Voice-Driven Search: Talk to the TV
By Monta Monaco Hernon - For years, people who talked to the TV were admonished, "It can't hear you, Honey." But now, with voice ...
Video content search, discovery and recommendation continues to be a hot topic for cable operators who realize that viewers can easily become frustrated while trying to navigate through myriad programming choices to find something to watch. Voice recognition technology is emerging as a way to help customers request specific content or ask for choices to be presented to them, said Kenn Harper VP of devices and ecosystem, Nuance Mobile (NASDAQ:NUAN).
"Scrolling menus and trying to type in a movie (name) is an anxiety-producing experience. It takes time using the small buttons, and (someone) might not know what they are looking for," said Harper, who will speak on April 17 at the NAB Show in Las Vegas. His talk, "Improving Content Search and Discovery with Voice Recognition Technology, will be from 10:30 to 11 a.m.
"There isn't a good interface (with) how content catalogues are structured. There isn't an easy way to find content you are looking for. Speech (recognition) can be used to solve a problem that has plagued the TV industry and cable operators for years," Harper said.
While voice recognition can benefit consumers by presenting an alternate and perhaps easier method of searching for content, it also benefits cable operators.
"A large percentage of queries are related to searches for content that has to be paid for - a premium channel or a movie that costs $5.99," Harper said. "Operators aren't just doing this because they are improving usability. By having an easier interface to get to content, they are eliminating barriers to drive more paid content."
How it Works
Voice recognition technology can be used to help with two categories: TV control and TV content search. The former involves looking for a specific channel or programming currently showing on television, while the latter has to do with searching in the on-demand catalog.
The first step is audio acquisition. The mechanism for this is generally in the remote, which records the request and streams to the set-top and/or the cloud. Next comes speech recognition. Other noise and talking needs to be filtered out, but accuracy also depends on a model that can interpret speech patterns, dialects and even languages. Since the voice recognition here is specific to viewing content, the model can be optimized for how people tend to structure commands. The third component is integration with the operator's on-demand database. While some TV control can be done locally on the set-top box, more complicated content searches often are processed in the cloud, Harper said.
As operators build personalization frameworks, biometrics can be used within voice recognition to determine who within a household is making a request. "The recommendations (the operator) serves up will be personalized to you and not your child (for example)," Harper said.
Nuance has seen a 70% repeat user rate. In other words, once viewers use voice recognition to search for content once, 70% of them will use it again, Harper said. "This stickiness is unprecedented. TV is one of the most sticky experiences for voice that we have ever seen. People (are still) using it six months after they first used it."
Additionally, active users of voice recognition for content search are using it 80 times per month.
"They are not just using it once Friday night. People are using this repeatedly throughout the course of the month. This is a pretty significant metric that we are seeing," Harper said.