A Look Into the Future of Speech Recognition from AT&T R&D Labs
If you thought Google’s Voice Actions, their technology which recognizes your speech and pushes it up to the cloud and back down to your device, is cool, then you should be pretty impressed with what the folks over at AT&T labs are working on in the field of speech recognition. Most of these projects are based on AT&T Watson, a real-time speech recognizer that, like Google Voice Actions, is also based in the cloud so that as WATSON’s technology improves, so does all of the services using it.
Speech recognition technology has steadily improved over the years, but it’s still tends to be a hit and miss technology. Technology like this is not only important because of the enormous role it can take in helping people with disabilities, but it can also transform the way we interact with everything from our cell phones to our televisions. At AT&T’s Labs in New York, the company invited media to check out some of the projects that they’re working on surrounding Watson. Some of these projects you can even try out yourself today via a mobile app, while some others may or may never see the light of day outside of their labs.
What’s in The Crystal Ball
iWalk is a navigation service which offers directions and verbal feedback to those who have low vision and are thus unable to rely on street signs alone.
Another AT&T labs scientist is using Watson to power text-to-speech on an e-reader (or e-reader desktop application), so that your ebooks, or RSS feed scan be read aloud to you while you’re doing something else – whether it’s cooking or whatever. You can even set the pace of the text-to-speech to make it faster or slower, or to make it skip around for quick summaries. AT&T labs is also working on making the text-to-speech sound as natural as possible when being read aloud to you.
iRemote is a potential smartphone or iPad app that would let you search through your TV’s program guide because the system is able to go back and transcribe archived shows, you would be able to perform a search for video in your TV program guide using natural speech with detailed specifications about what you’re looking for. For example, you could say “Search for Obama joke on Jay Leno from two weeks ago” and it will search for it in shows that already aired or possibly your DVR.
iMiracle is an application that is able to automatically transcribe speech on live TV segments into closed caption, and it can even translate the speech into different languages for the closed captions. The system also indexes the transcribed text so that it can be searched through later.
Voicemail-to-text is certainly nothing new, but the Voice-mail-to text app that AT&T is working on is much more accurate than what’s out there now on the market, and it even manages to provide appropriate formatting, punctuation and capitalization.
iPizza is a concept app that allows users to order complicated items from a pizza menu with their voice.
What’s Available Today
Vlingo is an app that is readily available for Android, BlackBerry, Nokia and iPhone. It allows users to search the web, find directions, update your social status, and send emails and text messages to contacts using your speech. Of-course, this makes using your phone on the road that much safer.
Have2Eat is an existing iPhone app that is able to summarize user-generated web reviews to help you find a good restaurant that is a suitable fit for you and your mood.
Speak4it is an app available for the iPhone and iPad that that uses a “push to speak” button so that users can say what they’d like to find. It also lets users point to a spot on a map and ask what’s around there.