Controlling a device with your voice is certainly nothing new. It’s been implemented time and time again and the execution has ranged from terrible, to gimmicky, to just ok, but it has never taken off. It’s been the sort of thing that you show off at family gatherings to impress older relatives but you never end up using.
It has existed on mobile devices for a while but with very limited capabilities. Feature phones could only do things like call up a number. Then, Google brought it on Android with Voice Search and Voice Actions expanding its capabilities. Now, Apple has officially taken a big step into voice control with iOS 5, the iPhone 4S and its personal assistant Siri. Could this mean that we will soon start using our devices in a different way?
Apple didn’t invent the artificial intelligence (AI) and voice recognition technology that makes Siri happen. In fact, Siri on iOS is the result of Apple’s purchase of same named company, Siri, and the use of third-party voice recognition technology (probably Nuance). But Apple has an incredibly successful history of doing exactly this, taking an existing but flailing technology, ironing out the quirks and making it the norm. The Macintosh made the mouse popular, the iPhone revolutionized touch-screen technology and Siri could be the beginning of a new way to interact with our devices.
Google’s steps in voice recognition and voice control
Mobile phones are the ideal test-bed for voice control, you see the built-in microphones offer great audio pick-up and sound quality (unlike desktop microphones). Google introduced Voice Search for smartphones back in November 2008, and in a show of extreme ‘goodwill’, the application was first released on the iPhone and not its own Android, which saw Voice Search in February 2009.
January 2010 saw the release of Android 2.1 which included voice input for dictation — you could input messages and emails by talking to your phone. In August, the same year, Google finally released Voice Actions for Android, its first voice control application. A disclaimer here that third-party applications also exist on the Android market that offer voice control, such as Vlingo and Speaktoit.
Google was claiming in 2010 that one in four searches on Android were being done by voice, and has been incrementally improving its speech recognition technology. In December 2010, Google also acquired Phonetic Arts, a British company that specializes in speech output, so that when you device or computer speaks back to you, it will hopefully sound more natural and less mechanical. Google has even recently introduced speech-to-speech translations for its Android phones.
Siri understands the way you talk. Siri is artificial intelligence first.
However, Google has so far relied on you learning the machine’s ‘language’ in order to convey your commands. We’ve gone over Google Search Voice Actions in detail, which although are intuitive and simple they do require a short learning curve, and there is one right way to say things. As it is, it lacks the artificial intelligence that Siri has, which works with more natural language.
Siri was the result of years of research run by the Stanford Research Institute under the CALO (Cognitive Assistant that Learns and Organizes) project, funder by U.S. military research agency, DARPA. Siri was a ‘lowly’, but up-and-coming app available for the iPhone when it was picked up by Apple for around $200 million in April 2010.
Since acquiring Siri, Apple has worked hard on the technology, integrating it in iOS 5, and finally releasing it together with the iPhone 4S. Like Google’s implementation, it is embedded in the OS and can be launched by holding one button. Unlike Google Voice Actions, Siri can understand natural language expressions. There is no special dictionary that you have to learn and the internet is already full of witty examples of how Siri responds to all kinds of ‘odd’ requests.
If you haven’t already seen it, here is a video by Apple on Siri:
Competition will make things better
Siri, in its current form, is of course not the end of it all. It has been labeled as beta, a rare move from Apple. It’s an incremental step in the evolutionary process. The mounts of data that Siri’s use is going to produce will definitely help Apple to make it better and more useful, while, our humble opinion is that it needs a voice makeover. Siri also only supports English, French and German, for now.
On the other side, Android already has a competing set of “personal assistant” apps, like Speaktoit, though none as polished as Siri. We would expect the response to come directly from Google, which has been the pioneer until now. Competition from the big boys is what will really push things forward. And of course there is Microsoft, which can’t afford to be left lagging, again.
Apple co-founder Steve Wozniak said at an event in Silicon Valley, in January this year that, voice recognition is computing’s next frontier. He went on to add that we should look for more robust voice recognition to take hold in the realm of personal computing and that it would be able to interpret a wider variety of commands.
Voice control is evolving now, mobile phones offer a controlled environment and now have a lot of processing power for some serious AI magic, but no one can guarantee it will ‘happen’ anytime soon. However it’s definitely the right time for the technology to build up steam. Stay tuned in this space for the future of computer input.
Some interesting external reading on voice control and Siri: