Voice controlled-technology is changing computer-human interfaces in the home and across a broad spectrum of industries, ranging from medical and banking through to business applications and industrial robotics. According to Mary Meeker’s 2017 Internet Trends Report, 20% of mobile searches are now made with voice and 11m Amazon Echo devices have been installed within US homes. The advances that have been made within this space over the past 12 months are immense, and there’s little doubt that voice assistants are here to stay. Over the next few years, the technology will become integral to our lives in ways that we can’t begin to imagine, and by the time our children are adults, it’s likely that text-based discovery will be long gone. But this certainty doesn’t detract from three big issues that we see with voice controls, which the tech giants and businesses rapidly need to overcome…
The first issue has been on my mind for a while, and a recent article within Street Fight Mag entitled ‘Voice Bots Have One Big Problem: Human Behaviour’, confirmed the scale of the problem. While we might be happy to ask Google or Alexa “what is the weather forecast for tomorrow?”, “find a dentist near me” or “send a text message to Charlotte” etc, other more personal queries such as “where can I buy the morning after pill?” or “how do I treat Athlete’s Foot?” etc would prove problematic for the average person, particularly when out and about or within an office setting. Even within the security of your own home there will be searches that you won’t want your children or partner to hear, and this is why text-based technology remains so strong. As the Street Fight article argues, “the social buffer that text interfaces provide allow even the shyest among us to write things we’d never dream of saying.”
So how do we overcome this problem, for voice controlled technology to move forward? The sophistication of such technology will undoubtedly improve and particularly in its ability to understand natural language, but until there’s a way to speak to a voice bot in a silent whisper and be clearly understood, human nature will opt for text-based search in many instances. While there are innovations like Hushme, “a personal acoustic device that protects speech privacy in open space environments”, I don’t imagine many would want to be seen wearing one in public.
The only realistic option is for the big smartphone and voice assistant companies to be focused on improving microphone technology, so that it can detect the tiniest whisper.
Voice systems have been built for standard speech, with not enough consideration for speech impairments or strong regional accents. To be accessible to all, it’s critical the artificial intelligence software used is able to learn from speech that’s not easily intelligible.
A small amount of innovation within this space is happening. For example, a Tel Aviv-based startup called Voiceitt has recently raised $2 million in seed funding to translate impaired speech into clear words. The app works by asking users to compose then read short, useful sentences out loud, such as “I’m thirsty,” or “turn off the lights.” The software records and begins to learn the speaker’s particular pronunciation. A carer can type phrases into the app if the user is not able to do so independently. After a brief training period, the Voiceitt app can turn the user’s statements into normalised speech, which it outputs in the form of audio or text messages, instantly, which can in turn be used by voice-controlled technology.
Plug-ons such as Voiceitt could be a good short-term fix, but in the long-run, voice assistance companies themselves need to be exploring the issue of speech impairments etc within their own AI development. From a business perspective, this will also be critical for those in locations where there are strong regional variations in accent, such as Scotland.
Advances in the AI technology behind voice assistants are balanced against significant security and privacy concerns. Currently, any voice request leaves a trail of breadcrumbs. The query is sent to a giant server farm paired with unique device IDs where the answer is sought, and returned to the individual’s device. These voice queries are saved: Apple, for example, stores Siri requests with device IDs for six months, and then deletes the ID and keeps the audio for another 18 months. This information, when combined with other personal data such as location, online purchase history, browsing habits, etc, can paint an accurate picture of an individual. This is important information which helps voice assistants to tailor and personalise results, but it can also be concerning from a data security point of view.
Google and Alexa are said to take things one step further, going as far as recording ambient conversations had within the home. This means that their mics are constantly listening even when an individual isn’t speaking directly to the device. They do so to learn a user’s vocal idiosyncrasies and to understand spoken requests better, helping them to handle queries instantly. But it’s easy to understand why many are apprehensive about placing a voice assistant within their home.
In most cases, there are ways to switch this listening ‘off’, but not everyone is aware of how to do so. In the future, voice assistant companies will need to find ways to reduce this infringement of privacy.