Voice has been one of the most successful technologies of the last decade, especially when it comes to the smart home. Thanks to innovations in natural language processing (NLP) during the 2012-2015 time period, voice became ubiquitous on our phones and in our homes. But now, with news that Amazon is downsizing its Alexa business and that Google is questioning what it can eke out of its Google Assistant, it’s time to take a hard look at how to make money in voice and what the news about Amazon and Google’s struggles mean for the smart home.
First up, voice and the smart home are related but entirely separate. Siri launched in 2011. IBM’s Watson was also playing Jeopardy back then. Thanks to hard work on speech-to-text and NLP software, we could talk to our phones and have them understand us — both to take transcription and complete programmed tasks. But speech wasn’t transformative on the phone, partly because the phone already had a pretty convenient and established user interface in touch and tapping. Many people were impressed, but talking to your phone to set an alarm or a reminder was still clunky and not in widespread use. It was a trick, not a transformational technology for most.
Voice is a UI, not a platform
But the need for a new interface became clear when we started adding connected devices to our homes. I saw it back in 2014 when Amazon launched Alexa because I already had a home full of devices by then. Others were not so sure. Even Kevin doubted my enthusiasm. Voice was intrusive and still somewhat glitchy. When Amazon launched its smart home capabilities in the spring of 2015, voice really achieved a killer app. Or so everyone thought.
But voice isn’t the smart home. And Amazon’s Alexa layoffs and losses and Google’s issues aren’t an indictment on voice. It merely shows that no one has figured out how to monetize a digital assistant. Monetizing voice is a low-margin endeavor. Voice is the user interface and the digital assistant is the platform. It’s like we’re confusing the ability to touch to navigate our phones with an app store.
And because voice is going to be an essential way people communicate with ubiquitous computers, we have to get voice right. It’s not going anywhere. But trying to make money on it outside traditional ways companies monetize user interfaces is a mistake. Logitech sells keyboards and mice. Apple and Google have operating systems that translate taps into instructions on touchscreens. Amazon, Google, and other companies can sell far-field microphones embedded with NLP software to provide voice.
Divorcing voice from the platform
But unlike touchscreens, voice has high barriers to standardization and understanding of intent that make it more difficult to divorce from a platform or OS. Alex Capecelatro, the CEO of Josh.ai, a company that builds a voice interface for custom integrators, points out that with voice, there are two layers of communication. The first is the actual words, and the second is the intent.
“When you’re dealing with apps, you have a dedicated destination you want to get to,” he said. “With voice, how do you prioritize words and voice commands? How does the system do the right thing and how does the user have control where a command goes?” In his example, asking a voice interface to turn off the light requires the interface to know what to do, but also what program to ask to do that. Today voice interfaces use integrations such as Amazon Skills to understand intent in some cases like knowing what light to turn off and in others, such as setting an alarm or asking a factual question, choosing its own source of information or action.