Analysis

We gave computers vision, now we want them to hear

— One day, Alexa might be able to understand when you're cooking based on the sounds you make in the kitchen.

Take a moment to listen to the world around you. Maybe you are listening to a podcast or the sounds of office life filtered through noise-canceling headphones. Or perhaps you’re on a train or lulled by the sound of a dishwasher. Our brains are constantly taking in the sounds around us and giving us useful information.

In the coming few years, computers will begin to also process those noises to understand what’s happening around them to modify the environment, improve hearing, and notify us if something is wrong. Much like computer vision was the success story of the last decade in machine learning, the coming one will see computers gain a sense of hearing.

We’re already seeing the possibilities. Last year Amazon announced Guard, a feature of the Amazon Echo devices that listens for and recognizes the sound of windows breaking. This year we should start seeing over-the-counter hearing aids thanks to a law passed in 2017. Those devices will take advantage of noise sensing to get a feel for the environment in which someone is having a conversation and adapt to make the human voice easier to understand.

In 2017 a Carnegie Melon researcher proposed a multi-sensor device that included noise detection to understand appliance use as a way to figure out what someone was doing in the home.

But it’s about to get a lot better. Ahead of an event about using machine learning on constrained computers, I spoke with a U.K. company called Audio Analytic that has built 700 different sound profiles that can detect everything from a train station to a baby’s cry. In addition to building and licensing sound models for companies that include Qualcomm and smart lighting company Sengeled, it has built a polyphonic sound detection score that will help others building sound detection models measure how effective their approaches are.

The machine learning community has robust data sets and ways to measure the effectiveness of speech and image recognition thanks to years of work and a clear understanding of what metrics matter in determining the effectiveness of an algorithm. In speech, for example, we look at the word error rate, or how often the computer messes up the word we said.

Classifying sounds is trickier. There’s no limit on the sounds that can be made in the real world, as opposed to the natural limitations of a human voice box. Researchers also have to contend with an almost infinite set of potential noises having a meaning. In language, there are a limited set of phonemes that can help shape a model, but the sounds of a mosquito, glass breaking, a refrigerator running, or a dog barking follow no common patterns.

Thus there are ways sound detection can go wrong that are familiar to anyone building a machine learning model and ways unique to the problem of recognizing millions of sounds. Having a score can help see how close a model comes. This will help the industry move toward a standard.

That standard will help push sound detection and recognition into more places. For example, at the TinyML conference next week, Audio Analytic is showing off sound detection on a board running an ARM Cortex-M0+ processor. This is a tiny, tiny chip used for sensors. Chris Mitchell, CEO of Audio Analytic says that the models are shrunk down to a few kilobytes. This means we could have a small, battery-powered sensor on a wall that can detect the sound of glass breaking or a pair of headphones might come on the market that can “listen” for the sound of an approaching car and reduce their active noise cancellation so a jogger or biker could hear what’s coming.

Mitchell sees the future of sound detection making waves in five different areas.

The first is in safety and security, and it’s already here in Amazon’s Guard or smart products trained to listen for the sound of smoke detectors and let someone know when they are activated.

The second area is in health and well being. Here we’ll find sensors trained to hear the sound of a baby’s cry, to detect coughing, or even to figure out if someone is snoring.

Another two areas are the detection of external environments for communications, as is needed for hearing aids, and the detection of external environments to improve the delivery of entertainment. As an example of how entertainment could take advantage of better sound detection, Google Nest Hub Max speakers already adjust the sound of music based on the dimensions of the room, but what if the speaker could also detect if you’re running a fan or there’s a video game playing, and adapts accordingly?

Environmental sound cues also play into the final area of value creation — convenience. Sound can help computers derive context that could influence how a smart home responds. For example, if a computer could recognize a set of sounds as highly correlated to cooking, it might brighten the lights in the kitchen.

The possibilities are endless, and by focusing on creating a good metric for building accurate sound detection and by bringing the technology to low-power and contained processors, we’ll have more options for the industry to play with going forward. In the next decade giving computers better hearing will mean smarter homes, better hearing aids, and a better experience for us as we navigate an increasingly loud and confusing world.

Stacey Higginbotham

Next SigmaDots is building decentralized IoT security for everything »

Previous « IoT news of the week for Feb. 7, 2020

Published by

Stacey Higginbotham

Tags: alexaAmazonAudio AnalyticsgoogleQualcomm

4 years ago

Episode 437: Goodbye and good luck
This is the final episode of The Internet of Things Podcast, and to send us…
So long, and thanks for all the insights
This article was originally published in my weekly IoT newsletter on Friday August 18, 2023.…
We are entering our maintenance era
This article was originally published in my weekly IoT newsletter on Friday August 18, 2023.…