Machine learning at the edge may be designed for tiny chips, but it is a huge topic. If we can run machine learning models on constrained devices, we can keep data local, which will boost privacy, and we can reduce the energy demands of both the internet of things and AI. And for those who aren’t sold on privacy or energy conservation, machine learning at the edge will also enable some really cool features.
That is why I was pumped to learn more about MCU-Net, a neural network designed to run on a microcontroller. In a conversation with Song Han, assistant professor in MIT’s Department of Electrical Engineering and Computer Science, I learned that his research team along with folks from the National Taiwan University and MIT-IBM Watson AI Lab are also working on two other significant technologies for edge machine learning, or edge ML: building models with less data and training those models on the edge. If all of this research comes together, we could see real-time training of machine learning models on battery-powered devices, which would give us the ability to customize wake words or build sensors that can monitor the environment while also learning what’s normal. The services offered by connected devices could become highly customized — without sending all of your data to the cloud.
First, a bit about machine learning and what it requires. Training neural networks so they can understand speech, recognize objects, play games, or do any number of tasks requires data. That data enables a computer to understand what it’s trying to classify so it can label it or take an action. Usually the computer requires a lot of data that’s already annotated so it can parse, for example, the meaning of objects in photographs or the role a comma plays in a sentence.
All of this data is then fed into the computer, which starts trying to label or “understand” what it has. A researcher tracks the computer’s success and tweaks the resulting model that it comes up with in an effort to achieve greater accuracy. This process is called training, and it takes a lot of computing power and time. The end result is a model that will run inferences. With inferences, when the computer gets new data it runs it through the model to label it. For the last few years, researchers have been working on chips and models that can run on the edge, with only the most ambitious trying to run models on microcontrollers. Training on microcontrollers is seen as mostly a pipe dream.
Back to Han’s research team, which has done some really powerful things that it presented this week at the NeurIPS conference. We’ll start with the data. The team has come up with a method to generate high-quality data for training using only 100 images — and without any pre-training. This will significantly reduce the amount of data required to train models, making it easier, faster, and cheaper to teach a computer what to recognize. The technique is called Differential Augmentation for Data Efficiency and Training, or DiffAugment.
This is bad news for those of us concerned about deepfakes, but it’s really useful for those who want faster, local person recognition (not detection, but an actual understanding of who is at the door or in the home). After a week of a camera taking shots of you and your family, for example, it has enough data to train and then start inferring exactly who is in the home, adding new people as it encounters them over time — all without the data leaving the device.
When it comes to training on the device, however, researchers need something else. They need Tiny Transfer Learning, or TinyTL. Training a neural network uses a lot of memory and a lot of parallel processing, which is why GPUs are so good at it. But TinyTL reduces the memory required to train a neural network by up to 13 times. That means we could soon have on-device learning, which lets us make the example above in which a camera learns who people are in a home locally and quickly. Currently companies tend to develop a model, deploy it, and then tweak it every few months or even weeks, depending on the device. That tweaking happens in the cloud. When the model gets an update, the device gets a software update with the new model, and users can expect to see new features or simply improved recognition.
With TinyTL, a device could generate updated models on the device itself, and it could happen nightly. This means you could train a smart speaker with a new, customizable wake word or change a process on an industrial production line and then quickly change the models running on sensors to adapt to the new process. Training at the edge is really the holy grail of edge ML, because it lets devices learn and react in real time.
Devices aren’t always learning, however. Much like people, much of machine learning is simply matching new input to an existing model to see if it is worth paying attention to. This is called inference, and we’ve seen some very good efforts aimed at bringing high-quality inference to microcontrollers and edge devices. Wake word detection is the most common example. Other examples might include flare detection in factories or tracking tumors on radiology scans. This is where MCU-Net comes into play.
MCU-Net is comprised of two elements: TinyEngine, which is where researchers design the smaller models that will run on an MCU, and TinyNAS, which maps out a particular MCU’s capabilities and builds a neural network architecture that can optimize the particulars of that MCU. A microcontroller might be as constrained as a few bits of memory and a few hundred hertz of performance or it might scale all the way up to a multicore gigahertz device with up to 2 MB of memory. Before you can build the model, you need a tool that can shrink or expand the model for the device it runs on.
So TinyNAS figures out the architecture for the neural network and then TinyEngine builds the model. By using them in tandem, Han has managed to use a $10 microcontroller to classify images with 70% accuracy. He compared the results to Google’s TensorFlow Lite and said it was both more accurate and did the classification 60% faster.
This research is really exciting for those of us who are eager for the insights we can derive from connected sensors and devices. It means those insights will be cheaper, more private, and will come at a lower environmental cost.
This story was updated from the version in the newsletter to reflect the involvement of MIT, the National Taiwan University and MIT-IBM Watson AI Lab in the research.
John Cohn says
Thanks for the nice write up ! It’s good to reconnect! ‘
Stacey Higginbotham says
Hey John! Did you work on this?
John Cohn says
Yes.. I work now wt the MIT-IBM AI Lab where I collaborate with Song Han and his team. You may remember we met many years ago on this project I lead with Paul Brody https://gigaom.com/2014/09/09/lets-discuss-ibms-new-block-chain-internet-of-things-architecture-and-robots/
It’s good to reconnect.