I talk a lot about computer vision because I think it’s a core enabling technology for vastly more efficient understanding and use of the world around us. When a computer can see, it can apply its intense analytical powers to the images and offer insights humans can’t always match. Plus, when combined with actuators, computers can direct things in the real world to respond to the data it “sees” immediately.
Thus, computer vision is a huge stepping stone to the promise of the internet of things. John Deere’s purchase this week of Blue River Technology, a company that makes a computer vision system to identify weeds on farms, is an excellent example of this in action.
With this acquisition, it’s adding what Willy Pell, director of new technology at Blue River, calls “real-time perception” to the reams of data the ag firm already provides. This perception comes in the guise of computer vision. The tractors can now pull a trailer behind them that snaps pictures of each plant and prescribes certain actions like dropping pesticide on it. By automating the task John Deere can offer farms a weed killing solution that scales cheaply and performs the same way every time while treating each plant individually.
Computer vision is going to pop up everywhere, in part because as humans we are incredibly visual. If dogs were building the internet of things, I bet they’d build sensors that could detect the chemicals that comprise various scents and then translate that back into code a computer could read. While dogs would likely focus on pheromones, we focus on pixels.
And this is an important thing to remember: computers don’t see like we do. Every image is translated into pixels with data associated with each. The computer then applies math to figure out distances between featured points and determines what it is seeing. Right now, a lot of the focus is on teaching computers to use videos, which a computer reads as “flat.” While we can look at a video of an office and estimate a building’s depth, or at least infer it has depth, a computer doesn’t necessarily do that. That’s why facial recognition using cameras can be spoofed by a photo or makeup that disguises contours.
Computers need depth perception to see as well as humans. With self-driving cars, consumer products like the Lighthouse personal assistant, some drones and even the anticipated 3-D sensor on the iPhone, computer vision with depth is hitting the mainstream. So I thought I’d show the picture above, which is a drone mapping out the world using double cameras in stereo, and explain the different ways we’re giving computers depth perception.
Old school depth perception is basically like a moving version of Viewmaster. It requires two cameras on either side plus processing power and algorithms to handle the math required to use the two camera images to provide computers with the sense of depth. When seen represented on a monitor, the edges of things are softer and less defined. In some use cases, especially as cameras decrease in cost and processing power requirements, this can suffice. For example, some drones could use this.
For everything else, there are 3-D depth sensors. They come in three different types. A familiar type are laser range finders that shoot out calibrated laser beams and record what the lasers bounce off of. It’s like sonar for light. This is the type of sensors found in LIDAR. They are extremely accurate at most things, but also expensive and require moving parts.
The other two types also use light. One, which generated the image above, is called a structured light camera. It works by sending out a known pattern of light, usually in infrared. The camera then “sees” by figuring out how the pattern was disrupted. The first well-known structured light 3-D sensor was probably the Microsoft Kinect, which launched in 2010. These are cheaper, but they don’t work well outside.
The final light sensor is a time of flight camera that shoots out precisely timed bursts of light and then measures how long it takes them to come back. It calculates the difference between returning pulses to generate a sense of the shape of an object it in front of it. These sensors are similar to what might be used in the next generation iPhone because they work well in a variety of lighting situations but aren’t as expensive as a laser-range finder.
As computers gain depth perception they can become more accurate at a variety of tasks, from robots that can better manipulate objects to perform complicated tasks to cameras to high-quality biometric security systems.
And what is the IoT really, except the search for better data and ways to manipulate it?
Update: This story was updated on 9-12-2017 to correct the spelling of Lighthouse and to note that lasers also use light. I could really use an editor in cases like that.