I don’t know about y’all, but I’m hitting that point in the summer where I just want to hang out, read books, and drink a cold beverage while watching the sunset. Luckily, the days are long, so I get to spend time relaxing (at least until the wind shifts and smoke from the nearby wildfires hits) and thinking about the future of the IoT.
This week, I’ve been thinking about trust. That’s in the wake of reading one story about the police in Chicago changing data in the ShotSpotter gunshot detection system and another about a researcher who was struggling with the Apple Watch algorithms and how they might affect his work. Both stories are centered around trust, a crucial element that gets largely lost in the IoT— specifically trust in the data and trust in the algorithms.
We talk a lot about security and privacy in the IoT, but very little about how data is gathered, who has access to it, as well as how it’s authenticated and kept from being compromised. Meanwhile, when it comes to algorithms we don’t always know where the training data came from, how the algorithms were built, or how they change in response to real-world experience.
The answers to these questions have huge implications because the data and algorithms are driving public policy. They can also be used to determine credit scores, inflate the price of goods — even determine who gets health care. Not to mention that some data can drive machines to action, such as in the case of an irrigation system or manufacturing plant.
It’s why we need to develop methods for data assurance and attestation so we understand how a sensor or device generates data. We also need to build chains of custody for data as it moves throughout computing systems. And then we need to figure out how to build algorithms in ways that are both replicable and transparent.
What might this look like? First, let’s consider two of the most recent examples of a trust failure related to data. With the ShotSpotter system, court filings allege that the Chicago Police Department changed the classification of certain sounds in the system from a firework to a gunshot. Later, another analyst working for ShotSpotter changed some of the location information to corroborate a story that the police department told with regard to a police shooting of a 13-year-old boy. From the Vice article:
But after the 11:46 p.m. alert came in, a ShotSpotter analyst manually overrode the algorithms and “reclassified” the sound as a gunshot. Then, months later and after “post-processing,” another ShotSpotter analyst changed the alert’s coordinates to a location on South Stony Island Drive near where Williams’ car was seen on camera.
Through this human-involved method, the ShotSpotter output in this case was dramatically transformed from data that did not support criminal charges of any kind to data that now forms the centerpiece of the prosecution’s murder case against Mr. Williams,” the public defender wrote in the motion.
The prosecutors working for this state decided to withdraw the ShotSpotter evidence rather than explain the changes.
In the second example, JP Onnela, associate professor of biostatistics at the Harvard T.H. Chan School of Public Health, wanted to use data from the Apple Watch showing heart rate variability. He chose to use data from a time period between early December 2018 and September 2020. He pulled the data twice, seven months apart. However, when comparing the results from the HRV data pulled earlier he discovered they were statistically different from the HRV data that he pulled later. In other words, the way the Apple Watch calculated heart rate viability had changed — and ultimately had changed enough to make Onnela question his use of the Apple Watch for his research.
Most doctors and researchers are aware of the foibles of consumer wearable devices, namely their lack of accuracy and their changing algorithms. But as Apple, Google, and Amazon continue pushing these devices for wellness and even on-the-job monitoring, it’s worth understanding how those algorithms change and who those changes might advantage.
When it comes to the data itself, we need ways to ensure that a sensor is calibrated correctly and that it isn’t compromised. The National Institute of Standards and Technology (NIST) and several other standards bodies exist to ensure that sensors meet the required specifications, but not all sensors meet those standards. Second, the IoT system needs a way to ensure that the data it ingests come from an authorized sensor that is telling the truth.
Then we need chains of custody that ensure the sensor data isn’t changed within a system. And finally, we need ways to audit the algorithm processing the data to ensure that it is fair and that it meets the public objective. For example, in the Shot Spotter example, perhaps the need to keep a clear chain of custody would have prevented analysts from reclassifying data or would have forced ShotSpotter to clarify why that happened.
ShotSpotter has denied that its analysts changed evidence to fit a police narrative and instead noted that it always has analysts create a separate record for court proceedings and that Vice, in its story, conflated the two separate events. However, the statement from ShotSpotter doesn’t address Vice‘s characterization of the original misclassification, that of turning the fireworks into a gunshot.
The point is that right now, we like to think of data as holding some eternal truth, when in fact it is as biased as the people trying to use it to set policy, monitor gunshots, or promote our health. Without mechanisms to establish trust in the sensors, the data, the way that data gets turned into an insight, and the algorithms themselves, objective, data-driven decisions are as much a chimera as objective journalism.