These days, when marketers and tech companies discuss digital transformation, they’re often talking about 5G deployments or digital twins. Whereas five years ago, when people talked about digital transformation or Industry 4.0, the focus was on data sharing and building out ecosystems.
But even back then it was clear that to really make a big impact the future of any digitization effort had to extend beyond the factory or enterprise. And to make that happen, a company had to share its data. Figuring out how to do so was initially seen as one of the next hurdles in the IoT, but over the last few years, figuring out how to share company data seemed to fall by the wayside.
Which is why I was so excited to see that, according to The Information, Microsoft is working on a service that will help companies share data in a secure manner. This service, called Project Oakes, has me hoping that we’re now ready for a conversation about sharing enterprise data so we can make IoT ecosystems real.
Let’s say a manufacturing facility wanted to use sensors to track the health of its machines. The folks inside the factory would have to share data from those machines with the facility’s equipment vendor in order to get repair staff out to the factory before any of them broke. In another scenario, if a company wanted to ensure it had a reliable supply of raw materials, it would have to share some of its manufacturing data with its suppliers to make sure the materials were on hand at the factory when needed.
On the machine learning (ML) side, an equipment maker might want a way to group data from machines that reside in factories owned by competing buyers. For example, in the automotive industry a vehicle’s paint job is a pretty guarded secret — both in terms of how the paint is made and applied — because understanding how much paint gets used can tell a lot about the vehicle manufacturer’s production. Thus, the data coming off of painting robots is sensitive.
But the company making the robots would love to get all of this data in aggregate for its own use if it could. So how can parties share this type of sensitive data? I used to think it would involve data contracts and NDAs. But it seems like Microsoft has a different idea with this new Azure service. From a job listing for the service:
“The team started as a small incubation 2 years ago and has already made a buzz within relevant industries, offering a new framework for double blind sharing, modeling, and analysis of data between two or more parties. Using this framework, parties can share raw data into a ‘clean room,’ apply a query, an algorithm or a ML model to the combined dataset and receive the results, but without any party seeing the other parties’ actual data. Our list of excited early adopters already includes significant players and is constantly expanding.”
This isn’t the exact solution for data sharing that I thought would emerge, but it’s solving a problem that the industry has had for a while, and one that was bound to become more pressing once initial rollouts of IoT solutions inside plants became more widespread. Because once companies started to see returns on their investment from the low-hanging fruit of adding sensors and automation, it was only natural that they would want to go further. But doing so means sharing more data.
Now, I will say that Microsoft has been pitching ways to share data for years. As far back as 2009 I was talking to folks there about data marketplaces that never materialized. Last year, I even included data sharing as one of my failed predictions for the IoT.
But Microsoft isn’t alone in pitching ways to shield data while also sharing it. In the machine learning world, there’s increasingly more research being conducted on federated machine learning, where researchers try to build models on top of several clusters of data without sharing the data. To do this, researchers try to build a single model that will deliver consistent results across isolated pools of data. This could be used for building algorithms that use pools of patient data or building algorithms on top of data that must stay stored in specific geographic regions.
We’re also seeing more attention being given to data masking as a way to preserve secrecy around data. Data masking includes techniques to hide the type of information being stored or transferred as well as ways to confuse observers about what that data may indicate. Encryption is the most common form of data masking, but it can go further.
For example, adding additional noise to a flow of packets from a sensor can help disguise when the sensor sends information about its state. This might be important if the sensor is monitoring whether or not a secured door is open or closed. Without additional noise, a hacker could monitor the state data and surmise that every time the sensor sent data that the door was either open or closed.
As these techniques become easier to deploy and services such as Microsoft’s launch, sharing data without sharing secrets will likely become more feasible. And that will open up new use cases for the IoT and help expand the data available for training algorithms. Maybe my predictions around the need for trusted data sharing weren’t wrong. Maybe they were just early.