In the wake of a second crash of a Boeing 737 MAX airplane, most countries, including the U.S., have ordered airlines to stop flying that particular plane. The death of more than 350 people in two plane crashes over a five-month period is unprecedented in modern times, and the current consensus is that the new 737 MAX is responsible.
The Lion Air crash five months ago was reportedly caused by pilots not knowing how to disengage the plane’s automatic software program, which kept forcing the nose of the plane down as it tried to compensate for information it was getting from a malfunctioning sensor. It’s everyone’s automation nightmare: humans unable to wrest control from a machine intent on its course of action — a course that would lead to the death of everyone on board.
So why am I writing about this? After all, this is an IoT-focused publication, and the planes were not connected to the internet. They were, however, reliant on complicated software systems and a sensor to automate processes. And planes may connect to the internet eventually. Even if they don’t, plenty of connected devices today will eventually get to the level of automation that Boeing’s airplanes have.
I’m fundamentally interested in what we can learn from these disasters as our regulatory agencies try to figure out how to handle increasing levels of complex software and automation in everyday connected products. Many people, for example, used this news to decry the safety of autonomous vehicles. But planes have been flown by wire — meaning the planes do most of the flying and pilots have to manually take control of the plane in order to actually pilot them — for decades.
The failure here isn’t actually with autonomous systems; it’s with a regulatory regime that let a massive update to an airline go through without requiring pilot retraining. In the case of the Boeing 737 MAX, the new design had larger engines than the previous 737 model mounted closer to the front of the plane. Those engines tended to lift the plane’s nose up. To compensate, Boeing engineers installed an automated software system called Maneuvering Characteristics Augmentation System (MCAS) that would rely on sensors to tell the plane when the nose was too high and then the MCAS system would adjust the plane so it trended back down.
In the case of the Lion Air flight, the faulty data sent from the sensor kept telling the MCAS system to lower the nose. But since the nose was level, the plane kept trending down. Every time the pilots flying the plane would adjust the nose back up, the system would pull it back down — until the system took over. Thanks to deep reporting by the New York Times, we know that in the case of the Lion Air flight, this happened 24 times before the plane went down for a final time.
It appears that the pilots were unaware of how to disengage the autopilot on the new automated system, which itself was trying to compensate for new engines. It wasn’t even documented in the manual. The pilots, when faced with a life-or-death situation, didn’t know how to take over the system and fly the plane themselves. Never mind the fact that, based on the malfunctioning sensor, the plane should have never taken off in the first place.
There were many failures, and together they resulted in the Lion Air flight crashing into the sea, killing 189 people. After the investigation, Boeing said it would prepare a software update. But five months later another Boeing 737 MAX crashed — and industry watchers see a similar pattern to that of Lion Air. Nor has the plane’s software update been released. It was delayed by the government shutdown, in part because federal officials deemed work on the project during the shutdown nonessential given that it was a low risk, according to statements Boeing and FAA officials have made to the press.
The update, coming next month, should help. Boeing says it will rely on data from multiple sensors instead of a single sensor, which will lessen the impact of a damaged sensor or a bad reading. But issuing a new software-based automoted system with little pilot training seems like an oversight the FAA should have not allowed. Software updates and automated systems can change the functionality of a product. There should be a rigorous testing period for updates and a rigorous training period to inform users on how to handle them.
And not just in the airline industry. Boeing’s challenges aside, I have been flummoxed every now and then by a change in the software on my Tesla. So far, any changes that have surprised me have been limited to the user interface, but software updates can also change how the physical car works as well.
A good model for this type of update and training program is being built by the medical device industry in partnership with the Food and Drug Administration. The FDA has been very proactive about developing standards for software updates to medical devices, partly because it wants to encourage medical device companies to patch security holes before they cause a problem (and in a worst-case scenario, kill a patient). Another reason the agency has created guidelines around updates is so that medical device makers know what to expect, both before a device is released and afterward. It is well aware of the increasingly important role software plays in the physical safety and functionality of the products it monitors.
We need to get all regulatory agencies to understand this link, and to adapt their rules to ensure that software updates and automated systems are given the scrutiny they deserve in safety-critical areas. They also need to make sure that when problems are discovered, the government and manufacturer can work quickly to resolve them.
This comment is incorrect in my opinion:
The failure here isn’t actually with autonomous systems
I have worked on unmanned avionics for over 15 years including redundant systems. The aircraft has 2 AOA angle of attack sensors installed. Boeing chose to only use one for attitude determination purposes. In my experience, if you spent the time and money to hook up a redundant sensor, and give it power, and connect it to avionics, it’s there for a reason and you MUST use it (and involve computations to validate it against the redundant sensor)! If you do not use it, why was it installed at all? The way I see it (based on the Seattle Times article) is they thought erroneously they could use only one because the mean time between failure on the AOA was low. Then they follow up the unsteady logic with the control authority on the MCAS was low by design so pilot could counteract it easily. Both of these assumptions turned out false. When AOA of one failed it drove plane to dive. Actions by pilot were increasingly dealt with by ratching actions by flight controls to drive tail action to point plane at the ground to the point where pilot could no longer control elevators (thus tug of war till death). Also, on previous 737 yanking stick would stop stall protect. On this aircraft it would not, and would just intensify corrections to drive plane into ground.
If good design practices were followed in regards to redundant sensors and flight controls it is likely this problem would not have occurred (redundant sensor check would alert pilot and disable forced diving of plane). Additionally, it is axiomatic to warn of redundant sensor problems prior to flight if can be detected on the ground (and in flight) which was being called optional by Boeing.
Training… Ok… I think that is straw-man argument myself. Blame the pilot sure … that’s the easy way out. We have been flying systems like this for many years now … and this is new news… No way… People got lazy, stopped checking carefully, good design/testing/certification was thrown out the window for profits and competitive advantage and people lost their lives for it. That is what happened. Checks and balances failed miserably. Even the venerable 737 can no longer be trusted once revised…
* disclaimer: the above assumes the news/facts that support the above on the internet are correct and accurate