Boeing’s Software Development Problem
Who could imagine in this day and age that one of the biggest companies in the world, Boeing, could make such a mistake?
When the news broke that Boeing’s state of the art plane, the Max 737, had suffered not one, but two tragic crashes on account of the plane’s software, the world was left wondering how this could possibly happen?
Last week, further details emerged that Boeing had outsourced some of the coding to developers who were bring paid as little as $9 per hour. Naturally, emotions turned to shock and outrage.
But why would a company that is valued in the tens of billions of dollars outsource work to create systems that were likely central to the safety of its planes anyway? As always, there is a lot more to this story than meets the eye.
In this article, we are going to look past the headlines and take a deeper look at what the most likely cause of failure actually was.
Developing A Better Plane
Developing a passenger airline is incredibly complex.
While the Wright Brothers may have put together the first flying airplane on their own, today’s aircraft require teams of thousands of individuals to design, test, and build them.
While Boeing scrambles to try to understand the chain of events that triggered this catastrophe, orders for the new plane have literally dried up overnight (Boeing has only sold a handful of new Max planes since the news broke).
Boeing is clearly using all the resources at its disposal to try to put out the PR disaster that these tragedies have created.
It announced that it had set up a 100 million dollar fund for the families that lost relatives in the two disasters. This fund, along with the recent release of a software patch to rectify the software errors that led to these disasters, is a clear admission by Boeing that the cause of the problem is of its own making.
Let’s start with why a company like Boeing would outsource its software development to third-party developers in the first place.
All A Matter Of Bad Language?
Like all companies, Boeing has a full-time staff to meet its day to day requirements.
The engineers that develop new planes and their systems will be part of its full-time staff. Boeing will even have internal education programs to ensure that these individuals have all the skills and knowledge that they need to develop new airplanes.
These engineers work in an engineering language called Matlab. This language was created by fellow engineers in order to design and mathematically model complex systems.
However, this language doesn’t work well as a functional language such as those required on airplanes.
In a nutshell, Matlab is great at doing the math but doesn’t work well with more complex structures such as vectors/matrices, etc., meaning it doesn’t fulfill the logical boundaries that a normal programming language can.
Its reliance on floating-point numbers, for example, which are non-deterministic, is not good when it tries to relate to airliner computer systems. Also, technologies that are now in use in airline computer systems have superseded this piece of engineering software in many ways.
Consequently, Boeing will have had to engage developers to translate the Matlab system models/programs into a more usable language such as C++.
Boeing engineers will not be qualified to do this.
This is where the human resources part comes in.
We Need Help
As a result of a lack of scalable human resources (Boeing has no requirement to employ full-time developers for this specific problem – and in such numbers either) to translate the code for its airplane systems into a more functional language, Boeing will have turned to outsource companies.
It is important not to jump to headline conclusions about Boeing’s decision to hire $9 per hour developers.
At the current time, we have no idea how many of them there were and what their actual ability was.
However, this decision almost certainly lies at the heart of the problem.
Hire expert developers for your next project
1 200 top developers
us over the last 3 years
In this particular case, given that the 737 Max was an update of the extremely successful 737 family of planes, Boeing would not likely have built an entirely new software system but rather updated an existing one.
Since the new Max 737 software was not a complete program write, it would have involved the engineers undertaking a change of dynamics in the existing software to fit it to the new plane’s specifications.
The problem that lies at the heart of these two disasters involved the anti-stall safety feature.
For some reason, as yet not known, the anti-stall overrode the pilot’s actions and tipped the nose into a dive, sending the planes crashing into the ground.
When the engineers were changing/modeling a new anti-stall system, they would have based their model on a new set of dynamics that would have relied on a new optimization method, which comes with a high degree of non-determinism.
Asking third party developers, who are not engineers, to transfer the model over from a Matlab model that they would not have fully understood is incredibly difficult, even for those developers experienced into doing so.
It appears that at least some of the developers who were tasked with this problem didn’t know how to, or missed entirely the fatal anti-stall problem when they created their solution.
While system testing should have picked up this problem, for whatever reason, it failed to do so.
It is important not to jump to easy conclusions that someone was asleep at the wheel and either missed this problem or failed to even test for it. We will have these answers in time.
However, the overall source of the failure definitely lay with the development approach Boeing used to create this software.
In short, the company took on a problem that was too complex for the existing development processes and management approach.
The scope and sheer size of the project clearly demanded that a project of this scope be cross-managed down to the last detail to ensure all the hundreds of developers and teams worked like a Harrington pocket watch.
It seems evident that this was not the case.
What the exact chain of errors was that led to these disasters will undoubtedly emerge in time.
However, as I have highlighted, one clear point of failure was in the project management side of things. At some point, a well-developed project management system should have flagged up this problem, even if it was as late as the testing phase.
If nothing else, these disasters should remind Boeing of the importance of razor-sharp project management systems and tools, especially given the news that Boeing is considering cutting real-world model prototype testing in favor of more computer simulations for future.