Tuesday, April 23, 2019

I don't think that I want to fly on a Boeing 737 Max

There is a great analysis of the 737 Max failures at IEEE:
The engines on the original 737 had a fan diameter (that of the intake blades on the engine) of just 100 centimeters (40 inches); those planned for the 737 Max have 176 cm. That’s a centerline difference of well over 30 cm (a foot), and you couldn’t “ovalize” the intake enough to hang the new engines beneath the wing without scraping the ground.
The solution was to extend the engine up and well in front of the wing. However, doing so also meant that the centerline of the engine’s thrust changed. Now, when the pilots applied power to the engine, the aircraft would have a significant propensity to “pitch up,” or raise its nose.
Larger engines were critical to the design, because that's how you get efficiency (read: lowest fuel cost).  The old airframe (fuselage and wings) were critical to the design because if you do a major change to the plane then the FAA certification is no longer valid and you need to (very expensively) re-certify the plane.
In the 737 Max, the engine nacelles themselves can, at high angles of attack, work as a wing and produce lift. And the lift they produce is well ahead of the wing’s center of lift, meaning the nacelles will cause the 737 Max at a high angle of attack to go to a higher angle of attack. This is aerodynamic malpractice of the worst kind.
This is really, really bad.  Consider a plane that is about to stall.  One approach (especially with large, powerful engines) is to apply power to increase air speed.  On the 737 Max, this will cause the nose to pitch up and bring on the stall.  The design is inherently unstable in this situation.
Let’s review what the MCAS does: It pushes the nose of the plane down when the system thinks the plane might exceed its angle-of-attack limits; it does so to avoid an aerodynamic stall. Boeing put MCAS into the 737 Max because the larger engines and their placement make a stall more likely in a 737 Max than in previous 737 models.
When MCAS senses that the angle of attack is too high, it commands the aircraft’s trim system (the system that makes the plane go up or down) to lower the nose. It also does something else: Indirectly, via something Boeing calls the “Elevator Feel Computer,” it pushes the pilot’s control columns (the things the pilots pull or push on to raise or lower the aircraft’s nose) downward.
This sounds sensible, although kludgy.  The problem is that the Elevator Feel Computer has a really powerful actuator; pilots will struggle to overcome it and push the nose down.  It seems that this wasn't a bug, but a feature of the design.  But here's the crux of the problem:
In the 737 Max, only one of the flight management computers is active at a time—either the pilot’s computer or the copilot’s computer. And the active computer takes inputs only from the sensors on its own side of the aircraft.
When the two computers disagree, the solution for the humans in the cockpit is 
to look across the control panel to see
 what the other instruments are saying and then sort it out. In the Boeing system, the flight
 management computer does not “look 
across” at the other instruments. It 
believes only the instruments on its side. It doesn’t go old-school. It’s modern. It’s software.
This means is that if a particular angle-of-attack sensor goes haywire—which happens all the time in a machine that alternates from one extreme environment to another, vibrating and shaking all the way—the flight management computer just believes it.
There's no redundancy.  Let me elaborate on that:

There's no redundancy.
There's no redundancy.
There's no redundancy.
There's no redundancy.


Holy cow, this is the dumbest design I've ever heard of, and I'm not even an aeronautical engineer.  This smells of "we found this out late in testing and had outsourced software developers write us some code in a hurry to fix it".  I don't know if that's how things happened but I've seen this more than once or twice in my career.
It gets even worse. There are several other instruments that can be used to determine things like angle of attack, either directly or indirectly, such as the pitot tubes, the artificial horizons, etc. All of these things would be cross-checked by a human pilot to quickly diagnose a faulty angle-of-attack sensor.
In a pinch, a human pilot could just look out the windshield to confirm visually and directly that, no, the aircraft is not pitched up dangerously. That’s the ultimate check and should go directly to the pilot’s ultimate sovereignty. Unfortunately, the current implementation of MCAS denies that sovereignty. It denies the pilots the ability to respond to what’s before their own eyes.
Like someone with narcissistic personality disorder, MCAS gaslights the pilots. And it turns out badly for everyone. “Raise the nose, HAL.” “I’m sorry, Dave, I’m afraid I can’t do that.”
There's no redundancy.
There's no redundancy.
There's no redundancy.
There's no redundancy.
So Boeing produced a dynamically unstable airframe, the 737 Max. That is big strike No. 1. Boeing then tried to mask the 737’s dynamic instability with a software system. Big strike No. 2. Finally, the software relied on systems known for their propensity to fail (angle-of-attack indicators) and did not appear to include even rudimentary provisions to cross-check the outputs of the angle-of-attack sensor against other sensors, or even the other angle-of-attack sensor. Big strike No. 3.
None of the above should have passed muster. None of the above should have passed the “OK” pencil of the most junior engineering staff, much less a DER.
That’s not a big strike. That’s a political, social, economic, and technical sin.
This is a long and detailed article and I've only excerpted key bits.  You should really read the whole thing because the situation is simply horrifying.  Boeing has destroyed their reputation.

I've written many, many, many times about design issues in Airbus' flight control software,, where the pilots become confused or the software freaks out and people die.  I always liked flying Boeing because their reputation that "the pilot is always in charge" was my strong preference - my whole career has been dealing with software failure, and my imagination is too active to ever be comfortable on an Airbus plane.

Well that has all changed after 737 Max.  It's not just that the pilot can't fly the plane now, it's this:
That’s because the major selling point of the 737 Max is that it is just a 737, and any pilot who has flown other 737s can fly a 737 Max without expensive training, without recertification, without another type of rating. Airlines—Southwest is a prominent example—tend to go for one “standard” airplane. They want to have one airplane that all their pilots can fly because that makes both pilots and airplanes fungible, maximizing flexibility and minimizing costs.
It all comes down to money, and in this case, MCAS was the way for both Boeing and its customers to keep the money flowing in the right direction. The necessity to insist that the 737 Max was no different in flying characteristics, no different in systems, from any other 737 was the key to the 737 Max’s fleet fungibility. That’s probably also the reason why the documentation about the MCAS system was kept on the down-low.
And so the pilots on the fatal flights couldn't figure out how to get out of the situation because Boeing intentionally did not tell them.  Allegedly.  This one will have to go through the courts but this very well may end up being the most expensive design mistake in history.

12 comments:

  1. Yeah. I've been harping on lack of redundancy in flight-control systems for a while now.
    There's a lot of redundancy in the data available, that could potentially be used for sanity filters if nothing else. In this case, they're using the unreliable AOA sensor to detect an uncommanded pitch up - and not, e.g., checking the pitch gyro, let alone a combination of airspeed, rate of climb, etc.
    Even a bounds check on the AOA reading could give a useful result: "If this reading is right, and the airspeed is enough to be even vaguely in flight, then the wings should have torn off. Ergo, the sensor is busted. Ignore it."

    ReplyDelete
  2. Sadly true... However, I have been told by friends that fly for two different airlines that US airlines were briefed/trained on the MCAS including disconnect procedures and have done so more than once.

    ReplyDelete
  3. Eric, yeah. There's a lot of sanity checking that seems not to have been coded into the MCAS system. This makes me think that nobody who understood flying was involved in the requirements definition or design reviews.

    Old NFO, that actually makes me feel a little bit better. But only a little bit.

    ReplyDelete
  4. Borepatch: Not just nobody who understood flying, but nobody who understood systems engineering, or even had a clue about managing risks.
    (Or UI design, for that matter. Apparently the extra-cost-optional AOA Disagree indicator is a software option, and it pops up an indication on the same screen as the AOA indicator, but in a visually unrelated location. Brilliant!)

    ReplyDelete
  5. I hangar my plane next to a 737 Max pilot....also used to fly 700 and NG series.

    ALL 737's pitch up on power. ALL of 'em. The engines are far below the longitudinal axis. Obviously the more thrust, the more pitch angle, but they ALL do it.

    And lets face it, there had been AOA sensor failures since the FIRST MAX came off the line. Most "western" pilots are trained to deal with it.

    Asian and third world pilots ....not so much. They rely heavily on the systems. Take off, engage the autopilot at 300 feet, let it fly the departure, the route the hold and the landing. Few are true "stick and rudder/pitch and power" pilots.

    the Lion Air plane had the EXACT same failure the day before, but that pilot knew how to handle it. The dead crew....not so much.

    The Ethipopian flight crew (the FO had 300-ish hours, BTW) kept the power at 92% while trying to deal with the emergency.... which made trim impossible with that jet wash across the tail. They pretty much made every mistake possible during the emergency...

    Yes, Boeing fucked up by only using one AOA sensor as the input for MCAS, but the crash and the deaths are on the flight crew and their (lack of) training and response to the emergency.



    ReplyDelete
  6. BTW: I'd fly with a WESTERN (US or European crew) on a 737 MAX, but probably not on an asian or third world airline crew.

    ReplyDelete
  7. B: before you go trusting western crews, remember Air France 447. A grand combination of automation failure and airmanship failure.

    ReplyDelete
  8. The difference is that the Boeing isn't a Fly By Wire aircraft....Apples and Oranges comparison. And the entire MCAS system can be easily disabled on the 737. If the pilots had followed procedure, both of the crashes could have been avoided. They didn't.

    Yes, the system should have redundancy.,,,Bad engineering is definitely a contributing factor But in the end, the pilots of both planes could not (or would not) fly without the automation, and they failed to disable the (failed) automated systems and just fly.

    ReplyDelete
  9. Also Boeing didn't want to have to certify the MAX as a new type because that would require potential customers buy expensive sim time to retrain their pilots.

    ReplyDelete
  10. I'm pretty sure the airlines didn't want to spend that money either....

    ReplyDelete
  11. That IEEE article isn't available to the unpaid masses.

    ReplyDelete
  12. I smell a software agile team at work.

    ReplyDelete

Remember your manners when you post. Anonymous comments are not allowed because of the plague of spam comments.