Wednesday, December 21, 2011

Another Airbus software problem

Final report is out on the Quantas Flight QF72 incident, and blame is laid squarely on bad software:

On 7 October 2008, the Australian-owned A330-303 aircraft was cruising at 37,000 feet when the autopilot disengaged and the aircraft rose, before plunging downwards sharply, injuring 110 of the aircraft’s 303 passengers and three-quarters of the cabin crew. Three minutes later the aircraft did it again, and the flight crew was bombarded with warnings from the instrumentation – almost all of them false.

The pilots issued a PAN distress call, but upgraded this to MAYDAY after seeing the seriousness of the injuries onboard. They disabled the automatic pilot and throttle control systems and then managed the approach and landing at Learmonth, Western Australia using backup instruments. Since the source of the problems couldn’t be immediately identified the crew used manual pressurisation control and braking equipment because the automatic systems weren’t trusted. In all, 51 passengers and crew required hospitalisation following the incident.
Seems a device went bad and fed bad data into the flight computers, which lots their minds trying to make sense of it.  It's actually kind of amazing that someone didn't die here.

They're actually blaming this on cosmic rays (srlsy), although that's a bad joke.  The real issue (in my professional opinion) is that the computer should be able to recognize when it's lost its mind and tell the pilot that he has to start real flying, as opposed to the usual Airbus desk flying.

11 comments:

North said...

"the computer should be able to recognize when it's lost its mind"

Physician heal thyself?

Easier said than done.

I'd rather see an effort to fix the fact that their engineering department has lost their minds. But that WON'T come from within that department or the out-sourced groups.

Borepatch said...

North, more specifically, the computer should have a confidence score, and when that score drops below a threshold, the computer should notify the pilot that the computer output should not be considered reliable.

BTW, the Space Shuttle computers had something sort of like this, although it was used by the series of computers that flew the Shuttle.

There's quite a lot of value in having the engineering teams think in these terms. I don't believe that they have, at least at Airbus.

K said...

I can't believe that they don't have redundant computers with polling/sanity checks to remove the vote of the one with bad data. I realize that's a lot of work, but I assumed that's how it was done in modern systems.
The Shuttles had that design since the late 70s... It hasn't trickled down to standard airliner design yet?

Old NFO said...

You have to remember, the Airbus was designed with 'female' logic... e.g. ass backwards from US birds. The computer is the ONLY system that can make a decision on that POS... That is why I don't fly on them unless I'm absolutely out of options to get somewhere.

Broken Andy said...

Was it a computer that went bad or a sensor? The computers should be programming to notice bad values from the sensors and alarm and possibly compensate. It sounds like the computers disengaged upon realizing they could not actually sense what was going on.

So I think the real issue here is with the pilots, who put the plane back on auto-pilot after the computers disengaged the first time.

An Ordinary American said...

Saying "another software problem with the Airbus" is like saying, "Another lie from the Obama misadministration."

They are indistinguishable from one another.

Agree with Old NFO.

--AOA

North said...

Borepatch: If a confidence score were implemented on the embedded systems, then implementing that would indicate possible imperfection in the system and reflect badly on the team. Better to design a flawless system, which makes the team look good.

TOTWTYTR said...

The autopilot on Air France 447 disengaged suddenly during a thunderstorm, probably due to a failure of the air speed sensors.

The pilots were simultaneously overloaded with warnings, while being under informed by the system.

Unable to diagnose what was happening they flew straight into the sea killing everyone on board.

I'm with OLDNFO, I try to avoid Airbuses whenever possible.

Dave H said...

North: "If a confidence score were implemented on the embedded systems, then implementing that would indicate possible imperfection in the system and reflect badly on the team."

I disagree. I've personally developed systems where the input validation and error compensation (on the inputs, not on the system itself) was so aggressive that the product worked, met specifications, and was being shipped to customers for almost six months before we discovered there was a hardware design flaw. This is validation code that a manager wanted me to remove because it "wasn't performing any useful function." (I tend to not make design decisions based on managers' orders any more because of that. I'll probably get fired for insubordination some day. I don't lose any sleep over it.)

Quizikle said...

Oh, no. It can't be the software. We've finished development and let the contractors go. We can't afford to slip the schedule any further.
Q

Luton Ian said...

Are you sure that it is a software problem, rather than one caused by adding the obligatory €uro symbol keys to the hardware?

Remember, if you're buying anything in 2012, that has a keyboard:

If it has a € symbol on that keyboard, then, it is old stock!