Friday, September 29, 2017

A great illustration of just how fragile computing is

This is oral tradition on why computers crash, but it's a great example of why I am skeptical of (for instance) computer controlled self-driving cars.  Mystery reboots:
One of Rick's customers was a large bank that ordered a pair of SUN E6500 servers [these were enormously expensive - certainly $50,000 each, maybe twice that.  - Borepatch]. Oracle may have hosed out its hardware teams but still has this whopping PDF Reference Manual for the machines. What bruisers they were! Each needed a full rack all to itself to house a 16-slot card cage, something called a “quad fan tray”, memory module, UltraSPARC II module, media tray, a pair of power/cooling modules, an AC power sequencer and even a peripheral power supply! 
Rick told us that bank put the two servers on the top floor of its building, where they hummed away happily until one morning they were discovered to have rebooted overnight. 
And not just rebooted once: they'd been up and down all night like someone who'd topped off a few beers with salmonella-tainted kebab. 
“The customer called and was furious,” Rick told us. And stayed that way for days, because the first technician to visit couldn't figure out what had gone wrong. Nor could other experts over the next week.
You have to read through to see what was causing the problem, but it's a great story that illustrates that computing is no where near as clean and antiseptic as designers claim.  That moral applies to self-driving cars as well.  What are the chances that there's a mystery reboot scenario in you new ride?  Me, I'm not willing to bet that there isn't.

6 comments:

Jess said...

That reminds me of a company I once worked for. They had a vault built of thick concrete, with heavy fireproof door. The important documents, including computer backups, were stored for protection.

One day, the comptroller reported the accounting backup disks were blank. Further examination revealed more blanks, or corrupted data.

Long story short: A bolt of lightning struck the radio tower outside, and adjacent, to the vault area. Before dissipating, the bolt energized the reinforcement in the concrete walls of the vault. When this happened, a strong degaussing field was created, and the magnetic disks were corrupted.

ASM826 said...

I already know. Housekeeping was unplugging them to plug in a vacuum or a floor buffer.

Flugelman said...

Reminds me of the time I had to visit a customer in Houston whose Delta CPM office computer had repeated hard drive failures. (Four in three months, IIRC.) When I saw the setup it was painfully obvious what the problem was, the clowns we had hired to install the system had situated the printer (A Mannsman-Talley dot matrix machine, this was 1984.) on top of the Delta case. Which meant every movement of the huge print head was transmitted to the Delta. That print head had to weigh a half pound.

We didn't contract any more work to the aforementioned clowns.

Sherm said...

I worked for a multi-state health club back in the 1990s. The entire operation worked off a single, very expensive computer with its own room, cooling system, alarms, communication links, etc. Elaborate plans were drawn up to move this computer to a different location, letting a back-up system handle the load for the couple of days it took to make the move. The night before, in anticipation of the move, the alarm system was disconnected. Shortly thereafter a workman hanging a shelf on the outside wall of the computer room drilled through the power cable to the cooling system. Fried was the term most used and they were able to toss the, no longer expensive, computer in the back of a pick-up truck for the move. I heard the replacement computer was very expensive.

Anonymous said...

Self-driving cars:
A mid-70s pickup with granny gear. Put it in granny about 20 feet before the fence, get out, walk to the fence, open the fence, let the truck drive itself through, close the fence, walk back to the truck, get in, put it in 2nd, drive away.

100% immune from hacking :)
Q

Will said...

ASM826:

Housekeeping strikes again! The whipping boys of industry...