Saturday, October 26, 2013

The inevitable failure of the Healthcare.gov "surge"

Failure is inevitable and in fact it has already started, 2 days after the announcement.  You see, the analogy is entirely wrong: the "surge" in Iraq (especially in al Anbar) was a dynamic of introducing additional, independent assets.  The key word is "independent".  The Healthcare.gov folks aren't.

This was explained nearly 40 years ago in Fred Brooks' software development classic, The Mythical Man Month.  Brooks expounded at length on the challenge today confronting Healthcare.gov, and summed up their agony with what has come to be called Brooks' Law:

Adding manpower to a late software project makes the project later.

This will seem counter intuitive to non-Programmers, people like Health and Human Service Secretaries.  But it's a core concept of software, and one that's fundamentally different from most of the physical world.  In the physical world, there is a much greater degree of independence.  If I need a rush job for a big order of widgets, I can open a new widget factory.  The widgets are independent from each other, and so the manufacturing process can proceed in parallel.

The surge in al Anbar was precisely like this: multiple units acting in parallel.  It worked pretty well, because it was the physical world.

Not so with software.  The design is really an integrated whole.  Essentially, healthcare.gov itself is the widget.  There's only one widget, and it's massively complex.  And this is where we get to the heart of the matter, and Brooks' Law.

You see, there are a few people who actually understand how the widget works.  These, of course, are the ones who wrote the code.  Maybe the code is terrible and maybe its not (it is very plausible that the programmers simply weren't given enough time to do it properly; this is shockingly common in software projects even in the private sector, and is an almost universal constant in the government).  But this set of people quite simply are the only ones who know what the code does, and why it does it.

Now let's double the size of the team.  Sounds like we'll make good progress, right - after all, twice as many seems twice as good.  But the new guys don't have a clue about how the code works, or why the code was written the way it was.  The only thing that they really can do is trivial tasks like documentation or fixing trivial bugs.  Remember, Healthcare.gov's problems are not trivial bugs, it's fundamental breakage in architecture and design.

The new guys simply cannot help with that.  There's no schedule benefit they can provide, no way to accelerate the project towards completion - at least until they learn the architecture and the code.  Then they can actually help.

So how do they learn the architecture and the code?  They ask the existing core group of programmers.  And so the existing group now finds itself not fixing the fundamental architecture and design breakage, but acting as mentors and tutors to a bunch of (hopefully) smart but new team members.

And the schedule slips further, because actual work isn't being done.


It's actually worse than this.  Coordinating a large software team is much harder than coordinating a small one.  Having worked with both as a Product Manager, the small teams seemed to almost direct themselves in a delightful manner.  With large teams, you find your schedule filling up with Core Team meetings and Test/QA meetings and tracking meetings and meetings to brief the higher ups.  One wag once said to me that we were keeping excellent track of the progress we weren't making.  Morale on a big, high visibility project that is seriously off track is a big problem.  The danger is that the good people get fed up and leave (they're good, and so they will have options) because the team leans on them for more than their share of the progress.


And so the statement from Jeff Zients, the project honcho, that the site would be functional ahead of the deadlines is without doubt wrong.  Zients may or may not know this; he is, after all, a management wonk who took a company public during the dot com bubble - but it wasn't a software company.  So the hype* about him as a "Tech Savior" is just hype.  He can't change the dynamics of how software is developed.

And so my prediction is that Healthcare.gov will not really work for another year.  It will take 90 days for the "surge" programmers to become useful, and by then the project will be another 90 days behind.  60 days after that the surge project will get a timeline "break even" - i.e. will be at the same point that it would have been without the surge.  That's April 1.

And so the only question then is just how bad is the architectural and design breakage?  We don't know, but the government's track record on large software development efforts is miserable.  The FAA famously wrote off a $1.5B effort to computerize the air traffic control system.  I myself as a fledgling Electrical Engineer was involved in a government program that ended up $200M over budget and facing Congressional Hearings.  That project never worked, even after the Government "declared victory".

That may in fact be the fate of Healthcare.gov.  Certainly victory will be declared, likely repeatedly.  And the system will collapse repeatedly, shortly after the declarations.  Eventually nobody will care, because they will all mentally write the thing off as a lost cause.

* Who on earth thinks that Zients is some sort of tech guru, anyway?  This is nothing but White House spin, desperately served up to a compliant press.  But as with Obama himself, high initial expectations will not be met, and this will go down as yet another case of over promising and under delivering.

8 comments:

New Jovian Thunderbolt said...

We need a baby, and need it fast. If it takes 9 months for 1 woman to make a baby just hire 9 women and we'll have that kiddo in 31 days.

Old NFO said...

Personally I think you're being optimistic... And you didn't even get into the cost issues, I'm betting at least a doubling of the cost in addition to the year plus delay... Zients will fail massively and be the designated fall guy.

AndyN said...

We don't know, but the government's track record on large software development efforts is miserable.

Even at that, I think CGI's track record is even worse than the government's in general. If you look at what they did with the Canadian gun registry, it's almost as if HHS was looking to give a no bid contract to somebody they knew wouldn't be able to do the job. I'm still trying to resist buying the idea that PPACA was designed poorly on purpose so that it would fail and people would accept socialized medicine, but every time I turn around I'm faced with something else that looks like they're trying to blow it.

One more thing about the surge that I don't think I've seen anybody address. They're supposed to have brought in the best and brightest to help out. Does anybody actually believe that the best and brightest were sitting at home with no work waiting for Obama to ask them to come save him? Not only are they bringing in extra people who will have to be brought up to speed, they're bringing in extra people who nobody wanted to hire before things got desperate.

drjim said...

" bringing in extra people who nobody wanted to hire"

What was it Ronald Reagan said about the best and brightest NOT working for dotgov?

whitecollargreenspaceguy said...

The states are contracting with statewide organizations who sub- contract to local organizations. This is too disjointed and waters down getting the word out and the work done. ACA clients will need ongoing help making decisions about providers and claims problems which may be too much for third level contractors to handle. CMS should arrange for Obamacare application helpers to work in all 1300 SSA offices. SSA has lost 10% of staff in last 3 years. There now between 4 and 8 empty work stations in each SSA office. They total 6,000 to 10,000, altogether they are worth up to $200 million, and they are unused due to staffing losses. If not used by Obamacare, the government is wasting about $1billion over next 5 years. This would greatly simplify national PSA'a - just tell citizens to visit their local SSA office. ACA navigators should use them to reach the public. When Medicare first started, SSA offices had to be open at night and on the weekends to get everyone enrolled. We must be successful in the roll-out of customer services for Affordable Care Act. Web Site and 800# are not enough. I would not buy a car or a house that way. Many citizens need face-to-face customer service. This plan can be applied to other federal agencies and we could add a second shift of white collar workers, see http://whitecollargreenspace.blogspot.com/ or Contact timalantoo@hotmail.com or Tim at 989-701-8813

Unknown said...

Yeah, that was a good book.

All true, but then it's BIG government - so BIGGER is always better. Especially when it not.

it's not just government that has this problem - they just go that way naturally.

Borepatch said...

Whitecollargreenspaceguy, the people to enroll the public in face to face meetings are called "Insurance Agents". Why would we need to staff up a government bureaucracy to do (poorly) what the private sector does better?

The question is rhetorical. The private sector isn't sprinkled with magic government dust that excuses all failings.

Mark Philip Alger said...

Key problem is that the core design requirement is fundamentally flawed.

There's been a lot of argument over whether or not the law is constitutional, or whether this part is possible or that part is politically feasible.

The part that everybody is ignoring like the emperor's birthday suit is that the law is economically unfeasible.

No amount of lipstick or beer goggles will make this pig pretty.

M