Friday, April 12, 2013

Repost: Government Data Mining FAIL

I do this infrequently, but this topic is so apropos to the previous post that I thought it worth reposting in its entirety.  I'd point out the unbelievable prescience shown in the dawn of this blog (only 100 days old at the time - still had that New Blog smell!), but it's really kind of obvious, isn't it?

--------------------------------------------------------------

Anti-terrorist data mining doesn't work

One of the biggest problems in Internet Security is getting the "False Positive" rate down to a manageable level. A False Positive is an event where your security device reports an attack, where there's no actual attack happening. It's the Boy Who Cried Wolf problem, and if it's too high, people turn the security off.

Apple had a hilarious ad that spoofed Vista's UAC security a while back. The security is so good that the whole system is unusable:



Surprise! Seems that identifying terrorists by mining a bunch of databases isn't any better:
A report scheduled to be released on Tuesday by the National Research Council, which has been years in the making, concludes that automated identification of terrorists through data mining or any other mechanism 'is neither feasible as an objective nor desirable as a goal of technology development efforts.' Inevitable false positives will result in 'ordinary, law-abiding citizens and businesses' being incorrectly flagged as suspects. The whopping 352-page report, called 'Protecting Individual Privacy in the Struggle Against Terrorists,' amounts to [be] at least a partial repudiation of the Defense Department's controversial data-mining program called Total Information Awareness, which was limited by Congress in 2003.
The problem is not so much one of technology, as it is of cost. Suppose you could create system where the data mining results gave you only one chance in a million at false positive. In other words, for every person identified as a potential terrorist, you were 99.9999% likely to be correct. This is almost certainly 3 or 4 orders of magnitude overly optimistic (the actual chances are likely no better than 1 in a thousand, and may well be much less), but let's ignore that.

There are roughly 700 Million air passengers in the US each year. One chance in a million means the system would report 700 likely terrorists (remember, this thought experiment assumes a ridiculously low false positive rate). The question, now, is what do you do with these 700 people?

Right now, we don't do anything, other than not let them fly. If they're Senator Kennedy, they make a fuss at budget time, and someone takes them off the list; otherwise, we don't do anything. So all this fuss, and nothing really happens? How come?

Cost. If we really thought these folks were actually terrorists, we'd investigate them. A reasonable investigation involves a lot of effort - wire taps (first, get a warrant), stakeouts, careful collection of a case by Law Enforcement, prosecution. Probably a million dollars between police, lawyers, courts, etc - probably a lot more, if there's a trial. For each of the 700. We're looking at a billion dollars, and this assumes a ridiculously low false positive rate.

There are on the order of a hundred thousand people in TSA's no-fly or watch databases. Not 700. If you investigated them all, you're talking a hundred billion bucks. So they turn the system off.

And that's actually the right answer. The data's lousy, joining lousy data with more lousy data makes the results lousier, and it's too expensive to make it work. How lousy is the data? Sky Marshals are on the No-Fly list. No, really. 5 year olds, too.

So the Fed.Gov sweeps it under the rug, thanks everyone involved for all their hard work, and pushes the "off" button.

As expected, the Slashdot comments are all over this:
I'd take their "no fly" list and identify every single person on it who was a legitimate threat and either have them under 24 hour surveillance or arrested.
The mere concept of a list of names of people who are too "dangerous" to let fly ... but not dangerous enough to track ... that just [censored - ed] stupid.
At least everyone's looking busy. The analogies to gun control pretty much write themselves.

2 comments:

  1. And remember, "If you're not on some government watchlist, you're not trying!"

    What have you done for false positives today? Do your part!

    ReplyDelete
  2. More magical thinking. If you draw your pentacle just right, and chant the incantation perfectly, then you can summon a demon to do your bidding. However, if you get anything at all wrong, you're dead. Horribly.

    The folks that support all these zero risk programs don't live in a scientific reality, they believe they can manipulate reality to fit their preferences. In medical terms, they're nuts.

    Life is a series of trade-offs, compromises, satisficing (satisfy and suffice) solutions, and coin flips. And then you die. There's no way off this train other than that. I don't mind if other folks want to go play with pixie dust and believe they can fly, but it really pisses me off when they insist that I jump off the cliff with them.

    ReplyDelete

Remember your manners when you post. Anonymous comments are not allowed because of the plague of spam comments.