Monday, November 5, 2012

I don't trust the polls for exactly the same reason I don't trust the climate databases

The problem isn't so much the data, the problem is that the data gets changed.  If all you hear about is the "adjusted" temperature data, then the science does indeed sound settled.  If you dig into the data, you find that the adjustments have entirely changed the outcome.  It's the same for the polls.

First, the headlines.  We are told that the global temperature is increasing at around 0.3°C per decade, maybe a little less.  We're told that the 20th Century saw warming of around 0.7°C.  That's the climate headlines; the RCP poll average shows Obama tied or slightly ahead, and this has been pretty stable over a period of weeks.  Sonic Charmer sums up what is pretty much the consensus view:
Now everyone says that if Romney wins, it’s likely to be via winning Ohio. But to think Romney will win Ohio, or is even all that close (the average gap above is ~3points), you have to think virtually ALL of these pollsters have (intentionally or inadvertently) a (D) bias. It is just astronomically-unlikely to get a set of results like above from 18 different pollsters (using whatever different methods) from a genuinely Romney-leaning state without there being something systematically wrong with virtually all of the pollsters.
Exactly.  Something is wrong, and it's effecting pretty much all of the pollsters.  I think that I've seen this before, in the temperature databases.  Background on temperature is in an old post of mine, How to create a Scientific Consensus on Global Warming which you really should read because it is long and detailed and I show my work.  The key part is this, though:
There are two parts to the GHCN data: the raw temperature readings, and adjustments to the readings. The raw numbers are easy - they're just the instrument reported temperature for the weather station. Look outside your house at your thermometer - that's the raw data. Here Chez Borepatch, my thermometer says that it's 39°.

Adjustments are modifications to the readings, to "remove inhomogeneities" in the raw data. You (like me) may look at that and say Whiskey Tango Foxtrot are inhomogeneities?

...

OK, we don't want a jump in the historical record if you move a station or replace a thermometer with a better one.

But. All the Climatologists in the world will look at this data. How much do the adjustments change the results?

We don't know, but people are starting to look. They're starting to find that adjustments change the data a lot. They change the data so much that they show that the earth is warming when the raw data may show that it's cooling.

Let me say that again: Thermometers may be showing that the Earth is cooling, but adjustments to this data show a rapid temperature rise.

Let me give three examples.

Darwin, Australia:The blue line is the raw data from the five weather stations in Darwin. It shows a 0.7°C cooling over the 20th Century. The Black lines are the adjustments to this data, showing a big jump in 1940 and a substantial increase since then. They turn the raw data decline into a 1.2°C increase over the course of the 20th Century.
The problem isn't with the data as originally collected.  The problem is what gets done to the data after it gets into the hands of the climate modeler.  This is my biggest problem with the idea of Global Warming - it's not clear at all from the data that things are warming up.  The adjustment process is a mess, and the scientific establishment really isn't explaining what's going on.  Thus my position as a skeptic.

It's the same thing with the polls.

Right up front, let me say that I have no reason to think that the pollsters are making mistakes in how they conduct the polls.  I will assume that their sample is suitably "random", that the questions do not lead to bias, that they correctly identify registered vs. likely voters, that the margin of error is as stated.  These may all be wrong (or some of them may be wrong), but Sonic Charmer is quite correct that it is unlikely in the extreme to have all 18 pollsters screwing up the technical internals of how polls are conducted.  My money certainly wouldn't make that bet, and yours shouldn't, either.

But that's the raw poll data.  That's not what's getting reported - what we hear about (and see in the RCP Poll Average) is adjusted poll data.  And here's where I catch more than a whiff of Bravo Sierra.  The Czar of Muscovy covers how the polling data gets adjusted in a fabulous post, Why the Polls are Screwy (Best To Ignore Them):
Something doesn’t seem right—poll after poll...after poll keeps showing the President tied with Mitt Romney and even a little ahead—and in some of the swing state polls, you see the President far ahead. Yet the average person thinks “That can’t be right.” Anecdotes, yes, are not data, but by God it sounds like people are turning out in waves to vote for Romney. Lines in GOP-friendly early voting locations are out the door, with cars lined up waiting for a chance to park. Early voting seems to be showing a massive turnout for Romney, unless a Democrat campaign spokesperson is asked; then, the numbers seem to float for Obama. All the pundits who have done this many times in their life, however, seem to anticipate a decisive Romney win—even a landslide. So how can Obama be so close or even ahead?

...

If we know that Democrats routinely make up 37% of voters, and then we can assume that 620 voters is probably too high. 620 out of 1000 is of course 62%, not 37%. So if we got 620 voters to indicate they are Democrats, than the actual number of Republican voters out of every 1,000 is probably closer to 493...all we do is multiply 304 actual Republican participants by 1.62 to normalize the number. And we see that the proportions are maintained pretty well. We can then figure out the probable number of independents from there. The results wind up:
Dems (620) 37%
GOP (493) 29%
Indep (553) 33%
Of course, this seems very scientific but it is of course absolute horseshit. It assumes that the fact Democrats average 36-38% of the vote makes a safe baseline for you to beef up the other numbers.
The polls report the Obama is slightly ahead, yet all of the polls show Romney with a double digit lead with Independents.  History shows that a sitting President never increases his share of the Independent vote over what the polls showed the week before the election.  Never.

You can argue whether the Republicans are sufficiently energized and the Democrats sufficiently demoralized, and what this will do to turnout.  I have my own opinion but it's just that: opinion.  We're all entitled to our own opinion and yours is just as good as mine.

But we're not entitled to our own data, either for climate science or for polling purposes.  When you willy nilly change the data that's been collected, expect to be challenged on your assumptions.  It looks like all of the pollsters (or almost all) are giving a weighting towards a heavy Democrat turnout and a close split of Independents, resulting in a narrow Obama win.  I think that this is nonsense on stilts, and (opinion) Republican turnout will beat Democrat turnout (note that it seems that Gallup has data backing this up).  More importantly (data) Independents will vote for Romney in significant numbers.

Add this up and you have to adjust the polling weights from D+3 to R+4.  That's my model.  Using the pollster's data in my model you come up with Romney running away with the election.  My take there was Romney 359, Obama 179 although I thought I had upped Romney's margin in a subsequent post (I can't find it, drat it all).

And so to Sonic Charmer's point, it's not the pollster's data that I challenge, it's their models.  I put more credence in my model than in theirs.  I'm not sure what motivates the pollsters but it doesn't really matter.  We'll know soon.

7 comments:

Mrs. S. said...

Getting back to the raw data, I usually tell pollsters they will know my answers on election day, and then I hang up the phone. Wonder how many other folks do that?

joe.attaboy said...

I would just like to be polled. Just once. My wife and I have lived in the same Florida county for 27 years. We've been registered to vote in that county...for 27 years. We've never been called, visited, emailed by, or seen anyone who wanted us to answer a genuine voter poll.

I'm beginning to think it has to do with my county (Clay, in Florida). Of 133,000 registered voters in the county, about ~73,000 are Republican, ~32,000 are Democrat and the other ~28,000 are "Others" (most of which are "no party affiliation".

Perhaps the polling organizations just don't see the point in contacting anyone on Clay County because of the lopsided Republican numbers.

As of this morning (Monday 11/5), about 31% of the county's voters have voted early. My wife and I were there Friday afternoon and waited about 40 minutes to get in.

The Czar of Muscovy said...

Joe.Attaboy--

I can answer your question. You are a registered Republican, and are off the call lists. Registered Democrats are as well. They already know to factor your vote into their equation; it's the "undecided" voters who get badgered by calls.

The easiest way to get out of getting poll calls is to register toward a particular party. Too bad that doesn't stop the infomercial robocalls.

BP--thanks for the nod. Appreciate it, buddy.

Anonymous said...

Nice unification of the statistics of AGW and political polls.

The polls I've looked into in depth all oversample Democrats by a lot - up to +10 (IIRC). It's all based on the '08 turnout - but that turnout was historical. It defies sense to think that something that has happened once in recorded history is going to happen exactly the same way immediately - especially when you consider how badly the polls have done at capturing races like, well, most of 2010, Ted Cruz in Texas this year, and so on.


SiGraybeard@work

agirlandhergun said...

Can not wait for this to be over.

Anonymous said...

I always hang up, saying "I don't do polls. Thank you. Good-bye." if there's a real person calling.

Polling results announced during a campaign are all suspect because they rely on the honesty of the responses, which is undeterminable.

Polls aren't, on the other hand, totally unrelated to reality because some people do respond honestly. But there's no way to set limits on the inherent errors, so all polling is rubbish. Or to be more precise, the fundamental errors mean you can't tell the more accurate polls from the less accurate ones.

So why waste your time with any of them?

And this election looks to be especially unreliable because of what's called the Bradley Effect, which I suspect will be big this year. You can google the term, but it's where people tell pollsters one thing and vote differently.

Mark Smith said...

Given the results, would you like to concede that the pollsters were right?