Wednesday, January 09, 2008

Polling Errors in New Hampshire























Hillary Clinton's stunning win over Barack Obama in New Hampshire is not only sure to be a legendary comeback but equally sure to become a standard example of polls picking the wrong winner. By a lot.

There is a ton of commentary already out on this, and much more to come. Here I simply want to illustrate the nature of the poll errors. These show the nature of the problem and help clarify the issues. I'll be back later with some analysis of these errors, but for now let's just see the data.

In the chart, the "cross-hairs" mark the outcome of the race, 39.1% Clinton, 36.4% Obama. This is the "target" the pollsters were shooting for.

The "rings" mark 5%, 10% and 15% errors. Normal sampling error would put a scatter of points inside the "5-ring", if everything else were perfect.

In fact, most polling shoots low and to the left, though often within or near the 5-ring. The reason is undecided voters in the survey. Unless the survey organization "allocates" these voters by estimating a vote for them, some 3-10% in a typical election survey are left out of the final vote estimate. Some measures of survey accuracy divide the undecided, either evenly across candidates or proportionately across them. There is good reason to do that in another post. But what the pollsters publish are the unallocated numbers (almost always) and so it seems fair to plot here the percent of the vote the pollster published, not one with undecided reallocated.

What we see for the Democrats is quite stunning. The polls actually spread very evenly around the actual Obama vote. Whatever went wrong, it was NOT an overestimate of Obama's support. The standard trend estimate for Obama was 36.7%, the sensitive estimate was 39.0% and the last five poll average was 38.4%, all reasonably close to his actual 36.4%.

It is the Clinton vote that was massively underestimated. Every New Hampshire poll was outside the 5-Ring. Clinton's trend estimate was 30.4%, with the sensitive estimate even worse at 29.9% and the 5 poll average at 31.0% compared to her actual vote of 39.1%.

So the clear puzzle that needs to be addressed is whether Clinton won on turnout (or Obama's was low) or whether last minute decisions broke overwhelmingly for Clinton. Or whether the pollster's likely voter screens mis-estimated the make up of the electorate. Or if the weekend hype led to a feeding frenzy of media coverage that was very favorable to Obama and very negative towards Clinton, which depressed her support in the polls but oddly did not lower her actual vote.

On the Republican side we see a more typical pattern, and with better overall results. About half of the post-Iowa polls were within the 5-ring for the Republicans, and most of the rest within the 10-ring.






















As expected, errors tend to be low and left, but the overall accuracy is not bad. This fact adds to the puzzle in an important way:

If the polls were systematically flawed methodologically, then we'd expect similar errors with both parties. Almost all the pollsters did simultaneous Democratic and Republican polls, with the same interviewers using the same questions with the only difference being screening for which primary a voter would participate in. So if the turnout model was bad for the Democrats, why wasn't it also bad for the Republicans? If the demographics were "off" for the Dems, why not for the Reps?

This is the best reason to think that the failure of polling in New Hampshire was tied to swiftly changing politics rather than to failures of methodology. However, we can't know until much more analysis is done, and more data about the polls themselves become available.

A good starting point would be for each New Hampshire pollster to release their demographic and cross tab data. This would allow sample composition to be compared and for voter preferences within demographic groups to be compared. Another valuable bit of information would be voter preference by day of interview.

In 1948 the polling industry suffered its worst failure when confidently predicting Truman's defeat. In the wake of that polling disaster, the profession responded positively by appointing a review committee which produced a book-length report on what went wrong, how it could have been avoided and what "best practices" should be adopted. The polling profession was much the better for that examination and report.

The New Hampshire results are not on the same level of embarrassment as 1948, but they do represent a moment when the profession could respond positively by releasing the kind of data that will allow an open assessment of methods. Such an assessment may reveal that in fact the polls were pretty good, but the politics just changed dramatically on election day. Or the facts could show that pollsters need to improve some of their practices and methods. Pollsters have legitimate proprietary interests to protect, but big mistakes like New Hampshire mean there are times when some openness can buy back lost credibility.

26 comments:

Daniel said...

My intuition tells me that the cause of the "errors" was rapidly changing politics. After all, it is a manifest fact that the political situation was especially fluid. So why jump down pollsters throats? I guess that makes a better story (shrug).

Having said that, I do think that polling plays a much different role in politics now than it did in 1948. Polling, and the spinning of polling, is intergral to moderen political campaigns. The press has a story to tell and polls play an important part in developing that story line. So it might be too much to hope that the profession will be willing to take such a candid look at its own behavior as it did in 1948.

Robert A Vollrath said...

On the edges of the media I've heard rumbles of voter fraud. Could voter fraud twist the polls out of shape?

Daniel said...

That was the first analysis of the New Hampshire results that didn't try to offer some half baked excuse. Not that I would ever think you would.

Just one more reason why I continue to come back to the site.
Bravo

Oneria said...

REMEMBER FLORIDA! lol

Goldmanusa said...

This is a terrific piece of journalism, the author is to be complimented, I will be discussing it today on my radio show. If they ever do a TV show like NUMB3RS with politics as the them, we know who will be playing the lead.

My compliments.

partha said...

"So the clear puzzle that needs to be addressed is whether Clinton won on turnout (or Obama's was low) or whether last minute decisions broke overwhelmingly for Clinton. Or whether the pollster's likely voter screens mis-estimated the make up of the electorate. Or if the weekend hype led to a feeding frenzy of media coverage that was very favorable to Obama and very negative towards Clinton, which depressed her support in the polls but oddly did not lower her actual vote."

Nice analysis of the voting patterns. But the real questions are the ones that you pose and leave unanswered.

Carrie said...

"So if the turnout model was bad for the Democrats, why wasn't it also bad for the Republicans? If the demographics were "off" for the Dems, why not for the Reps?"

I am intrigued by Andrew Kohut's hypothesis in the New York Times. Is it possible that in this instance the polling audience was an inadequate substitute for - and only for - the Democratic primary voter?

Could it be that lower-income white Democrats who were not polled but did vote were overwhelmingly pro-Clinton and were enough to skew the results?

You wouldn't see that on the Republican side unless this demographic were heavily skewed toward one Republican candidate.

I just found this blog in reference to the NH polling and I'm so happy I did. Political statistics is *such* a fascinating field!

Daniel T. said...

I just want to point out the the first Daniel that posted (me) is not the same as the second Daniel that posted.

Daniel T.

Everymatter said...

i think Hillary Clinton will be not successful

May god bless him

Zach said...

This is an interesting analysis.

What's also interesting is that the Iowa polls were pretty far off.

There hasn't been a lot of discussion of that, largely because I think everyone brushed it off as a lot of young and minority voters (not considered "likely voters" by most polls) showed up, causing what was supposed to be a neck in neck race to turn into an overwhelming Obama victory.

Although in many places Obama's victory lead to more young and minority voters registering, NH was so soon after Iowa, I'm curious as to when their last day to register was. Perhaps the surge in voter registration didn't hit NH like it has other states.

On the other hand, the "likely voters" who would not have voted for Obama may have decided that rather than vote for other establishment candidates (Republican or Democratic), they would vote for Hillary. Also, voters registered as independents in NH but rooting for the Republicans could have voted in the Democratic primary not to boost Clinton, but to put a more beatable candidate than Obama in place.

I think the last two options are especially likely if NH voters had access to the news stories from other major cities attributing surges in voter registration to Obama's success in Iowa.

Most likely, however, the polling errors were some combination of all the factors I listed, plus those in your original post and everyone else's comments.

With no clear "establishment" candidate from either party this year, I think there's a lot of things going on in a lot of different voters' heads.

Bjørn Erik said...

STRATEGIC VOTING?
I wonder if strategic voting among democrats is part of the explanation. That is, what if many democrats acted strategically on the basis of polling that overrated Obama? Would this make the opinion polls looking even worse after the election day than they actually were before election day?

Daniel T said...

http://www.lasvegassun.com/news/2008/jan/11/pollsters-have-plan-nevada-skip-it/

I have not read the original NYT article quoted in the Las Vegas Sun (above). But According to this article the head of polling for the Pew Trusts, a respectable organization, claims that poor white people are racist, are less likely to respond to pollsters, and thus Obama's support was overstated. Such a claim is in direct conflict with your own claim. I am curious to your response.

Anonymous said...

My name is Andrew Ian Murphy, and what I want to say is how so many of you are so unwilling to be adult enough to face the most simple of all realities...

Polling errors?

How about Voting errors?

Or more to the point; VOTE FRAUD.

God, you guys are such weak people.

Parabellum Ben said...

I've read an article (I don't recall the URL) that claimed Democratic candidate Dennis Kucinich wrote a letter to the Secretary of State of New Hampshire asking for a recount in the interest of restoring/maintaining voter confidence.

However, I know this much--as a registered Independent voter, if my state had open primaries, I would planned on voting in the Democratic primary, but after seeing polling data suggesting that my vote wouldn't swing things. . . I would have given my vote to Dr. Ron Paul.

Makes me wonder if the polls themselves significantly changed the outcome of the primary.

B.Q. Political Report said...

I think that it mainly comes down to the fact that there were too few days for polling to get a good picture of all of the trends. Individual days sometimes have very odd results. If one examines the exit polling those deciding within three days of the election and the last week did mirror the polls in those periods before New Hampshire and those deciding on the last day resembled the final results.

I have no idea how to fully interpret that.

Topi said...

parabellum ben, your point was very good. After all, there was a considerable number of independents voting in the Republican primary, a fact that helped John McCain win and that then validated the pollsters' predictions for the Republicans. Perhaps some of these independents voting in the Rep primaries would have voted for Obama but chose to vote in the other party's contest as the margin of victory was there erroneously predicted to be smaller than in the Dem race.

But of course, the big problem was not that Obama got too few votes, it was the way the pollsters' underestimated the Clinton vote. I think carrie has a point with lower-income Dem voters turning out for Clinton in higher numbers than predicted.

I also think that the change in the front runner status was crucial. Before Iowa many uncertain Obama voters knew that they had to vote for Obama lest he is finished. So they kept the race competitive by handing him the victory. In NH uncertain Clinton voters knew that Clinton winning the primary would not decide the entire race but it would keep her in the race and give her a chance to better formulate her agenda.

Anonymous said...

Has the official story of 9/11 seemed a bit suspect to you? The film below shows you exactly why it should. Don't be overwhelmed by the number of facts thrown at you. There are only a handful of facts here to focus on as proof that the attacks were enhanced by a rogue governmental network.

First there's former New York mayor Ruddy Guiliani admitting on ABC News that he was told to leave the World Trade Center to avoid the impending collapse, despite there being no such indication from the buildings' structure. This shows prior knowledge of the collapse on the part of insiders above Guiliani in the chain of command during crisis response. This also shows that there had to be another cause of the collapse that would cause these insiders to be certain of the collapse.

Second, there's the preponderance of eyewitness accounts of explosions going off throughout the towers prior to the collapse. The fact that explosions were heard in areas that were nowhere near the destruction--like the basement, lobby, and lower floors--points to the use of explosives. And explosives would certainly provide those insiders above Guiliani the certainty of a collapse.

Third, there's the collapse of all three buildings occurring too fast for a collapse not induced by demolition devices. Both towers fell between 10 to 13 seconds, and Building 7 fell between 6 to 7 seconds. Free-fall speed for the towers is 9.2 seconds, and for building 7 it is 5.9 seconds. For these three buildings to have fallen at virtually free-fall speed as they did, virtually all key points of structural resistance had to have been removed from the equation. If left intact, they would have provided resistance against the collapse that ,in turn, would have caused the collapse to fall much slower than a free fall. The only thing that could remove those key points of structural resistance would be demolition devices.

Lastly, there's the discovery of the hijackers' identification in the aftermath of the attacks. The fact that an ID turned up so quickly out of the enormous amount of rubble from the towers and was found by itself--not near a hijacker's body nor his belongings--shows that the ID was planted. This corroborates the evidence of insiders working within a rogue governmental network. The plant was obviously intended to keep the focus on the hijackers’ culpability, and not anyone else’s.

Using these core facts you can persuade others that this is conspiracy fact, not theory; and that this knowledge is based on logic, not paranoia. So please push for the truth to be publicly recognized for the sake of our nation’s security and sanity.
.
Loose Change: Final Cut

Princess said...

How much fraud do you think there'll be when a woman or African American gets the Democratic nomination? I hope things will go smoothly either way.

Adam F. said...

As someone who's not well-versed in polling data (thanks this site, I'm learning), I have a hard time believing that New Hampshire was a result of a quickly changing poltical landscape. In my humble opinion, that insinuates a fickle voting populace, and while that might be the case, I'm reluctant (or afraid) to believe it.

R2K said...

: )

Surtur said...

Your numbers are quite intriguing. Still I hope Obama will make it all the way.

There are big news else ware in the world i.e. Oldest parliament in the world burned down

Anonymous said...

When you wrote:

The "rings" mark 5%, 10% and 15% errors. Normal sampling error would put a scatter of points inside the "5-ring", if everything else were perfect.

Did you adjust the points representing poll outcomes to somehow standardize each's margin of error or did they all have a margin of error equal to 5% at the same p value? I assume that for all studies, the margin of error on the estimated proportion of Clinton supporters and the estimated proportion of Obama supporters was the same because their supporting data was probably the same question. However, each poll has a potentially different margin of error and if one is different, each point should be the center of its own circle of radius equal to the margin of error.

Also, with regard to:

What we see for the Democrats is quite stunning. The polls actually spread very evenly around the actual Obama vote. Whatever went wrong, it was NOT an overestimate of Obama's support. The standard trend estimate for Obama was 36.7%, the sensitive estimate was 39.0% and the last five poll average was 38.4%, all reasonably close to his actual 36.4%.

What is the statistical idea behind this? I only took an Introduction to Statistics course and I did not see the technique which involves averaging best-estimators of multiple studies.

chocomush said...

do you think its a big conspiracy behind it?? or was it simply shoddy workmanship and inaccuracy. . either way most of the newspapers in Ireland got it wrong and it was an embarrassing day all round for journalism. .i mean checking your sources is a fundamental part of being a journalist. .

either way if Obama wins i believe he owes it to the media who have played a vital role so far in his campaign. .

Anonymous said...

i'm confused
the picturs seem misleading

The Gizmole! said...

Yes me too please explain!!!

TheGizmole

frnkline said...

The "boomers" in their 40s and 50s remain the largest group, but for our purposes there are two important points. Those under 30 make up a substantial share of the population, while those 60 and over represent a substantially smaller share at each age.
===================================
frankline
New Hampshire Drug Addiction