Monday, November 06, 2006

From Poll Margin to Wins: Polls as Predictors
























The usual way to look at poll accuracy is to subtract the poll result from the vote result. But an alternative is to look at how the probability that a candidate wins depends on the margin they have in the pre-election polls. Since American elections are "winner-take-all" within districts, this is a good way of looking at the practical power of polls to predict winners.

After all-- a statistician would say a poll was better that predicted 51% for the loser who actually got 49% than a poll that predicted 51% for the winner who got 55%. That's right from one point of view, but not from the perspective of predicting winners right. Here I take a look at the latter view of what is important.

The data are from all statewide polls for Senate, Governor or President from 2000 and 2002.

The figure above plots results by poll margin. The x-axis shows the Dem minus Rep margin in the polls. The y-axis plots the percent of races the Dem ACTUALLY won for each margin we saw in the polls. So imagine I take all polls that found a 5 point lead for the Dem. The y-axis plots the proportion of those polls with a 5-point lead in which the Dem actually DID win. I do this separately for each race, Gov, Sen and Pres. The dots show there is a lot of variation, but the pattern of points, and the black trend line through the data show how the predictive accuracy varies over margins from -30 to +30.

One interesting feature is that a margin of zero (a tied poll) produces a 50-50 split in wins with remarkable accuracy. There is nothing I did statistically to force the black trend line to go through the "crosshairs" at the (0, .5) point in the graph, but it comes awfully close. So a tied poll really does predict a coin-flip outcome.

The probability of a win rises or falls rapidly as the polls move away from a margin of zero. By the time we see a 10 point lead in the poll for the Dem, about 90% of the Dems win. When we see a 10 point margin for the Rep, about 90% of Reps win. That symmetry is also not something I forced with the statistics-- it represents the simple and symmetric pattern in the data.

More practically, it means that polls rarely miss the winner with a 10 point lead, but they DO miss it 10% of the time.

A 5 point lead, on the other hand, turns out to be right only about 60-65% of the time. So bet on a candidate with a 5 point lead, but don't give odds. And for 1 or 2 point leads (as in some of our closer races tomorrow) the polls are only barely better than 50% right in picking the winner. That should be a sobering thought to those enthused by a narrow lead in the polls. Quite a few of those "leaders" will lose. Of course, an equal proportion of those trailing in the polls will win.

So read the polls-- they are a lot better than nothing. But don't take that 2 point lead to the bank. That is a failure to appreciate the practical consequences of the margin for error.


Click here to go to Table of Contents

9 comments:

Anonymous said...

Interesting results. Any way to add a momentum factor, like slope of the last 3-4 polls?

Anonymous said...

What is the y axis actually measuring? I.e., what data is being aggregated to calculate the probability of a win?

Jeff R said...

Prof. Franklin, about two years ago Kevin Drum at Washington Monthly posted a table that purported to show the probability that one candidate is genuinely ahead of another, given a lead of X percent and a MoE of Y percent. (Example from his table: a 2% lead with a 4% MoE means there is a 69% chance the candidate is genuinely ahead.) At the time, he was debunking the widespread media references to 2 or 3 percent leads as a "statistical tie." He attributed the formulae that generated the table to two professors of mathematics and statistics at Cal State Chico, Nancy Carter and Neil Schwertman.

Are you familiar with their work, and do you have an opinion on how it fits in with what you've said here?

Just wondering.

Anonymous said...

Suppose that you could add additional variables to the equation. What variables would best improve the model (difference in campaign warchest? incumbency?)? Is there a set of variables that could be added to the model that would result in polls no longer contributing significantly to the model?

Anonymous said...

Darned if I can figure out the meaning of each dot.

Anonymous said...

Wow. This is why I went into journalism. What the heck does this mean?

Anonymous said...

Very cool post!
After the last two presidential elections, a lot of people seem to think that even if a dem is ahead a couple of points in the poll the republican is likely to win (because of better republican turnout or social acceptibality effects in polls or whatever). Your results seem to show the opposite - if a dem is a couple of points down in your model, the dem still has 50% chance of winning. (it's almost 50/50, but not quite). Or, another way to put it, if the poll is at 50% exactly, the dem has a >50% prob of winning. I'm assuming this difference is not statistically significant, but it's still interesting because it's in the opposite direction of what seems like current conventional wisdom.

Have a good election night!!
-Corrie P.

Jim Miller said...

One thing that struck me is just how bad the polls for governor and senator are. That's not a surprise, but your figure does illustrate that point vividly.

jeremy said...

The question that a tied poll is a coin flip outcome only makes sense if one is talking about whether there is a bias with respect to a particular variable in one direction or the other, as you look at here, for the Republican or Democrat. You could have, for example, looked at whether a 50-50 poll for incumbent versus challenger was a coin flip and gotten a different result.

It does seem like if you combined this data with the reported margin of errors for the polls, you could see how well margin of errors in the aggregate reflected the real margin of error between the last polls and the outcome.