Tuesday, November 08, 2005

Polling for the 2005 elections

Today's elections have produced a good deal of polling and considerable uncertainty as to likely results. MysteryPollster has done several excellent posts on this here, here, here and here. MP stressed the difficulty of polling on California's propositions, and provides a thorough discussion of the implications for poll variability.

Here I want to show you the variability across polls in the propositions plus the NJ and VA governor's races. I'm going to do this a little differently than you usually see it. Here's why.

We can think of each poll as a sample of the likely outcome. The uncertainty arises from both sampling errors, campaign dynamics and non-sampling errors such as non-response, question wording and a host of other polling demons.

It is common to toss out early polling and focus only on the last week or even days of polling. I'm not going to do this. Rather I want to show the total distribution of results across the polls. This reflects the uncertainty due to campaign dynamics, as well as the other factors. The reason I think that's worth doing is that changes over the campaign are unpredictable. What has gone down may come back up in the end, or vice versa. We place a lot of faith in the last week of polling (not wholy misplaced) but in representing how uncertain we are about the likely results, I think it makes sense to consider the total variability in polling. This provides more polls as part of the evidence as well.

Of course where there are clear trends in the polls, this may make the uncertainty appear greater than would be the case only with late polls. That matters if we are primarily interested in predicting winners. But if we want to represent uncertainty, I think my approach is preferable.

So in the graphs below I present the distribution of polls for each California proposition and for the NJ and VA governor's races. Where there have been clear trends, I mention them. But you won't find the usual time series plot here. That isn't my point. Rather look at where polls have been highly variable--- we should be very uncertain there. Where they have not been, we should be less uncertain.

The wide variability in percentages and in spread should be appreciated by readers of polls (and perhaps by pollsters themselves). Sampling error is only part of the story of uncertainty. These plots give a better sense of how much uncertainty comes from all sources, not just sampling.

The conclusions from most of these plots: don't place big bets based on the polls. (But do note that polls DO seem to provide clear evidence in some cases. The exceptions count too!)

Data: See RealClearPolitics.Com for the numbers and for the trends.