Thursday, June 29, 2006

Puzzling PA Polls Pondered

Last week two new polls on the Pennsylvania Senate race came out on the same day. Quinnipiac University saw the race 52%-34%, a stunning 18 point margin for Democratic State Treasurer Robert Casey over incumbent Republican Senator Rick Santorum. On the same day, the Wall Street Journal released it's latest Zogby Interactive "Battlegrounds" poll showing the race at 47.9%-41.3%, a 6.6 percentage point Casey lead Zogby says is within the margin of error. Anyway you count it, that is a striking discrepancy between polls (even if we allow that they both agree Casey seems to be ahead.) So a reasonable person asks "What is going on here and how can I make sense of these crazy polls?"

I touched here on the importance of looking at polls in the context of all the available polling on a race. The Pennsylvania polls show just how revealing this can be. What appears a hard to reconcile discrepancy is actually pretty clear once we look at all the data.

(A side issue is that Zogby Interactive polls are based on volunteers from the internet rather than a random sample of the population. The data are then weighted to resemble the population in partisan and demographic terms. However, with no probability sample, there is no theoretical justification for computing a margin of error for such polls, and their reliability remain open to much doubt and discussion. I'll pass today on getting into those matters and just talk about the performance of the Zogby poll.)

The figure above shows all the trial heat polls for Pennsylvania since January 1, 2005. The solid lines are the estimated trends across all the polling. After apparent Casey gains and Santorum losses in the first half of 2005, there has been a small decline for Casey and a small rise of Santorum, though Casey continues to hold a clear advantage.

How do the Quinnipiac and Zogby polls fit with this overall trend? The triangles represent Quinnipiac polls. They cluster fairly closely to the estimated trend for Casey, but appear to pretty consistently underestimate support for Santorum compared to the estimated trend. And the latest Quinnipiac poll moved substantially further below the estimated trend for Santorum than previous polls. At the same time the latest Quinnipiac poll appears clearly above the trend for Casey. Combined, these two discrepancies amount to a substantial overstatement of the Casey lead.

At the same time, the Zogby polls (the square symbols) track the Casey vote fairly closely but OVERstate the Santorum vote by a bit. The latest Zogby is slightly above the Santorum trend and a couple of points below the Casey trend. So Zogby understates the margin between the two at the same time Quinnipiac overstates that margin. The result is a very large discrepancy between the two polls, and a considerable puzzlement for campaign observers.

There are a couple of details we could consider. First, the trend line is not very sensitive to whether we include Quinnipiac and Zogby or not, at least once enough other pollsters get involved in the polling, midway through 2005. After that the dashed line, excluding Quinnipiac and Zogby, tracks closely with the solid line based on all the polling. Roughly, the Quinnipiac and Zogby "house effects" cancel each other out in the estimated trend based on all the polls.

Second, the magnitude of the "house effects" is pretty noticeable. Quinnipiac's mean deviation from the trend is -2.49% for Santorum and +0.58% for Casey. Zogby reverses this: +3.05% for Santorum and -0.56% for Casey. Put those together and Quinnipiac overstates Casey's lead by +3.07 points while Zogby understates it by -3.63%. The difference between the two polls would then be 6.7 percentage points on the margin between candidates-- quite a noticeable discrepancy.

The graph below highlights how these two polls fare relative to other polling in the race. The x-axis is the difference between each poll and the trend estimate when the poll was taken. Negative values mean the poll underestimates the trend, while positive values means it overestimates the trend. The y-axis is the same discrepancy for Santorum. The vertical and horzontal lines in the graph mark zero, or polling that falls exactly on the trends for each candidate.

The triangular symbols for Quinnipiac polls shows that they have mostly fallen into the lower right quadrant, overstating Casey and understating Santorum. Worse, the most recent poll by Quinnipiac (circled) is quite far away from the trend, producing an estimate that is even further from the poll's average house effect. Three of the earlier Quinnipiac polls were quite close to trend for both candidates (they are close to the intersection of the "zero" lines in the graph.) But there has been a strong pattern of Quinnipiac results that are well below trend for Santorum while a bit above for Casey.

The Zogby polls have generally been well above the Santorum trend, while only a little low on Casey. The most recent Zogby poll is actually closer to both trends than has been the case with most of his polling in this race.

So that is the solution to the puzzle of the Pennsylvania polls. When compared to all the polling, the discrepancies become rather clear. Both polls have been discrepant and in opposite directions, on average. The latest results exaggerate this already clear tendency, with the quite discrepant Quinnipiac poll far from the estimated trends. When the two polls appear on the same day, this conflict is more apparent than when they are released well apart.

Bottom line: I much prefer my trend estimates, which use all the available polling information. Those trends currently stand at 49.7% for Casey and 40.4% for Santorum.

