Thursday, August 02, 2007
Does One Bad Pollster Spoil the Trend Estimate?
Yesterday, in response to this post at Pollster.com, readers raised a number of excellent questions about the effects of individual polls in our trend estimates of candidate support (and just about everything else here as well, including presidential and congressional approval, support for the war and more.) Much of the discussion was about how to detect and exclude "bad" polls, which is a topic that covers a huge range of issues including "house effects" (the tendency of polling organizations to poll consistently high or low on some questions), outliers (single polls that are far from the rest) and more. The discussion will provide fodder for a number of posts to come later this month as I review our methods and try to clarify these and other issues. So there is a lot to do. Consider this a down-payment on the rest.
To paraphrase one question: "Why not exclude a polling organization if it consistently produces results out of line with everyone else?"
We could approach this in several ways. For example, suppose a pollster was consistently 4 percentage points high but their polls moved in synch with the trend in all the other polls. Movement in those polls would tell you a lot about dynamics of opinion even if the pollster were "biased" by 4 points. If the bias were consistent, then we could just subtract 4 points and have an excellent estimate of the trend. A simple shift of the average poll result above or below the overall trend is not in and of itself a clinching argument for excluding a pollster. I'll come back to this issue in much more detail later in this series of posts.
A simpler and more direct way to approach the question is to ask what difference does it make if we do include all polls, rather than exclude supposedly "bad" ones? Of course we'd have a major problem if the trend estimator were quite sensitive to individual polls or all the polls by a particular pollster. Happily, this is an empirical question, so we can answer it. And we don't have to know which pollster is "bad" to begin with.
The plots above show the trend estimate, using our standard estimator, as the black line. This uses all the polls we have available for the national nomination contests in both parties. The light blue lines are trend estimates that result when I drop each of the 19 different polling organizations, one at a time. Though the lines are indistinguishable, there are 19 different blue ones for each candidate in the figures. If the impact of individual organizations on the trend estimate were large, some of these blue lines would diverge sharply from the black overall trend line and we'd be seriously concerned about those polls that were responsible for the divergent results.
But that isn't what actually happens. The blue lines all fall within +/- 1 percentage point of the overall trend estimate and the vast majority are within less than +/- 0.5 points. There is no evidence that excluding any single organization has more than a trivial effect on the estimated trend. This alone is strong evidence that whatever problems specific pollsters or individual polls may have, they do not seriously disturb the trend estimates we use here at Political Arithmetik and Pollster.com.
It is interesting that the variation around the top candidates in both parties, Clinton and Giuliani, is larger than it is among the third place candidates, Edwards and Romney, while variation for the middle candidates falls in between. This is a possible clue to one aspect of "house effects". One well known source of house effects is due to how hard the interviewer pushes for an answer. Some organizations now routinely find 20% or more unable or unwilling to pick a candidate. Other organizations have less than 5% failing to choose a candidate. Now imagine yourself asked to pick, but lacking an actual preference. When pushed, who do you most likely "settle" for in order to placate the interviewer? I'd bet on the best known names. If that were the case, we'd see the greater variation around Clinton and Giuliani substantially explained by differences in how hard pollsters push for answers on the vote preference question. This is one more topic for another day.
The fact that we find little effect on the trend estimate due to excluding each pollster could mean one or both of two things: either no pollster is biased or discrepant enough to actually raise a problem in the first place, or the trend estimator we are using is statistically robust enough that it resists the influence of unusual pollsters or polls. The second possibility is true by design. I've chosen an estimation method and designed the approach we take so that the trend estimator should be resistant to bias due to a single organization or a single poll. While it can be fooled under the right circumstances, those should be both rare and short lived, rather than common and long term.
We are not in a position today to reach a conclusion about the first possibility, that none of our pollsters are consistently out of line with the others. That could be, but it could also be that one or more pollsters are in fact out of step but that the estimator successfully resists their influence. To address this more interesting question, will require more work and a separate post (or series of posts). It does seem to me that there are clearly systematic differences across polling organizations. I've done a good many posts in the past on "house effects" and on individual outliers, and will do more of that in the coming weeks. But it also disturbs me that many complaints are hurled at specific polling organizations with little or no effort to support the claims empirically and systematically. That is a job we'll begin to undertake here as a route to clarifying what the actual evidence is for which polls are less "reliable" than others, and what exactly that means. Stay tuned.