Friday, August 29, 2008

State Battlegrounds and Home Grounds

A quickie from Detroit Metro Airport.

Mark Blumenthal reported on an interview with Obama campaign manager David Plouffe yesterday at Pollster. Plouffe discussed the 18 states the Obama campaign sees as their target states, and Mark reported what states those were in his post.

Here we take a quick look at the polling in those states. The chart above is sorted by the Obama minus McCain margin, and shows the 95% confidence interval. The dot size is proportional to electoral vote.

Below I show the status of the states based on our polling categorization of each state.

Time to run for the plane.

Wednesday, August 27, 2008

McCain, Obama and Clinton Favorability

A little interesting movement in views of the candidates has taken place since the end of the primaries in June. All three candidates, McCain, Obama and Clinton, have seen rises in their favorable ratings and an initial decline in unfavorable views though with a slight upturn recently. McCain and Obama are enjoying essentially identical ratings, with 60% favorable and only 35% unfavorable. Even after a significant amount of negative portrayals of him in RNC and McCain ads, Obama's rating has risen over the summer, and so has McCain's. (According to the Wisconsin Advertising Project, which monitored and coded all 100,000 ad airings in June and July, one third of McCain's ads contained negative information about Obama and 100% of RNC ads were negative. In the same two months, 10% of Obama's ads mentioned McCain.)

Whatever happens after the conventions, both candidates enjoy an enviable standing with voters as attractive figures instead of a pair of lesser evils. The fall capaign may alter this, but even after a hard fought primary season the nominees remain attractive figures.

Meanwhile, Senator Clinton has also enjoyed an upturn in favorable ratings and a decline in unfavorable ratings since the end of the primary season. While improved, Clinton remains a more polarizing figure than either McCain or Obama, with slightly lower favorable but noticeably higher negative ratings.

Senator Clinton is far more popular among Democrats than among either Independents or (especially) Republicans. In that sense, her speech to the Democratic Convention last night was an example of speaking primarily to the party and her supporters, rather than to the broader public. The contast between former Virginia governor and now Senate candidate Mark Warner's speech and Clinton's is a good example of this difference. Warner stressed unifying themes and appeals across political groups, which was greated warmly but which fell short of electrifying the Democratic delegates. In contrast, Clinton played to the party and produced a predictably enthusiastic response within the DNC convention hall. Conventions contain both elements. Monday, the party celebrated Sen. Kennedy's life and family legacy, primarily an inside the family affair, perhaps touching some independents but not likely to attract Republicans. In contrast Michelle Obama's speech could have easily been given at the Republican convention, with its themes of family, hard work, pulling oneself up from working class circumstances. Hers was a speech designed to reach out beyond the party.

The one remaining question from the Clinton speech is whether her supporters also resepect her enough to follow her lead. For Clinton to be a power in the party includes the requirement that she be able to deliver her supporters for Obama. If any significant number of her supporters refuse to be delivered, they reduce her status as a result. This is hard to judge from the cable news coverage, who can easily find individual delegates willing to say they are unpersuaded. But what effect the Clinton speech has with her supporters outside the convention hall will be critical.

Sunday, August 24, 2008

How Pollsters Affect Poll Results

Who does the poll affects the results. Some. These are called "house effects" because they are systematic effects due to survey "house" or polling organization. It is perhaps easy to think of these effects as "bias" but that is misleading. The differences are due to a variety of factors that represent reasonable differences in practice from one organization to another.

For example, how you phrase a question can affect the results, and an organization usually asks the question the same way in all their surveys. This creates a house effect. Another source is how the organization treats "don't know" or "undecided" responses. Some push hard for a position even if the respondent is reluctant to give one. Other pollsters take "undecided" at face value and don't push. The latter get higher rates of undecided, but more important they get lower levels of support for both candidates as a result of not pushing for how respondents lean. And organizations differ in whether they typically interview adults, registered voters or likely voters. The differences across those three groups produce differences in results. Which is right? It depends on what you are trying to estimate-- opinion of the population, of people who can easily vote if the choose to do so or of the probable electorate. Not to mention the vagaries of identifying who is really likely to vote. Finally, survey mode may matter. Is the survey conducted by random digit dialing (RDD) with live interviewers, by RDD with recorded interviews ("interactive voice response" or IVR), or by internet using panels of volunteers who are statistically adjusted in some way to make inferences about the population.

Given all these and many other possible sources of house effects, it is perhaps surprising the net effects are as small as they are. They are often statistically significant, but rarely are they notably large.

The chart above shows the house effect for each polling organization that has conducted at least five national polls on the Obama-McCain match-up since 2007. The dots are the estimated house effects and the blue lines extend out to a 95% confidence interval around the effects.

The largest pro-Obama house effect is that of Harris Interactive, at just over 4 points. The poll most favorable to McCain is Rasmussen's Tracking poll at just less than -3 points. Everyone else falls between these extremes.

Now let's put this in context. We are looking at effects on the difference between the candidates, so that +4 from Harris is equivalent to two points high on Obama and two points low on McCain. Taking half the estimated effect above gives the average effect per candidate. The average effects are at most 2 points per candidate. Not trivial, but not huge.

Estimating the house effect is not hard. But knowing where "zero" should be is very hard. A house effect of zero is saying the pollster perfectly matches some standard. The ideal standard, of course, is the actual election outcome. But we don't know that now, only after the fact in November. So the standard used here is the house effect relative to our Pollster Trend Estimate. If a pollster consistently runs 2 points above our trend, their house effect would be +2.

The house effects are calculated so that the average house effect is zero. This doesn't depend on how many polls a pollster conducts. And it doesn't mean the pollster closest to zero is the "best". It just means their results track our trend estimate on average. That can also happen if a pollster gyrates considerably above and below our trend, but balances out. A nicer result is a poll that closely follows the trend. But either pattern could produce a house effect near zero. For example, Democracy Corps and Zogby have very similar house effects near -1. But look at their plots below and you see that Democracy Corps has followed our trend quite closely, though about a point below the trend. Zogby has also been on average a point below trend, but his polls have shown large variation around the trend, with some polls as near-outliers above while others are near outliers below the trend. The net effect is the same as for Democracy Corps, but the variability of Zogby's results is much higher.

Incidentally, the Democracy Corps poll is conducted by the Democratic firm of Greenberg Quinlan Rosner Reserch in collaboration with Democratic strategist James Carville. Yet the poll has a negative house effect of -1. Does this mean the Democracy Corps poll is biased against Obama? No. It means they use a likey voter sample, which typically produces modestly more pro-Republican responses than do registered voter or adult samples. Assuming that the house effect necessarily reflects a partisan bias is a major mistake.

How can you use these house effects? Take a pollster's latest results and subtract the house effect from their reported Obama minus McCain difference. That puts their results in the same terms as all others, centered on the Trend Estimate. This is especially useful if you are comparing results from two pollsters with different house effects. Removing those house differences makes their results more comparable.

What impact do house effects have on our Trend Estimate? A little. Our estimator is designed to resist big effects of any single pollster, but it isn't infallible, especially when some pollsters do far more polls than others or when one pollster dominates during some small period of time. We can estimate house effects, adjust for these, and reestimate our trend with house effects removed. The result runs through the center of the polls, but doesn't allow the number of polls done by an organization to be as influential.

The results are shown in the chart below. The blue line is our standard estimator and the red line is the estimate with house effects removed. Without house effects the current trend stands at +2.0 while ignoring house effects produces an estimate of +1.7. A little different, but given the range of variability across polls and the uncertainty as to where the race "really" stands, this is not a big effect.

The impact of house effects isn't always this small. Looking back along the trend we see that the red and blue lines diverged by as much as 1 point in late June, an effect due significantly to the large number of Rasmussen and Gallup tracking polls during that time and few polls with positive house effects in that period. A smaller but still notable divergence occurred in late February and early March.

The bottom line is that there are real and measurable differences between polling organizations, but the magnitude of these effects is considerably less than some commentary would suggest. Many of the house effect estimates above are not statistically different from zero. Even ignoring that, the range of effects is rather small, though of course in a tight race the differences may be politically important. Finally, the effects on our Trend Estimate is detectable but does not lead to large distortions, even if we can see some noticeable differences at some times.

The charts below move though all the pollsters and plots their poll results compared to the standard trend and the trend removing house effects. Pollsters with fewer than 5 polls are all lumped together as "Other" pollsters. Once they get to our minimum number of polls, we'll have house effects for them too.

Monday, August 11, 2008

Age, Turnout and Votes

It's all about who votes. Those that do win. Those that don't lose. The chronic losers in American politics are the young who famously turn out at low rates election after election.

This year, those young people are of great interest. Allegedly they will be mobilized in huge numbers, and allegedly they will vote strongly for Barack Obama. The latest available Gallup weekly estimate (July 28-Aug 3) shows Obama leading 56%-35% among 18-29 year olds, while McCain leads 46%-37% among those 65 and older.

But will the young vote? And how much difference does it make when they don't?

The chart above shows the turnout rate by age for 2000 and 2004, based on the Census Bureau's "Current Population Survey (CPS)", the largest and best source of detailed data on turnout. The most striking result is just how low turnout is among those under 30 compared to older voters. No age group 18-29 managed to reach 45% turnout in 2000, and only two made it in 2004. Not one single age group over 30 fell so low in either year. Despite a little noise for each group, the pattern is a strong rise in participation rates with every year of age at least until the late 60s, after which there is some decline. Yet even among those 85 and over the turnout rate remains above 55%, more then 10 points higher than among their 20-something grandchildren and great-grandchildren.

The second striking feature of the chart is that the young can be mobilized a bit, under the right circumstances. Turnout among those under 30 rose significantly in 2004 compared to 2000. While turnout went up among all age groups, the relative gain was clearly greater among those under 30. While mobilizing the young is difficult, these data show that it is possible to get significant gains, at least relative to past turnout.

Even so, the "highly mobilized" 20-somethings of 2004 still fell behind the turnout of their 30-something older siblings. A supposed Obama-surge among the young may still not catch up with those even a bit older.

The irony is that the young are a large share of the population, but not of the electorate. The chart below shows the population by age in 2004 (it shifts a little by 2008 but not enough to change the story.)

The "boomers" in their 40s and 50s remain the largest group, but for our purposes there are two important points. Those under 30 make up a substantial share of the population, while those 60 and over represent a substantially smaller share at each age.

In 2004 those 18-29 were 21.8% of the population, while those 58-69 were just 13.2%. Add in the 11.5% 70 and up, and you get just 24.7% of "geezers" over 58 vs. 21.8% of "kids". But the sly old geezers know a thing or two about voting. Shift from share of the population to share of the electorate and the advantage shifts to the old: 18-29 year olds were just 16% of the electorate in 2004, while those 58-69 were an almost equal 15.9%. Add in the 70+ group at 13.4% and the geezers win hands down: 29.3% of voters vs 16% for the young. That difference is the power of high turnout. It goes a long way to explaining why Social Security is the third rail of American politics.

High turnout buys "over-representation". Divide share of voters by share of the population and you get proportionate representation. A ratio of 1.0 means a group votes proportionate to its size. Values over 1 are overrepresented groups. In 2004, for example, 55 year olds were represented 20% more than their population would suggest, with a 1.2 score. The youngest voters, 18 year olds, had an abysmal representation rate of 0.49 in 2000, less than half their share of the population.

While turnout rises with age, it is not until we hit 40 or so that we reach "fair" representation (1.0). After that, every age group is over-represented in the electorate. Less than 40, and every age group is under-represented. (Two small exceptions-- so sue me.)

So what are the implications? If you gave me a choice of being wildly popular with the young or moderately popular with the old, I'd take the old any day. They are far more reliable in voting, and while their population numbers are small they more than make up for it in over-representation thanks to turnout differences.

There is much conversation about "youth" turnout this year. Perhaps we will indeed see another rise, as we did in 2004. But unless something truly unprecedented occurs, no one can win on the young alone. The gap in turnout is simply too large.

But is age destiny? If there were constant differences in partisan preference by age, then perhaps so. But there aren't. Despite being supposedly "old and set in their ways", those 60 and up shifted their votes more than any other age group between 2000 and 2004. In 2000, the 60+ vote went to Gore by a 4 point margin. In 2004, however, those 60+ went for Bush by 8 points. That net 12 point swing, multiplied by their over-representation means a lot.

The 20-somethings also shifted, from +2 for Gore to +9 for Kerry. Coupled with their surge in turnout, the younger voters kept Kerry close in 2004 when he was losing in every other age category. But it wasn't enough to win.

The Obama campaign may be right that they can gain votes by mobilizing the young. But the old play a bigger role in elections, and they are not imovable in their vote preferences. Indeed, they make the youngest group seem a bit static by comparison. It is not the candidate's age that will be the key to winning the votes of those 60 and over. Issues and personality will play a large role. Any candidate would be well advised to recognize that the dynamic swings among older voters coupled with their substantial over-representation makes them a potent force for electoral change.

P.S. At the cross post to, Michael McDonald noticed that my plots were based on population percentages rather than percentage of citizens. That's a good catch and I probably should have used citizens in the first place. But the qualitative results don't change at all, so my story remains exactly as it is above. The precise percentages quoted do shift a little bit, but not in ways that change any of the conclusions. Therefore I've left the text above as it was, but append the first three figures, revised to include only citizens, here for completeness. Thanks to Michael for pointing this out.

Monday, August 04, 2008

Polling Trends in 2008 vs '04 and '00

The most common description of polls is that they are snapshots, not predictions. A good way to look at that in the 2008 election is to compare the '08 campaign with the two that came before.

The chart above shows the trend estimates for each of the last three presidential campaigns. I'm plotting the estimated margin between the two candidates, Dem minus Rep, for each year.

With 93 days to go until the 2008 election, Obama holds a 3.3 point advantage over McCain, though that has been eroding over the past six weeks. If we put a confidence interval around today's estimate, we get a race that is just barely leaning Democratic.

But what about the future? The dynamics of the next 92 days are all important for where we stand on November 4. Since we can't foresee those 92 days yet, let's see what happened during the same time in 2000 and 2004. That gives us a better idea how much change we might anticipate in the next three months.

In 2004, Kerry slowly built a 2 point lead by this time, and held a small lead through much of the summer. But then the race took a sharp turn, with Bush making a 6 point run, taking a four point lead with 50 days to go. Kerry gained back 3 points of that in the polling, but less than 2 points of it in the actual vote, losing by a 2.4 point margin.

In 2000, Bush led in most of the early polls, holding a 6 point lead with 107 days to go. Then Gore moved sharply up, erasing Bush's lead and then adding a 3 point lead for Gore with about 56 days left. Bush promptly reversed Gore's gains with a six point move in the GOP's direction, and led by about 3 points over the last three weeks of the campaign. Of course, the 2000 polls were misleading in predicting a Bush win. Gore won the popular vote by 0.6 points.

So far in 2008, Obama has enjoyed a run up of 5.5 points since his low point in late March. That run is on a par with Bush's in 2004 but still a bit less than Gore's 9 point run in 2000, and on par the Bush's 6 point rebound that year.

Judging from the dynamics we've seen in the past it is quite reasonable to expect the current trend to shift by half-a-dozen points. August and the conventions have been periods of substantial change in both previous elections, so if history repeats itself the next 4 or 5 weeks should be pretty interesting.

The bottom line is neither campaign should be complacent or despondent. There is a lot of time left and recent history shows that both up and down swings of 6-9 points are entirely plausible.

As a P.S. here are the three campaigns with educational confidence intervals around them.

The current 2008 estimate is just barely inside the "lean Dem" range, and will move to toss up if the current trend continues for another couple or three polls.

The 2004 estimate was pretty close to the outcome which was well within the 68% confidence interval around the trend.

The polls in 2000 were troubling for having the wrong popular vote winner, but even there the outcome was inside the 95% confidence interval. With races as close as the last two, it is worth appreciating just how wide those confidence intervals are.

Our efforts to characterize races rely on the best estimates of those confidence intervals, but it is all too easy to focus on who's ahead and not remember how much uncertainty there is. That uncertainty is both about where the current estimate says the race stands today and about how the race may change in coming weeks. The data here show that unless one candidate builds a bigger lead than either has held so far, the uncertainty remains pretty big.

Note: My trend here is slightly different from the Pollster National trend because I'm working off the difference between candidates, not each trend separately, and because I've made 2008 comparable to 2000 and 2004, just a slightly different amount of smoothing compared to Pollster's standard estimator this year. None of those differences change the qualitative picture or shift the magnitude of changes I cite above.