Political Arithmetik

Friday, August 29, 2008

State Battlegrounds and Home Grounds

A quickie from Detroit Metro Airport.

Mark Blumenthal reported on an interview with Obama campaign manager David Plouffe yesterday at Pollster. Plouffe discussed the 18 states the Obama campaign sees as their target states, and Mark reported what states those were in his post.

Here we take a quick look at the polling in those states. The chart above is sorted by the Obama minus McCain margin, and shows the 95% confidence interval. The dot size is proportional to electoral vote.

Below I show the status of the states based on our polling categorization of each state.

Time to run for the plane.

Wednesday, August 27, 2008

McCain, Obama and Clinton Favorability

A little interesting movement in views of the candidates has taken place since the end of the primaries in June. All three candidates, McCain, Obama and Clinton, have seen rises in their favorable ratings and an initial decline in unfavorable views though with a slight upturn recently. McCain and Obama are enjoying essentially identical ratings, with 60% favorable and only 35% unfavorable. Even after a significant amount of negative portrayals of him in RNC and McCain ads, Obama's rating has risen over the summer, and so has McCain's. (According to the Wisconsin Advertising Project, which monitored and coded all 100,000 ad airings in June and July, one third of McCain's ads contained negative information about Obama and 100% of RNC ads were negative. In the same two months, 10% of Obama's ads mentioned McCain.)

Whatever happens after the conventions, both candidates enjoy an enviable standing with voters as attractive figures instead of a pair of lesser evils. The fall capaign may alter this, but even after a hard fought primary season the nominees remain attractive figures.

Meanwhile, Senator Clinton has also enjoyed an upturn in favorable ratings and a decline in unfavorable ratings since the end of the primary season. While improved, Clinton remains a more polarizing figure than either McCain or Obama, with slightly lower favorable but noticeably higher negative ratings.

Senator Clinton is far more popular among Democrats than among either Independents or (especially) Republicans. In that sense, her speech to the Democratic Convention last night was an example of speaking primarily to the party and her supporters, rather than to the broader public. The contast between former Virginia governor and now Senate candidate Mark Warner's speech and Clinton's is a good example of this difference. Warner stressed unifying themes and appeals across political groups, which was greated warmly but which fell short of electrifying the Democratic delegates. In contrast, Clinton played to the party and produced a predictably enthusiastic response within the DNC convention hall. Conventions contain both elements. Monday, the party celebrated Sen. Kennedy's life and family legacy, primarily an inside the family affair, perhaps touching some independents but not likely to attract Republicans. In contrast Michelle Obama's speech could have easily been given at the Republican convention, with its themes of family, hard work, pulling oneself up from working class circumstances. Hers was a speech designed to reach out beyond the party.

The one remaining question from the Clinton speech is whether her supporters also resepect her enough to follow her lead. For Clinton to be a power in the party includes the requirement that she be able to deliver her supporters for Obama. If any significant number of her supporters refuse to be delivered, they reduce her status as a result. This is hard to judge from the cable news coverage, who can easily find individual delegates willing to say they are unpersuaded. But what effect the Clinton speech has with her supporters outside the convention hall will be critical.

Sunday, August 24, 2008

How Pollsters Affect Poll Results

Who does the poll affects the results. Some. These are called "house effects" because they are systematic effects due to survey "house" or polling organization. It is perhaps easy to think of these effects as "bias" but that is misleading. The differences are due to a variety of factors that represent reasonable differences in practice from one organization to another.

For example, how you phrase a question can affect the results, and an organization usually asks the question the same way in all their surveys. This creates a house effect. Another source is how the organization treats "don't know" or "undecided" responses. Some push hard for a position even if the respondent is reluctant to give one. Other pollsters take "undecided" at face value and don't push. The latter get higher rates of undecided, but more important they get lower levels of support for both candidates as a result of not pushing for how respondents lean. And organizations differ in whether they typically interview adults, registered voters or likely voters. The differences across those three groups produce differences in results. Which is right? It depends on what you are trying to estimate-- opinion of the population, of people who can easily vote if the choose to do so or of the probable electorate. Not to mention the vagaries of identifying who is really likely to vote. Finally, survey mode may matter. Is the survey conducted by random digit dialing (RDD) with live interviewers, by RDD with recorded interviews ("interactive voice response" or IVR), or by internet using panels of volunteers who are statistically adjusted in some way to make inferences about the population.

Given all these and many other possible sources of house effects, it is perhaps surprising the net effects are as small as they are. They are often statistically significant, but rarely are they notably large.

The chart above shows the house effect for each polling organization that has conducted at least five national polls on the Obama-McCain match-up since 2007. The dots are the estimated house effects and the blue lines extend out to a 95% confidence interval around the effects.

The largest pro-Obama house effect is that of Harris Interactive, at just over 4 points. The poll most favorable to McCain is Rasmussen's Tracking poll at just less than -3 points. Everyone else falls between these extremes.

Now let's put this in context. We are looking at effects on the difference between the candidates, so that +4 from Harris is equivalent to two points high on Obama and two points low on McCain. Taking half the estimated effect above gives the average effect per candidate. The average effects are at most 2 points per candidate. Not trivial, but not huge.

Estimating the house effect is not hard. But knowing where "zero" should be is very hard. A house effect of zero is saying the pollster perfectly matches some standard. The ideal standard, of course, is the actual election outcome. But we don't know that now, only after the fact in November. So the standard used here is the house effect relative to our Pollster Trend Estimate. If a pollster consistently runs 2 points above our trend, their house effect would be +2.

The house effects are calculated so that the average house effect is zero. This doesn't depend on how many polls a pollster conducts. And it doesn't mean the pollster closest to zero is the "best". It just means their results track our trend estimate on average. That can also happen if a pollster gyrates considerably above and below our trend, but balances out. A nicer result is a poll that closely follows the trend. But either pattern could produce a house effect near zero. For example, Democracy Corps and Zogby have very similar house effects near -1. But look at their plots below and you see that Democracy Corps has followed our trend quite closely, though about a point below the trend. Zogby has also been on average a point below trend, but his polls have shown large variation around the trend, with some polls as near-outliers above while others are near outliers below the trend. The net effect is the same as for Democracy Corps, but the variability of Zogby's results is much higher.

Incidentally, the Democracy Corps poll is conducted by the Democratic firm of Greenberg Quinlan Rosner Reserch in collaboration with Democratic strategist James Carville. Yet the poll has a negative house effect of -1. Does this mean the Democracy Corps poll is biased against Obama? No. It means they use a likey voter sample, which typically produces modestly more pro-Republican responses than do registered voter or adult samples. Assuming that the house effect necessarily reflects a partisan bias is a major mistake.

How can you use these house effects? Take a pollster's latest results and subtract the house effect from their reported Obama minus McCain difference. That puts their results in the same terms as all others, centered on the Pollster.com Trend Estimate. This is especially useful if you are comparing results from two pollsters with different house effects. Removing those house differences makes their results more comparable.

What impact do house effects have on our Pollster.com Trend Estimate? A little. Our estimator is designed to resist big effects of any single pollster, but it isn't infallible, especially when some pollsters do far more polls than others or when one pollster dominates during some small period of time. We can estimate house effects, adjust for these, and reestimate our trend with house effects removed. The result runs through the center of the polls, but doesn't allow the number of polls done by an organization to be as influential.

The results are shown in the chart below. The blue line is our standard estimator and the red line is the estimate with house effects removed. Without house effects the current trend stands at +2.0 while ignoring house effects produces an estimate of +1.7. A little different, but given the range of variability across polls and the uncertainty as to where the race "really" stands, this is not a big effect.

The impact of house effects isn't always this small. Looking back along the trend we see that the red and blue lines diverged by as much as 1 point in late June, an effect due significantly to the large number of Rasmussen and Gallup tracking polls during that time and few polls with positive house effects in that period. A smaller but still notable divergence occurred in late February and early March.

The bottom line is that there are real and measurable differences between polling organizations, but the magnitude of these effects is considerably less than some commentary would suggest. Many of the house effect estimates above are not statistically different from zero. Even ignoring that, the range of effects is rather small, though of course in a tight race the differences may be politically important. Finally, the effects on our Pollster.com Trend Estimate is detectable but does not lead to large distortions, even if we can see some noticeable differences at some times.

The charts below move though all the pollsters and plots their poll results compared to the standard trend and the trend removing house effects. Pollsters with fewer than 5 polls are all lumped together as "Other" pollsters. Once they get to our minimum number of polls, we'll have house effects for them too.

Monday, August 11, 2008

Age, Turnout and Votes

It's all about who votes. Those that do win. Those that don't lose. The chronic losers in American politics are the young who famously turn out at low rates election after election.

This year, those young people are of great interest. Allegedly they will be mobilized in huge numbers, and allegedly they will vote strongly for Barack Obama. The latest available Gallup weekly estimate (July 28-Aug 3) shows Obama leading 56%-35% among 18-29 year olds, while McCain leads 46%-37% among those 65 and older.

But will the young vote? And how much difference does it make when they don't?

The chart above shows the turnout rate by age for 2000 and 2004, based on the Census Bureau's "Current Population Survey (CPS)", the largest and best source of detailed data on turnout. The most striking result is just how low turnout is among those under 30 compared to older voters. No age group 18-29 managed to reach 45% turnout in 2000, and only two made it in 2004. Not one single age group over 30 fell so low in either year. Despite a little noise for each group, the pattern is a strong rise in participation rates with every year of age at least until the late 60s, after which there is some decline. Yet even among those 85 and over the turnout rate remains above 55%, more then 10 points higher than among their 20-something grandchildren and great-grandchildren.

The second striking feature of the chart is that the young can be mobilized a bit, under the right circumstances. Turnout among those under 30 rose significantly in 2004 compared to 2000. While turnout went up among all age groups, the relative gain was clearly greater among those under 30. While mobilizing the young is difficult, these data show that it is possible to get significant gains, at least relative to past turnout.

Even so, the "highly mobilized" 20-somethings of 2004 still fell behind the turnout of their 30-something older siblings. A supposed Obama-surge among the young may still not catch up with those even a bit older.

The irony is that the young are a large share of the population, but not of the electorate. The chart below shows the population by age in 2004 (it shifts a little by 2008 but not enough to change the story.)

The "boomers" in their 40s and 50s remain the largest group, but for our purposes there are two important points. Those under 30 make up a substantial share of the population, while those 60 and over represent a substantially smaller share at each age.

In 2004 those 18-29 were 21.8% of the population, while those 58-69 were just 13.2%. Add in the 11.5% 70 and up, and you get just 24.7% of "geezers" over 58 vs. 21.8% of "kids". But the sly old geezers know a thing or two about voting. Shift from share of the population to share of the electorate and the advantage shifts to the old: 18-29 year olds were just 16% of the electorate in 2004, while those 58-69 were an almost equal 15.9%. Add in the 70+ group at 13.4% and the geezers win hands down: 29.3% of voters vs 16% for the young. That difference is the power of high turnout. It goes a long way to explaining why Social Security is the third rail of American politics.

High turnout buys "over-representation". Divide share of voters by share of the population and you get proportionate representation. A ratio of 1.0 means a group votes proportionate to its size. Values over 1 are overrepresented groups. In 2004, for example, 55 year olds were represented 20% more than their population would suggest, with a 1.2 score. The youngest voters, 18 year olds, had an abysmal representation rate of 0.49 in 2000, less than half their share of the population.

While turnout rises with age, it is not until we hit 40 or so that we reach "fair" representation (1.0). After that, every age group is over-represented in the electorate. Less than 40, and every age group is under-represented. (Two small exceptions-- so sue me.)

So what are the implications? If you gave me a choice of being wildly popular with the young or moderately popular with the old, I'd take the old any day. They are far more reliable in voting, and while their population numbers are small they more than make up for it in over-representation thanks to turnout differences.

There is much conversation about "youth" turnout this year. Perhaps we will indeed see another rise, as we did in 2004. But unless something truly unprecedented occurs, no one can win on the young alone. The gap in turnout is simply too large.

But is age destiny? If there were constant differences in partisan preference by age, then perhaps so. But there aren't. Despite being supposedly "old and set in their ways", those 60 and up shifted their votes more than any other age group between 2000 and 2004. In 2000, the 60+ vote went to Gore by a 4 point margin. In 2004, however, those 60+ went for Bush by 8 points. That net 12 point swing, multiplied by their over-representation means a lot.

The 20-somethings also shifted, from +2 for Gore to +9 for Kerry. Coupled with their surge in turnout, the younger voters kept Kerry close in 2004 when he was losing in every other age category. But it wasn't enough to win.

The Obama campaign may be right that they can gain votes by mobilizing the young. But the old play a bigger role in elections, and they are not imovable in their vote preferences. Indeed, they make the youngest group seem a bit static by comparison. It is not the candidate's age that will be the key to winning the votes of those 60 and over. Issues and personality will play a large role. Any candidate would be well advised to recognize that the dynamic swings among older voters coupled with their substantial over-representation makes them a potent force for electoral change.

P.S. At the cross post to Pollster.com, Michael McDonald noticed that my plots were based on population percentages rather than percentage of citizens. That's a good catch and I probably should have used citizens in the first place. But the qualitative results don't change at all, so my story remains exactly as it is above. The precise percentages quoted do shift a little bit, but not in ways that change any of the conclusions. Therefore I've left the text above as it was, but append the first three figures, revised to include only citizens, here for completeness. Thanks to Michael for pointing this out.

Monday, August 04, 2008

Polling Trends in 2008 vs '04 and '00

The most common description of polls is that they are snapshots, not predictions. A good way to look at that in the 2008 election is to compare the '08 campaign with the two that came before.

The chart above shows the trend estimates for each of the last three presidential campaigns. I'm plotting the estimated margin between the two candidates, Dem minus Rep, for each year.

With 93 days to go until the 2008 election, Obama holds a 3.3 point advantage over McCain, though that has been eroding over the past six weeks. If we put a confidence interval around today's estimate, we get a race that is just barely leaning Democratic.

But what about the future? The dynamics of the next 92 days are all important for where we stand on November 4. Since we can't foresee those 92 days yet, let's see what happened during the same time in 2000 and 2004. That gives us a better idea how much change we might anticipate in the next three months.

In 2004, Kerry slowly built a 2 point lead by this time, and held a small lead through much of the summer. But then the race took a sharp turn, with Bush making a 6 point run, taking a four point lead with 50 days to go. Kerry gained back 3 points of that in the polling, but less than 2 points of it in the actual vote, losing by a 2.4 point margin.

In 2000, Bush led in most of the early polls, holding a 6 point lead with 107 days to go. Then Gore moved sharply up, erasing Bush's lead and then adding a 3 point lead for Gore with about 56 days left. Bush promptly reversed Gore's gains with a six point move in the GOP's direction, and led by about 3 points over the last three weeks of the campaign. Of course, the 2000 polls were misleading in predicting a Bush win. Gore won the popular vote by 0.6 points.

So far in 2008, Obama has enjoyed a run up of 5.5 points since his low point in late March. That run is on a par with Bush's in 2004 but still a bit less than Gore's 9 point run in 2000, and on par the Bush's 6 point rebound that year.

Judging from the dynamics we've seen in the past it is quite reasonable to expect the current trend to shift by half-a-dozen points. August and the conventions have been periods of substantial change in both previous elections, so if history repeats itself the next 4 or 5 weeks should be pretty interesting.

The bottom line is neither campaign should be complacent or despondent. There is a lot of time left and recent history shows that both up and down swings of 6-9 points are entirely plausible.

As a P.S. here are the three campaigns with educational confidence intervals around them.

The current 2008 estimate is just barely inside the "lean Dem" range, and will move to toss up if the current trend continues for another couple or three polls.

The 2004 estimate was pretty close to the outcome which was well within the 68% confidence interval around the trend.

The polls in 2000 were troubling for having the wrong popular vote winner, but even there the outcome was inside the 95% confidence interval. With races as close as the last two, it is worth appreciating just how wide those confidence intervals are.

Our efforts to characterize races rely on the best estimates of those confidence intervals, but it is all too easy to focus on who's ahead and not remember how much uncertainty there is. That uncertainty is both about where the current estimate says the race stands today and about how the race may change in coming weeks. The data here show that unless one candidate builds a bigger lead than either has held so far, the uncertainty remains pretty big.

Note: My trend here is slightly different from the Pollster National trend because I'm working off the difference between candidates, not each trend separately, and because I've made 2008 comparable to 2000 and 2004, just a slightly different amount of smoothing compared to Pollster's standard estimator this year. None of those differences change the qualitative picture or shift the magnitude of changes I cite above.

Friday, June 13, 2008

Trends in Party Identification in Wisconsin

This week my colleague Ken Goldstein and I conducted a Wisconsin statewide survey sponsored by the UW Department of Political Science and WisPolitics.com. So fair warning that I'm a party to this survey rather than an independent observer.

A number of people have commented on the party identification balance in the survey: 38% Dem, 24% Rep, 29% Independent (37% Independent when "no preference/other" are allocated to independent. When this group is asked how they "lean", very few insist on some other party, so this allocation makes sense.) See Alan Reifman's blog on weighting and party id for a good example and discussion of broader issues of weighting to party id.

I want to point out two things here and put our data in the context of other polls in Wisconsin.

The chart above shows party identification trends since 2000 using data from three sources that have done frequent polling in the state. What we see is a relatively stable Dem/Rep parity from 2000-2004, with Dem ID falling a bit around 2004 while Reps moved up slightly.

Starting in 2005, however, there is an initially slow but then sharper shift in partisanship. Republican ID declines from about 30% to about 24% today, while Dem ID rises from about 30% to nearly 40%. After an initial surge of independents, that group has recently fallen off a bit. (You have to squint a bit to see WPRI and Badger after 2005, but they are close to the trend lines during this period, so the changes are not just a matter of house effects or phone vs ivr methods. WPRI, for example, has Rep ID moving from 33% in 2004 to 28%, 26% and 25% in 2005-2007. Their Dem ID rises from 30%-33%-34% then falls to 29% over the same period. The final 29% is a large discrepancy from the trend, of course.)

We did not weight our survey to party identification, and these trends help explain why we have reservations about doing that. While relatively stable, party id does move over time, and by a fair bit, as you can see here. But that said, our unweighted results turn out to be quite close to the estimated trends in partisan categories in any case.

The second point is to compare these trends with those in exit poll measures of party id. In 2000, the VNS Exit poll put Wisconsin pid at 37% Dem, 32% Rep and 31% Ind. This shifted in 2004 to 35% Dem, 27% Ind and 38% Rep. But in 2006 the exit polls found that the balance was 38% Dem, 34% Rep and 27% Ind. Those values all show a smaller share of independents at the polls on election day compared to the polling trend, but that is to be expected given differences in turnout between partisans and independents. The size of the party ID groups grows as a result, but the balance between them is in line with what we see in the trends in the polls, though certainly not an exact match. The polls, after all, are of either adults or likely voters, while the exits are by definition a measure of who actually showed up on election day.

For 2006, the Dem exit percent and the Dem trend estimate are a close match. Republicans gain in the exits, by about 6 points over the 2006 trend estimate. If that holds for 2008, we might expect an electorate more like 38% Dem and 30% Rep. Of course both parties will have very active "ground games" and GOTV efforts to try to change those numbers.

While I'm certainly happy that our party id balance is so close to the trend in all the other polling, the more important point is that party id in Wisconsin has shifted quite a bit over the past four years. The coming campaign may alter that, possibly bringing disappointed former Republicans back home, for example. Likewise a Republican advantage in turnout could bring the exit polls back to closer balance. But as the data show, today the GOP is at the worst disadvantage the state has seen in over eight years.

Let me conclude with a bit of description of the polls used here.

Wisconsin Policy Research Institute ("WPRI") has done some of the longest running polls in the state, usually two a year. Their data here is taken from their annual estimates, which I assume pool the two surveys though they don't say so explicitly. WPRI describes itself as "Wisconsin's Free Market Think Tank".

The "Badger Poll" is conducted by the UW Survey Center. They did more extensive polling in 2002-04 but now do about two polls a year.

SurveyUSA is a well known national pollster that uses "Interactive Voice Response" (IVR) automated interviews. SurveyUSA has done monthly polling in the state since 2005, providing some of the best data on state trends in approval of elected officials and as a byproduct have an excellent data series of party ID.

Finally, there is our new Department of Political Science/WisPolitics poll. Ours uses a commercial call center, not the UW Survey Center or undergrads in a class calling for a grade. WPRI, Badger and our poll all use live interviewers, SurveyUSA uses IVR. Most of these surveys are in the 500-600 respondent range.

Wednesday, May 21, 2008

Gay Marriage Support and Opposition

Marriage for gay and lesbian couples has been a hot button issue, most especially so in the 2004 election cycle when 11 states considered and passed referendums banning (in various ways) same-sex marriages. In 2006 an additional 8 states voted on marriage ballot measures, with only Arizona defeating the proposal. In all, 41 states have statutes defining marriage as "between one man and one woman", and 27 states have put that definition into their constitutions. Only five states currently have no law banning same-sex unions (MA, NJ, NM, NY, RI). In 2008, Florida will have a "defense of marriage" amendment (DOMA) on the ballot, while California is awaiting certification of a ballot proposal and Arizona may reconsider its 2006 initiative (currently awaiting state Senate approval). (An excellent summary of the status of same-sex marriage in the states is available here.)

Despite this overwhelming majority among other states, the California Supreme Court last week ruled that the state cannot constitutionally withhold the right to marriage from same-sex couples. (Text of the ruling is here. The LA Times initial report on the decision is here.) Supporters of gay marriage hailed the decision as a breakthrough for fundamental rights, in line with the same California Court's decision in 1948 striking down laws banning inter-racial marriage. Opponents of gay marriage argued the ruling puts the issue squarely back on the table for 2008 and confirmed the opponents argument that only constitutional amendments can prevent courts from overturning popular opinion on this issue. In 2000 California passed, by a 61%-39% majority, Proposition 22 affirming that "only marriage between a man and a woman is valid and recognized in California."

California has one of the strongest domestic partnership laws in the nation, so the Court's decision has the effect of ruling that by withholding the designation "marriage", such domestic partnership laws still fall short of the equal treatment required by the state constitution.

The California decision follows the Massachusetts Supreme Court's ruling of November 18, 2003 which ultimately made Massachusetts the first, and so far only, state to legalize same-sex marriage. (Rhode Island law recognizes same-sex marriages from other states.) Subsequently, the state Supreme Courts of New York, New Jersey and Washington have each declined to find a constitutional right to same sex marriage. Four states have civil union laws providing full state-level spousal rights (CT, NJ, NH and VT) while six have domestic partnership laws that provide varying degrees of spousal rights (DC, HI, ME, OR, WA plus the California law at issue in this decision).

In light of the California decision, let's take a look at public opinion on same-sex marriage and how opinion has responded to past events.

A typical question asks "Do you strongly favor, favor, oppose, or strongly oppose allowing gays and lesbians to marry legally?" (This is the form used by the Pew Research Center polls. There is considerable variation in question wording, but most polling has used a similar dichotomy between favoring gay marriage or opposing it. I've collapsed "degrees" of support or opposition into a dichotomous measure for all polls.) The earliest use of such a question I could find dates back to September 1985, but it was not until 1992 that the question began to be asked regularly. There was a flurry of interest in the question following the Massachusetts ruling and during the 2004 election campaign.

If we rely on that first poll alone, in 1985 82% of the public opposed same sex marriage, while only 11% supported it. By the early 1990s, when the data become richer, opposition was at about 65% while support stood at about 28%. Congress passed, and President Clinton signed, the federal "Defense of Marriage Act" in September 1996, but public opinion trends seem not to have noticed at all, neither rising nor falling around that time. By the week of the California ruling, May 15, 2008, opposition had declined to about 55% while support had grown to 40%. The net effect of some 16 years of public debate was a 10 point decline in opposition and a 12 point rise in support.

But that trend was not uniform. The Massachusetts ruling, and the 2004 election campaign, coincided with a sharp, if relatively short term, disruption of the previous slow but steady decade long shift of opinion. The Massachusetts Court decision placed the issue squarely on the public radar, and the 11 state ballot proposals in the 2004 election created the setting for public debate and political exploitation of the issue.

During the year from November 2003 to November 2004, opposition to same-sex marriage rose by five points, from 55% to just over 60%. Meanwhile support fell by about eight points, from 38% to 30%, then rebounded by a point or so by election day. (These shifts slightly predate the Massachusetts decision, probably reflecting the increased visibility of the issue prior to the Court's ruling.) The impact of these shifts and of the 11 referendums that were passed on the presidential election remains debatable. Initial punditry credited the referenda with helping defeat John Kerry, especially in Ohio. More careful subsequent analysis doubts much of an effect, however.

These sharp shifts in trend reversed direction immediately following the 2004 election, but took more than two years to return to pre-2004 levels. Support returned to 2003 levels in mid-2007 while opposition has only now, in May 2008, declined back to where it stood in mid-2003. Despite this slow recovery from the 2004 "shock", the 2005-08 trend lines make it clear that public opinion returned to its previous trajectory of slowly rising support and declining opposition in the aftermath of 2004. It is also interesting that the 2006 elections, with 8 states voting on referenda, made no discernible difference to the post-2004 trend. In part this may reflect the more limited number of states, but it also reflects some decline in the saliency of the marriage issue.

The California ruling, and the likely campaign over a proposition there to modify the state constitution this fall, will test whether increasing the salience of the issue will result in a replay of the 2003-04 dynamics, with opponents stimulated and supporters in retreat, or if the 2006 experience means that the issue is no longer the motivator it was in 2004. The 2003-04 data clearly show the potential for sharp changes when the marriage issue becomes extremely salient. That the fight will take place in the most populous state in the Union also guarantees national exposure. However, the fact that most states have already settled this issue through law or amendment, and that only three states (so far) are on track to have proposals on the ballot, means that the issue is more localized than it was in 2004.

Opinion now is not much different from where it was in mid-2003, so a similar reaction is possible but there may be an element of "been there, done that" as well. The novelty of the issue is surely much reduced now than it was five years ago, though the record of referenda passing in 7 of 8 states in 2006 certainly demonstrates that opposition to same-sex marriage remained strong even in a very pro-Democratic election year. (Wisconsin, for example, reelected a Democratic governor and flipped a House seat to the Democrats but also modified its constitution to ban same sex marriage or anything substantially equivalent to marriage.)

The big question is whether the marriage issue has any carry over to the presidential vote in 2008. Democratic politicians, including Senators Clinton and Obama, have tried to insulate themselves by opposing gay marriage. Instead, they support civil union or domestic partner legislation. Senator McCain opposes same sex marriage and opposes legal recognition of same sex partnerships, but also opposes a federal constitutional amendment. This line of debate, with both parties opposing marriage, but with Democrats willing to support some legal recognition short of marriage, reflects another way to framing the question, one that is significantly more favorable for limited rights for gays and lesbians.

(Note: This chart is scaled the same as the previous chart so the dynamics and time frame are directly comparable. The large white space prior to 2000 reflects the politically relevant point that in that time period the "civil union" option was not prominent enough to be included in polling questions.)

Beginning in 2004 (with one early exception in 2000), polling organizations began asking a question with three alternatives. The CBS News question wording is representative:

Which comes closest to your view? Gay couples should be allowed to legally marry, or gay couples should be allowed to form civil unions but not legally marry, or there should be no legal recognition of a gay couple's relationship?

When the "civil unions" option is added, opposition to gay rights drops significantly from about 55% to 40%. Likewise, support for gay marriage drops from 40% to 29%. The "comfortable" middle ground is then some 26% who are willing to support civil unions so long as they fall short of "marriage".

This "half a loaf" approach is acceptable to only some in the gay rights community, but it is precisely the politically acceptable position that Democratic politicians think can move them from the losing side of public opinion to the winning side. If we add supporters of marriage to supporters of civil unions, we get the chart below.

This is now a near mirror image of the balance of opinion in the first chart. Now about 53% support either civil unions or marriage, and a minority of 40% oppose any legal rights for gay and lesbian couples. By assuming supporters of marriage will not punish them for the expedient support of only civil unions, Clinton and Obama (and many other Democrats) have tried to turn a losing position into a winning one.

The remaining uncertainty is whether opponents of any legal recognition are more intense than the supporters of civil unions. If so, then opposition groups may still win the battle between intense minority and lukewarm majority. On ballot propositions, the record is strongly in favor of the opponents of marriage and in some cases of civil unions as well.

The Clinton-Obama position will certainly not win over opponents of any form of legal recognition for gays, but then they probably wouldn't win many such voters in any case (an exception is African-Americans, many of whom are quite opposed to marriage or civil unions.) Whether their position provides them popular support in response to attack ads on this issue remains to be seen.

Tuesday, May 06, 2008

NC and IN Final Sensitivity Comparison

Both standard and sensitive estimators are agreed in North Carolina. In Indiana there is a little bit of room between them, but not enough to affect conclusions about the probable outcome (if the polls are right!)

The gyrations the Indiana sensitive estimator for Clinton goes through, thanks to variability in polls and relatively few polls, is a good warning that the sensitive estimator may just be a bit too ready to chase after noise.

NC and IN Final Pollster Comparisons

With the last of the preelection polls in, we can now do our "apples to apples" comparison. Follow each pollster in the charts to see who's high, who's low and who has jumped around.

Note this is for the Obama minus Clinton MARGIN (which makes it easier to plot all the polls in one, still jumbled, chart.)

And check back tonight as the votes roll in to see who nailed it and who missed. In North Carolina all agree on the winner, only the margin is in dispute. But Indiana has a little disagreement on who is ahead. Fun!

Monday, May 05, 2008

How much does the Pollster matter for Trend?

One of the things we think about a lot at Pollster.com is the quality of polling. Mark Blumenthal's post on the North Carolina poll demographics here is a great example of how much variability we see among polls, all trying to hit the same target population.

This issue is also raised by those who would like to exclude some polls from our trend estimates. If one "bad apple" spoils the barrel, then this is a serious issue for our efforts to estimate the state of the races here.

We've stuck to our principle that we include all available polls without cherry picking (to shift the fruit metaphor!) but we don't do that out of blind faith. Rather we do it because the empirical evidence shows that the effects of single pollsters are generally small, certainly compared to the other sources of uncertainty about the state of the race.

Here I take a look at this issue for North Carolina and Indiana.

There are four elements that affect how much a pollster influences our trend estimate.

First, the pollster's results must be "different" from the trend we'd estimate without them. If a pollster happened to hit our trend dead on every time, their influence would reinforce our trend estimate, but not change it. So for a poll to affect the trend, it needs to be different from what we'd otherwise estimate.

Second, the pollster needs to produce results that are systematically different from the trend. If a pollster bounces around the trend, some high and some low, then the net effect is small, even if individual polls are rather far off the trend.

Since the trend is determined across all pollsters, these first two points are another way of saying that the pollster must differ from what other pollsters are getting.

Third, volume matters. In some states, a single pollster accounts for a substantial proportion of all polling, while other pollsters contribute only a single poll. The former obviously have more potential influence than the latter. But high volume of polls doesn't matter if they are consistently close to (and scattered around) the trend estimate based on other polling. The problem comes when the prolific pollster is also rather different from others, and especially if there are few other pollsters active in the state.

Fourth, polls late in the game can have more leverage on the "current" trend estimate. So a pollster that does several polls but only in the last week before election day can have more influence on the current estimate than they would if those polls were spread over the entire pre-election period. Again, such an effect is only visible if the late polls are different from other polling.

Having an effect on the trend could be a very good thing if the pollster is right while others are wrong. The problem is how do you know a priori which pollster will be right THIS TIME. Experience this year demonstrates that a good day can be followed by a bad day, or both on the same day.

It is also important to put these effects in perspective across all polls we see in a race. The individual polls are highly variable. Our data often finds polls covering plus or minus 5, 6 or even 7 points of our estimated trend for an individual candidate, and double that for the margin between two candidates. There is a lot of noise out there, and the whole point of our trend estimator is to extract the signal from the noise. Our estimator (especially the "standard" estimator I'm using here, as opposed to the "sensitive" estimator we also check) is designed to resist polls that are "way off" (i.e. outliers) but at the same time be able to follow the common trend across polls. (I'm going to not go into the details of our local regression estimator here, which is not a simple rolling average. Let's hold that for another day. The FAQ on this is coming.)

So let's take a look at the North Carolina plot way up there at the top of this post. The horizontal axis is scaled to show the range of poll results we've seen in the state since April 1. This provides perspective on how much variation you see from poll to poll in the raw results.

The red "whiskers" at the bottom of the plot are the individual polls taken over this time. There is a bit more than a 25 point range in the Obama-Clinton margin during this period. Since the trends in the state have been relatively flat, only a little of this variation is due to "real change".

Our trend estimate based on all polls is the vertical blue line, which as of Monday afternoon is +8.6 points in Obama's favor.

How much do individual pollsters matter for this estimate? PPP has done the most polling in the state. If we take them out, the trend estimate drops to 7.0, a shift of 1.6 points on the difference (or an average of .8 points for each candidate, moving in opposite directions of course).

At the opposite extreme, removing Insider Advantage from our estimator produces a 10.7 point Obama lead, a shift of 2.1 points on the difference, or 1.05 points per candidate.

For most other pollsters, the effect is far smaller, even for relatively frequent pollsters such as SurveyUSA and ARG.

So the maximum effect of removing a single pollster is a shift between a 7.0 and a 10.7 point Obama lead. A shift of 3.7 points on the difference can matter in a close race, but that difference is relatively small compared to the variation we see in individual polls. Indeed, the four polls completed 5/4 show a range of +3 to +10 for the Obama margin. (They average a +7.25, compared to our trend estimate of +8.6.)

There is less polling in Indiana, so we might expect more influence since there are fewer polls to stabilize the trend estimator.

Here the current estimate using all polls is -6.2, a lead for Clinton. The range of results we get from excluding pollsters is from -4.1 (excluding SurveyUSA) to -8.7 (excluding Zogby). That is a bit larger than North Carolina, as expected. But put this in the perspective of the range of raw poll results for Indiana, which is from -16 to +5 in polls taken since April 1. The six latest polls as of Monday, all ending on 5/4, range from -12 to +2.

To sum up. Which polls we include affect our results. That both has to be and should be. We WANT the data to matter, and of course it does. What we don't want is for individual polls to make such large differences for our results that inclusion or exclusion decisions become critical. The results we see here show that we SHOULD be somewhat uncertain as to the trend, as it depends upon which individual pollsters are included. What is somewhat different in our approach at Pollster.com is we want to emphasize this uncertainty and put it in perspective, rather than produce a single number and treat that as if it were "certain". That is why we always show the individual polls spread around our trend estimate in the charts. All estimates have uncertainty. We need to understand both the value of the estimate and the uncertainty inherent in it. Pollster effects are part of that story.

However, what is crucial is that these effects on the trend estimate are small compared to the range of variability we see across individual polls. The goal of our trend estimator is to produce a better estimate than what any single poll (or pollster) can provide. By that standard pollster effects on the trend are modest compared to the variability across individual polls.

Evaluating the accuracy of the polls is a different topic, one we'll revisit again on Wednesday.

NC and IN Sensitivity Update

As we close in on tomorrow's primaries in North Carolina and Indiana, the "standard" and "sensitive" trend estimates have largely converged.

In North Carolina the standard estimator puts Obama at 50.1% and Clinton at 41.5%. The sensitive estimator has it Obama 49.5% and Clinton 42.2%. Or, a margin in the standard trend of +8.6 for Obama vs +7.3 in the sensitive estimate.

In Indiana, the standard estimator puts Clinton up 49.5% to 43.3% for Obama. Switching to the sensitive estimator makes it Clinton 51.2% to Obama's 43.5%. Or a Clinton advantage of 6.2% for the standard estimator versus 7.7% for the sensitive one.

Either way the polls are seeing a split decision tomorrow. Anything else will be a very interesting surprise.