Friday, August 18, 2006

Votes, Seats and the Generic Ballot
























This is a "good news/bad news" story. For Democrats it is good news/bad news and for Republicans it is bad news/good news. It is all about how to convert the generic ballot into votes, and votes into seats.

Yesterday Jay Cost posted an excellent analysis of the generic ballot on RealClearPolitics.com. Here I want to illustrate one of the things he wrote about, and expand a good bit on the problem of going from the generic ballot to the national vote and on to the number of seats won. My colleague at UCSD, Matt Shugart, has also ventured down this road at Fruits and Votes with a very interesting analysis. I'll try to add something to both.

The graph at the top of this page illustrates two of the main points Cost made. The generic ballot almost always overstates the actual Democratic vote, and this bias (he says "skew") gets worse as the Democrats do better on the generic ballot.

In the figure, the gray diagonal line is where the points should be if the generic ballot exactly predicted the national two party vote. What we actually see is that in all but 3 of the last 30 elections, the polls fall well below this gray diagonal line-- they overstate the Democratic vote. Further, the gap between the points and the gray diagonal line gets larger as the generic ballot favors the Democrats more and more. This is exactly Cost's first two points, and the evidence is very clear.

(Technical notes: The straight blue line is the linear fit of actual vote to generic ballot, and the red line is my usual "local regression" which more flexibly fits the pattern of the data, without the assumption of a linear relationship. In this case, above about 53% Dem on the generic ballot we have a fit that is quite linear. Below that the red line tails down quite a bit, showing that even for low Democratic support in the generic ballot the poll still overstates the actual Democratic vote. I use two party vote because it is a better predictor of seats won than using D and R shares of the total vote. I also convert the "don't knows" in the generic ballot by allocating them equally to each party because this gives (somewhat) better predictions of the actual national vote than does allocating them proportionally to those who have decided. I then estimate support for the Democrats on the generic ballot as of election day using all the polls for the election year and estimating the value that the trend would predict on election day. This way I avoid using a single "September" poll, or any other method that throws out information. The method I'm using here is not all I could wish for, and I hope to make technical improvements in the coming days and weeks, but it is "good enough" for now. From here on, when I talk about "generic ballot" or "generic ballot estimate" I mean the estimate as of election day using this method, not a single poll result earlier in the campaign.)

The first bit of "good news" for Dems is that despite these problems, it is still the case that as the generic ballot goes up for Dems, their national vote share has also generally gone up. So if we throw out all the econometrics and just ask "Does higher generic ballot support usually mean higher vote share on election day?" then the answer is yes. (Rapidly followed by the qualification that the relationship has a good deal of variability so we can't be very certain about specific predictions.) As I showed in an earlier post here, the 2006 election cycle is showing the highest support by far for Democrats on the generic ballot of any election since 1994. So without quantification, that must be good news for the Dems.

With a bit of quantification, the current estimate for the Dem share on the generic ballot, projected out to election day, is 57.9%. Taking account of the overestimation, either the linear or the local model give the same prediction: the Dems should be expected to win about 53% of the national two party vote. The uncertainty (about which I will have very little to say here, but perhaps more later) of this prediction is considerable: +/- 3.3%, or from 49.7-56.3%, which is quite a wide range. (And actually, that +/- 3.3% is too SMALL a margin of error. I just don't have time to calculate a better one this morning.) Still, all the caveats aside, this prediction is for the best year than the Democrats have had since 1990, at least in terms of the national vote.

(Robert Erikson and Lee Sigelman have an interesting article on the generic ballot in Public Opinion Quarterly, 1995. They point out that there are additional variables at work here, one of which is party control of the White House. Likewise Alan Abramowitz at Emory has a forthcoming article estimating seat change as a function of past share of seats, midterms, presidential approval, generic vote, open seats and challenger quality. By including some or all of that I could improve the fit of my model. However, my goal is to illustrate the relationships and the basic problem, rather than produce the best forecasting model, so I'll not go that route here.)

But House races are won or lost in 435 separate contests, not one national race. So how does the generic ballot do when predicting the number of seats the Democrats might expect to win? We'll look at that two ways, first directly, then indirectly.

We can plot the Democratic seats in the House against the generic ballot estimate for each year. That produces the graph below, again with a blue linear fit and a red local fit.
























The magic number here is 218 seats for a majority, marked by the horizontal red line. Once more we see that the better you do on the generic ballot, the more seats you tend to win, but we also see a lot of variation around that relationship. In particular, we see that in the 8 congresses since 1946 that were controlled by Republicans, the linear fit fails to predict the Republican majority 7 of 8 times. We also see that 7 of the 8 Republican congresses came from generic ballots in the 50-55% Dem range. However, there were 8 other congresses in this same range of generic ballots but which were easily held by Democratic majorities. That's poor prediction when it counts most. We'll return below to this problem of explaining the Republican congresses.

We also see in the figure that given the estimate for the 2006 generic ballot of 57.8%, we should expect to see a large Democratic majority, with an astonishing 252 seats. That would be a 49 seat increase, which seems wildly out of touch with current opinion that Democrats MAY not quite manage to gain the 15 seats needed for a majority. And again the uncertainty is huge: +/- 43 seats! (So between a 6 seat gain and universal domination at 295 seats. One suspects even Democrats don't think 295 is very likely.) Again, this uncertainty could be reduced by adding other predictor variables. But my point is that those who wish to rely on the generic ballot in thinking about likely outcomes have to grapple with both seemingly unrealistically high predictions based on past elections and an embarrassingly wide margin of error.

Still, the good news for Dems is that high generic ballots do tend to go with more seats. Bad news for Reps.

We've seen the perils of predicting national vote from generic ballot, and likewise for seats. But let's pause for a moment. Suppose we actually could get a PERFECT prediction of the national vote shares from the generic ballot (or any other source for that matter.) How much would that help us predict the outcome of the battle for control of the House?

The graph below plots the number of Democratic seats against the national vote share won by the Democrats, so here there is no error in the vote measure. Rather the variability that comes in is due to the fact that national vote totals don't translate directly into seats as they would in a proportional representation system. This "Votes-Seats" relationship lets us appreciate how much uncertainty there is in party control, EVEN IF WE TOOK OUT ALL THE UNCERTAINTY in the generic ballot.
























Here the relationship is tighter, as it must be, than for the generic ballot. Still, the variation around the blue regression line is about +/- 19 seats for any given level of actual votes. That's more uncertainty than what the Dems need to gain to take control, so it seems unlikely we could readily forecast with confidence who will win the House, even if we knew the final vote today.

But not to let that stop us, translating the generic ballot at 57.9% into a predicted national vote of 53.0% gives the vertical black line in the figure. It crosses the blue regression line at 251 seats, essentially the same estimate we got from directly estimating seats from generic vote.

So what can we say? IF the pattern of relationships between generic ballot and votes, and between votes and seats held in 2006 as it had from 1946-2004, then we should expect an easy Democratic capture of the House, and with much room to spare. (but remember all that uncertainty we are ignoring in rushing to declare a Democratic victory.) Still, uncertain as we are, that would be the smart way to bet.

That's the last of the good news for the Dems. Now the Reps get their turn.

The "catch" in that last paragraph is the phrase "IF the pattern of relationships ... held in 2006 as it had from 1946-2004". That pattern may not hold. If not, then the predictions will, of course, differ.

But this is not idle speculation. There is evidence that the pattern has in fact differed in the last six elections. And if that difference persists, the Democrats may well fail to take control of the House EVEN IF they do very well in the national vote shares.

In the votes-seats graph immediately above, the blue dots fall pretty nicely around the blue regression line, and the red local regression is close enough to the linear model that I don't think it is worth worrying too much about nonlinearities in the votes-seats relationship. At a glance then, a graph of a linear relationship to write home about, and to use as an example in class. But an example of what? When we add labels for year, a new pattern becomes immediately obvious.

The figure below shows what we should have known anyway-- the points below 218 seats are Republican congresses, 6 of which are from 1994-2004, and two from 1946 and 1952. If we look at these compared to the overall fit line, they don't look too bad. But if we focus on just the 6 points since 1994, we see that they don't show the same votes-seats relationship as the rest of the data do. Seats have been MUCH less responsive to changes in votes since 1994 than in the previous 46 years.
























The green line is the votes-seats relationship for 1946-1992. It is quite close to the gray line we estimate using all the data. But the red line is estimated for the 1994-2004 period only, and it is dramatically different. From 1946-1992, a one-percentage point gain in the Democratic share of the national vote produced a gain of 8.2 seats (and vice versa for Republicans.) Since 1994, a one point gain in votes has produced a gain of only 1.9 seats. The translation of votes into seats has become dramatically less responsive. This is the curse of the Dems in trying to gain a majority, but also the curse of the Reps in trying to expand theirs.

It is true that during this period votes have held to a relatively narrow range of about 46%-50% Democratic (a bit wider though than Michael Barone's 49-49 nation notion.) Yet under the "old" votes-seats relationship this would have accounted for shifts of some 33 seats. In fact the number of seats has shifted by only 10 over this period.

And this is the good news for Reps and the very bad for Dems. IF the votes-seats relationship in 2006 follows the pattern of the past six elections, then EVEN THE EXTRAORDINARY SUCCESS currently forecast by the generic ballot may not be enough to give Dems control of the House. In the figure above, the red line for the 1994-2004 votes-seats relationship remains below the magic 218 seats even when Dems win the 53% of the national vote which current generic ballot results would predict. Rather than the 251 seats they would be able to expect under the 1946-1992 relationship, they will expect only 215 seats, 3 short. The Dems would need 55% of the vote to reach a predicted number of seats of 218, and even the current excellent generic ballot results are not enough to sustain that number of national votes.

So, while the generic ballot looks the best (from a Democratic perspective) in years, and in the past would have been plenty of reason to expect to win the House, the votes-seats relationship appears to have changed so much that control may remain beyond the Dems' grasp. Good news, Reps. (If I was worrying about uncertainty here, I'd be insisting that there is so much uncertainty that the small gap between 215 and 218 seats means the race for control is essentially a flip of a fair coin.)

One question is whether we should believe that the votes-seats relationship has really changed, or is this just a fluke of these particular six elections. Maybe there have been lots of groupings of six elections in the last 60 years that would give us equally flat, but transitory, votes-seats relationships.

In the graph below I test this. I run a separate regression for all possible groupings of six consecutive elections since 1946. I plot one blue regression line for each group of six years, 25 estimates in all. If the votes-seats relationship is unstable, we should see lines going every which-way, like a dropped bowl of chopsticks.























That doesn't happen. All but two of the 25 lines are pretty close to the overall line based on all years. The largest departure from the overall pattern is clearly the 1994-2004 line we saw in the previous graph (though now colored blue rather than red.) In these sixty years, only one set of years had a similarly small slope. That line is for 1982-1992, and is the highest line in the top left quadrant of the figure. Yet even then the slope was 3.3, compared to 1.9 now. Moreover, that estimate is sensitive to the data in a way the the current period is not. The 1982-92 period is marked with red dots. With only six points, it is normal for the regression line to be sensitive to which six points are included. (Which is why the stability of the OTHER 23 lines is remarkable.) But if we look at the red points, it is easy to see that there are two or three points that are crucial for determining the slope of the line. The estimate is quite sensitive to the data.

In contrast, if you look at the blue dots for 1994-2004, no single point is very influential in determining the slope. Remove any one, and the rest continue to produce essentially the same line, with a very small slope. So I could argue that the 1982-1992 period is more of a fluke of particular elections, but I cannot make the same case for the current period.

(Why these two sets of years? It might be redistricting which got more sophisticated in the 80s, and still more so each decade since. It could be an increase in incumbency advantage, independent of redistricting. It is interesting that the two periods with smaller slopes come consecutively, suggesting that structural or behavioral changes drive the changing votes-seats relationship. But that's a story for another day, as are the normative implications of a much less responsive "People's House", in the sense of seat change in response to vote change.)

This means that I would be very reluctant to assume that the historic relationship between votes and seats is still true. (And a statistical test confirms that the slopes differ to a statistically significant extent between 1946-1992 and 1994-2004.) And the implication of that is that even a very successful vote for Democrats need not translate into control of the House.

If we shifted to a race-by-race evaluation, we might well reach different conclusions. The national vote doesn't need to change by much if the shifts are concentrated in exactly the right districts, and those who know such things (Charlie Cook, Stuart Rothenberg, CQ) tell us there are many more competitive races than expected eight months ago. That should raise the uncertainty and anxiety for both parties.

But if you are betting on the generic ballot to predict control of the House, even when it looks awfully strong for Democrats, you might want to think again. The relationship has weakened and the uncertainty is huge.


Click here to go to Table of Contents