Lies, Damn Lies, and Good Statistics

Most of the discussion about why and how the Democrats lost has focused on the data from the CNN exit poll. This poll is immensely frustrating. I’m not upset about the design; I have no reason to think that the numbers are bogus. What’s frustrating is that it’s impossible to get the raw data (if this isn’t true, please let me know). For example, a really interesting factoid is that Bush gained approximately 5% in every income bracket >\$50,000 compared to the 2000 election, and he neither gained nor lost ground in the below \$50,000 income brackets. I have no idea why this is the case, although I could speculate along with everyone else.

The polling data could provide answers: maybe the higher income classes voted on the war because they feel economically secure; maybe there’s a religious revival among the middle and upper class; maybe those with incomes between \$50,000 and \$100,000 voted for different reasons than those who earned over \$100,000 (an aside: those making over \$200,000 overwhelming voted for Bush–63%). The point is nobody knows, even though the data exist.

The real shame is that there are a lot of people with the advanced statistical tools (e.g., multivariate analysis, Mantel tests, etc.) to address these questions rigorously. One of my superpowers is, in fact, multivariate analysis (don’t worry, I use my superpowers for good, not evil). It would be great if CNN eventually posted the raw data–somewhere out there has to be a really humongous Excel spreadsheet. By that I mean, “Voter #1036, is a Republican, voted for Bush, makes over \$200,000, etc.” Not only could we answer many interesting questions, but we might be able to see how divided the country is (i.e., are certain positions strongly correlated with each other).

So before you buy into someone’s explanation, remember: they don’t really know the answer (except for me. Always listen to the Mad Biologist).

