Iowahawk breaks out the calculator on poll reliability:
So if the sample size is 400, the margin of error is 1/20 = 5%; if the sample size is 625 the margin of error is 1/25 = 4%; if the sample size is 1000, it’s about 3%.
Works pretty well if you’re interested in hypothetical colored balls in hypothetical giant urns, or survival rates of plants in a controlled experiment, or defects in a batch of factory products. It may even work well if you’re interested in blind cola taste tests. But what if the thing you are studying doesn’t quite fit the balls & urns template?
- What if 40% of the balls have personally chosen to live in an urn that you legally can’t stick your hand into?
- What if 50% of the balls who live in the legal urn explicitly refuse to let you select them?
- What if the balls inside the urn are constantly interacting and talking and arguing with each other, and can decide to change their color on a whim?
- What if you have to rely on the balls to report their own color, and some unknown number are probably lying to you?
- What if you’ve been hired to count balls by a company who has endorsed blue as their favorite color?
- What if you have outsourced the urn-ball counting to part-time temp balls, most of whom happen to be blue?
- What if the balls inside the urn are listening to you counting out there, and it affects whether they want to be counted, and/or which color they want to be?
If one or more of the above statements are true, then the formula for margin of error simplifies to
Margin of Error = Who the hell knows?
I think that the disparity among the polls is pretty good evidence of this. A lot of it, particularly the weighting is guess work, educated or otherwise. There’s only one poll that matters (though with all of the chicanery going on, even that one is going to be in doubt, particularly if it’s close on Tuesday). What a mess.
Unfortunately, the national polls don’t matter much in a republic — it is the individual states that count. The polls in a number of key states show pretty solid leads for Obama, even accounting for margin of error. As the polls stand today Obama has enough electoral votes to lock in a win. Either McCain is either gonna have to put on his miracle perfume to woo the masses or Obama is gonna have to screw things up pretty bad.
Still, I hold out hope for my pick: Robocop/Unicorn ’08
Josh, I tend to agree with you on remaining pessimistic. However, it did catch my attention that Penn is still in play. If it is in play, then many of these other states may be in play.
Whatever the case, applause to Iowahawk for simplifying the situation with the polls. True brilliance there.
Regardless of the results, I voted Saturday. That’s the only poll I’ve participated in this year outside of the Insta-polls.
I am less impressed than ever with polls.
1. Per Iowahawk, a sample of n=625 (and that’s larger than a lot of polls) gives you a margin of error of +/- 4%. So a poll with a lead of 8 points or less would be within the MOE.
2. Furthermore, as Bob Krumm points out, weekends matter. http://www.bobkrumm.com/blog/?p=2034. What polls as a 7 point lead on a weekend may actually be a 3 point lead if asked midweek.
3. Josh is correct that national polls mean next to nothing since electoral votes are won by state. State-by-state polling might be relevant but that would require, you know, more work.
4. Per Rand’s original post, poll samples are often (usually?) not random. Questioner bias, respondent selection and self-selection and even the very act of asking the “balls” what “color” they are affect results. Solution: Quantum polling. You heard it here first.
5. Finally, the only poll that matters is the one in which you cast your vote.
I think the poll uncertainties are even worse than quoted. If I understand what’s going on in these polls, national or state, they are really doing three parallel polls out of that one sample: The voting intentions of identified Democrat, Republican, and Independent voters. So to get to a final set of numbers you need to factor in the uncertainties on, at least,
a) The number of D’s, R’s and I’s in the sample
b) The voting intentions of those groups
c) The relative weightings on each of those groups as given by your turn-out model.
I’m amazed that these are given any credibility at all.
So, tell me. Why don’t they just do away with the subsampling, and instead, as in most places, take a representative demographic sample of the population as a whole, ask who they’re voting for, and report that? Who cares what they’re registered as.
Why don’t they just do away with the subsampling, and instead, as in most places, take a representative demographic sample of the population as a whole, ask who they’re voting for, and report that?
Honestly? Two reasons. First, because Republicans would do better, which would piss off their major customers, the news media.
Secondly, by endlessly “correcting” the data, they can control how the polls change over time. You’d see less volatility. That, too, would piss off the news media, because they like the horse-race aspect, and the more tension the more papers get sold or viewers of your 60-minute TV special there are. Now The One is ahead by a nose, now the evil challenger is catching up, now it’s neck ‘n’ neck, tune in tomorrow and watch 150 advertisements while we Analyze The Issues and the Race. Brought to you by GoodCorp, makers of Soylent Green…..
Also, even if the polls show Obama having the race sewn up, the polls aren’t always representative of the final result of the actual election–the only poll that matters.
So, tell me. Why don’t they just do away with the subsampling, and instead, as in most places, take a representative demographic sample of the population as a whole, ask who they’re voting for, and report that? Who cares what they’re registered as.
It’s my understanding that some polls do just that. And they haven’t diverged too far from other polls that use weighting by party ID.
The one thing that Repubs can hope for this year is that for whatever reason (Black, Celebrity, Marxist, Gay, Transgendered, whatever) a significant fraction of those willing to be polled are just lying about their real intentions – they think it’s cool to say they will vote for Obama. Of course the cross correlation with their responses to other questions in the same poll don’t give this idea much credence.
If the poll averages at RCP for example are accurate to within about 3 percentage points (add 3 to favor the Mavericks) it’s probably an Obama win. I still worry about PA though.