Just a couple hours ago, Markos Moulitsas (you know, the Daily Kos) announced that he's suing his former polling group, Research 2000. Evidently, they faked a lot of their data in the last 19 months, and didn't even have the decency to fake it well.
The statistical case against R2000 is made here. Nate Silver chimes in here.
The punchline:
"We do not know exactly how the weekly R2K results were created, but we are confident they could not accurately describe random polls."What's interesting to me about the situation is how clear-cut the evidence is. Mark Grebner, Michael Weissman, and Jonathan Weissman (GWW), conduct three tests to show that the numbers are cooked. Each statistical test gives results that are devastatingly improbable.
- Probability of odd-even pairs between men and women: 776 heads on 778 tosses of a fair coin, or about 1 in 10231.
- Probability of consistency of results in small samples: ~1 in 1016.
- Probability of missing zeroes in a time trend: ~1 in 1016.
Statistically, there's almost no wiggle room here. Until the R2000 folks come forward with some really compelling counterevidence, I'm convinced they're frauds.
An interesting feature of the last two patterns is that they hint at overeagerness to give results. Patterns in the data are *too* clear, and the detailed crosstabs supplied by R2000 made it easier to catch them. If they had been willing to dump some random noise into the equations -- blurring the results -- it would have been much harder to build a conclusive case against them.
This jives with my experience doing survey research for clients: nobody wants to pay for null results. Even when your data collection is legitimate, there's always pressure to twist and squeeze the data to wring out "findings" that aren't really there. Three cheers for transparency and the scientific method.