How the UK Government Spun 136 People into 7 Million -- a radio show looked into the government's claim of 7 million illegal filesharers and discovered it came down to 136 people in a survey admitting they'd used it.
But when I read the original article at PC Pro, I was moved to reply:
Your headline is more misleading than what the government originally reported. The 136 was 11.6% of the responses to a "survey of 1,176 net-connected households". That's a good sample size for a survey.Why is it possible (if you ask a well-worded question and pick people randomly) to accurately gauge what's happened in a large population by asking only 1,000 people to respond to a survey?
[pinero50 says "Extrapolating 3.9 million from a sample of ~1000 odd still seems pretty suspect to me." Pinero, I know it seems strange, but that's a basic thing you learn when studying statistics. A sample size on the order of 1,000 gives very accurate results, if picked randomly.]
There were problems, including having an interested party be involved in the research, but a more sensible comparison would be 3.9 million (mentioned at the end of the article) vs 7 million.
Commenters are right to question the wording used in the survey. If it just mentioned file-sharing, all bets are off as to how many people share illegally.
It's counter-intuitive at first. But try this thought experiment: Imagine a silo full of grains of corn. You want to pull some out and examine it to get a sense of the quality of the corn. Imagine that this is a high-tech silo, and the corn gets thoroughly mixed, so if you reach in and pull out a scoop, it will be a good random sample of the corn. Can you see that it doesn't really matter whether the silo is 10 feet tall or a hundred? The cup of corn you pull out should give you a sense of what's in there, as long as there's not too much variability (i.e., as long as the kernels as all close to the same size, water content, etc).
The people surveyed are like the kernels in the scoop. In statistics courses, you learn to put together something called a confidence interval, so you can come up with a precise way of talking about how accurately the sample reflects the population. You end up with something along the lines of: "We're 95% sure that 3.9 million people are doing illegal file-sharing, with an error margin of .05 million."
A more accurate (and less attention-grabbing) title? Perhaps How the UK Government Spun 3.9 Million People into 7 Million. But who'd buy your news if it weren't inflammatory?
*My title takes off from a charming little book called How to Lie with Statistics, by Darrel Huff, written in 1954, an enjoyable and educational read.