It’s all the rage to argue that science is facing a crisis because so many results can’t be repeated. David Colquhoun has a really nice explanation why replication experiments fail so often. Before we get to his explanation, we’ll have to march through a couple of ugly terms (but it will be worth it!). First, there’s “P(real)” which is the probability that there is a real effect. For example, if ten percent of drugs tested have a real effect (assume we’re omniscient), then P(real) would equal 0.1. “Power” is the probably that a given experiment will detect that real effect. If the power = 0.8, then eighty percent of the time when there is a real biological effect, it will be observed (most studies shoot for a power of 0.8 or higher); of course, this also means twenty percent of real effects will missed. There’s the significance level, usually referred to as p (or a ‘p-value’). I’ll use Colquhoun’s definition:
The P value is the probability that you would find a difference as big as that observed, or a still bigger value, if in fact A and B were identical.
With a p-value of 0.05, the typical minimum threshold, there’s a five percent chance that your seemingly significant result is actually spurious. Finally, there’s the false discovery rate (FDR), which is the percentage of positive results that are actually false positives (i.e., you believe your drug works, but it doesn’t). Colquhoun makes a cartoon example using some reasonable values (power = 0.8, P(real) = 0.1, p =0.05):
Yep, 36 percent false positives with a p-value of 0.05. Depressingly, many studies are underpowered (i.e., power is lower than 0.8) and most treatments won’t work, so P(real) is much less than 0.1. If you want to play around, here’s a generalized version of the cartoon:
With Big Data and data mining, P(real) will often be very small. Thanks to limited budgets, power will also be low. Rub these two things together, and we should expect high false discovery rates. Put another way, we get the science we pay for: mining existing data and low numbers of replicates is cheaper than funding research at the scale the experiments require.
Though it does keep things interesting…
An aside: Don’t even think the word Bayesian. We’re not having that argument…