There’a PLoSOne article, “High Impact = High Statistical Standards? Not Necessarily So” that will probably get a lot of discussion. It describes the misuse of null hypothesis testing (“NHST”), which is:
NHST starts by assuming that a null hypothesis, H0, is true, where H0 is typically a statement of zero effect, zero difference, or zero correlation in the population of interest. A p value is then calculated, where p is the probability, if H0 is true, of obtaining the observed result, or more extreme. A low p value, typically p<.05, throws doubt on H0 and leads to the rejection of H0 and a conclusion that the effect in question is statistically significant.
No doubt, this will lead to all sorts of philosophy of statistics arguments, which I will avoid here. However, there are two points from the paper that are worth noting for statistical pawadan (boldface mine):
The second limitation is that the p value is very likely to be quite different if an experiment is repeated. For example if a two-tailed result gives p = 0.05, there is an 80% chance the one-tailed p value from a replication will fall in the interval (.00008, .44), a 10% chance that p<.00008, and fully a 10% chance that p>.44 . In other words, a p value provides only extremely vague information about a result’s repeatability. Researchers do not appreciate this weakness of p .
The third limitation is that the conclusion “Yes, there is a difference from zero” is almost always true. In other words, the null hypothesis is almost never exactly correct. The probability that H0 will be rejected increases with the sample size (N), so the result of NHST says as much, or more, about N [the size of the experiment] as about any hypothesis. One example is that a very low two-tailed correlation coefficient r = 0.10 is not sufficient to reject the H0 of a zero true correlation with p<0.05, up to N = 380 participants. Above this number, H0 can be rejected.
These might have something to do with the Decline Effect. Just saying.