Two Issues With Null Hypothesis Testing

There’a PLoSOne article, “High Impact = High Statistical Standards? Not Necessarily So” that will probably get a lot of discussion. It describes the misuse of null hypothesis testing (“NHST”), which is:

NHST starts by assuming that a null hypothesis, H0, is true, where H0 is typically a statement of zero effect, zero difference, or zero correlation in the population of interest. A p value is then calculated, where p is the probability, if H0 is true, of obtaining the observed result, or more extreme. A low p value, typically p<.05, throws doubt on H0 and leads to the rejection of H0 and a conclusion that the effect in question is statistically significant.

No doubt, this will lead to all sorts of philosophy of statistics arguments, which I will avoid here. However, there are two points from the paper that are worth noting for statistical pawadan (boldface mine):

The second limitation is that the p value is very likely to be quite different if an experiment is repeated. For example if a two-tailed result gives p = 0.05, there is an 80% chance the one-tailed p value from a replication will fall in the interval (.00008, .44), a 10% chance that p<.00008, and fully a 10% chance that p>.44 [7]. In other words, a p value provides only extremely vague information about a result’s repeatability. Researchers do not appreciate this weakness of p [8].

The third limitation is that the conclusion “Yes, there is a difference from zero” is almost always true. In other words, the null hypothesis is almost never exactly correct. The probability that H0 will be rejected increases with the sample size (N), so the result of NHST says as much, or more, about N [the size of the experiment] as about any hypothesis. One example is that a very low two-tailed correlation coefficient r = 0.10 is not sufficient to reject the H0 of a zero true correlation with p<0.05, up to N = 380 participants. Above this number, H0 can be rejected.

These might have something to do with the Decline Effect. Just saying.

This entry was posted in Statistics. Bookmark the permalink.

2 Responses to Two Issues With Null Hypothesis Testing

  1. Min says:

    Another serious problem is that null hypothesis testing typically inverts the logic of falsifiability. The hypothesis being tested is not the hypothesis of interest. True, evidence against the null hypothesis is taken as evidence in favor of the hypothesis of interest. However, it is also evidence in favor of every other hypothesis than the null. Confirmatory evidence is very weak.

    • Min says:

      I should amend my statement. We may be able to conclude that the data would also disconfirm other hypotheses than the null, even without performing the tests. For instance, if we toss a coin five times and it comes up heads every time, that is evidence against the null hypothesis that the coin is unbiased. It is also evidence against the hypothesis that the coin will come up tails 3/4 of the time. 🙂

Comments are closed.