In statistical hypothesis testing, the p-value is the probability of obtaining a result at least as extreme as a given data point, assuming the data point was the result of chance alone. The fact that p-values are based on this assumption is crucial to their correct interpretation. The p-value may be noted as a decimal: p-value < 0.05 means that the likelihood that the event occurred by chance alone is less than 5%. The lower the p-value, the less likely the event would occur by chance alone.
Coin flipping example
For example, say an experiment is performed to determine if a coin flip is fair (50% chance of landing heads or tails), or unfairly biased, either toward heads (> 50% chance of landing heads) or toward tails (< 50% chance of landing heads). Since we consider both biased alternatives, a two-tailed test is performed. The null hypothesis is that the coin is fair, and that any deviations from the 50% rate can be ascribed to chance alone. Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips. The p-value of this result would be the chance of a fair coin landing on heads at least 14 times out of 20 flips (as larger values in this case are also less favorable to the null hypothesis of a fair coin) or landing on tails at most 6 times out of 20 flips. In this case the random variable T has a binomial distribution. The probability that 20 flips of a fair coin would result in 14 or more heads is 0.0577. Since this is a two-tailed test, the probability that 20 flips of the coin would result in 14 or more heads or 6 or less heads is 0.0577 x 2 = 0.115.
Generally, the smaller the p-value, the more people there are who would be willing to say that the results came from a biased coin.
Generally, one rejects the null hypothesis if the p-value is smaller than or equal to the significance level, often represented by the Greek letter α (alpha). If the level is 0.05, then the results are only 5% likely to be as extraordinary as just seen, given that the null hypothesis is true.
In the above example, the calculated p-value exceeds 0.05, and thus the null hypothesis - that the observed result of 14 heads out of 20 flips can be ascribed to chance alone - is not rejected. Such a finding is often stated as being "not statistically significant at the 5% level".
However, had a single extra head been obtained, the resulting p-value would be 0.02. This time the null hypothesis - that the observed result of 15 heads out of 20 flips can be ascribed to chance alone - is rejected. Such a finding would be described as being "statistically significant at the 5% level".
Critics of p-values point out that the criterion used to decide "statistical significance" is based on the somewhat arbitrary choice of level (often set at 0.05). A proposed replacement for the p-value is p-rep.
There are several common misunderstandings about p-values.
- The p-value is not the probability that the null hypothesis is true (claimed to justify the "rule" of considering as significant p-values closer to 0 (zero)).
- The p-value is not the probability that a finding is "merely a fluke" (again, justifying the "rule" of considering small p-values as "significant").
- As the calculation of a p-value is based on the assumption that a finding is the product of chance alone, it patently cannot simultaneously be used to gauge the probability of that assumption being true.
- The p-value is not the probability of falsely rejecting the null hypothesis. This error is a version of the so-called prosecutor's fallacy.
- The p-value is not the probability that a replicating experiment would not yield the same conclusion.
- 1 − (p-value) is not the probability of the alternative hypothesis being true (see (1)).
- The significance level of the test is not determined by the p-value.
- The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
- The p-value does not indicate the size or importance of the observed effect (compare with effect size).
- Free p-Value Calculator for the Chi-Square test from Daniel Soper's Free Statistics Calculators website. Computes the one-tailed probability value of a chi-square test (i.e., the area under the chi-square distribution from the chi-square value to infinity), given the chi-square value and the degrees of freedom.
- Free p-Value Calculator for the Fisher F-test from Daniel Soper's Free Statistics Calculators website. Computes the probability value of an F-test, given the F-value, numerator degrees of freedom, and denominator degrees of freedom.
- Free p-Value Calculator for the Student t-test from Daniel Soper's Free Statistics Calculators website. Computes the one-tailed and two-tailed probability values of a t-test, given the t-value and the degrees of freedom.
- Understanding P-values, Jim Berger's page with links to various websites about p-values, and a Java applet that illustrates how the numerical values of p-values can give quite misleading impressions about the truth or falsity of the hypothesis under test.
- Dallal GE (2007) Historical background to the origins of p-values and the choice of 0.05 as the cut-off for significance
- Hubbard R, Armstrong JS (2005) Historical background on the widespread confusion of the p-value (PDF)
- Fisher's method for combining independent tests of significance using their p-values
- Duffy ME, Munroe BH, Jacobsen BS. Sifting the evidence — what's wrong with significance tests?. Unknown parameter
|edition=suggested) (help); Unknown parameter
- Sterne JAC, Smith GD (2001). "Sifting the evidence — what's wrong with significance tests?". BMJ. 322 (7280): 226–231.