# P-value

## Overview

In statistical hypothesis testing, the p-value is the probability of obtaining a result at least as extreme as a given data point, assuming the data point was the result of chance alone. The fact that p-values are based on this assumption is crucial to their correct interpretation. The p-value may be noted as a decimal: p-value < 0.05 means that the likelihood that the event occurred by chance alone is less than 5%. The lower the p-value, the less likely the event would occur by chance alone.

## Coin flipping example

For example, say an experiment is performed to determine if a coin flip is fair (50% chance of landing heads or tails), or unfairly biased, either toward heads (> 50% chance of landing heads) or toward tails (< 50% chance of landing heads). Since we consider both biased alternatives, a two-tailed test is performed. The null hypothesis is that the coin is fair, and that any deviations from the 50% rate can be ascribed to chance alone. Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips. The p-value of this result would be the chance of a fair coin landing on heads at least 14 times out of 20 flips (as larger values in this case are also less favorable to the null hypothesis of a fair coin) or landing on tails at most 6 times out of 20 flips. In this case the random variable T has a binomial distribution. The probability that 20 flips of a fair coin would result in 14 or more heads is 0.0577. Since this is a two-tailed test, the probability that 20 flips of the coin would result in 14 or more heads or 6 or less heads is 0.0577 x 2 = 0.115.

Generally, the smaller the p-value, the more people there are who would be willing to say that the results came from a biased coin.

## Interpretation

Generally, one rejects the null hypothesis if the p-value is smaller than or equal to the significance level, often represented by the Greek letter α (alpha). If the level is 0.05, then the results are only 5% likely to be as extraordinary as just seen, given that the null hypothesis is true.

In the above example, the calculated p-value exceeds 0.05, and thus the null hypothesis - that the observed result of 14 heads out of 20 flips can be ascribed to chance alone - is not rejected. Such a finding is often stated as being "not statistically significant at the 5% level".

However, had a single extra head been obtained, the resulting p-value would be 0.02. This time the null hypothesis - that the observed result of 15 heads out of 20 flips can be ascribed to chance alone - is rejected. Such a finding would be described as being "statistically significant at the 5% level".

Critics of p-values point out that the criterion used to decide "statistical significance" is based on the somewhat arbitrary choice of level (often set at 0.05). A proposed replacement for the p-value is p-rep.

## Frequent misunderstandings

There are several common misunderstandings about p-values.

1. The p-value is not the probability that the null hypothesis is true (claimed to justify the "rule" of considering as significant p-values closer to 0 (zero)).
In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses. Comparison of Bayesian and classical approaches shows that a p-value can be very close to zero while the posterior probability of the null is very close to unity. This is the Jeffreys-Lindley paradox.
2. The p-value is not the probability that a finding is "merely a fluke" (again, justifying the "rule" of considering small p-values as "significant").
As the calculation of a p-value is based on the assumption that a finding is the product of chance alone, it patently cannot simultaneously be used to gauge the probability of that assumption being true.
3. The p-value is not the probability of falsely rejecting the null hypothesis. This error is a version of the so-called prosecutor's fallacy.
4. The p-value is not the probability that a replicating experiment would not yield the same conclusion.
5. 1 − (p-value) is not the probability of the alternative hypothesis being true (see (1)).
6. The significance level of the test is not determined by the p-value.
The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
7. The p-value does not indicate the size or importance of the observed effect (compare with effect size). 