# Likelihood function

*in Wiktionary, the free dictionary.*

**likelihood****Likelihood** as a solitary term is a shorthand for **likelihood function**. In non-technical usage, "likelihood" is a synonym for "probability", but throughout this article only the technical definition is used. Informally, if "probability" allows us to predict unknown outcomes based on known parameters, then "likelihood" allows us to determine unknown parameters based on known outcomes.

In a sense, likelihood works backwards from probability: given *B*, we use the conditional probability Pr(*A*|*B*) to reason about *A*, and, given *A*, we use the likelihood function *L*(*B*|*A*) to reason about *B*. This mode of reasoning is formalized in Bayes' theorem:

In statistics, a **likelihood function** is a conditional probability function considered as a function of its *second* argument with its first argument held fixed, thus:

and also any other function proportional to such a function.
That is, the likelihood function for *B* is the equivalence class of functions

for any constant of proportionality . Thus the numerical value is immaterial; all that matters are ratios of the form

since these are invariant with respect to the constant of proportionality.

For more about making inferences via likelihood functions, see also the method of maximum likelihood, and likelihood-ratio testing.

## Contents

## Concentrated likelihood

For a likelihood function of more than one parameter, it is sometimes possible to write some parameters as functions of other parameters, thereby reducing the number of independent parameters. (The function is the parameter value which maximises the likelihood given the value of the other parameters.) This procedure is called concentration of the parameters and results in the concentrated likelihood function.

For example, consider a regression analysis model with normally distributed errors. The most likely value of the error variance is the variance of the residuals. The residuals depend on all other parameters. Hence the variance parameter can be written as a function of the other parameters.

## Historical remarks

Some early thoughts on likelihood were made in a book by Thorvald N. Thiele published in 1889^{[1]}.
The first paper where the full idea of the "likelihood" appears was written by R.A. Fisher in 1922^{[2]}: "On the mathematical foundations of theoretical statistics". In that paper, Fisher also uses the term "method of maximum likelihood". Fisher argues against inverse probability as a basis for statistical inferences, and instead proposes inferences based on likelihood functions.

## Likelihood function of a parameterized model

Among many applications, we consider here one of broad theoretical and practical importance. Given a parameterized family of probability density functions

where θ is the parameter (in the case of discrete distributions, the probability density functions are probability "mass" functions) the **likelihood function** is

where *x* is the observed outcome of an experiment. In other words, when *f*(*x* | θ) is viewed as a function of *x* with θ fixed, it is a probability density function, and when viewed as a function of θ with *x* fixed, it is a likelihood function.

*Note:* This is *not* the same as the probability that those parameters are the right ones, given the observed sample. Attempting to interpret the likelihood of a hypothesis given observed evidence as the probability of the hypothesis is a common error, with potentially disastrous real-world consequences in medicine, engineering or jurisprudence. See prosecutor's fallacy for an example of this.

## Example

For example, if I toss a coin, with a probability *p _{H}* of landing heads up ('H'), the probability of getting two heads in two trials ('HH') is

*p*. If

_{H}^{2}*p*= 0.5, then the probability of seeing two heads is 0.25.

_{H}In symbols, we can say the above as

Another way of saying this is to reverse it and say that "the likelihood of *p _{H}* = 0.5, given the observation 'HH', is 0.25", i.e.,

- .

But this is not the same as saying that the *probability* of *p _{H}* = 0.5, given the observation, is 0.25.

To take an extreme case, on this basis we can say "the likelihood of *p _{H}* = 1 given the observation 'HH' is 1". But it is clearly not the case that the

*probability*of

*p*= 1 given the observation is 1: the event 'HH' can occur for any

_{H}*p*> 0 (and often does, in reality, for

_{H}*p*roughly 0.5). If the

_{H}*probability*of

*p*= 1 given the observation is 1, it means that

_{H}*p*must and can only be equal 1 for event 'HH' to occur which is obviously not true.

_{H}The likelihood function is not a probability density function – for example, the integral of a likelihood function is not in general 1. In this example, the integral of the likelihood density over the interval [0, 1] in *p _{H}* is 1/3, demonstrating again that the likelihood density function cannot be interpreted as a probability density function for

*p*. On the other hand, given any particular value of

_{H}*p*

_{H}, e.g.

*p*

_{H}= 0.5, the integral of the probability density function over the domain of the random variables

**is**1.

## See also

- Bayes factor
- Bayesian inference
- conditional probability
- likelihood principle
- likelihood-ratio test
- maximum likelihood
- principle of maximum entropy
- score (statistics)

## Notes

- ↑ Steffen L. Lauritzen, Aspects of T. N. Thiele's Contributions to Statistics (1999).
- ↑ Ronald A. Fisher. "On the mathematical foundations of theoretical statistics".
*Philosophical Transactions of the Royal Society*, A, 222:309-368 (1922).*("Likelihood" is discussed in section 6.)*

## References

- A. W. F. Edwards (1972).
*Likelihood: An account of the statistical concept of likelihood and its application to scientific inference*, Cambridge University Press. Reprinted in 1992, expanded edition, Johns Hopkins University Press.