# Yule-Simon distribution

Parameters Probability mass functionPlot of the Yule-Simon PMFYule-Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) Cumulative distribution functionPlot of the Yule-Simon CMFYule-Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) $\rho >0\,$ shape (real) $k\in \{1,2,\dots \}\,$ $\rho \,\mathrm {B} (k,\rho +1)\,$ $1-k\,\mathrm {B} (k,\rho +1)\,$ ${\frac {\rho }{\rho -1}}\,$ for $\rho >1\,$ $1\,$ ${\frac {\rho ^{2}}{(\rho -1)^{2}\;(\rho -2)}}\,$ for $\rho >2\,$ ${\frac {(\rho +1)^{2}\;{\sqrt {\rho -2}}}{(\rho -3)\;\rho }}\,$ for $\rho >3\,$ $\rho +3+{\frac {11\rho ^{3}-49\rho -22}{(\rho -4)\;(\rho -3)\;\rho }}\,$ for $\rho >4\,$ ${\frac {\rho }{\rho +1}}\;{}_{2}F_{1}(1,1;\rho +2;e^{t})\,e^{t}\,$ ${\frac {\rho }{\rho +1}}\;{}_{2}F_{1}(1,1;\rho +2;e^{i\,t})\,e^{i\,t}\,$ In probability and statistics, the Yule-Simon distribution is a discrete probability distribution named after Udny Yule and Herbert Simon. Simon originally called it the Yule distribution.

The probability mass function of the Yule-Simon(ρ) distribution is

$f(k;\rho )=\rho \,\mathrm {B} (k,\rho +1),\,$ for integer $k\geq 1$ and real $\rho >0$ , where $\mathrm {B}$ is the beta function. Equivalently the pmf can be written in terms of the falling factorial as

$f(k;\rho )={\frac {\rho \,\Gamma (\rho +1)}{(k+\rho )^{\underline {\rho +1}}}},\,$ where $\Gamma$ is the gamma function. Thus, if $\rho$ is an integer,

$f(k;\rho )={\frac {\rho \,\rho !\,(k-1)!}{(k+\rho )!}}.\,$ The probability mass function f has the property that for sufficiently large k we have

$f(k;\rho )\approx {\frac {\rho \,\Gamma (\rho +1)}{k^{\rho +1}}}\propto {\frac {1}{k^{\rho +1}}}.\,$ This means that the tail of the Yule-Simon distribution is a realization of Zipf's law: $f(k;\rho )$ can be used to model, for example, the relative frequency of the $k$ th most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of $k$ .

## Occurrence

The Yule-Simon distribution arises as a continuous mixture of geometric distributions. Specifically, assume that $W$ follows an exponential distribution with scale $1/\rho$ or rate $\rho$ :

$W\sim \mathrm {Exponential} (\rho )\,$ $h(w;\rho )=\rho \,\exp(-\rho \,w)\,$ Then a Yule-Simon distributed variable $K$ has the following geometric distribution:

$K\sim \mathrm {Geometric} (\exp(-W))\,$ The pmf of a geometric distribution is

$g(k;p)=p\,(1-p)^{k-1}\,$ for $k\in \{1,2,\dots \}$ . The Yule-Simon pmf is then the following exponential-geometric mixture distribution:

$f(k;\rho )=\int _{0}^{\infty }\,\,\,g(k;\exp(-w))\,h(w;\rho )\,dw\,$ ## Generalizations

The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule-Simon(ρ, α) distribution is defined as

$f(k;\rho ,\alpha )={\frac {\rho }{1-\alpha ^{\rho }}}\;\mathrm {B} _{1-\alpha }(k,\rho +1),\,$ with $0\leq \alpha <1$ . For $\alpha =0$ the ordinary Yule-Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.

File:Yule-Simon distribution.png
Plot of the Yule-Simon(1) distribution (red) and its asymptotic Zipf law (blue) 