## You are here

Homesufficient statistic

## Primary tabs

# sufficient statistic

Let $\{f_{\theta}\}$ be a statistical model with parameter
$\theta$. Let $\boldsymbol{X}=(X_{1},\ldots,X_{n})$ be a random vector
of random variables representing $n$ observations. A statistic $T=T(\boldsymbol{X})$ of $\boldsymbol{X}$ for the parameter $\theta$ is called a
*sufficient statistic*, or a *sufficient estimator*, if
the conditional probability distribution of $\boldsymbol{X}$ given
$T(\boldsymbol{X})=t$ is not a function of $\theta$ (equivalently,
does not depend on $\theta$).

In other words, all the information about the unknown parameter
$\theta$ is captured in the sufficient statistic $T$. If, say, we
are interested in finding out the percentage of defective light
bulbs in a shipment of new ones, it is enough, or *sufficient*,
to count the number of defective ones (sum of the $X_{i}$’s), rather
than worrying about which individual light bulbs are the defective
ones (the vector $(X_{1},\ldots,X_{n})$). By taking the sum, a certain
“reduction” of data has been achieved.

Examples

1. Let $X_{1},\ldots,X_{n}$ be $n$ independent observations from a uniform distribution on integers $1,\ldots,\theta$. Let $T=\max\{X_{1},\ldots,X_{n}\}$ be a statistic for $\theta$. Then the conditional probability distribution of $\boldsymbol{X}=(X_{1},\ldots,X_{n})$ given $T=t$ is

$P(\boldsymbol{X}\mid t)=\frac{P(X_{1}=x_{1},\ldots,X_{n}=x_{n},\max\{X_{n}\}=t% )}{P(\max\{X_{n}\}=t)}.$ The numerator is $0$ if $\max\{x_{n}\}\neq t$. So in this case, $P(\boldsymbol{X}\mid t)=0$ and is not a function of $\theta$. Otherwise, the numerator is $\theta^{{-n}}$ and $P(\boldsymbol{X}\mid t)$ becomes

$\frac{\theta^{{-n}}}{P(\max\{X_{n}\}=t)}=(\theta^{n}P(X_{{(1)}}\leq\cdots\leq X% _{{(n)}}=t))^{{-1}},$ where $X_{{(i)}}$’s are the rearrangements of the $X_{i}$’s in a non-decreasing order from $i=1$ to $n$. For the denominator, we first note that

$\displaystyle P(X_{{(1)}}\leq\cdots\leq X_{{(n)}}=t)$ $\displaystyle=$ $\displaystyle P(X_{{(1)}}\leq\cdots\leq X_{{(n)}}\leq t)-P(X_{{(1)}}\leq\cdots% \leq X_{{(n)}}<t)$ $\displaystyle=$ $\displaystyle P(X_{{(1)}}\leq\cdots\leq X_{{(n)}}\leq t)-P(X_{{(1)}}\leq\cdots% \leq X_{{(n)}}\leq t-1).$ From the above equation, we find that there are $t^{n}-(t-1)^{n}$ ways to form non-decreasing finite sequences of $n$ positive integers such that the maximum of the sequence is $t$. So

$(\theta^{n}P(X_{{(1)}}\leq\cdots\leq X_{{(n)}}=t))^{{-1}}=(\theta^{n}(t^{n}-(t% -1)^{n})\theta^{{-n}})^{{-1}}=(t^{n}-(t-1)^{n})^{{-1}}$ again is not a function of $\theta$. Therefore, $T=\max\{X_{i}\}$ is a sufficient statistic for $\theta$. Here, we see that a reduction of data has been achieved by taking only the largest member of set of observations, not the entire set.

2. If we set $T(X_{1},\ldots,X_{n})=(X_{1},\ldots,X_{n})$, then we see that $T$ is trivially a sufficient statistic for

*any*parameter $\theta$. The conditional probability distribution of $(X_{1},\ldots,X_{n})$ given $T$ is 1. Even though this is a sufficient statistic by definition (of course, the individual observations provide as much information there is to know about $\theta$ as possible), and there is no loss of data in $T$ (which is simply a list of all observations), there is really no reduction of data to speak of here.3. The sample mean

$\overline{X}=\frac{X_{1}+\cdots+X_{n}}{n}$ of $n$ independent observations from a normal distribution $N(\mu,\sigma^{2})$ (both $\mu$ and $\sigma^{2}$ unknown) is a sufficient statistic for $\mu$. This is the result of the factorization criterion. Similarly, one sees that any partition of the sum of $n$ observations $X_{i}$ into $m$ subtotals is a sufficient statistic for $\mu$. For instance,

$T(X_{1},\ldots,X_{n})=(\sum_{{i=1}}^{{j}}X_{i},\sum_{{i=j+1}}^{{k}}X_{i},\sum_% {{i=k+1}}^{{n}}X_{i})$ is a sufficient statistic for $\mu$.

4. Again, assume there are $n$ independent observations $X_{i}$ from a normal distribution $N(\mu,\sigma^{2})$ with unknown mean and variance. The sample variance

$\frac{1}{n-1}\sum_{{i=1}}^{{n}}(X_{i}-\overline{X})^{2}$ is

*not*a sufficient statistic for $\sigma^{2}$. However, if $\mu$ is a known constant, then$\frac{1}{n-1}\sum_{{i=1}}^{{n}}(X_{i}-\mu)^{2}$ is a sufficient statistic for $\sigma^{2}$.

A sufficient statistic for a parameter $\theta$ is called
a *minimal sufficient statistic* if it can be expressed as a
function of any sufficient statistic for $\theta$.

Example. In example $3$ above, both the sample mean $\overline{X}$ and the finite sum $S=X_{1}+\cdots+X_{n}$ are minimal sufficient statistics for the mean $\mu$. Since, by the factorization criterion, any sufficient statistic $T$ for $\mu$ is a vector whose coordinates form a partition of the finite sum, taking the sum of these coordinates is just the finite sum $S$. So, we have just expressed $S$ as a function of $T$. Therefore, $S$ is minimal. Similarly, $\overline{X}$ is minimal.

Two sufficient statistics $T_{1},T_{2}$ for a parameter $\theta$ are said to be equivalent provided that there is a bijection $g$ such that $g\circ T_{1}=T_{2}$. $\overline{X}$ and $S$ from the above example are two equivalent sufficient statistics. Two minimal sufficient statistics for the same parameter are equivalent.

## Mathematics Subject Classification

62B05*no label found*

- Forums
- Planetary Bugs
- HS/Secondary
- University/Tertiary
- Graduate/Advanced
- Industry/Practice
- Research Topics
- LaTeX help
- Math Comptetitions
- Math History
- Math Humor
- PlanetMath Comments
- PlanetMath System Updates and News
- PlanetMath help
- PlanetMath.ORG
- Strategic Communications Development
- The Math Pub
- Testing messages (ignore)

- Other useful stuff
- Corrections