You are here
HomeCramer's V
Primary tabs
Cramer’s V
Cramer’s V is a statistic measuring the strength of association or
dependency between two (nominal) categorical variables in a
contingency table.
Setup. Suppose $X$ and $Y$ are two categorical variables
that are to be analyzed in a some experimental or observational data
with the following information:

$X$ has $M$ distinct categories or classes, labeled $X_{1},\ldots,X_{M}$,

$Y$ has $N$ distinct categories, labeled $Y_{1},\ldots,Y_{N}$,

$n$ pairs of observations $(x_{k},y_{k})$ are taken, where $x_{i}$ belongs to one of the $M$ categories in $X$ and $y_{i}$ belongs to one of the $N$ categories in $Y$.
Form a $M\times N$ contingency table such that Cell $(i,j)$ contains the count $n_{{ij}}$ of occurrences of Category $X_{i}$ in $X$ and Category $Y_{j}$ in $Y$:
$X\backslash Y$  $Y_{1}$  $Y_{2}$  $\cdots$  $Y_{N}$ 
$X_{1}$  $n_{{11}}$  $n_{{12}}$  $\cdots$  $n_{{1N}}$ 
$X_{2}$  $n_{{21}}$  $n_{{22}}$  $\cdots$  $n_{{2N}}$ 
$\vdots$  $\vdots$  $\vdots$  $\ddots$  $\vdots$ 
$X_{M}$  $n_{{M1}}$  $n_{{M2}}$  $\cdots$  $n_{{MN}}$ 
Note that $n=\sum n_{{ij}}$.
Definition. Suppose that the null hypothesis is that $X$
and $Y$ are independent random variables. Based on the table and
the null hypothesis, the chisquared statistic $\chi^{2}$ can be
computed. Then, Cramer’s V is defined to be
$V=V(X,Y)=\sqrt{\frac{\chi^{2}}{n\operatorname{min}(M1,N1)}}.$ 
Of course,
in order for $V$ to make sense, each categorical variable must have
at least 2 categories.
Remarks.
1. $0\leq V\leq 1$. The closer $V$ is to 0, the smaller the association between the categorical variables $X$ and $Y$. On the other hand, $V$ being close to 1 is an indication of a strong association between $X$ and $Y$. If $X=Y$, then $V(X,Y)=1$.
2. When comparing more than two categorical variables, it is customary to set up a square matrix, where cell $(i,j)$ represents the Cramer’s V between the $i$th variable and the $j$th variable. If there are $n$ variables, there are $\frac{n(n1)}{2}$ Cramer’s V’s to calculate, since, for any discrete random variables $X$ and $Y$, $V(X,X)=1$ and $V(X,Y)=V(Y,X)$. Consequently, this matrix is symmetric.
3. If one of the categorical variables is dichotomous, (either $M$ or $N=2$), Cramer’s V is equal to the phi statistic ($\Phi$), which is defined to be
$\Phi=\sqrt{\frac{\chi^{2}}{n}}.$ 4. Cramer’s V is named after the Swedish mathematician and statistician Harald Cramér, who sought to make statistics mathematically rigorous, much like Kolmogorov’s axiomatization of probability theory. Cramér also made contributions to number theory, probability theory, and actuarial mathematics widely used by the insurance industry.
References
 1 A. Agresti, Categorical Data Analysis, WileyInterscience, 2nd ed. 2002.
 2 H. Cramér, Mathematical Methods of Statistics, Princeton University Press, 1999.
Mathematics Subject Classification
62H17 no label found Forums
 Planetary Bugs
 HS/Secondary
 University/Tertiary
 Graduate/Advanced
 Industry/Practice
 Research Topics
 LaTeX help
 Math Comptetitions
 Math History
 Math Humor
 PlanetMath Comments
 PlanetMath System Updates and News
 PlanetMath help
 PlanetMath.ORG
 Strategic Communications Development
 The Math Pub
 Testing messages (ignore)
 Other useful stuff
 Corrections