The \chi^2 Test of Independence

The $\chi^2$ Test of Independence#

Just as ANOVA is the straightforward extension of $t$ procedures into the cases where we have more than 2 samples of numeric data, $\chi^2$ methods are the mathematical extension of $z$-proportion procedures for categorical data.

Example: Using R Calculations#

The table below shows a breakdown at a certain university of the number of students still undecided about their majors compared to the number who chosen a major already.

	Freshman	Sophomore	Junior
Have Chosen a Major	114	168	198
Have not Chosen a Major	212	171	92

Hypotheses#

The hypothesis setup in its most general form is as follows:

$H_0 : \text{Variables are Independent}$
$H_a : \text{Variables are Dependent}$

We often include more specificity for the names of the variable to better indicate what is being studied which in this case would be as follows:

$H_0 : \text{The proportion of students who have chosen a major is }\textbf{independent }\text{of year in school}$
$H_a : \text{The proportion of students who have chosen a major is }\textbf{dependent }\text{upon year in school}$

Observed Data Matrix#

We create the observed data below:

obs <- matrix(c(114,212,168,171,198,92),ncol=3)
obs

114	168	198
212	171	92

We add column titles and row titles as follows:

colnames(obs) <- c('Freshmen', 'Sophomore', 'Junior')
rownames(obs) <- c('Have Chosen', 'Have NOT Chosen')
obs

	Freshmen	Sophomore	Junior
Have Chosen	114	168	198
Have NOT Chosen	212	171	92

Conduct the Test#

chisq.test(obs)

	Pearson's Chi-squared test

data:  obs
X-squared = 68.207, df = 2, p-value = 1.545e-15

Reporting Out#

Because $p = 1.545\times 10^{-15} < 0.05 = \alpha$, we reject the null. We thus have evidence that the percentage of students who have chosen their majors depends upon which year in school they are.

Example: Using Tables and Formulas#

We have the observed data matrix above. We need to calculate the expected matrix. For this, we will need a formula to work with. From the formula sheet, we have the following for calculating cells of the expected matrix:

\[\text{expected count} = \frac{\text{row total}\times \text{column total}}{\text{table total}}\]

Expected Matrix#

Starting with the observed data matrix:$$$$

obs

	Freshmen	Sophomore	Junior
Have Chosen	114	168	198
Have NOT Chosen	212	171	92

We calculate the expected matrix with the top-left cell ($TL$) as follows:

\[\begin{split}\begin{align}TL &= \frac{(114+168+198) \times (114+212)}{955}\\&= \frac{(480) \times (326)}{955}\\&= \frac{156480}{955}\\&=163.85\end{align}\end{split}\]

The bottom-left ($BL$) is as follows: $$\begin{align}BL &= \frac{(114+168+198) \times (168+171)}{955}\\&=170.39\end{align}$$

Proceeding in the same for four more times, we have the following exp matrix:

obs

	Freshmen	Sophomore	Junior
Have Chosen	114	168	198
Have NOT Chosen	212	171	92

exp <- matrix(c(163.85,162.15,170.39,168.61,145.76,144.24),ncol=3)
colnames(exp) <- c('Freshmen', 'Sophomore', 'Junior')
rownames(exp) <- c('Have Chosen', 'Have NOT Chosen')
exp

	Freshmen	Sophomore	Junior
Have Chosen	163.85	170.39	145.76
Have NOT Chosen	162.15	168.61	144.24

Test Statistic $\chi^2$#

To calcuate the $\chi^2$ test statistic, referring to the formula sheet provides the following:

\[\chi^2 = \sum \frac{(O−E)^2}{E}\]

where

O : Observed Cell Count
E : Expected Cell Count

Hence:

\[\begin{split}\begin{align}\chi^2 &= \frac{(114-163.85)^2}{163.85}+\frac{(212-162.15)^2}{162.15}+\frac{(168-170.39)^2}{170.39}+\frac{(171-168.61)^2}{168.61}\\&+\frac{(198-145.76)^2}{145.76}+\frac{(92-144.24)^2}{144.24}\\&= \frac{2485.0}{163.85}+\frac{2485.0}{162.15}+\frac{5.7}{170.39}+\frac{5.7}{168.61}+\frac{2729.0}{145.76}+\frac{2729.0}{144.24}\\&= 15.17+15.33+0.03+0.03+18.72+18.92\end{align}\end{split}\]

which gives:

$\displaystyle x^2\approx 68.2$

Cutoff Value from Table#

To find ${\chi^2}^*$ in the class’s $\chi^2$ table, note that we have

\[df = (r-1)(c-1)=2\times 1=2\]

where $r$ and $c$ are the numbers of rows and number of columns respectively in the observed and expected matrices. Both matrices should have identical shape. In the row where $df = 2$ and the column where $\alpha = 0.05$, we find that:

\[{\chi^2}^* = 5.991\]

Reporting Out#

Since $\chi^2 = 68.2 > 5.991 = {\chi^2}^2$, we reject the null hypothesis. We thus have evidence for the alternative which indicates that the proportion of students who have chosen their major depends upon the year in school.

The \chi^2 Test of Independence

Contents

The \(\chi^2\) Test of Independence#

Example: Using R Calculations#

Hypotheses#

Observed Data Matrix#

Conduct the Test#

Reporting Out#

Example: Using Tables and Formulas#

Expected Matrix#

Test Statistic \(\chi^2\)#

Cutoff Value from Table#

Reporting Out#