{ "cells": [ { "cell_type": "markdown", "id": "8290d04d-fb31-4224-b584-0b305767e5f7", "metadata": {}, "source": [ "# The $\\chi^2$ Test of Independence\n", "\n", "Just as ANOVA is the straightforward extension of $t$ procedures into the cases where we have more than 2 samples of numeric data, $\\chi^2$ methods are the mathematical extension of $z$-proportion procedures for categorical data." ] }, { "cell_type": "markdown", "id": "2b51c8cb-9cbd-4f89-878d-82b40c9ffa76", "metadata": {}, "source": [ "## Example: Using R Calculations\n", "\n", "The table below shows a breakdown at a certain university of the number of students still undecided about their majors compared to the number who chosen a major already." ] }, { "cell_type": "markdown", "id": "07afd179-ed7f-4996-b411-149c9e3d920c", "metadata": {}, "source": [ "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "
FreshmanSophomoreJunior
Have Chosen a Major114168198
Have not Chosen a Major21217192
" ] }, { "cell_type": "markdown", "id": "1f0efd76-6323-48ce-8952-81c367da5b45", "metadata": {}, "source": [ "### Hypotheses\n", "\n", "The hypothesis setup in its most general form is as follows:\n", "\n", "- $H_0 : \\text{Variables are Independent}$\n", "- $H_a : \\text{Variables are Dependent}$\n", "\n", "We often include more specificity for the names of the variable to better indicate what is being studied which in this case would be as follows:\n", "\n", "- $H_0 : \\text{The proportion of students who have chosen a major is }\\textbf{independent }\\text{of year in school}$\n", "- $H_a : \\text{The proportion of students who have chosen a major is }\\textbf{dependent }\\text{upon year in school}$" ] }, { "cell_type": "markdown", "id": "9e092df2-0718-4799-9d9b-5c220937f8b9", "metadata": {}, "source": [ "### Observed Data Matrix\n", "\n", "We create the observed data below:" ] }, { "cell_type": "code", "execution_count": 1, "id": "c4007310-3e00-4e01-9b3c-0136932ad6da", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\t\n", "\t\n", "\n", "
114168198
212171 92
\n" ], "text/latex": [ "\\begin{tabular}{lll}\n", "\t 114 & 168 & 198\\\\\n", "\t 212 & 171 & 92\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| 114 | 168 | 198 |\n", "| 212 | 171 | 92 |\n", "\n" ], "text/plain": [ " [,1] [,2] [,3]\n", "[1,] 114 168 198 \n", "[2,] 212 171 92 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "obs <- matrix(c(114,212,168,171,198,92),ncol=3)\n", "obs" ] }, { "cell_type": "markdown", "id": "2fe4b6d1-4f70-4fd8-8e85-c8060af69745", "metadata": {}, "source": [ "We add column titles and row titles as follows:" ] }, { "cell_type": "code", "execution_count": 2, "id": "fed6f4de-2943-4fc0-9fc3-69cd6514a2bd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
FreshmenSophomoreJunior
Have Chosen114168198
Have NOT Chosen212171 92
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " & Freshmen & Sophomore & Junior\\\\\n", "\\hline\n", "\tHave Chosen & 114 & 168 & 198\\\\\n", "\tHave NOT Chosen & 212 & 171 & 92\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Freshmen | Sophomore | Junior |\n", "|---|---|---|---|\n", "| Have Chosen | 114 | 168 | 198 |\n", "| Have NOT Chosen | 212 | 171 | 92 |\n", "\n" ], "text/plain": [ " Freshmen Sophomore Junior\n", "Have Chosen 114 168 198 \n", "Have NOT Chosen 212 171 92 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "colnames(obs) <- c('Freshmen', 'Sophomore', 'Junior')\n", "rownames(obs) <- c('Have Chosen', 'Have NOT Chosen')\n", "obs" ] }, { "cell_type": "markdown", "id": "3323d371-1c2f-41f0-8d83-3fcdebab3e52", "metadata": {}, "source": [ "### Conduct the Test" ] }, { "cell_type": "code", "execution_count": 3, "id": "6a6d89a9-2dd8-4004-8eff-cb1d75b5846b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "\tPearson's Chi-squared test\n", "\n", "data: obs\n", "X-squared = 68.207, df = 2, p-value = 1.545e-15\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "chisq.test(obs)" ] }, { "cell_type": "markdown", "id": "2e2d110c-48ff-415f-a51e-07d049b2520c", "metadata": {}, "source": [ "### Reporting Out\n", "\n", "Because $p = 1.545\\times 10^{-15} < 0.05 = \\alpha$, we reject the null. We thus have evidence that the percentage of students who have chosen their majors depends upon which year in school they are." ] }, { "cell_type": "markdown", "id": "97c272bc-8af9-4971-acce-4d18beb9d1e9", "metadata": {}, "source": [ "## Example: Using Tables and Formulas\n", "\n", "We have the observed data matrix above. We need to calculate the expected matrix. For this, we will need a formula to work with. From the [formula sheet](https://faculty.ung.edu/rsinn/3350/StatsFormulas.pdf), we have the following for **calculating cells of the expected matrix**:\n", "\n", "$$\\text{expected count} = \\frac{\\text{row total}\\times \\text{column total}}{\\text{table total}}$$\n", "\n", "### Expected Matrix\n", "\n", "Starting with the observed data matrix:$$$$" ] }, { "cell_type": "code", "execution_count": 4, "id": "eabf920e-c378-4c4d-be84-3e7401d57efa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
FreshmenSophomoreJunior
Have Chosen114168198
Have NOT Chosen212171 92
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " & Freshmen & Sophomore & Junior\\\\\n", "\\hline\n", "\tHave Chosen & 114 & 168 & 198\\\\\n", "\tHave NOT Chosen & 212 & 171 & 92\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Freshmen | Sophomore | Junior |\n", "|---|---|---|---|\n", "| Have Chosen | 114 | 168 | 198 |\n", "| Have NOT Chosen | 212 | 171 | 92 |\n", "\n" ], "text/plain": [ " Freshmen Sophomore Junior\n", "Have Chosen 114 168 198 \n", "Have NOT Chosen 212 171 92 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "obs" ] }, { "cell_type": "markdown", "id": "eb3661c3-aa90-4c25-8e16-7959d664af28", "metadata": {}, "source": [ "We calculate the expected matrix with the top-left cell ($TL$) as follows:\n", "\n", "$$\\begin{align}TL &= \\frac{(114+168+198) \\times (114+212)}{955}\\\\&= \\frac{(480) \\times (326)}{955}\\\\&= \\frac{156480}{955}\\\\&=163.85\\end{align}$$\n", "\n", "The bottom-left ($BL$) is as follows:\n", "$$\\begin{align}BL &= \\frac{(114+168+198) \\times (168+171)}{955}\\\\&=170.39\\end{align}$$" ] }, { "cell_type": "markdown", "id": "f136df0e-06b3-421e-961c-95d2de55a78e", "metadata": {}, "source": [ "Proceeding in the same for four more times, we have the following exp matrix:" ] }, { "cell_type": "code", "execution_count": 5, "id": "22fe0b31-6692-4289-88b5-47c80131de59", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
FreshmenSophomoreJunior
Have Chosen114168198
Have NOT Chosen212171 92
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " & Freshmen & Sophomore & Junior\\\\\n", "\\hline\n", "\tHave Chosen & 114 & 168 & 198\\\\\n", "\tHave NOT Chosen & 212 & 171 & 92\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Freshmen | Sophomore | Junior |\n", "|---|---|---|---|\n", "| Have Chosen | 114 | 168 | 198 |\n", "| Have NOT Chosen | 212 | 171 | 92 |\n", "\n" ], "text/plain": [ " Freshmen Sophomore Junior\n", "Have Chosen 114 168 198 \n", "Have NOT Chosen 212 171 92 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "obs" ] }, { "cell_type": "code", "execution_count": 6, "id": "be6b7446-a9ba-49e1-81c3-adececdd3fd0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "
FreshmenSophomoreJunior
Have Chosen163.85170.39145.76
Have NOT Chosen162.15168.61144.24
\n" ], "text/latex": [ "\\begin{tabular}{r|lll}\n", " & Freshmen & Sophomore & Junior\\\\\n", "\\hline\n", "\tHave Chosen & 163.85 & 170.39 & 145.76\\\\\n", "\tHave NOT Chosen & 162.15 & 168.61 & 144.24\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| | Freshmen | Sophomore | Junior |\n", "|---|---|---|---|\n", "| Have Chosen | 163.85 | 170.39 | 145.76 |\n", "| Have NOT Chosen | 162.15 | 168.61 | 144.24 |\n", "\n" ], "text/plain": [ " Freshmen Sophomore Junior\n", "Have Chosen 163.85 170.39 145.76\n", "Have NOT Chosen 162.15 168.61 144.24" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp <- matrix(c(163.85,162.15,170.39,168.61,145.76,144.24),ncol=3)\n", "colnames(exp) <- c('Freshmen', 'Sophomore', 'Junior')\n", "rownames(exp) <- c('Have Chosen', 'Have NOT Chosen')\n", "exp" ] }, { "cell_type": "markdown", "id": "9217a2ac-0f9d-4950-962d-4c8d2135da71", "metadata": {}, "source": [ "### Test Statistic $\\chi^2$\n", "\n", "To calcuate the $\\chi^2$ test statistic, referring to the formula sheet provides the following:\n", "\n", "$$\\chi^2 = \\sum \\frac{(O−E)^2}{E}$$\n", "\n", "where\n", "- O : Observed Cell Count\n", "- E : Expected Cell Count\n", "\n", "Hence:" ] }, { "cell_type": "markdown", "id": "8fadb1fc-23d8-4908-9e4c-72c1c6039f2d", "metadata": {}, "source": [ "$$\\begin{align}\\chi^2 &= \\frac{(114-163.85)^2}{163.85}+\\frac{(212-162.15)^2}{162.15}+\\frac{(168-170.39)^2}{170.39}+\\frac{(171-168.61)^2}{168.61}\\\\&+\\frac{(198-145.76)^2}{145.76}+\\frac{(92-144.24)^2}{144.24}\\\\&= \\frac{2485.0}{163.85}+\\frac{2485.0}{162.15}+\\frac{5.7}{170.39}+\\frac{5.7}{168.61}+\\frac{2729.0}{145.76}+\\frac{2729.0}{144.24}\\\\&= 15.17+15.33+0.03+0.03+18.72+18.92\\end{align}$$\n", "\n", "which gives:\n", "\n", "$\\displaystyle x^2\\approx 68.2$" ] }, { "cell_type": "markdown", "id": "0d213b5d-18ca-4f66-a60e-8adb28e922a5", "metadata": {}, "source": [ "### Cutoff Value from Table\n", "\n", "To find ${\\chi^2}^*$ in the [class's $\\chi^2$ table](https://faculty.ung.edu/rsinn/3350/Table_ChiSquared.pdf), note that we have\n", "\n", "$$df = (r-1)(c-1)=2\\times 1=2$$\n", "\n", "where $r$ and $c$ are the numbers of rows and number of columns respectively in the observed and expected matrices. Both matrices should have identical shape. In the row where $df = 2$ and the column where $\\alpha = 0.05$, we find that:\n", "\n", "$${\\chi^2}^* = 5.991$$" ] }, { "cell_type": "markdown", "id": "cf32c537-5b01-462d-abe8-80aee3d72e08", "metadata": {}, "source": [ "### Reporting Out\n", "\n", "Since $\\chi^2 = 68.2 > 5.991 = {\\chi^2}^2$, we reject the null hypothesis. We thus have evidence for the alternative which indicates that the proportion of students who have chosen their major depends upon the year in school." ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 5 }