{
"cells": [
{
"cell_type": "markdown",
"id": "407fc574-7488-401a-94a7-58ed2b7ce5c3",
"metadata": {},
"source": [
"# Goodness of Fit (GOF)\n",
"\n",
"The $\\chi^2$ test can be used in a rather novel way:\n",
"\n",
"
We can test a probability model.\n",
"\n",
"The $\\chi^2$ GOF allows us to compare observed data to a given probability model. Perhaps the best example is that of eye color."
]
},
{
"cell_type": "markdown",
"id": "d9fb84ff-0aff-4905-b178-aa7026664f65",
"metadata": {},
"source": [
"## Example: Eye Color\n",
"\n",
"A recent release from the American Academy of Opthamology gives the proportion of various eye colors in the population of the United States.\n",
"\n",
"\n",
" | Blue | Green | Hazel | Light Brown | Dark Brown | \n",
"
\n",
" Proportion | 32% | 15% | 12% | 16% | 25% | \n",
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"id": "bf2ab01a-638a-4ee9-90af-c8b655287ad5",
"metadata": {},
"source": [
"At UNG, recent class surveys resulted in the following sample eye color distribution from students at the university which are shown the **obs** vector below:"
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "a3c5cdf3-16fb-451d-b276-19d1d1f2e684",
"metadata": {},
"outputs": [],
"source": [
"obs <- c(68,41,30,51,60)\n",
"prob <- c(0.32, 0.15, 0.12, 0.16, 0.25)"
]
},
{
"cell_type": "code",
"execution_count": 64,
"id": "187909a6-a26e-4c7d-bd52-ac80bd499a08",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"\tChi-squared test for given probabilities\n",
"\n",
"data: obs\n",
"X-squared = 5.2517, df = 4, p-value = 0.2624\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"chisq.test(obs, p = prob)"
]
},
{
"cell_type": "markdown",
"id": "30f09bb9-ef2a-4544-ad51-5dc8ce1f6947",
"metadata": {},
"source": [
"### Reporting Out\n",
"\n",
"Given that $p = 0.2624 > 0.05 = \\alpha$, we fail to reject the null. We have no evidence that the observed data on eye color from UNG students departs from the nationwide probability."
]
},
{
"cell_type": "markdown",
"id": "01a02295-0bbd-4f42-aa92-50ea82077014",
"metadata": {},
"source": [
"## Example: Using Tables and Formulas\n",
"\n",
"We have the observed data vector above. We need to calculate the expected vector which is based on probabilities.\n",
"\n",
"### Observed Data Vector and Expected Vector\n",
"\n",
"Starting with the observed data vector (shown above), we need to know the sample size which we can find with a summation of the vector **obs**:"
]
},
{
"cell_type": "code",
"execution_count": 83,
"id": "e413a4e7-1129-48c8-ac71-4a8a636c07cf",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"250"
],
"text/latex": [
"250"
],
"text/markdown": [
"250"
],
"text/plain": [
"[1] 250"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sum(obs)"
]
},
{
"cell_type": "markdown",
"id": "645db49d-dc44-44aa-affb-49cc0c4d4f39",
"metadata": {},
"source": [
"We compute the predicted value for the number of students expected to have each eye color by multiplying the probabilities from the model by the total sample size.\n",
"\n",
"- Blue: $32\\%$ of $250 = 80$\n",
"- Green: $15\\%$ of $250 = 37.5$\n",
"- Hazel: $12\\%$ of $250 = 30$\n",
"- Light Brown: $16\\%$ of $250 = 40$\n",
"- Dark Brown: $25\\%$ of $250 = 62.5$\n",
"\n",
"We can calcuate these values in R as shown below:"
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "d0b21c88-7805-42bd-9c9e-f3a23842b8fa",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 80
\n",
"\t- 37.5
\n",
"\t- 30
\n",
"\t- 40
\n",
"\t- 62.5
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 80\n",
"\\item 37.5\n",
"\\item 30\n",
"\\item 40\n",
"\\item 62.5\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 80\n",
"2. 37.5\n",
"3. 30\n",
"4. 40\n",
"5. 62.5\n",
"\n",
"\n"
],
"text/plain": [
"[1] 80.0 37.5 30.0 40.0 62.5"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"exp <- prob * 250\n",
"exp"
]
},
{
"cell_type": "markdown",
"id": "e0158825-22b2-4bea-9a00-0333445c1afd",
"metadata": {},
"source": [
"Gathering it all together, we have the following matrix:"
]
},
{
"cell_type": "code",
"execution_count": 87,
"id": "08a5f00d-da3e-40a0-afd8-bf553adccc4d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\tObserved | 68 | 41.0 | 30 | 51 | 60.0 |
\n",
"\tExpected | 80 | 37.5 | 30 | 40 | 62.5 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllll}\n",
"\tObserved & 68 & 41.0 & 30 & 51 & 60.0\\\\\n",
"\tExpected & 80 & 37.5 & 30 & 40 & 62.5\\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"| Observed | 68 | 41.0 | 30 | 51 | 60.0 |\n",
"| Expected | 80 | 37.5 | 30 | 40 | 62.5 |\n",
"\n"
],
"text/plain": [
" [,1] [,2] [,3] [,4] [,5]\n",
"Observed 68 41.0 30 51 60.0\n",
"Expected 80 37.5 30 40 62.5"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"tab = matrix(c(obs, exp), nrow = 2, byrow = TRUE)\n",
"rownames(tab) <- c('Observed', 'Expected')\n",
"tab"
]
},
{
"cell_type": "markdown",
"id": "0885792c-ab4b-4e57-978d-a9ba0cef6d56",
"metadata": {},
"source": [
"### Calculating the Test Statistic $\\chi^2$\n",
"\n",
"Referring to the [formula sheet](https://faculty.ung.edu/rsinn/3350/StatsFormulas.pdf) provides the following:\n",
"\n",
"$$\\chi^2 = \\sum \\frac{(O−E)^2}{E}$$"
]
},
{
"cell_type": "markdown",
"id": "552b2f9d-912a-4d61-99f8-356d3192b779",
"metadata": {},
"source": [
"We enter the data into the formula:\n",
"\n",
"$$\\begin{align}\\chi^2 &= \\frac{(68−80)^2}{80} + \\frac{(41−37.5)^2}{37.5} + \\frac{(30−30)^2}{30} + \\frac{(51−40)^2}{40} + \\frac{(60−62.5)^2}{62.5}\\\\&= \\frac{1.8}{80} + \\frac{12.25}{37.5} + 0 + \\frac{121}{40} + \\frac{6.25}{62.5}\\\\&\\approx 1.80 + 0.33 + 0.00 + 3.03 + 0.1\\\\&\\approx 5.25\\end{align}$$"
]
},
{
"cell_type": "markdown",
"id": "08c1ef30-6b52-4b71-a249-679091c55c6e",
"metadata": {},
"source": [
"### Finding ${\\chi^2}^*$ in the Table\n",
"\n",
"From the [class $\\chi^2$ table](https://faculty.ung.edu/rsinn/3350/Table_ChiSquared.pdf) using $df = \\text{number of probabilities} - 1 = 4$ and $\\alpha = 0.05$, we find that:\n",
"\n",
"$${\\chi^2}^* = 9.488$$"
]
},
{
"cell_type": "markdown",
"id": "05b890ef-66dd-4d1e-8f41-4046ea5a126d",
"metadata": {},
"source": [
"### Reporting Out\n",
"\n",
"Given that $\\chi^2 = 5.25 < 9.488 = {\\chi^2}^*$, we fail to reject the null. We have no evidence that the observed data on eye color from 250 UNG students departs from the nationwide probability."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}