{ "cells": [ { "cell_type": "markdown", "id": "407fc574-7488-401a-94a7-58ed2b7ce5c3", "metadata": {}, "source": [ "# Goodness of Fit (GOF)\n", "\n", "The $\\chi^2$ test can be used in a rather novel way:\n", "\n", "
We can test a probability model.

\n", "\n", "The $\\chi^2$ GOF allows us to compare observed data to a given probability model. Perhaps the best example is that of eye color." ] }, { "cell_type": "markdown", "id": "d9fb84ff-0aff-4905-b178-aa7026664f65", "metadata": {}, "source": [ "## Example: Eye Color\n", "\n", "A recent release from the American Academy of Opthamology gives the proportion of various eye colors in the population of the United States.\n", "\n", "\n", " \n", " \n", " \n", " \n", "
BlueGreenHazel Light BrownDark Brown
Proportion32%15%12%16%25%
" ] }, { "cell_type": "markdown", "id": "bf2ab01a-638a-4ee9-90af-c8b655287ad5", "metadata": {}, "source": [ "At UNG, recent class surveys resulted in the following sample eye color distribution from students at the university which are shown the **obs** vector below:" ] }, { "cell_type": "code", "execution_count": 63, "id": "a3c5cdf3-16fb-451d-b276-19d1d1f2e684", "metadata": {}, "outputs": [], "source": [ "obs <- c(68,41,30,51,60)\n", "prob <- c(0.32, 0.15, 0.12, 0.16, 0.25)" ] }, { "cell_type": "code", "execution_count": 64, "id": "187909a6-a26e-4c7d-bd52-ac80bd499a08", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "\tChi-squared test for given probabilities\n", "\n", "data: obs\n", "X-squared = 5.2517, df = 4, p-value = 0.2624\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "chisq.test(obs, p = prob)" ] }, { "cell_type": "markdown", "id": "30f09bb9-ef2a-4544-ad51-5dc8ce1f6947", "metadata": {}, "source": [ "### Reporting Out\n", "\n", "Given that $p = 0.2624 > 0.05 = \\alpha$, we fail to reject the null. We have no evidence that the observed data on eye color from UNG students departs from the nationwide probability." ] }, { "cell_type": "markdown", "id": "01a02295-0bbd-4f42-aa92-50ea82077014", "metadata": {}, "source": [ "## Example: Using Tables and Formulas\n", "\n", "We have the observed data vector above. We need to calculate the expected vector which is based on probabilities.\n", "\n", "### Observed Data Vector and Expected Vector\n", "\n", "Starting with the observed data vector (shown above), we need to know the sample size which we can find with a summation of the vector **obs**:" ] }, { "cell_type": "code", "execution_count": 83, "id": "e413a4e7-1129-48c8-ac71-4a8a636c07cf", "metadata": {}, "outputs": [ { "data": { "text/html": [ "250" ], "text/latex": [ "250" ], "text/markdown": [ "250" ], "text/plain": [ "[1] 250" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sum(obs)" ] }, { "cell_type": "markdown", "id": "645db49d-dc44-44aa-affb-49cc0c4d4f39", "metadata": {}, "source": [ "We compute the predicted value for the number of students expected to have each eye color by multiplying the probabilities from the model by the total sample size.\n", "\n", "- Blue: $32\\%$ of $250 = 80$\n", "- Green: $15\\%$ of $250 = 37.5$\n", "- Hazel: $12\\%$ of $250 = 30$\n", "- Light Brown: $16\\%$ of $250 = 40$\n", "- Dark Brown: $25\\%$ of $250 = 62.5$\n", "\n", "We can calcuate these values in R as shown below:" ] }, { "cell_type": "code", "execution_count": 84, "id": "d0b21c88-7805-42bd-9c9e-f3a23842b8fa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
    \n", "\t
  1. 80
  2. \n", "\t
  3. 37.5
  4. \n", "\t
  5. 30
  6. \n", "\t
  7. 40
  8. \n", "\t
  9. 62.5
  10. \n", "
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 80\n", "\\item 37.5\n", "\\item 30\n", "\\item 40\n", "\\item 62.5\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 80\n", "2. 37.5\n", "3. 30\n", "4. 40\n", "5. 62.5\n", "\n", "\n" ], "text/plain": [ "[1] 80.0 37.5 30.0 40.0 62.5" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "exp <- prob * 250\n", "exp" ] }, { "cell_type": "markdown", "id": "e0158825-22b2-4bea-9a00-0333445c1afd", "metadata": {}, "source": [ "Gathering it all together, we have the following matrix:" ] }, { "cell_type": "code", "execution_count": 87, "id": "08a5f00d-da3e-40a0-afd8-bf553adccc4d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\t\n", "\t\n", "\n", "
Observed68 41.030 51 60.0
Expected80 37.530 40 62.5
\n" ], "text/latex": [ "\\begin{tabular}{r|lllll}\n", "\tObserved & 68 & 41.0 & 30 & 51 & 60.0\\\\\n", "\tExpected & 80 & 37.5 & 30 & 40 & 62.5\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "| Observed | 68 | 41.0 | 30 | 51 | 60.0 |\n", "| Expected | 80 | 37.5 | 30 | 40 | 62.5 |\n", "\n" ], "text/plain": [ " [,1] [,2] [,3] [,4] [,5]\n", "Observed 68 41.0 30 51 60.0\n", "Expected 80 37.5 30 40 62.5" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tab = matrix(c(obs, exp), nrow = 2, byrow = TRUE)\n", "rownames(tab) <- c('Observed', 'Expected')\n", "tab" ] }, { "cell_type": "markdown", "id": "0885792c-ab4b-4e57-978d-a9ba0cef6d56", "metadata": {}, "source": [ "### Calculating the Test Statistic $\\chi^2$\n", "\n", "Referring to the [formula sheet](https://faculty.ung.edu/rsinn/3350/StatsFormulas.pdf) provides the following:\n", "\n", "$$\\chi^2 = \\sum \\frac{(O−E)^2}{E}$$" ] }, { "cell_type": "markdown", "id": "552b2f9d-912a-4d61-99f8-356d3192b779", "metadata": {}, "source": [ "We enter the data into the formula:\n", "\n", "$$\\begin{align}\\chi^2 &= \\frac{(68−80)^2}{80} + \\frac{(41−37.5)^2}{37.5} + \\frac{(30−30)^2}{30} + \\frac{(51−40)^2}{40} + \\frac{(60−62.5)^2}{62.5}\\\\&= \\frac{1.8}{80} + \\frac{12.25}{37.5} + 0 + \\frac{121}{40} + \\frac{6.25}{62.5}\\\\&\\approx 1.80 + 0.33 + 0.00 + 3.03 + 0.1\\\\&\\approx 5.25\\end{align}$$" ] }, { "cell_type": "markdown", "id": "08c1ef30-6b52-4b71-a249-679091c55c6e", "metadata": {}, "source": [ "### Finding ${\\chi^2}^*$ in the Table\n", "\n", "From the [class $\\chi^2$ table](https://faculty.ung.edu/rsinn/3350/Table_ChiSquared.pdf) using $df = \\text{number of probabilities} - 1 = 4$ and $\\alpha = 0.05$, we find that:\n", "\n", "$${\\chi^2}^* = 9.488$$" ] }, { "cell_type": "markdown", "id": "05b890ef-66dd-4d1e-8f41-4046ea5a126d", "metadata": {}, "source": [ "### Reporting Out\n", "\n", "Given that $\\chi^2 = 5.25 < 9.488 = {\\chi^2}^*$, we fail to reject the null. We have no evidence that the observed data on eye color from 250 UNG students departs from the nationwide probability." ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 5 }