Goodness of Fit (GOF)#

The \(\chi^2\) test can be used in a rather novel way:

We can test a probability model.

The \(\chi^2\) GOF allows us to compare observed data to a given probability model. Perhaps the best example is that of eye color.

Example: Eye Color#

A recent release from the American Academy of Opthamology gives the proportion of various eye colors in the population of the United States.

BlueGreenHazel Light BrownDark Brown
Proportion32%15%12%16%25%

At UNG, recent class surveys resulted in the following sample eye color distribution from students at the university which are shown the obs vector below:

obs <- c(68,41,30,51,60)
prob <- c(0.32, 0.15, 0.12, 0.16, 0.25)
chisq.test(obs, p = prob)
	Chi-squared test for given probabilities

data:  obs
X-squared = 5.2517, df = 4, p-value = 0.2624

Reporting Out#

Given that \(p = 0.2624 > 0.05 = \alpha\), we fail to reject the null. We have no evidence that the observed data on eye color from UNG students departs from the nationwide probability.

Example: Using Tables and Formulas#

We have the observed data vector above. We need to calculate the expected vector which is based on probabilities.

Observed Data Vector and Expected Vector#

Starting with the observed data vector (shown above), we need to know the sample size which we can find with a summation of the vector obs:

sum(obs)
250

We compute the predicted value for the number of students expected to have each eye color by multiplying the probabilities from the model by the total sample size.

  • Blue: \(32\%\) of \(250 = 80\)

  • Green: \(15\%\) of \(250 = 37.5\)

  • Hazel: \(12\%\) of \(250 = 30\)

  • Light Brown: \(16\%\) of \(250 = 40\)

  • Dark Brown: \(25\%\) of \(250 = 62.5\)

We can calcuate these values in R as shown below:

exp <- prob * 250
exp
  1. 80
  2. 37.5
  3. 30
  4. 40
  5. 62.5

Gathering it all together, we have the following matrix:

tab = matrix(c(obs, exp), nrow = 2, byrow = TRUE)
rownames(tab) <- c('Observed', 'Expected')
tab
Observed68 41.030 51 60.0
Expected80 37.530 40 62.5

Calculating the Test Statistic \(\chi^2\)#

Referring to the formula sheet provides the following:

\[\chi^2 = \sum \frac{(O−E)^2}{E}\]

We enter the data into the formula:

\[\begin{split}\begin{align}\chi^2 &= \frac{(68−80)^2}{80} + \frac{(41−37.5)^2}{37.5} + \frac{(30−30)^2}{30} + \frac{(51−40)^2}{40} + \frac{(60−62.5)^2}{62.5}\\&= \frac{1.8}{80} + \frac{12.25}{37.5} + 0 + \frac{121}{40} + \frac{6.25}{62.5}\\&\approx 1.80 + 0.33 + 0.00 + 3.03 + 0.1\\&\approx 5.25\end{align}\end{split}\]

Finding \({\chi^2}^*\) in the Table#

From the class \(\chi^2\) table using \(df = \text{number of probabilities} - 1 = 4\) and \(\alpha = 0.05\), we find that:

\[{\chi^2}^* = 9.488\]

Reporting Out#

Given that \(\chi^2 = 5.25 < 9.488 = {\chi^2}^*\), we fail to reject the null. We have no evidence that the observed data on eye color from 250 UNG students departs from the nationwide probability.