30. Homework 10¶
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
from scipy import stats
An undergraduate statistics research project several years ago studied humor styles and personality. The data set from that study is called personality.csv
. We will pull different subsets of that data frame for the work below. Here’s a description of the different variables you will see.
- Sex
M/F response to question about biological sex
- G21
Y/N response to “are you 21 years old or older?”
- Greek
Y/N response to “are you involved in a social Greek fraternity or sorority?”
- AccDate
Y/N response to question: “At a time in your life when you are not involved with anyone, someone asks you out. This person has a great personality, but you do not find them physically attractive. Do you accept the date?”
- SitClass
Front/middle/back response to “where do you prefer to sit in class?”
- Friends
Same/opposite/either response to “which sex do you find it easiest to make friends with.”
- Stress1, Stress2
Pre-post measure of stress in the 2nd week (Stress1) and 7th week (Stress2) of the semester.
- TxRel
Toxic relationships beliefs, higher scores indicate more toxicity.
- Opt
Optimism, higher scores indicate more optimism.
- SE
Self-esteem, higher score indicate higher levels of self-esteem.
- Neuro
Neuroticism, higher scores indicate higher levels of neuroticism
- Perf
Perfectionism, higher scores indicate higher levels of perfectionism.
- Narc
Narcissism, higher scores indicate higher levels of narcissism.
You will likely recognize several data sets that we used in class examples and labs, too.
Data for examples¶
neuroanx = Table.read_table('http://faculty.ung.edu/rsinn/neuroanx.csv')
perfnarc = Table.read_table('http://faculty.ung.edu/rsinn/perfnarc.csv')
nba = Table.read_table('http://faculty.ung.edu/rsinn/nba_salaries.csv')
assault = Table.read_table('http://faculty.ung.edu/rsinn/crime_rates.csv').select(0,1,2,3,4,7)
Task 1¶
Using the perfnarc
table, conduct an exploratory data analysis of the Stress1
values. Be sure to find the mean, median, sample size and standard deviation, and to display a histogram of the variable.
perfnarc.show(5)
Sex | G21 | Greek | AccDate | Stress1 | Stress2 | Perf | Narc |
---|---|---|---|---|---|---|---|
F | N | N | N | 9 | 7 | 99 | 3 |
F | Y | N | Y | 11 | 13 | 86 | 2 |
F | N | Y | N | 15 | 14 | 118 | 4 |
F | N | N | Y | 16 | 15 | 113 | 2 |
F | Y | N | Y | 17 | 17 | 107 | 8 |
... (143 rows omitted)
Remember, you may use the descriptive statistics tools from notebook 26.
Task 2¶
Using data from the perfnarc
table, conduct an A/B test on Stress1
values using the grouping variable Greek
. The research question is whether students involved in Greek life would be more stressed during the 2nd week of the semester. Many social Greek organizations have meetings, socials and philanthropy events early in the semester, so perhaps they experience higher levels of stess.
perfnarc.show(5)
Sex | G21 | Greek | AccDate | Stress1 | Stress2 | Perf | Narc |
---|---|---|---|---|---|---|---|
F | N | N | N | 9 | 7 | 99 | 3 |
F | Y | N | Y | 11 | 13 | 86 | 2 |
F | N | Y | N | 15 | 14 | 118 | 4 |
F | N | N | Y | 16 | 15 | 113 | 2 |
F | Y | N | Y | 17 | 17 | 107 | 8 |
... (143 rows omitted)
Be sure to include your null hypothesis and a for
loop that simulates the null hypothesis test statistic. After displaying the simulated distrubtion and calculating your \(p\)-value, write a sentence or two about the real world conclusions you can draw based on your investigation.
Remember, you may use the A/B testing tools from notebook 20 and notebook 21.
Task 3¶
Using the nba_salary
table, conduct an A/B test to determine if power forwards (PF) are paid more than shooting guards (SG).
nba.show(5)
PLAYER | POSITION | TEAM | '15-'16 SALARY |
---|---|---|---|
Paul Millsap | PF | Atlanta Hawks | 18.6717 |
Al Horford | C | Atlanta Hawks | 12 |
Tiago Splitter | C | Atlanta Hawks | 9.75625 |
Jeff Teague | PG | Atlanta Hawks | 8 |
Kyle Korver | SG | Atlanta Hawks | 5.74648 |
... (412 rows omitted)
Be sure to include your null hypothesis and a for
loop that simulates the null hypothesis test statistic. After displaying the simulated distrubtion and calculating your \(p\)-value, write a sentence or two about the real world conclusions you can draw based on your investigation.
Remember, you may use the tools from notebook 20 and notebook 21.
Task 4¶
Using the violent crime data set called crime_rates
, conduct an exploratory data analysis as well as a bootstrapping confidence interval estimate of the mean Aggravated Assault Rate
in Georgia between 1960 and 1990.
assault.show(5)
State | Year | Population | Violent Crime Rate | Murder Rate | Aggraveted Assault Rate |
---|---|---|---|---|---|
Alaska | 1960 | 226167 | 104.3 | 10.2 | 45.1 |
Alaska | 1961 | 234000 | 88.9 | 11.5 | 51.7 |
Alaska | 1962 | 246000 | 91.5 | 4.5 | 54.5 |
Alaska | 1963 | 248000 | 109.7 | 6.5 | 66.1 |
Alaska | 1964 | 250000 | 150 | 10.4 | 96 |
... (2195 rows omitted)
Discuss your findings. Remember, you may use the bootstrapping tools from notebook 24 and notebook 25 along with the descriptive statistics tools in notebook 26
Task 5¶
Using the violent crime data set called crime_rates
, conduct comparison of the Aggravated Assaults in Georgia and Alabama from 1960 to 1990. Use a 95% confidence interval for both means using a bootstrap confidence interval with resample size of 30 and 1,000 repetitions of your for
loop.
With the null hypothesis that the aggravated assault rate distribution will be the same in both GA and AL, we can conduct a hypothesis test. If the confidence intervals do not overlap, we have evidence of a difference in aggravated assault rates between these two states. If the confidence intervals do overlap, there is no evidence for a difference in means.
assault.show(5)
State | Year | Population | Violent Crime Rate | Murder Rate | Aggraveted Assault Rate |
---|---|---|---|---|---|
Alaska | 1960 | 226167 | 104.3 | 10.2 | 45.1 |
Alaska | 1961 | 234000 | 88.9 | 11.5 | 51.7 |
Alaska | 1962 | 246000 | 91.5 | 4.5 | 54.5 |
Alaska | 1963 | 248000 | 109.7 | 6.5 | 66.1 |
Alaska | 1964 | 250000 | 150 | 10.4 | 96 |
... (2195 rows omitted)
Compare and contrast your two bootstrap distributions, and write a sentence or two about the real world conclusions you can draw based on your investigation.
Remember, you may use the tools from notebook 24 and notebook 25.
Task 6¶
Using the personality
data set, test for a significant correlation between Anxiety and Optimism. Be sure to to display descriptive statistics for regression, a scatter plot, and a simulation of the null hypothesis test statistic calculated using a for
loop with 2,000 to 5,000 repititions.
neuroanx.show(5)
Sex | G21 | SitClass | Friends | TxRel | Anx | Opt | SE | Neuro |
---|---|---|---|---|---|---|---|---|
M | N | F | O | 26 | 23 | 20 | 70 | 10 |
F | N | M | S | 21 | 24 | 22 | 68 | 11 |
M | Y | F | E | 25 | 27 | 29 | 65 | 11 |
M | Y | B | E | 22 | 30 | 28 | 61 | 15 |
M | N | M | E | 23 | 40 | 26 | 64 | 16 |
... (137 rows omitted)
Be sure to include your null hypothesis and a for
loop that simulates the null hypothesis test statistic. After displaying the simulated distrubtion and calculating your \(p\)-value, write a sentence or two about the real world conclusions you can draw based on your investigation.
Remember, you may use the correlation and regression tools from notebook 26 and notebook 27
Task 7¶
Conduct an investigation that interests you using one of the included data sets. Describe your null hypothesis and how you plan to test it. Have fun. Be creative. Yet try to keep your ideas very similar to one of the example investigations shown. You should include a hypothesis test and discussion of your calculated \(p\)-value. An exploratory data analysis of a single numeric variable is not sufficient for this task but would be an excellent additional component to one of the hypothesis tests.