Outliers#
If outliers exist in a data set, we need to know how many outliers exist and whether they are above or below the mean. We need some data to work with:
pers <- read.csv('https://faculty.ung.edu/rsinn/data/personality.csv')
age <- pers$Age
thrill <- pers$Thrill
One Code Block to Rule Them All#
We provide examples with clarifications below.
cat('Outliers for the AGE data using the 2 standard deviation rule:\n ',sum(age > mean(age) + 2 * sd(age)),
' to the right, and','\n ', sum(age < mean(age) - 2 * sd(age)) , ' to the left.\n\n')
Q1 <- quantile(age)[["25%"]] ; Q3 <- quantile(age)[["75%"]] ; IQR <- Q3 - Q1
cat('Outliers for the AGE data using the Box Plot method:\n ',sum(age > Q3 + 1.5 * IQR),
' to the right, and','\n ', sum(age < Q1 - 1.5 * IQR) , ' to the left.')
Outliers for the AGE data using the 2 standard deviation rule:
3 to the right, and
0 to the left.
Outliers for the AGE data using the Box Plot method:
9 to the right, and
0 to the left.
cat('Outliers for the THRILL data using the 2 standard deviation rule:\n ',sum(thrill > mean(thrill) + 2 * sd(thrill)),
' to the right, and','\n ', sum(thrill < mean(thrill) - 2 * sd(thrill)) , ' to the left.\n\n')
Q1 <- quantile(thrill)[["25%"]] ; Q3 <- quantile(thrill)[["75%"]] ; IQR <- Q3 - Q1
cat('Outliers for the THRILL data using the Box Plot method:\n ',sum(thrill > Q3 + 1.5 * IQR),
' to the right, and','\n ', sum(thrill < Q1 - 1.5 * IQR) , ' to the left.')
Outliers for the THRILL data using the 2 standard deviation rule:
0 to the right, and
1 to the left.
Outliers for the THRILL data using the Box Plot method:
0 to the right, and
0 to the left.
Outliers by Mean and Standard Deviation#
To implement the 2 standard deviation rule for outliers, typically used for small data sets, we do the following.
cat('Outliers for the AGE data using the 2 standard deviation rule:\n ',sum(age > mean(age) + 2 * sd(age)),
' to the right, and','\n ', sum(age < mean(age) - 2 * sd(age)) , ' to the left.')
Outliers for the AGE data using the 2 standard deviation rule:
3 to the right, and
0 to the left.
Outliers Using 5-Number Summary#
We calculate the fences as follows:
Upper Fence = Q3 + 1.5 * IQR
Lower Fence = Q1 - 1.5 * IQR
To implement the Box Plot method for outliers, we do the following.
Q1 <- quantile(thrill)[["25%"]] ; Q3 <- quantile(thrill)[["75%"]] ; IQR <- Q3 - Q1
## Now that we've calculated the needed values, we have:
cat('Outliers for the AGE data using the Box Plot method:\n ',sum(age > Q3 + 1.5 * IQR),
' to the right, and','\n ', sum(age < Q1 - 1.5 * IQR) , ' to the left.')
Outliers for the AGE data using the Box Plot method:
2 to the right, and
0 to the left.