Outliers#

If outliers exist in a data set, we need to know how many outliers exist and whether they are above or below the mean. We need some data to work with:

pers <- read.csv('https://faculty.ung.edu/rsinn/data/personality.csv')
age <- pers$Age
thrill <- pers$Thrill

One Code Block to Rule Them All#

We provide examples with clarifications below.

cat('Outliers for the AGE data using the 2 standard deviation rule:\n   ',sum(age > mean(age) + 2 * sd(age)), 
    ' to the right, and','\n   ', sum(age < mean(age) - 2 * sd(age)) , ' to the left.\n\n')

Q1 <- quantile(age)[["25%"]] ; Q3 <- quantile(age)[["75%"]] ; IQR <- Q3 - Q1

cat('Outliers for the AGE data using the Box Plot method:\n   ',sum(age > Q3 + 1.5 * IQR), 
    ' to the right, and','\n   ', sum(age < Q1 - 1.5 * IQR) , ' to the left.')
Outliers for the AGE data using the 2 standard deviation rule:
    3  to the right, and 
    0  to the left.
Outliers for the AGE data using the Box Plot method:
    9  to the right, and 
    0  to the left.
cat('Outliers for the THRILL data using the 2 standard deviation rule:\n   ',sum(thrill > mean(thrill) + 2 * sd(thrill)), 
    ' to the right, and','\n   ', sum(thrill < mean(thrill) - 2 * sd(thrill)) , ' to the left.\n\n')

Q1 <- quantile(thrill)[["25%"]] ; Q3 <- quantile(thrill)[["75%"]] ; IQR <- Q3 - Q1

cat('Outliers for the THRILL data using the Box Plot method:\n   ',sum(thrill > Q3 + 1.5 * IQR), 
    ' to the right, and','\n   ', sum(thrill < Q1 - 1.5 * IQR) , ' to the left.')
Outliers for the THRILL data using the 2 standard deviation rule:
    0  to the right, and 
    1  to the left.
Outliers for the THRILL data using the Box Plot method:
    0  to the right, and 
    0  to the left.

Outliers by Mean and Standard Deviation#

To implement the 2 standard deviation rule for outliers, typically used for small data sets, we do the following.

cat('Outliers for the AGE data using the 2 standard deviation rule:\n   ',sum(age > mean(age) + 2 * sd(age)), 
    ' to the right, and','\n   ', sum(age < mean(age) - 2 * sd(age)) , ' to the left.')
Outliers for the AGE data using the 2 standard deviation rule:
    3  to the right, and 
    0  to the left.

Outliers Using 5-Number Summary#

We calculate the fences as follows:

  • Upper Fence = Q3 + 1.5 * IQR

  • Lower Fence = Q1 - 1.5 * IQR

To implement the Box Plot method for outliers, we do the following.

Q1 <- quantile(thrill)[["25%"]] ; Q3 <- quantile(thrill)[["75%"]] ; IQR <- Q3 - Q1
## Now that we've calculated the needed values, we have:
cat('Outliers for the AGE data using the Box Plot method:\n   ',sum(age > Q3 + 1.5 * IQR), 
    ' to the right, and','\n   ', sum(age < Q1 - 1.5 * IQR) , ' to the left.')
Outliers for the AGE data using the Box Plot method:
    2  to the right, and 
    0  to the left.