Descriptive Statistics#
The most basic type of statistics are those values that describe the data in a vector, statistics such as:
The Standard Descriptives.
Mean
Standard Deviation
Sample Size
The 5-Number Summary.
Min
Q1 or 25th percentile
Median
Q3 or 75th percentile
Max
We will learn about how the cat() function can print text along with code.
Some formatting options exist with cat()
Standard Descriptives#
First, let’s import the personality data frame.
pers <- read.csv('https://faculty.ung.edu/rsinn/personality.csv')
For the column ‘Sleep’, let’s determine the standard descriptives:
s <- pers$Sleep
mean(s)
sd(s)
length(s)
The cat() Function#
From this unformatted output, we see that these students average about 6.24 hours of sleep per night with a standard deviation of 2.13. With the cat() function, we can print some labels along with the calculations.
cat('Standard Descriptive Statistics for SLEEP: Mean = ',mean(s),' Standard Deviation = ',
sd(s),'Sample Size = ',length(s))
Standard Descriptive Statistics for SLEEP: Mean = 6.24031 Standard Deviation = 2.131516 Sample Size = 129
Hard Returns#
The cat() function takes a real step forward once we learn that
insets a hard return. This enables actual formatting of output into something readable that looks professional.
cat('Standard Descriptive Statistics for SLEEP:\n Mean = ',mean(s),
'\n Standard Deviation = ', sd(s),
'\n Sample Size = ',length(s))
Standard Descriptive Statistics for SLEEP:
Mean = 6.24031
Standard Deviation = 2.131516
Sample Size = 129
Rounded Values within the cat() Function#
We can even use the round() function inside that cat() function:
cat('Standard Descriptive Statistics for SLEEP:\n Mean = ',round(mean(s),2),
'\n Standard Deviation = ', round(sd(s),2),
'\n Sample Size = ',length(s))
Standard Descriptive Statistics for SLEEP:
Mean = 6.24
Standard Deviation = 2.13
Sample Size = 129
The 5-Number Summary#
The 5-Number Summary splits the data set into 4 equally-sized sections or quartiles. The values needed to do so are as follows:
Min
Q1
Median
Q3
Max
where Q1 and Q3 represent the 25th and 75th percentiles respectively. The command summary() will produce these values as shown below.
summary(s)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.50 5.00 6.50 6.24 7.50 11.00
Putting it All Together#
Let’s use all the techniques in this section to, in one command line, display the standard descriptives and the 5-Number Summary.
cat('The standard descriptives for SLEEP\n Mean = ', round(mean(s),2),
'\n Standard Deviation = ', round(sd(s),2),
'\n Sample Size = ', length(s),
'\n\nThe 5-number summary for SLEEEP')
summary(s)
The standard descriptives for SLEEP
Mean = 6.24
Standard Deviation = 2.13
Sample Size = 129
The 5-number summary for SLEEEP
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.50 5.00 6.50 6.24 7.50 11.00
Notice that the summary() function is outside the cat() function. We did this because, inside the cat() function, the labels for the values get left off. Coding it this way allows a combination of formatting from the cat() function together with the nice styling of the summary() function.