Descriptive Statistics#

The most basic type of statistics are those values that describe the data in a vector, statistics such as:

  1. The Standard Descriptives.

    • Mean

    • Standard Deviation

    • Sample Size

  2. The 5-Number Summary.

    • Min

    • Q1 or 25th percentile

    • Median

    • Q3 or 75th percentile

    • Max

  3. We will learn about how the cat() function can print text along with code.

    • Some formatting options exist with cat()

Standard Descriptives#

First, let’s import the personality data frame.

pers <- read.csv('https://faculty.ung.edu/rsinn/personality.csv')

For the column ‘Sleep’, let’s determine the standard descriptives:

s <- pers$Sleep

mean(s)
sd(s)
length(s)
6.24031007751938
2.13151645621158
129

The cat() Function#

From this unformatted output, we see that these students average about 6.24 hours of sleep per night with a standard deviation of 2.13. With the cat() function, we can print some labels along with the calculations.

cat('Standard Descriptive Statistics for SLEEP:   Mean = ',mean(s),'   Standard Deviation = ',
    sd(s),'Sample Size = ',length(s))
Standard Descriptive Statistics for SLEEP:   Mean =  6.24031    Standard Deviation =  2.131516 Sample Size =  129

Hard Returns#

The cat() function takes a real step forward once we learn that

\n

insets a hard return. This enables actual formatting of output into something readable that looks professional.

cat('Standard Descriptive Statistics for SLEEP:\n   Mean = ',mean(s),
    '\n   Standard Deviation = ', sd(s),
    '\n   Sample Size = ',length(s))
Standard Descriptive Statistics for SLEEP:
   Mean =  6.24031 
   Standard Deviation =  2.131516 
   Sample Size =  129

Rounded Values within the cat() Function#

We can even use the round() function inside that cat() function:

cat('Standard Descriptive Statistics for SLEEP:\n   Mean = ',round(mean(s),2),
    '\n   Standard Deviation = ', round(sd(s),2),
    '\n   Sample Size = ',length(s))
Standard Descriptive Statistics for SLEEP:
   Mean =  6.24 
   Standard Deviation =  2.13 
   Sample Size =  129

The 5-Number Summary#

The 5-Number Summary splits the data set into 4 equally-sized sections or quartiles. The values needed to do so are as follows:

  • Min

  • Q1

  • Median

  • Q3

  • Max

where Q1 and Q3 represent the 25th and 75th percentiles respectively. The command summary() will produce these values as shown below.

summary(s)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.50    5.00    6.50    6.24    7.50   11.00 

Putting it All Together#

Let’s use all the techniques in this section to, in one command line, display the standard descriptives and the 5-Number Summary.

cat('The standard descriptives for SLEEP\n   Mean = ', round(mean(s),2),
    '\n   Standard Deviation = ', round(sd(s),2),
    '\n   Sample Size = ', length(s),
    '\n\nThe 5-number summary for SLEEEP')
summary(s)
The standard descriptives for SLEEP
   Mean =  6.24 
   Standard Deviation =  2.13 
   Sample Size =  129 

The 5-number summary for SLEEEP
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.50    5.00    6.50    6.24    7.50   11.00 

Notice that the summary() function is outside the cat() function. We did this because, inside the cat() function, the labels for the values get left off. Coding it this way allows a combination of formatting from the cat() function together with the nice styling of the summary() function.