Helpful Code#
The very basics of R code are demonstrated below:
Importing data from a URL
Calculating the Standard Descriptives
Calculating the 5-Number Summary
Using the cat() function to print text and code output together with some formatting options.
Importing Data from a URL#
Many data sets are available on the internet in CSV format. The read.csv() function is very useful:
A URL is its input.
From the URL, R downloads the CSV file.
From the CSV, R imports the file as a data frame.
A typical example is shown below:
pers <- read.csv('https://faculty.ung.edu/rsinn/data/personality.csv')
head(pers,3)
Age | Yr | Sex | G21 | Corps | Res | Greek | VarsAth | Honor | GPA | ... | Perf | OCD | Play | Extro | Narc | HSAF | HSSE | HSAG | HSSD | PHS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21 | 2 | M | Y | Y | 1 | N | N | N | 3.23 | ... | 105 | 10 | 142 | 8 | 11 | 41 | 40 | 26 | 27 | SE |
20 | 3 | F | N | N | 2 | Y | N | Y | 3.95 | ... | 105 | 3 | 172 | 16 | 11 | 46 | 52 | 26 | 33 | SE |
22 | 3 | M | Y | N | 2 | N | N | N | 3.06 | ... | 73 | 1 | 134 | 15 | 11 | 48 | 42 | 44 | 29 | AG |
To work with examples below, let’s grab the age column as a single vector of numeric data.
age <- pers$Age
Standard Descriptives#
The three most valuable statistics for nearly any data set are the mean, standard deviation and sample size. The functions we need are intuitively named:
mean()
sd()
length()
m <- mean(age) ; s <- sd(age) ; n <- length(age)
5-Number Summary#
The 5-Number Summary of a numeric vector includes the min, Q1, med, Q3, and max values where Q1 and Q3 are the 25th percentile and 75th percentile respectively.
summary(age)
Min. 1st Qu. Median Mean 3rd Qu. Max.
17.00 19.00 20.00 20.81 21.00 50.00
Cat() Function#
We use the cat() function to combine printed output with code output. We can format some text and the values found above:
cat('Standard descriptives for Age variable \nMean = ', m, '\nStd Dev = ', s, '\nSample Size = ',n)
Standard descriptives for Age variable
Mean = 20.81395
Std Dev = 3.639556
Sample Size = 129
Formatting with the Cat() Function#
We can round the mean and standard deviation for readability, and we can include all the necessary code with the cat() function itself.
Tip
Long Coding Lines Long lines of code can be seperated by hard returns. R ignores most returns and spaces. Be sure to indent the same amount of spaces for each continuation line.
cat('Standard descriptives for Age variable
\nMean = ', round(mean(age),2),
'\nStd Dev = ', round(sd(age),3),
'\nSample Size = ',n)
Standard descriptives for Age variable
Mean = 20.81
Std Dev = 3.64
Sample Size = 129
Finally, let’s also print out the 5-Number Summary below the standard descriptives. Please notice that the summary performs awkwardly inside the cat() function. Still, we can format our output nicely in spite of this. The code is shown below:
cat('Standard descriptives for Age variable
\nMean = ', round(mean(age),2),
'\nStd Dev = ', round(sd(age),3),
'\nSample Size = ',n,
'\n\nThe 5-Number Summary for Age variable\n\n')
summary(age)
Standard descriptives for Age variable
Mean = 20.81
Std Dev = 3.64
Sample Size = 129
The 5-Number Summary for Age variable
Min. 1st Qu. Median Mean 3rd Qu. Max.
17.00 19.00 20.00 20.81 21.00 50.00