Helpful Code#
When using R, several tasks are used quite often without a detailed explanation given for each use. We therefore have created a reference page for the most common uses.
Extracting Columns from Data Frames as Vectors#
Generally, a column in a data frame contains the values for a specific variable. Thus, we often wish to extract a column from a data frame as a vector of values so that we can work with it.
pers <- read.csv('https://faculty.ung.edu/rsinn/data/personality.csv')
Option 1: $#
To extract the perfectionism scores column of data using the dollar sign method, we proceed as follows:
perfect <- pers$Perf
head(perfect, 5)
- 105
- 105
- 73
- 90
- 95
Option 2: [Row, Column] Format#
To extract the perfectionism scores column of data using the Rows and columns of the data frame, we leave the row indicator empty and specific a column as shown:
perfect2 <- pers[ , 'Perf']
head(perfect2, 5)
- 105
- 105
- 73
- 90
- 95
The column may described either by number (shown below) or by name (as shown above). The perfectionism scores are stored in column #27.
perfect3 <- pers[ , 27]
head(perfect3, 5)
- 105
- 105
- 73
- 90
- 95
All methods shown work properly and, as one can see, display identical results.
Subsetting a Data Frame#
What if we wish to compare the biological sexes on the narcissism variable? Then, we need to create a subset of narcissism for both sexes, male and female. Working with the females first, we see the following:
females <- subset(pers, Sex == 'F')
head(females, 5)
Age | Yr | Sex | G21 | Corps | Res | Greek | VarsAth | Honor | GPA | ... | Perf | OCD | Play | Extro | Narc | HSAF | HSSE | HSAG | HSSD | PHS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 20 | 3 | F | N | N | 2 | Y | N | Y | 3.95 | ... | 105 | 3 | 172 | 16 | 11 | 46 | 52 | 26 | 33 | SE |
4 | 27 | 3 | F | Y | N | 3 | N | N | N | 2.84 | ... | 90 | 9 | 160 | 16 | 10 | 51 | 51 | 23 | 19 | SE |
6 | 22 | 3 | F | Y | N | 2 | Y | N | N | 2.63 | ... | 114 | 20 | 133 | 10 | 9 | 40 | 27 | 31 | 28 | AG |
8 | 20 | 3 | F | N | N | 1 | Y | N | N | 3.30 | ... | 142 | 17 | 168 | 16 | 9 | 55 | 45 | 24 | 29 | AF |
9 | 22 | 2 | F | Y | N | 1 | N | N | N | 3.02 | ... | 119 | 16 | 141 | 10 | 9 | 52 | 47 | 32 | 26 | SE |
Grid of Graphics#
We often wish to show 2 or more graphical displays for a specific data set while minimizing the space required to do so. We will use two functions to assist us:
layout()
matrix()
Warning
We use the option lcm() to specify the height of the graphical display in centimeters. Values between 5 and 12 generally work well, and some guesswork is typically required.
We create a plot called plt because it’s made up of 2 different graphical pieces: the qqnorm plot and qqline superimposed on top. Since we wish to display these 2 elements together, we surround them with { } and store them as the single graphical item plt.
For an example, let’s display a histogram and a QQ plot for the naricissism variable of the personality data frame:
data <- pers[ , 'Narc']
layout(matrix(c(1,2), ncol = 2), lcm(8))
hist(data)
plt <- { qqnorm(data, main = 'QQ Plot: Narcissism') ; qqline(data) }
