Subsetting a Data Frame#
What if we wish to compare the biological sexes on the narcissism variable? Then, we need to create a subset of narcissism for both sexes, male and female. Working with the females first, we see the following:
Subsetting Females Only#
We create the new data frame females which is made up of rows from the pers data frame that have F in the biological sex column.
pers <- read.csv('https://faculty.ung.edu/rsinn/data/personality.csv')
females <- subset(pers, Sex == 'F')
head(females, 5)
Age | Yr | Sex | G21 | Corps | Res | Greek | VarsAth | Honor | GPA | ... | Perf | OCD | Play | Extro | Narc | HSAF | HSSE | HSAG | HSSD | PHS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 20 | 3 | F | N | N | 2 | Y | N | Y | 3.95 | ... | 105 | 3 | 172 | 16 | 11 | 46 | 52 | 26 | 33 | SE |
4 | 27 | 3 | F | Y | N | 3 | N | N | N | 2.84 | ... | 90 | 9 | 160 | 16 | 10 | 51 | 51 | 23 | 19 | SE |
6 | 22 | 3 | F | Y | N | 2 | Y | N | N | 2.63 | ... | 114 | 20 | 133 | 10 | 9 | 40 | 27 | 31 | 28 | AG |
8 | 20 | 3 | F | N | N | 1 | Y | N | N | 3.30 | ... | 142 | 17 | 168 | 16 | 9 | 55 | 45 | 24 | 29 | AF |
9 | 22 | 2 | F | Y | N | 1 | N | N | N | 3.02 | ... | 119 | 16 | 141 | 10 | 9 | 52 | 47 | 32 | 26 | SE |
Subsetting Males Only#
We do exactly the same thing for males:
males <- subset(pers, Sex == 'M')
head(males, 5)
Age | Yr | Sex | G21 | Corps | Res | Greek | VarsAth | Honor | GPA | ... | Perf | OCD | Play | Extro | Narc | HSAF | HSSE | HSAG | HSSD | PHS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 21 | 2 | M | Y | Y | 1 | N | N | N | 3.23 | ... | 105 | 10 | 142 | 8 | 11 | 41 | 40 | 26 | 27 | SE |
3 | 22 | 3 | M | Y | N | 2 | N | N | N | 3.06 | ... | 73 | 1 | 134 | 15 | 11 | 48 | 42 | 44 | 29 | AG |
5 | 24 | 3 | M | Y | N | 2 | N | N | N | 2.39 | ... | 95 | 5 | 166 | 14 | 10 | 56 | 46 | 27 | 20 | AF |
7 | 18 | 1 | M | N | Y | 1 | N | N | N | 3.17 | ... | 49 | 20 | 114 | 10 | 9 | 56 | 45 | 41 | 38 | AG |
11 | 22 | 4 | M | Y | N | 2 | N | Y | N | 3.01 | ... | 104 | 11 | 143 | 10 | 9 | 51 | 54 | 46 | 21 | AG |
Subsetting a Subset#
The numeric variable aggressive humor is more often used by males than females and correlates negatively with age. To test this, we might wish to obtaina sample of young females, say younger than 20.
The quickest way to accomplish this is use the females data frame we already created by subsetting:
youngfemales <- subset(females, Age < 20)
We can extract the HSAG aggressive humor values as a numeric vector ready for use in statistical tests:
aghumor_fem <- youngfemales[,'HSAG']
head(aghumor_fem,8)
- 32
- 23
- 24
- 29
- 18
- 36
- 22
- 15
Double Subsetting in a Single Command#
We can use the AND operator & along with the OR operator | to conduct mulitple subsetting operations in a single call of the function subset().
aghumor_fem <- subset(pers, Sex == 'F' & Age < 20)
## Selecting relevant columns for ease of inspection
aghumor_fem <- aghumor_fem[,c('Age','Sex', 'HSAG')]
head(aghumor_fem, 8)
Age | Sex | HSAG | |
---|---|---|---|
25 | 19 | F | 32 |
28 | 19 | F | 23 |
30 | 19 | F | 24 |
32 | 19 | F | 29 |
34 | 18 | F | 18 |
41 | 19 | F | 36 |
43 | 19 | F | 22 |
44 | 19 | F | 15 |
If we desire to compare young females and young males on the variable HSAG, the subsetting could utilize the OR operator which is a vertical line |. The savvy student will notice that easier ways exist to accomplish this specific task, but the example below illustrates
Three or more subsetting criteria may be utilized in one call of the functions subset(), and
The OR operator usage.
aghumor <- subset(pers, Sex == 'F' | Sex == 'M' & Age < 20)
## Selecting relevant columns for ease of inspection
aghumor <- aghumor[,c('Age','Sex', 'HSAG')]
head(aghumor, 8)
Age | Sex | HSAG | |
---|---|---|---|
2 | 20 | F | 26 |
4 | 27 | F | 23 |
6 | 22 | F | 31 |
7 | 18 | M | 41 |
8 | 20 | F | 24 |
9 | 22 | F | 32 |
10 | 20 | F | 32 |
12 | 20 | F | 30 |