Subsetting a Data Frame#

What if we wish to compare the biological sexes on the narcissism variable? Then, we need to create a subset of narcissism for both sexes, male and female. Working with the females first, we see the following:

Subsetting Females Only#

We create the new data frame females which is made up of rows from the pers data frame that have F in the biological sex column.

pers <- read.csv('https://faculty.ung.edu/rsinn/data/personality.csv')
females <- subset(pers, Sex == 'F')
head(females, 5)
AgeYrSexG21CorpsResGreekVarsAthHonorGPA...PerfOCDPlayExtroNarcHSAFHSSEHSAGHSSDPHS
220 3 F N N 2 Y N Y 3.95... 105 3 172 16 11 46 52 26 33 SE
427 3 F Y N 3 N N N 2.84... 90 9 160 16 10 51 51 23 19 SE
622 3 F Y N 2 Y N N 2.63... 114 20 133 10 9 40 27 31 28 AG
820 3 F N N 1 Y N N 3.30... 142 17 168 16 9 55 45 24 29 AF
922 2 F Y N 1 N N N 3.02... 119 16 141 10 9 52 47 32 26 SE

Subsetting Males Only#

We do exactly the same thing for males:

males <- subset(pers, Sex == 'M')
head(males, 5)
AgeYrSexG21CorpsResGreekVarsAthHonorGPA...PerfOCDPlayExtroNarcHSAFHSSEHSAGHSSDPHS
121 2 M Y Y 1 N N N 3.23... 105 10 142 8 11 41 40 26 27 SE
322 3 M Y N 2 N N N 3.06... 73 1 134 15 11 48 42 44 29 AG
524 3 M Y N 2 N N N 2.39... 95 5 166 14 10 56 46 27 20 AF
718 1 M N Y 1 N N N 3.17... 49 20 114 10 9 56 45 41 38 AG
1122 4 M Y N 2 N Y N 3.01... 104 11 143 10 9 51 54 46 21 AG

Subsetting a Subset#

The numeric variable aggressive humor is more often used by males than females and correlates negatively with age. To test this, we might wish to obtaina sample of young females, say younger than 20.

The quickest way to accomplish this is use the females data frame we already created by subsetting:

youngfemales <- subset(females, Age < 20)

We can extract the HSAG aggressive humor values as a numeric vector ready for use in statistical tests:

aghumor_fem <- youngfemales[,'HSAG']
head(aghumor_fem,8)
  1. 32
  2. 23
  3. 24
  4. 29
  5. 18
  6. 36
  7. 22
  8. 15

Double Subsetting in a Single Command#

We can use the AND operator & along with the OR operator | to conduct mulitple subsetting operations in a single call of the function subset().

aghumor_fem <- subset(pers, Sex == 'F' & Age < 20)

## Selecting relevant columns for ease of inspection
aghumor_fem <- aghumor_fem[,c('Age','Sex', 'HSAG')]

head(aghumor_fem, 8)
AgeSexHSAG
2519F 32
2819F 23
3019F 24
3219F 29
3418F 18
4119F 36
4319F 22
4419F 15

If we desire to compare young females and young males on the variable HSAG, the subsetting could utilize the OR operator which is a vertical line |. The savvy student will notice that easier ways exist to accomplish this specific task, but the example below illustrates

  • Three or more subsetting criteria may be utilized in one call of the functions subset(), and

  • The OR operator usage.

aghumor <- subset(pers, Sex == 'F' | Sex == 'M' & Age < 20)

## Selecting relevant columns for ease of inspection
aghumor <- aghumor[,c('Age','Sex', 'HSAG')]

head(aghumor, 8)
AgeSexHSAG
220F 26
427F 23
622F 31
718M 41
820F 24
922F 32
1020F 32
1220F 30