Creating a Data Frame#
When we receive the data as a table, we have to create our own vectors and data frame. The example below shows an example where soda is tested.
Example: Soda#
A beverage manufacturer is testing 3 sweeteners for its zero-carbs soda. Thirty taste-testers were given a 5-question survey about bevverage quality after tasting the soda. Ten of the thirty tested each version of the new soda. The results are shown below where higher scores indicate a more pleasant overall taste.
A | B | C |
---|---|---|
13 | 12 | 7 |
17 | 8 | 19 |
19 | 6 | 15 |
11 | 16 | 14 |
20 | 12 | 10 |
15 | 14 | 16 |
18 | 10 | 18 |
9 | 18 | 11 |
12 | 4 | 14 |
16 | 11 | 11 |
Creating 3 Vectors#
First, let’s enter each column in the table above as a vector.
a <- c(13, 17, 19, 11, 20, 15, 18, 9, 12, 16)
b <- c(12, 8, 6, 16, 12, 14, 10, 18, 4, 11)
c <- c(7, 19, 15, 14, 10, 16, 18, 11, 14, 11)
data <- c(a,b,c)
The last line above combines the 3 vectors into a single numeric vector that will become the first column of our data frame.
Creating a Grouping Variable#
The next job is to create a single text vector that assigns each rating value to a sweetener group. We thus need 10 A’s, 10 B’s and 10 C’s in that order, and we must use quotation marks so R will understand they are group ID’s.
grp <- c('A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C')
Creating the Data Frame#
The data.frame() function is straightforward to use.
The factor() allows us to specify a vector should be treated as a category variable and used (in this case) as a group ID. We can even use numbers to specify the group assignments as the function factor() will read the values in this vector as group ID’s.
df = data.frame(y = data, group = factor(grp))
head(df, 11)
y | group |
---|---|
13 | A |
17 | A |
19 | A |
11 | A |
20 | A |
15 | A |
18 | A |
9 | A |
12 | A |
16 | A |
12 | B |
Completing the Example#
As can be seen above, the data frame we created called df does have the proper values identified with the proper groups. Below, we run the ANOVA on these data to verify the data frame has been loaded properly.
anova <- aov(y ~ grp, data = df)
summary(anova)
Df Sum Sq Mean Sq F value Pr(>F)
grp 2 77.4 38.70 2.515 0.0996 .
Residuals 27 415.4 15.39
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Example 2: Deck of Cards Data Frame#
Let’s create a deck of cards with values \(2,3,\cdots,14\) in four suits: S, H, D, C. Using the function rep(), we can create two columns for the eventual data frame:
values 2:14 repeated 4 times.
suits with each letter repeated 13 times.
values <- rep(2:14,4)
suits <- c(rep('S',13), rep('H',13), rep('D',13), rep('C',13))
We join the two columns created above into a data frame with a numeric variable (values) and a grouping variable (suits).
deck_df <- data.frame(v = values, suits = factor(suits))
head(deck_df,15)
v | suits |
---|---|
2 | S |
3 | S |
4 | S |
5 | S |
6 | S |
7 | S |
8 | S |
9 | S |
10 | S |
11 | S |
12 | S |
13 | S |
14 | S |
2 | H |
3 | H |
We have a deck of cards as a data frame, and we can draw cards with the sample() function or the sample.data.frame() function.
Draw from Deck using sample()#
Notice that we are accounting for which suit is drawn, not the values drawn.
x <- sample(deck_df$suits, 13)
sum( x == 'H' )
x
- C
- H
- S
- C
- D
- H
- S
- C
- S
- C
- S
- S
- S
Levels:
- 'C'
- 'D'
- 'H'
- 'S'
Draw from Deck using sample.data.frame()#
Show code cell source
sample.data.frame <- function(x, size, replace = FALSE, prob = NULL, groups=NULL,
orig.ids = TRUE, fixed = names(x), shuffled = c(),
invisibly.return = NULL, ...) {
if( missing(size) ) size = nrow(x)
if( is.null(invisibly.return) ) invisibly.return = size>50
shuffled <- intersect(shuffled, names(x))
fixed <- setdiff(intersect(fixed, names(x)), shuffled)
n <- nrow(x)
ids <- 1:n
groups <- eval( substitute(groups), x )
newids <- sample(n, size, replace=replace, prob=prob, ...)
origids <- ids[newids]
result <- x[newids, , drop=FALSE]
idsString <- as.character(origids)
for (column in shuffled) {
cids <- sample(newids, groups=groups[newids])
result[,column] <- x[cids,column]
idsString <- paste(idsString, ".", cids, sep="")
}
result <- result[ , union(fixed,shuffled), drop=FALSE]
if (orig.ids) result$orig.id <- idsString
if (invisibly.return) { return(invisible(result)) } else {return(result)}
}
Draw a 5-Card Poker Hand#
x <- sample.data.frame(deck_df, 5, orig.id = FALSE)
head(x,5)
v | suits | |
---|---|---|
29 | 4 | D |
23 | 11 | H |
13 | 14 | S |
10 | 11 | S |
25 | 13 | H |
Let’s count the number of Hearts in the hand.
sum( x == 'H')
Draw 13-Card Hand, Count Number of Hearts#
x <- sample.data.frame(deck_df, 13, orig.id = FALSE)
sum(x[,2] == 'H')
x[,2]
- S
- D
- D
- S
- D
- C
- C
- C
- H
- C
- H
- C
- C
Levels:
- 'C'
- 'D'
- 'H'
- 'S'