Creating a Data Frame#

When we receive the data as a table, we have to create our own vectors and data frame. The example below shows an example where soda is tested.

Example: Soda#

A beverage manufacturer is testing 3 sweeteners for its zero-carbs soda. Thirty taste-testers were given a 5-question survey about bevverage quality after tasting the soda. Ten of the thirty tested each version of the new soda. The results are shown below where higher scores indicate a more pleasant overall taste.

ABC
13127
17819
19615
111614
201210
151416
181018
91811
12414
161111

Creating 3 Vectors#

First, let’s enter each column in the table above as a vector.

a <- c(13, 17, 19, 11, 20, 15, 18, 9, 12, 16)
b <- c(12, 8, 6, 16, 12, 14, 10, 18, 4, 11)
c <- c(7, 19, 15, 14, 10, 16, 18, 11, 14, 11)
data <- c(a,b,c)

The last line above combines the 3 vectors into a single numeric vector that will become the first column of our data frame.

Creating a Grouping Variable#

The next job is to create a single text vector that assigns each rating value to a sweetener group. We thus need 10 A’s, 10 B’s and 10 C’s in that order, and we must use quotation marks so R will understand they are group ID’s.

grp <- c('A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C')

Creating the Data Frame#

The data.frame() function is straightforward to use.

The factor() allows us to specify a vector should be treated as a category variable and used (in this case) as a group ID. We can even use numbers to specify the group assignments as the function factor() will read the values in this vector as group ID’s.

df = data.frame(y = data, group = factor(grp))
head(df, 11)
ygroup
13A
17A
19A
11A
20A
15A
18A
9A
12A
16A
12B

Completing the Example#

As can be seen above, the data frame we created called df does have the proper values identified with the proper groups. Below, we run the ANOVA on these data to verify the data frame has been loaded properly.

anova <- aov(y ~ grp, data = df)
summary(anova)
            Df Sum Sq Mean Sq F value Pr(>F)  
grp          2   77.4   38.70   2.515 0.0996 .
Residuals   27  415.4   15.39                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 2: Deck of Cards Data Frame#

Let’s create a deck of cards with values \(2,3,\cdots,14\) in four suits: S, H, D, C. Using the function rep(), we can create two columns for the eventual data frame:

  1. values 2:14 repeated 4 times.

  2. suits with each letter repeated 13 times.

values <- rep(2:14,4)
suits <- c(rep('S',13), rep('H',13), rep('D',13), rep('C',13))

We join the two columns created above into a data frame with a numeric variable (values) and a grouping variable (suits).

deck_df <- data.frame(v = values, suits = factor(suits))
head(deck_df,15)
vsuits
2S
3S
4S
5S
6S
7S
8S
9S
10S
11S
12S
13S
14S
2H
3H

We have a deck of cards as a data frame, and we can draw cards with the sample() function or the sample.data.frame() function.

Draw from Deck using sample()#

Notice that we are accounting for which suit is drawn, not the values drawn.

x <- sample(deck_df$suits, 13)
sum( x  == 'H' )
x
2
  1. C
  2. H
  3. S
  4. C
  5. D
  6. H
  7. S
  8. C
  9. S
  10. C
  11. S
  12. S
  13. S
Levels:
  1. 'C'
  2. 'D'
  3. 'H'
  4. 'S'

Draw from Deck using sample.data.frame()#

Hide code cell source
sample.data.frame <- function(x, size, replace = FALSE, prob = NULL, groups=NULL, 
                              orig.ids = TRUE, fixed = names(x), shuffled = c(),
                              invisibly.return = NULL, ...) {
  if( missing(size) ) size = nrow(x)
  if( is.null(invisibly.return) ) invisibly.return = size>50 
  shuffled <- intersect(shuffled, names(x))
  fixed <- setdiff(intersect(fixed, names(x)), shuffled)
  n <- nrow(x)
  ids <- 1:n
  groups <- eval( substitute(groups), x )
  newids <- sample(n, size, replace=replace, prob=prob, ...)
  origids <- ids[newids]
  result <- x[newids, , drop=FALSE]
  
  idsString <- as.character(origids)
  
  for (column in shuffled) {
    cids <- sample(newids, groups=groups[newids])
    result[,column] <- x[cids,column]
    idsString <- paste(idsString, ".", cids, sep="")
  }
  
  result <-  result[ , union(fixed,shuffled), drop=FALSE]
  if (orig.ids) result$orig.id <- idsString
  
  
  if (invisibly.return) { return(invisible(result)) } else {return(result)}
}

Draw a 5-Card Poker Hand#

x <- sample.data.frame(deck_df, 5, orig.id = FALSE)
head(x,5)
vsuits
29 4D
2311H
1314S
1011S
2513H

Let’s count the number of Hearts in the hand.

sum( x == 'H')
2

Draw 13-Card Hand, Count Number of Hearts#

x <- sample.data.frame(deck_df, 13, orig.id = FALSE)
sum(x[,2] == 'H')
x[,2]
2
  1. S
  2. D
  3. D
  4. S
  5. D
  6. C
  7. C
  8. C
  9. H
  10. C
  11. H
  12. C
  13. C
Levels:
  1. 'C'
  2. 'D'
  3. 'H'
  4. 'S'