Creating a Data Frame#

When we receive the data as a table, we have to create our own vectors and data frame. The example below shows an example where soda is tested.

Example: Soda#

A beverage manufacturer is testing 3 sweeteners for its zero-carbs soda. Thirty taste-testers were given a 5-question survey about bevverage quality after tasting the soda. Ten of the thirty tested each version of the new soda. The results are shown below where higher scores indicate a more pleasant overall taste.

ABC
13127
17819
19615
111614
201210
151416
181018
91811
12414
161111

Creating 3 Vectors#

First, let’s enter each column in the table above as a vector.

a <- c(13, 17, 19, 11, 20, 15, 18, 9, 12, 16)
b <- c(12, 8, 6, 16, 12, 14, 10, 18, 4, 11)
c <- c(7, 19, 15, 14, 10, 16, 18, 11, 14, 11)
data <- c(a,b,c)

The last line above combines the 3 vectors into a single numeric vector that will become the first column of our data frame.

Creating a Grouping Variable#

The next job is to create a single text vector that assigns each rating value to a sweetener group. We thus need 10 A’s, 10 B’s and 10 C’s in that order, and we must use quotation marks so R will understand they are group ID’s.

grp <- c('A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C')

Creating the Data Frame#

The data.frame() function is straightforward to use.

The factor() allows us to specify a vector should be treated as a category variable and used (in this case) as a group ID. We can even use numbers to specify the group assignments as the function factor() will read the values in this vector as group ID’s.

df = data.frame(y = data, group = factor(grp))
head(df, 11)
ygroup
13A
17A
19A
11A
20A
15A
18A
9A
12A
16A
12B

Completing the Example#

As can be seen above, the data frame we created called df does have the proper values identified with the proper groups. Below, we run the ANOVA on these data to verify the data frame has been loaded properly.

anova <- aov(y ~ grp, data = df)
summary(anova)
            Df Sum Sq Mean Sq F value Pr(>F)  
grp          2   77.4   38.70   2.515 0.0996 .
Residuals   27  415.4   15.39                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1