Creating a Data Frame#
When we receive the data as a table, we have to create our own vectors and data frame. The example below shows an example where soda is tested.
Example: Soda#
A beverage manufacturer is testing 3 sweeteners for its zero-carbs soda. Thirty taste-testers were given a 5-question survey about bevverage quality after tasting the soda. Ten of the thirty tested each version of the new soda. The results are shown below where higher scores indicate a more pleasant overall taste.
A | B | C |
---|---|---|
13 | 12 | 7 |
17 | 8 | 19 |
19 | 6 | 15 |
11 | 16 | 14 |
20 | 12 | 10 |
15 | 14 | 16 |
18 | 10 | 18 |
9 | 18 | 11 |
12 | 4 | 14 |
16 | 11 | 11 |
Creating 3 Vectors#
First, let’s enter each column in the table above as a vector.
a <- c(13, 17, 19, 11, 20, 15, 18, 9, 12, 16)
b <- c(12, 8, 6, 16, 12, 14, 10, 18, 4, 11)
c <- c(7, 19, 15, 14, 10, 16, 18, 11, 14, 11)
data <- c(a,b,c)
The last line above combines the 3 vectors into a single numeric vector that will become the first column of our data frame.
Creating a Grouping Variable#
The next job is to create a single text vector that assigns each rating value to a sweetener group. We thus need 10 A’s, 10 B’s and 10 C’s in that order, and we must use quotation marks so R will understand they are group ID’s.
grp <- c('A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','C','C','C','C','C','C','C','C','C','C')
Creating the Data Frame#
The data.frame() function is straightforward to use.
The factor() allows us to specify a vector should be treated as a category variable and used (in this case) as a group ID. We can even use numbers to specify the group assignments as the function factor() will read the values in this vector as group ID’s.
df = data.frame(y = data, group = factor(grp))
head(df, 11)
y | group |
---|---|
13 | A |
17 | A |
19 | A |
11 | A |
20 | A |
15 | A |
18 | A |
9 | A |
12 | A |
16 | A |
12 | B |
Completing the Example#
As can be seen above, the data frame we created called df does have the proper values identified with the proper groups. Below, we run the ANOVA on these data to verify the data frame has been loaded properly.
anova <- aov(y ~ grp, data = df)
summary(anova)
Df Sum Sq Mean Sq F value Pr(>F)
grp 2 77.4 38.70 2.515 0.0996 .
Residuals 27 415.4 15.39
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1