Data Frames#

R stores data in specialized tables called Data Frames. Many R commands are tailored to this particular format for our data. The key points of learning are as follows:

  1. What is a Data Frame?

    • Columns are variables.

    • Rows are individual research subjects.

    • Each column of a data frame is vector of data.

  2. Loading a Data Frame.

    • From the internet via URL

  3. Creating data vectors, often by typing in the contents thereof.

We will need to learn to import CSV files as data frames and to manipulate them in various ways.

Loading a Data Frame#

Data often get stored in CSV files on the internet. We use the

read.csv()

to read the data into R as shown below.

pers <- read.csv('https://faculty.ung.edu/rsinn/personality.csv')
head(pers, 5)
AgeYrSexG21CorpsResGreekVarsAthHonorGPA...PerfOCDPlayExtroNarcHSAFHSSEHSAGHSSDPHS
21 2 M Y Y 1 N N N 3.23... 105 10 142 8 11 41 40 26 27 SE
20 3 F N N 2 Y N Y 3.95... 105 3 172 16 11 46 52 26 33 SE
22 3 M Y N 2 N N N 3.06... 73 1 134 15 11 48 42 44 29 AG
27 3 F Y N 3 N N N 2.84... 90 9 160 16 10 51 51 23 19 SE
24 3 M Y N 2 N N N 2.39... 95 5 166 14 10 56 46 27 20 AF

To use the function read.csv() properly, please note the following:

  • We must use quotation marks as well as parentheses to enclose the URL where the CSV file lives.

  • Either single ‘quotes’ or double “quotes” will work provided the open quote and close quote symbols match one another.

Please notice the following attributes of this data frame:

  1. This data frame has many personality variables collected from UNG students circa 2012.

  2. Those variables are columns in the data frame such as Extroversion (Extro), Narcissism (Narc) and Perfectionism (Perf).

  3. Rows represent individuals. In this example, each row is for a unique student.

    • Individuals in a research setting may be mice, houses, cities, mosquitos, trees, plants or rocks.

    • Often, human subjects are needed in research settings.

Extracting Columns from a Data Frame#

Because the columns of a data frame are so vital, we often wish to extract a single column from a data frame. We can do so in at least 2 different ways, and these ways will both be used often in this course.

  1. The dollar sign method.

  2. The row-column method.

The Dollar Sign Method#

Above, we have named our data frame pers. Suppose we wish to extract the column with the Naricissism values from the data frame (column ‘Narc’). The format is

pers$Narc

where the dollar sign seperates the data frame name and the column title. The first seven entries of the Narcissism column are displayed by the code below.

narc = pers$Narc
head(narc,7)
  1. 11
  2. 11
  3. 11
  4. 10
  5. 10
  6. 9
  7. 9

The Row-Column Method#

Tip

The row-column method is quite flexible and can be used in a quite a few different commands and calculations. We will only describe its basic features here.

We use square brackets after the data frame name to indicate which rows or columns we are interested in. Below, to displays all rows but only column, we leave the entry before the comma blank.

narc2 <- pers[ , 'Narc']
head(narc2,7)
  1. 11
  2. 11
  3. 11
  4. 10
  5. 10
  6. 9
  7. 9

We can identify the column by title or by its index number. The index number version is shown below.

narc3 <- pers[ , 31]
head(narc3,7)
  1. 11
  2. 11
  3. 11
  4. 10
  5. 10
  6. 9
  7. 9

We can also extract multiple columns at once using the concatentate function c() as shown below.

narc4 <- pers[ , c('Narc','Sex')]
head(narc4,7)
NarcSex
11M
11F
11M
10F
10M
9F
9M