6.2 Accessing data frames

Individual vectors of a data frame can easily be accessed using the operator $.

> f <- data.frame(word=c("one", "two", "three", "four","five","six"), number=1:6, bigger2=1:6>2)
> f$word
[1] "one"   "two"   "three" "four"  "five"  "six"  

Elements of these vectors are then accessed using the [] operator, just as on ordinary vectors.

> f <- data.frame(word=c("one", "two", "three", "four","five","six"), number=1:6, bigger2=1:6>2)
> f$bigger2[3]
[1] TRUE
> f$number[3]
[1] 3
> f$word[1:2]
[1] "one" "two"

As with matrices, the desired elements can be identified using both a row and a column index vector. Leaving one index vector blank returns the whole set.

> f <- data.frame(word=c("one", "two", "three", "four","five","six"), number=1:6, bigger2=1:6>2)
> f[2,]
  word number bigger2
2  two      2   FALSE
> f[,2]
[1] 1 2 3 4 5 6
> f[4:5,-3]
  word number
4 four      4
5 five      5

You can also subset data frames based on logical vectors. As with vectors, a useful function is which() that returns the index of TRUE element (that can be further used to identify the index of a row for example).

> f <- data.frame(word=c("one", "two", "three", "four","five","six"), number=1:6, bigger2=1:6>2)
> a<-which(f$bigger2)
> f[a,]
   word number bigger2
3 three      3    TRUE
4  four      4    TRUE
5  five      5    TRUE
6   six      6    TRUE

Data frames can easily be sorted using the function order().

> d <- data.frame(a=c(12,11,20,1,1,5),b=c("lion","monkey","snake","elephant","cat","tiger"))
> order(d$a)
[1] 4 5 6 2 1 3
> d[order(d$a),]
   a        b
4  1 elephant
5  1      cat
6  5    tiger
2 11   monkey
1 12     lion
3 20    snake
> d[order(d$a,d$b),]
   a        b
5  1      cat
4  1 elephant
6  5    tiger
2 11   monkey
1 12     lion
3 20    snake

The function order() returns the position/rank of the original value. In the example above, order(d$a)[1] is 4, because the fourth element of a has value 1 and as such the smallest value. order(d$a)[2] is 5, because the fifth element of a has again value 1 and so on. Although both values are 1, R still orders them, by default just by the position they appear in the vector. If we pass a second argument to order(), it will use this second argument to break such ties. Therefore, order(d$a,d$b)[1] is now 5, because the “c” of “cat” comes before the “e” of “elephant” in the alphabet, so R knows how to prioritize.

6.2.1 Exercises: Accessing data frames

See Section 18.0.15 for solutions.

  1. Create a data frame with a column x from a vector 1:10 and a column y from a vector c(1,10,-2,3,8,-7,2,1,9,4). Access rows that are the same in the both columns of the data frames (using a logical vector).

  2. Calculate the mean of the last five rows of the column y after you divide that column with a column x. You can use length() to assess the length of a vector.