6.2 Accessing data frames
Individual vectors of a data frame can easily be accessed using the operator $
.
> f <- data.frame(word=c("one", "two", "three", "four","five","six"), number=1:6, bigger2=1:6>2)
> f$word
[1] "one" "two" "three" "four" "five" "six"
Elements of these vectors are then accessed using the []
operator, just as on
ordinary vectors.
> f <- data.frame(word=c("one", "two", "three", "four","five","six"), number=1:6, bigger2=1:6>2)
> f$bigger2[3]
[1] TRUE
> f$number[3]
[1] 3
> f$word[1:2]
[1] "one" "two"
As with matrices, the desired elements can be identified using both a row and a column index vector. Leaving one index vector blank returns the whole set.
> f <- data.frame(word=c("one", "two", "three", "four","five","six"), number=1:6, bigger2=1:6>2)
> f[2,]
word number bigger2
2 two 2 FALSE
> f[,2]
[1] 1 2 3 4 5 6
> f[4:5,-3]
word number
4 four 4
5 five 5
You can also subset data frames based on logical vectors. As with vectors, a useful function is which()
that returns the index of TRUE
element (that can be further used to identify the index of a row for example).
> f <- data.frame(word=c("one", "two", "three", "four","five","six"), number=1:6, bigger2=1:6>2)
> a<-which(f$bigger2)
> f[a,]
word number bigger2
3 three 3 TRUE
4 four 4 TRUE
5 five 5 TRUE
6 six 6 TRUE
Data frames can easily be sorted using the function order()
.
> d <- data.frame(a=c(12,11,20,1,1,5),b=c("lion","monkey","snake","elephant","cat","tiger"))
> order(d$a)
[1] 4 5 6 2 1 3
> d[order(d$a),]
a b
4 1 elephant
5 1 cat
6 5 tiger
2 11 monkey
1 12 lion
3 20 snake
> d[order(d$a,d$b),]
a b
5 1 cat
4 1 elephant
6 5 tiger
2 11 monkey
1 12 lion
3 20 snake
The function order()
returns the position/rank of the original value. In the example above, order(d$a)[1]
is 4, because the fourth element of a
has value 1 and as such the smallest value. order(d$a)[2]
is 5, because the fifth element of a
has again value 1 and so on. Although both values are 1, R still orders them, by default just by the position they appear in the vector. If we pass a second argument to order()
, it will use this second argument to break such ties. Therefore, order(d$a,d$b)[1]
is now 5, because the “c” of “cat” comes before the “e” of “elephant” in the alphabet, so R knows how to prioritize.
6.2.1 Exercises: Accessing data frames
See Section 18.0.15 for solutions.
Create a data frame with a column x from a vector 1:10 and a column y from a vector c(1,10,-2,3,8,-7,2,1,9,4). Access rows that are the same in the both columns of the data frames (using a logical vector).
Calculate the mean of the last five rows of the column y after you divide that column with a column x. You can use
length()
to assess the length of a vector.