6.3 Manipulating data frames

When accessing, you can also change the elements of data frames in place.

> f <- data.frame(word=c("one", "two", "three", "four","five","six"), number=1:6, bigger2=1:6>2)
> f[2,1] <- "one"
> f$word[3] <- "anotherOne"
> f[2,2] <- 1
> print(f)
        word number bigger2
1        one      1   FALSE
2        one      1   FALSE
3 anotherOne      3    TRUE
4       four      4    TRUE
5       five      5    TRUE
6        six      6    TRUE

Notably, you might want to change a class of a column. You can check the class of a column in a data frame with class(). The same function can be called to assess a class of any R element.

> class(f$word)
[1] "character"
> class(f$number)
[1] "numeric"

R allows us to convert classes with the functions like as.character(), as.numeric(), as.factor() and as.logical(). For example, we can change a character vector to factors with the function as.factor() (note that if you use a R version older than 4.0, character vectors inside a data frame are converted to factors by default). In similar spirit, we can change a numeric vector to a character vector with as.character() - and so on!

> f$word <- as.factor(f$word)
> class(f$word)
[1] "factor"
> f$number <- as.character(f$number)
> class(f$number)
[1] "character"

If you want to see the structure and the classes of all columns at once, you can use the function str().

> str(f)
'data.frame':   6 obs. of  3 variables:
 $ word   : Factor w/ 5 levels "anotherOne","five",..: 4 4 1 3 2 5
 $ number : chr  "1" "1" "3" "4" ...
 $ bigger2: logi  FALSE FALSE TRUE TRUE TRUE TRUE

6.3.1 Exercises: Manipulating data frames

See Section 18.0.16 for solutions.

  1. Create a data frame data containing a vector id with the labels X1, X2,. . ., X10 (using paste), followed by two vectors x and y both containing 10 numbers sampled from [-10,10] with replacement. Replace all elements of y with y[i] squared.

  2. Add a logical column ok to data that is TRUE for all rows with y>20 and FALSEotherwise. Create a data frame other that contains both id and y, but only for those rows with x>3 and ok==TRUE