3.6 Sorting, Shuffling and Sampling Vectors
The function sort() returns a sorted version of a vector:
> x <- c(-8, 5, 8, 2, 47, 10, -3, 2, 218, -27, 72, -73, 3)
> sort(x)
[1] -73 -27 -8 -3 2 2 3 5 8 10 47 72 218
> sort(x, decreasing=T)
[1] 218 72 47 10 8 5 3 2 2 -3 -8 -27 -73The rank of the elements can be obtained with the optional argument index.return, which returns an index vector indicating the correct order:
> x <- c(-8, 5, 8, 2, 47, 10, -3, 2, 218, -27, 72, -73, 3)
> sort(x, index.return=T)
$x
[1] -73 -27 -8 -3 2 2 3 5 8 10 47 72 218
$ix
[1] 12 10 1 7 4 8 13 2 3 6 5 11 9
> x[sort(x, index.return=T)$ix]
[1] -73 -27 -8 -3 2 2 3 5 8 10 47 72 218We can use this index vector to sort e.g. x (in this case, we get exactly the same as sort(x), see above). However, the index vector can be very useful if we want to sort a vector y according to another vector x:
> x <- c(-8, 5, 8, 2, 47, 10, -3, 2, 218, -27, 72, -73, 3)
> y <- c(7, 28, 49, 1, -28, 2, 49, 12, 49, 5, -1, 1, 2)
> y[sort(x, index.return=T)$ix]
[1] 1 5 7 49 1 12 2 28 49 2 -28 -1 49The opposite of sorting is shuffling, in which case the elements of a vector are, well, shuffled at random. This can be achieved using the function sample():
While we used the function sample() to shuffle a vector, the function can also be used what its name suggests: to sample elements from a vector. For this, sample() has the argument size that indicates the number of elements to choose. By default, this argument is equal to the length of the provided vector - but we can use it to generate sub-samples easily.
Obviously, we can not take more samples than elements provided, unless we sample with replacement.
> sample(c(10,20), size=10)
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
> sample(c(10,20), size=10, replace=TRUE)
[1] 10 10 20 10 10 10 20 10 10 20By default, each element has the same probability to be sampled. However, you may provide specific probabilities using the argument prob. These probabilities will automatically be normalized to so they sum to 1.
Sampling is often used to generate vectors of random indices, that is, to sample from a sequence of integers between 1 and the length of another vector. The function sample.int() makes that easier:
The first argument of sample.int() corresponds to the last possible value such that a call sample.int(17,3) is equivalent to a call sample(1:17, 3).
3.6.1 Exercises: Sorting and shuffling vectors
See Section 18.0.10 for solutions.
Create a vector
xwith elements 3, 4, -6, 2, -7, 2, -1, 0 and sort it in increasing and decreasing order.Take the same vector
xwith elements 3, 4, -6, 2, -7, 2, -1, 0, but this time only sort the first three elements in increasing order.Create a vector
vwith 98 evenly spaced elements between 8 and 37. Then fromv, sample 10 elements at random and store these 10 elements in a new vector calledV. Then, sortVin decreasing order.