3.6 Sorting, Shuffling and Sampling Vectors

The function sort() returns a sorted version of a vector:

> x <- c(-8, 5, 8, 2, 47, 10, -3, 2, 218, -27, 72, -73, 3)
> sort(x)
 [1] -73 -27  -8  -3   2   2   3   5   8  10  47  72 218
> sort(x, decreasing=T)
 [1] 218  72  47  10   8   5   3   2   2  -3  -8 -27 -73

The rank of the elements can be obtained with the optional argument index.return, which returns an index vector indicating the correct order:

> x <- c(-8, 5, 8, 2, 47, 10, -3, 2, 218, -27, 72, -73, 3)
> sort(x, index.return=T)
$x
 [1] -73 -27  -8  -3   2   2   3   5   8  10  47  72 218

$ix
 [1] 12 10  1  7  4  8 13  2  3  6  5 11  9
> x[sort(x, index.return=T)$ix]
 [1] -73 -27  -8  -3   2   2   3   5   8  10  47  72 218

We can use this index vector to sort e.g. x (in this case, we get exactly the same as sort(x), see above). However, the index vector can be very useful if we want to sort a vector y according to another vector x:

> x <- c(-8, 5, 8, 2, 47, 10, -3, 2, 218, -27, 72, -73, 3)
> y <- c(7, 28, 49, 1, -28, 2, 49, 12, 49, 5, -1, 1, 2)
> y[sort(x, index.return=T)$ix]
 [1]   1   5   7  49   1  12   2  28  49   2 -28  -1  49

The opposite of sorting is shuffling, in which case the elements of a vector are, well, shuffled at random. This can be achieved using the function sample():

> x <- 1:10
> sample(x)
 [1]  1  3  6  4  2  5  7 10  9  8

While we used the function sample() to shuffle a vector, the function can also be used what its name suggests: to sample elements from a vector. For this, sample() has the argument size that indicates the number of elements to choose. By default, this argument is equal to the length of the provided vector - but we can use it to generate sub-samples easily.

> sample(-5:5, size=3)
[1] -4  2 -3

Obviously, we can not take more samples than elements provided, unless we sample with replacement.

> sample(c(10,20), size=10)
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
> sample(c(10,20), size=10, replace=TRUE)
 [1] 10 10 20 10 10 10 20 10 10 20

By default, each element has the same probability to be sampled. However, you may provide specific probabilities using the argument prob. These probabilities will automatically be normalized to so they sum to 1.

> sample(1:2, prob=c(0.01, 0.99))
[1] 2 1
> sample(1:2, prob=c(100,1))
[1] 1 2

Sampling is often used to generate vectors of random indices, that is, to sample from a sequence of integers between 1 and the length of another vector. The function sample.int() makes that easier:

> sample.int(20, 4)
[1]  3  9  5 16

The first argument of sample.int() corresponds to the last possible value such that a call sample.int(17,3) is equivalent to a call sample(1:17, 3).

3.6.1 Exercises: Sorting and shuffling vectors

See Section 18.0.10 for solutions.

Create a vector x with elements 3, 4, -6, 2, -7, 2, -1, 0 and sort it in increasing and decreasing order.
Take the same vector x with elements 3, 4, -6, 2, -7, 2, -1, 0, but this time only sort the first three elements in increasing order.
Create a vector v with 98 evenly spaced elements between 8 and 37. Then from v, sample 10 elements at random and store these 10 elements in a new vector called V. Then, sort V in decreasing order.