4.6 Sorting, Shuffling and Sampling Vectors
The function sort()
returns a sorted version of a vector:
> x <- c(-8, 5, 8, 2, 47, 10, -3, 2, 218, -27, 72, -73, 3)
> sort(x)
[1] -73 -27 -8 -3 2 2 3 5 8 10 47 72 218
> sort(x, decreasing=T)
[1] 218 72 47 10 8 5 3 2 2 -3 -8 -27 -73
The rank of the elements can be obtained with the optional argument index.return
, which returns an index vector indicating the correct order:
> x <- c(-8, 5, 8, 2, 47, 10, -3, 2, 218, -27, 72, -73, 3)
> sort(x, index.return=T)
$x
[1] -73 -27 -8 -3 2 2 3 5 8 10 47 72 218
$ix
[1] 12 10 1 7 4 8 13 2 3 6 5 11 9
> x[sort(x, index.return=T)$ix]
[1] -73 -27 -8 -3 2 2 3 5 8 10 47 72 218
We can use this index vector to sort e.g. x
(in this case, we get exactly the same as sort(x)
, see above). However, the index vector can be very useful if we want to sort a vector y
according to another vector x
:
> x <- c(-8, 5, 8, 2, 47, 10, -3, 2, 218, -27, 72, -73, 3)
> y <- c(7, 28, 49, 1, -28, 2, 49, 12, 49, 5, -1, 1, 2)
> y[sort(x, index.return=T)$ix]
[1] 1 5 7 49 1 12 2 28 49 2 -28 -1 49
The opposite of sorting is shuffling, in which case the elements of a vector are, well, shuffled at random. This can be achieved using the function sample()
:
While we used the function sample()
to shuffle a vector, the function can also be used what its name suggests: to sample elements from a vector. For this, sample()
has the argument size
that indicates the number of elements to choose. By default, this argument is equal to the length of the provided vector - but we can use it to generate sub-samples easily.
Obviously, we can not take more samples than elements provided, unless we sample with replacement.
> sample(c(10,20), size=10)
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
> sample(c(10,20), size=10, replace=TRUE)
[1] 20 10 10 10 10 20 20 20 20 10
By default, each element has the same probability to be sampled. However, you may provide specific probabilities using the argument prob
. These probabilities will automatically be normalized to so they sum to 1.
Sampling is often used to generate vectors of random indices, that is, to sample from a sequence of integers between 1 and the length of another vector. The function sample.int()
makes that easier:
The first argument of sample.int()
corresponds to the last possible value such that a call sample.int(17,3)
is equivalent to a call sample(1:17, 3)
.
4.6.1 Exercises: Sorting and shuffling vectors
See Section 18.0.10 for solutions.
Create a vector
x
with elements 3, 4, -6, 2, -7, 2, -1, 0 and sort it in increasing and decreasing order.Take the same vector
x
with elements 3, 4, -6, 2, -7, 2, -1, 0, but this time only sort the first three elements in increasing order.Create a vector
v
with 98 evenly spaced elements between 8 and 37. Then fromv
, sample 10 elements at random and store these 10 elements in a new vector calledV
. Then, sortV
in decreasing order.