Chapter 13 Writing Functions

Throughout this tutorial, you have been using tons of functions built into R, like seq(), mean(), plot(), rnorm() etc. However, you can easily write your own functions that perform specific tasks!

For example, say you’ve received a cool data set and want to run a bunch of statistical analyses. It is often necessary to normalize variables before calculating certain statistics. One way of normalization is to squeeze all values within 0 and 1, and hence to enable a fair comparison of data with different magnitudes. The formula for such a normalization is: $\bar{x} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}$ This formula can easily be implemented in R:

> x <- rnorm(5)
> (x - min(x)) / (max(x) - min(x))
[1] 1.000000000 0.003232373 0.578723932 0.000000000 0.304387379

Now imagine a second vector y, which also needs to be normalized. Of course, you could copy-paste your code from above:

> y <- rnorm(5)
> (y - min(y)) / (max(x) - min(y))
[1] 0.0000000 0.4487176 0.6120174 0.4286857 0.3490270

However, this is bad practice. For example, you might forget to change a variable while copy-pasting. Did you spot the mistake in the formula above? We forgot to replace max(x) by max(y)!

The alternative to copy-pasting is to write custom functions. This has three main advantages:

Functions are a natural way to organize your code. Give a function a reasonable name makes it transparent what that piece of code does. For example, guessing the purpose of a function called normalize() would be pretty straightforward, and calling that function would make your code more readable than a copy-pasted version of a formula.
If you need to fix a bug in your code, or want to add extra steps, you only need to change your code in one place. Imagine you implemented the function for normalization wrongly - e.g. you used $x_{m i n} - x_{m a x}$ in the denominator - and then copy-pasted it 10 times. You would have to fix this bug 10 times in our code - and would likely miss at least one instance.
By writing functions you make your code easily available for re-use. This not only speeds up development, it also eliminates copy-paste mistakes.

Convinced? Let’s have a look on how to implement our normalization function!

> normalize <- function(x){
+   frac <-  (x - min(x)) / (max(x) - min(x))
+   return(frac)
+ }

We can break down the systhax for writing functions as follows:

> NAME <- function(ARGUMENTS) {
+ 
+   ACTIONS
+ 
+   return(OUTPUT)
+ 
+ }

Let’s look at the four pieces that make up a function in R:

Name: You need to pick a name for your function. This can be any valid object name, ideally something that speaks for itself. As functions always DO something, it is recommended that you pick a verb for your function name - for example, normalize() if the function normalizes variables, permute(), calculateSummaryStats() etc. Although you could use names of existing functions (e.g. create a custom min() function), you should avoid this as it causes ambiguities.
Arguments: You list the argu,ments (also called inputs) to the function inside the brackets of function(). The input for the normalization example above is simply x. But you can specify as many inputs as you want, e.g. function() if there are no arguments, or function(x, y, z), if there are three.
Actions: What should the function do with the inputs? Calculate a statistic, as above? Plot something? You write all the real R code inside the body of the function, i.e. inside the curly brackets {}. In the normalization example above, we calculate the variable frac, which contains the normalized values of x.
Output: What should the function return when it’s finished with the actions? A function can return any data type, be it a vector, matrix, data frame, list or even a custom type of your own. Just place the object to return inside return() - in the normalization example above, for instance, we returned the vector frac. Note that a function can also return nothing (by just omitting return()), for example if you want to plot something.

Let’s go back to our example normalize(). Try to execute the code where we defined the function. Nothing output should appear, but R now knows about the function, such that it is ready to be used! We can call the function as every other function we’ve used before:

> normalize(x)
[1] 1.000000000 0.003232373 0.578723932 0.000000000 0.304387379

… and if we ever encounter other vectors that should be normalized, we can simply call the function again:

> normalize(y)
[1] 0.0000000 0.7331779 1.0000000 0.7004468 0.5702893

As mentioned above, a big advantage of functions is that if our requirements change, we only need to make the change in one place. For example, you might encounter variables including NaN and infinite values. In these cases, normalize() fails:

> x <- c(x, NaN, Inf)
> normalize(x)
[1] NaN NaN NaN NaN NaN NaN NaN

Because we’ve extracted the code into a function, we only need to make the fix in one place:

> normalize <- function(x){
+   x <- x[is.finite(x)]
+   min_x <- min(x, na.rm = TRUE)
+   max_x <- max(x, na.rm = TRUE)
+   frac <-  (x - min_x) / (max_x - min_x) 
+   return(frac)
+ }
> 
> normalize(x)
[1] 1.000000000 0.003232373 0.578723932 0.000000000 0.304387379

This is an important part of the “do not repeat yourself” (or DRY) principle. The more repetition you have in your code, the more places you need to remember to update when things change (and they always do!), and the more likely you are to create bugs over time.

There are two ways to shorten the syntax of writing functions:

The last statment is always returned even without writing return, unless it is an assignment.
The curly braces {} are only necessary for functions spanning multiply lines.

For instance, the function

> square <- function(x){
+   return(x*x)
+ }

could also be written as

> square <- function(x) x*x

However, and although this is seen sometimes, we agree with the Google style guide that this is bad practice that makes code less readable and more prone to errors. We therefore recommend that you always write functions on multiple lines with curly braces {} and write an explicit return statement to indicate explicitly what your function returns. The only exception are anonymous functions discussed much later (see Functional Programming)

13.0.1 Exercises: Writing Functions

See Section 18.0.31 for solutions.

Write a function that returns the square of a variable
Write a function that converts Fahrenheit to Celsius.
Write functions to compute the sample variance and skewness of a numeric vector. Sample variance is defined as $Var (x) = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}$ where $\bar{x} = \frac{1}{n} \sum x_{i}$ is the sample mean. Skewedness is defined as: $Skew (x) = \frac{\frac{1}{n - 2} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{3}}{Var (x)^{3 / 2}}$
Write calculateBothNA(), a function that takes two vectors of the same length and returns the number of positions that have an NA in both vectors.