14.2 Functional Programming

Functional programming is a paradigm of how to write computer code. The overall philosophy is to decompose any bigger problem into smaller pieces, for which generic functions can be written. These functions are then used together to solve the bigger problem. Small functions have the advantage that they are simple to understand, easy to thoroughly test, and likely generic enough to be reusable.

In the extreme, each function does as little as possible and is pure, which means that it does not depend on an internal state, does not modify a global variable and each successive run yields the same output. Many R functions to do not satisfy this and R is hence not a functional language. Neither is functional programming per se better or worse than other programming paradigms. However, adopting a functional style helps not only to make code easier to read, debug and reuse, it also allows to make use of the speed optimizations discussed in the previous chapters.

14.2.1 Functional style - an example

Here is a basic example of how functional style can help to make code better. The following code generates three example data sets stored in a list: data sampled from a standard normal, a standard logistic and a Students’t distribution with 5 degrees of freedom.

> #generate some example data
> x <- list(a=rnorm(100), b=rlogis(100), c=rt(100, df=5))

For each data set, the log-likelihood of the data under a standard normal is then calculated.

> #calculate log likelihood of each data set under a standard normal
> LLnorm <- numeric(length(x))
> for(d in seq_along(x)){
+   LLnorm[d] <- 0.0
+   for(i in seq_along(x[[d]])){
+     LLnorm[d] <- LLnorm[d] + log(1/sqrt(2*pi) * exp(-0.5 * x[[d]][i] * x[[d]][i]))
+   }
+ }
> LLnorm
[1] -143.5703 -242.8420 -174.8873

This code can now be made more readable, reusable and even faster when using a functional style. Let us begin by writing a specific function for the calculation of the log-likelihood under the standard normal.

> logLikNormal <- function(x){
+   return(log(1/sqrt(2*pi) * exp(-0.5 * x * x)))
+ }
> 
> LLnorm <- numeric(length(x))
> for(d in seq_along(x)){
+   LLnorm[d] <- 0.0
+   for(i in seq_along(x[[d]])){
+     LLnorm[d] <- LLnorm[d] + logLikNormal(x[[d]][i])
+   }
+ }
> LLnorm
[1] -143.5703 -242.8420 -174.8873

This has two benefits: First, it is more readable because the logic of what the nested for loops do is separated from the intricacies of calculating the log-likelihood under the standard normal. Second, it is more reusable as the calculation of the log-likelihood is now a function usable also outside that loop.

To further improve, we can put both the inner and outer loops into their own functions, which renders the full code to

> logLikNormal <- function(x){
+   return(log(1/sqrt(2*pi) * exp(-0.5 * x * x)))
+ }
> 
> logLikVectorizedNormal <- function(vec){
+   LLnorm <- 0.0
+   for(i in seq_along(vec)){
+     LLnorm <- LLnorm + logLikNormal(vec[i])
+   }
+   return(LLnorm)
+ }
> 
> logLikListNormal <- function(data){
+   LLnorm <- numeric(length(data))
+   for(d in seq_along(data)){
+     LLnorm[d] <- logLikVectorizedNormal(data[[d]])
+   }
+   return(LLnorm)
+ }
> 
> logLikListNormal(x)
[1] -143.5703 -242.8420 -174.8873

Once we have managed to decompose the bigger problem (calculating the log-likelihood for each data set in a data frame) into individual small functions, we have not only made the code more readable and reusable, we can now also think about improving each function individually. This is now much easier as each function is isolated, i.e. analyzing it does not require knowledge about the big problem anymore.

For instance, we could benefit from the vectorized nature of R and rewrite logLikVectorizedNormal() as

> logLikVectorizedNormal <- function(vec){
+   return( sum(logLikNormal(vec)) )
+ }

This is possible because all operations in logLikNormal() also work if x is a vector rather than a single value. And since for loops are slow, we also gained in performance.

Similarly, we could get rid of the for loop in logLikListNormal() by using sapply()

> logLikListNormal <- function(data){
+   return( sapply(data, logLikVectorizedNormal) )
+ }

We could also think of making the function logLikVectorizedNormal() more general. For instance, we might be interested in also calculating the log-likelihood under alternative models, say under a logistic distribution. A naive implementation could look like this:

> logLikLogistic <- function(x){
+   return(-2 * log(exp(x/2) + exp(-x/2)))
+ }
> 
> logLikVectorizedLogistic <- function(vec){
+   return( sum(logLikLogistic(vec)) )
+ }
> 
> logLikListLogistic <- function(data){
+   return( sapply(data, logLikVectorizedLogistic) )
+ }
> 
> logLikListLogistic(x)
        a         b         c 
-161.4840 -196.1062 -172.4180

As we will see next, this can be written much more elegantly by using so-called functionals.

14.2.2 Functionals

Functionals are functions that take other functions as input. In R, as in any language that supports functional programming, functions are objects just like other variables: they can be assigned to variables, passed to functions, or be returned by functions.

Here is a way to see this:

> f <- function(x){ return(x*x) }
> f(3)
[1] 9
> g <- f
> g(4)
[1] 16

In the code above, a function is created that returns the square of a number. This function is then assigned to the variable f. To call the function, the parenthesis () are used, which contain the arguments to the function. Just like data, a function can be copied to another variable g.

Functions can also be put in a list.

> z <- rnorm(100)
> func <- list(mean, median, sd)
> for(f in func){
+   print(f(z))
+ }
[1] 0.0492941
[1] 0.0389872
[1] 0.9142129

More importantly, they can be passed to another function as an argument.

> square <- function(x){ return(x*x) }
> inverse <- function(x){ return(1/x) }
> do <- function(x, f){
+   return(f(x))
+ }
> do(3, square)
[1] 9
> do(4, inverse)
[1] 0.25

In the above example, we defined the function do that takes as arguments some data x as well as a function f that is then applied to the data. Functions such as doare called functionals. You have already come across built-in functionals such as apply() or sweep(), which also take a function as input.

As a fun-fact, note that functions do not need to be assigned to a variable. Such functions are called anonymous functions and are handy when used only once to be passed to functionals.

> do(3, function(x){return(x^3)})
[1] 27

The above syntax can be simplified to

> do(3, function(x) x^3)
[1] 27

since the last statement of a function is automatically returned (unless it is an assignment) and because the curly braces are not necessary for single-line functions (see Writing Functions).

For functionals to be powerful, tasks obviously need to be coded in functions. To benefit from the power of functionals, code should thus be written in functional style.

Using functionals, we can generalize the above example as follows:

> logLikNormal <- function(x){
+   return(log(1/sqrt(2*pi) * exp(-0.5 * x * x)))
+ }
> 
> logLikLogistic <- function(x){
+   return(-2 * log(exp(x/2) + exp(-x/2)))
+ }
> 
> logLikVectorized <- function(vec, f){
+   return( sum(f(vec)) )
+ }
> 
> logLikList <- function(data, f){
+   return( sapply(data, logLikVectorized, f) )
+ }
> 
> logLikList(x, logLikNormal)
        a         b         c 
-143.5703 -242.8420 -174.8873 
> logLikList(x, logLikLogistic)
        a         b         c 
-161.4840 -196.1062 -172.4180

Note that this generalization is only possible because we used functional style.

14.2.3 Three final thoughts

14.2.3.1 Functional Style vs. Performance

In the above example, using functional style allowed us to obtain clean, readable and reusable code. Once individual building bricks (i.e. functions) were identified, it was much easier to identify how each brick could be improved and generalized thanks to isolation. However, the above example also nicely illustrates that the functional paradigm may come with performance limitations if pushed to extremes.

To give an algorithmic example, consider the term 1/sqrt(2*pi) in logLikNormal(): as this term is a constant independent of x, there would be no need to calculate it again for every data point. A faster version to calculate the log-likelihood under the standard normal would therefore be:

> logLikVectorizedNormal <- function(x){
+   return(length(x) * log(1/sqrt(2*pi)) - 0.5 * sum(x * x))
+ }

This has the further advantage that only a single call to log()and no calls to exp() are required, both very costly functions. This performance boost can only be obtained if the sum is done in the same function as the log-likelihood calculations per data point. As this shows, the paradigm that functions should do as little as possible may jeopardize the goal to have perfectly efficient code.

However, most parts of the code are not performance critical and the time spent debugging badly written code easily outweighs any performance gains. As Donald Knuth famously said: “The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.”

Conclusion: worry about readability and reusability first, and consider performance only if identified as critical! (But that is no excuse for not using faster options if they do not conflict with good coding practices!).

14.2.3.2 Functional style blows up function calls

Functional style implies that many functions get called, that themselves call many functions. This blows up the number of function calls, which is one thing that makes R code slow, as we have seen above. The solution to this is not to abandon functional style - on the contrary: as we have seen above, outsourcing loops is possible precisely because of built-in functionals such as apply(). Only by using a functional style can we benefit from those.

Conclusion: Make good use of apply() and similar functions. For these to be useful, adopt a functional style in coding.

14.2.3.3 Do not repeat yourself (DRY)

As mentioned earlier, an important paradigm is the “do not repeat yourself” (or DRY) principle: reusing a well tested function is much safer (i.e. less bug-prone) than reimplementing the same logic over and over again. Using a functional style is what makes the DRY principle work. However, the principle may also be generalized to “do not repeat others”: using well tested functions (and packages) is much safer than reimplementing the same logic yourself!

In the above example, for instance, we could have used the function dnorm() with log=TRUE rather than writing our own function logLikNormal(). At first it may seem that our function should be faster as we implement the standard normal rather than the generalized version. However, dnorm() is implemented in C++ rather than R and is likely much more performant than our solution anyways.

Conclusion: Check what is available first, before implementing it yourself!

14.2.4 Exercises: Functional programming

See Section 18.0.38 for solutions.

Create a 100x100 matrix A and fill it with random numbers from a uniform distribution. Use functional style to calculate the sum of squares for each row.
If you haven’t already, try to do the above using an anonymous function.
Write your own version of apply(), named my_apply() that takes two arguments: a matrix or data frame x and a function f that is then applied to each row of x (you may use a for-loop). Use it to calculate the sum of squares on A.
Write a functional apply_all() that takes two arguments: a matrix or data frame x and a list of functions funcs. Your functional apply_all() should then return a matrix with columns matching the functions in func that were applied to each row (you may use a for-loop). Use it to calculate the sum of squares and the square of the sum on A.