Functions

and debugging

Aurélien Ginolhac, DLSM

University of Luxembourg

Monday, the 24th of March, 2025

Learning objectives

You will learn to:

  • Write and understand functions
  • Assertions and defensive programming
  • Create a reproducible example

Functions

(in ) Everything that exists is an object. Everything that happens is a function call.

John Chambers Creator of the S language, predecessor of

Functions

The most important thing to understand about is that functions are objects in their own right. You can work with them exactly the same way you work with any other type of object.

Hadley Wickham, Advanced R

Everything is a function

Type names without ()

`+`
function (e1, e2)  .Primitive("+")
sd
function (x, na.rm = FALSE) 
sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
    na.rm = na.rm))
<bytecode: 0x559bcb448a58>
<environment: namespace:stats>

Those 2 are the same thing

1 + 2 * 3
[1] 7
`+`(1, `*`(2, 3))
[1] 7

Obviously the top notation is one humans use. Under the hood, the bottom one is happening.

Minimal function

  • Use the assignment <- and the keyword function to store a function in an object

Minimal example

f <- function() {}
f
function () 
{
}
  • Calling f will not execute the code inside the function
  • A function is called with parenthesis () i.e. f() in our example
f()
NULL
  • A function will always return something: here NULL (the empty element)

Functions have 3 parts

  • body(), the inside code
  • formals(), the list of arguments
  • environment(), location of the function’s parts
f <- function(x) x^2
f
function (x) 
x^2
formals(f) # assign character is '='
$x
body(f)
x^2
environment(f)
<environment: R_GlobalEnv>

Returning a result

f <- function() {
  1 + 1
  print("Hello from basv!")
}
# no usage of return
f()
[1] "Hello from basv!"
  • Assigning the result to a name:
# no output for a computation
result <- f()
[1] "Hello from basv!"
# display the content
result
[1] "Hello from basv!"

With no explicit return

Functions return the output of the last command

Explicit use the return() (early exit)

return() with variable
f3 <- function() {
  res <- 1 + 1
  # stop early if even
  if (res %% 2 == 0) {
    return(res)
  }
  # other computation not run
  3 + 2
}
f3()
[1] 2

Function arguments

Without default

z <- NULL # empty z object from global env
f <- function(z) { # Note the 'z'
  z + 1
}
f()
Error in f(): argument "z" is missing, with no default
f(2) # same as f(z = 2)
[1] 3

Function environments are enclosed, the z formal has nothing to do with Global Env z.

With default

z <- NULL
f <- function(z = 0) { 
  z + 1
}

f()
[1] 1
f(2)
[1] 3
f(z = 2) # name the argument explicitly
[1] 3
  • When argument is supplied, default is superseded.
  • Argument can be named or not (usual when > 1 argument)

Environments

Environments are enclosed, but if a variable is not found, it is searched in the parent: Global Env

y <- 1
f <- function(x) x + y

f(x = 4)
[1] 5

Debugging

Unhelpful error outputs

library(tibble)
dat <- tibble(a = 1:2,
              b = c("a", "b"))
df$a
Error in df$a: object of type 'closure' is not subsettable

The tidyverse is helping you out

select(df, a)
Error in UseMethod("select"): no applicable method for 'select' applied to an object of class "function"

df is actually a function (density for F), closures are a special kind of functions

Defensive programming: assertions

Here, compute without assertion

for (i in list(1, 2, "a")) {
  print(3 / i)
}
[1] 3
[1] 1.5
Error in 3/i: non-numeric argument to binary operator

Error message not so clear

Be defensive and anticipate issues

for (i in list(1, 2, "a")) {
  if (!is.numeric(i)) {
    stop(paste(i, "is not a number"))
  }
  print(3 / i)
}
[1] 3
[1] 1.5
Error: a is not a number
  • if test for relevant condition
  • stop create the error with a meaningful message

Another useful test to add:

  [...]
  if (i == 0) stop("Cannot divide by 0")
  [...]

for one liner, curly braces are optional

Reproducible Example: reprex

  • Forces you to test a fresh R session
  • To narrow down the issue
  • Test with built-in dataset
  • In most cases, you find the error, otherwise make it easy for others to help you
my_files <- fs::dir_ls("data", glob = "*.xlsx")
wunsch <- map_dfr(my_files, readxl::read_xlsx)
wunsch |> 
  group_by(id, cell) |> 
  summarise(n = n(),
            top = slice_max(joer))
Error: Problem with `summarise()` input `top`. 
x no applicable method for 'slice_max' applied 
to an object of class "c('double', 'numeric')"
ℹ Input `top` is `slice_max(joer)`. 
ℹ The error occurred in group 1: id = "0012".
library(dplyr, warn.conflicts = FALSE)
ToothGrowth |> 
  # should be slice_max(len, n = 2, by = supp) only
   mutate(top = slice_max(len, n = 2),
          .by = supp)
  • No personal files / multiple excels
  • 1 package / 6
  • 3 functions / 8
  • 2 lines / 6
  • Others can run it on their machines

reprex usage and renders:

From the clipboard:

library(dplyr, warn.conflicts = FALSE)
ToothGrowth |> 
  # should be slice_max(len, n = 2, by = supp) only
   mutate(top = slice_max(len, n = 2),
          .by = supp)
#> Error in `mutate()`:
#> ℹ In argument: `top = slice_max(len, n = 2)`.
#> ℹ In group 1: `supp = VC`.
#> Caused by error in `slice_max()`:
#> ! `order_by` is absent but must be supplied.

Created on 2024-02-29 with reprex v2.1.0

Code style

We know is flexible and not rigorous enough regarding indentation and style.

Thus, to make code looks more alike and consistent, we need to follow some guidelines

Recommendations:

  • Spaces around operators
  • One operation per line
  • } on its own line
  • Naming things
# Bad
neige<-function(x) x+1; return(x)

# Good
increment <- function(x) {
  x + 1
}
  • Indentation and spaces (RStudio helps + Diagnostics)
  • Naming things (yes again)
# Bad
for(xx in seq_along(vec)){
a[i]<-vec[xx]%%2}

# Good
for (i in seq_along(vec)) {
  res[i] <- vec[i] %% 2
}

Write a function

Features

  • Named is_even
  • Takes a numeric vector as input
  • Return logical vector of is even
  • Should check if the input is correct:
    • Data type (must be numbers)
    • No missing data

You should make it work without a function first. Then encapsulate it to a function.

Once done, send it back to moodle in dedicated assignment.

Examples of desired outputs:

  • for is_even(1)
  • for is_even(2)
  • for is_even(NA)
[1] "for 1:"
[1] FALSE
[1] "for 2:"
[1] TRUE
[1] "for NA:"
Error: argument n should be a number!

Before we stop

You learned to:

  • Read and write functions
  • Learn about debugging and assertive programming

Acknowledgments 🙏 👏

  • Hadley Wickham
  • Jenny Bryan

Thank you for your attention!