Advanced Control Flow

## Warning in file(filename, "r", encoding = encoding): URL
## 'https://metrics.rstudioprimers.com/learnr/installClient': status was 'Couldn't
## resolve host name'

[1] “Warning: An error occurred with the client code.”

&& and ||

This tutorial extends the Control Flow tutorial, where you learned how to use if, else, return(), and stop().

Here you will learn how to

combine logical tests in an if statement
write if statements that work with vectors, which is a prerequisite if you want to write vectorized functions.

Here’s what clean() looked like at the end of the Control Flow tutorial. Do you notice that all of the if statements have the same outcome?

clean <- function(x) {
  stopifnot(!is.null(x))
  if (x == -99) return(NA)
  if (x == ".") return(NA)
  if (x == "NaN") return(NA)
  x
}

Let’s use your knowledge of logical tests to trim them down to a single if statement.

Write a logical test that returns TRUE when x is -99 OR x is “.” (Let’s ignore the “NaN” case to keep things simple). Then click Submit Answer.

"You can combine two logical tests in R with `&` (and) and `|` (or), e.g. x < 0 & x > 1."

x == -99 | x == "."

& and |

& and | are R’s boolean operators for combining logical tests.

& stands for “and” will return TRUE if both tests return TRUE and will return FALSE otherwise.
| stands for “or” will return TRUE if one or both tests returns TRUE and will return FALSE otherwise.

So,

x <- -99
x == -99 | x == "."

## [1] TRUE

However, it is bad practice to use & and | to combine logical tests within an if condition. Why? Because:

there is something better (as you’ll see in a minute)
& and | tend to generate warning messages when used with if

As R operators, both & and | are vectorized which means that you can use them with vectors. This is very useful.

x <- c(-99, 0 , 1)
x == -99

## [1]  TRUE FALSE FALSE

x == "."

## [1] FALSE FALSE FALSE

x == -99 | x == "."

## [1]  TRUE FALSE FALSE

However, if conditions are not vectorized. if expects the logical test contained within its parentheses to return a single TRUE or FALSE. If the condition returns a vector of TRUE or FALSEs, if will use the first value and show a warning message.

x <- c(-99, 0 , 1)
if (x == -99 | x == ".") NA

## Error in if (x == -99 | x == ".") NA: the condition has length > 1

&& and ||

You can avoid this by always using && and || within your if conditions. && and || are lazy substitutes for & and |. They are lazy in two ways.

First, && and || always return a single TRUE or FALSE. If you give && or || vectors, they will compare only the first elements of the vectors—and they will not return a warning message.

x <- c(-99, 0 , 1)
x == -99 || x == "."

## Error in x == -99 || x == ".": 'length = 3' in coercion to 'logical(1)'

Use ||

Let’s use this to our immediate advantage.

Replace the two if statements below with a single statement that tests whether x is -99 or "." without throwing error messages.

clean <- function(x) {
  stopifnot(!is.null(x))
  if (x == -99) return(NA)
  if (x == ".") return(NA)
  x
}

"Like |, || expects a _complete_ logical test on each side of ||."

clean <- function(x) {
  stopifnot(!is.null(x))
  if (x == -99 || x == ".") return(NA)
  x
}

clean <- function(x) {
  stopifnot(!is.null(x))
  if (x == -99 || x == ".") return(NA)
  x
}

Computation

The most important reason to use || instead of | is that || saves unnecessary computation when possible. This is the second way that && and || are lazy.

When possible, && and || jump to the correct conclusion after evaluating the first of the two logical tests (not so with & and |).

&& will return FALSE if the test on the left returns FALSE (because the combined test would return FALSE).
|| will return TRUE if the test on the left returns TRUE (because the combined test would return TRUE)

In either case, && and || will not evaluate the test on the right.

x <- -99
if (x == -99 || stop("if you evaluate this.")) "I didn't evaluate stop()."

## [1] "I didn't evaluate stop()."

How could you use this?

Remember how this code returns an error because if cannot handle the result of NULL == -99?

clean <- function(x) {
  if (x == -99) return(NA)
  x
}
clean(NULL)

## Error in if (x == -99) return(NA): argument is of length zero

Quiz

Suppose we redefine clean() like this:

clean <- function(x) {
  if (is.null(x) || x == -99) return(NA)
  x
}

Vectorized if

Burried in the last section is an interesting question: what if you do want clean() to work with vectors? i.e.

clean(c(-99, 0, 1))

## [1] NA  0  1

That would be a handy way to clean whole columns of data. How could you do it?

Compare these two functions (one should seem familiar). What is different?

clean <- function(x) {
  if (x == -99) NA else x
}

clean2 <- function(x) {
  ifelse(x == -99, NA, x)
}

ifelse()

ifelse() is a function that replicates an if else statement. It takes three arguments: a logical test followed by two pieces of code. If the test returns TRUE, ifelse() will return the results of the first piece of code. If the test returns FALSE, ifelse() will return the results of the second piece of code.

So clean(-99) and clean2(-99) both return NA.

clean(-99)

## [1] NA

clean2(-99)

## [1] NA

However, unlike if and else, ifelse is vectorized. As a result, you can pass ifelse() a vector of values and it will apply the implied if else statement separately to each element of the vector.

x <- c(-99, 0, 1)
ifelse(x == -99, NA, x)

## [1] NA  0  1

clean2() inherits this vectorized property from ifelse().

clean2(c(-99, 0, 1))

## [1] NA  0  1

Compare that to clean() (which is non-vectorized because it relies on if and else, which are non-vectorized).

clean(c(-99, 0, 1))

## Error in if (x == -99) NA else x: the condition has length > 1

if_else

The dplyr package offers a slight improvement on ifelse() named if_else(). if_else() is faster than ifelse(), but it requires you to make sure that each case in the if else statement returns the same type of object. For example, the statement needs to return a real number (or a string, or a logical, etc.) whether or not the condition is TRUE.

No big deal, right? Well kind of.

x <- c(-99, 0, 1)
if_else(x == -99, NA, x)

## [1] NA  0  1

NA

What happened? Recall that data in R comes in six atomic types.

It is true:

typeof(NA)

## [1] "logical"

So when you write if_else(x == -99, NA, x), if_else() returns a logical in the first case and a double (real number) in the second (assuming x is a real number).

You can get around this mishap in two ways:

Stick to ifelse()
Use a NA that comes with a type

Types of NA

You may not realize it, but R comes with five types of NA. They all appear as NA when printed, but they are each saved with a separate data type. These are:

NA # logical

## [1] NA

NA_integer_ # integer

## [1] NA

NA_real_ # double

## [1] NA

NA_complex_ # complex

## [1] NA

NA_character_ # character

## [1] NA

You can fix if_else() by being precise about which NA to use (most other R functions will convert the type of NA without bothering you).

x <- c(-99, 0, 1)
if_else(x == -99, NA_real_, x)

## [1] NA  0  1

Use if_else

Fix the if_else() statement of clean2() to work with real numbers. Then click Submoit Answer.

clean2 <- function(x) {
  ifelse(x == -99, NA, x)
}

clean2 <- function(x) {
  ifelse(x == -99, NA_real_, x)
}

Vectorized else if

What if you want to write a vectorized version of a multi-part if else tree? Like the tree in this function:

clean <- function(x) {
  if (x == -99) NA 
  else if (x == ".") NA
  else if (x == "") NA
  else if (x == "NaN") NA
  else x
}

In this case, neither ifelse() or if_else() will do. Why? Because each can only handle a single if condition, but our tree has four.

case_when()

You can vectorize multi-part if else statements with dplyr’s case_when() function. Here is how you would use case_when() to rewrite our foo() function from the Control Flow tutorial.

Here is the masterpiece in its original form

foo <- function(x) {
  if (x > 2) "a"
  else if (x < 2) "b"
  else if (x == 1) "c"
  else "d"
}

And here it is with case_when().

foo2 <- function(x) {
  case_when(
    x > 2  ~ "a",
    x < 2  ~ "b",
    x == 1 ~ "c",
    TRUE   ~ "d"
  )
}

And here are our foos in action to prove that foo2() is vectorized.

x <- c(3, 2, 1)
foo(x)

## Error in if (x > 2) "a" else if (x < 2) "b" else if (x == 1) "c" else "d": the condition has length > 1

foo2(x)

## [1] "a" "d" "b"

Notice that

case_when() returns a single case for each element, the first case whose left hand side evaluates to TRUE
The left hand side of the last case evaluates to TRUE no matter what the value of x is (In fact, the left hand side is TRUE). This is an easy way to add an else clause to the end of case_when().

Now let’s look at the unusual syntax of case_when().

case_when() syntax

foo2 <- function(x) {
  case_when(
    x > 2  ~ "a",
    x < 2  ~ "b",
    x == 1 ~ "c",
    TRUE   ~ "d"
  )
}

Each argument of case_when() is a pair that consists of a logical test on the left hand side and a piece of code on the right hand side. The two are always separated by a ~.

Like if_else(), case_when() expects each case to return the same type of output. So keep those NA types handy: NA, NA_integer_, NA_real_, NA_complex_, NA_character_.

Final Challenge

Rewrite the multi-part version of clean() to use case_when(), which will allow clean() to handle vectors. Retain each case. Assume where necessary that clean() will only work with real numbers. Then click Submit Answer.

clean <- function(x) {
  if (x == -99) NA 
  else if (x == ".") NA
  else if (x == "") NA
  else if (x == "NaN") NA
  else x
}

"Use NA's that have the right type."

clean <- function(x) {
  case_when(
    x == -99 ~ NA_real_, 
    x == "." ~ NA_real_,
    x == "" ~ NA_real_,
    x == "NaN" ~ NA_real_,
    TRUE ~ x
  )
}

Congratulations!

You’ve learned how to alter the control flow of your functions with:

if
else
return()
stop()
stopifnot()
ifelse()

Not only that, you tackled two advanced methods: dplyr’s if_else() and dplyr’s case_when().