## Warning in file(filename, "r", encoding = encoding): URL
## 'https://metrics.rstudioprimers.com/learnr/installClient': status was 'SSL
## connect error'
[1] “Warning: An error occurred with the client code.”
&& and ||
This tutorial extends the Control Flow tutorial, where you learned how to use if
, else
, return()
, and stop()
.
Here you will learn how to
- combine logical tests in an if statement
- write if statements that work with vectors, which is a prerequisite if you want to write vectorized functions.
Here’s what clean()
looked like at the end of the Control Flow tutorial. Do you notice that all of the if statements have the same outcome?
<- function(x) {
clean stopifnot(!is.null(x))
if (x == -99) return(NA)
if (x == ".") return(NA)
if (x == "NaN") return(NA)
x }
Let’s use your knowledge of logical tests to trim them down to a single if statement.
- Write a logical test that returns TRUE when x is -99 OR x is “.” (Let’s ignore the “NaN” case to keep things simple). Then click Submit Answer.
"You can combine two logical tests in R with `&` (and) and `|` (or), e.g. x < 0 & x > 1."
x == -99 | x == "."
& and |
&
and |
are R’s boolean operators for combining logical tests.
&
stands for “and” will returnTRUE
if both tests returnTRUE
and will returnFALSE
otherwise.|
stands for “or” will returnTRUE
if one or both tests returnsTRUE
and will returnFALSE
otherwise.
So,
<- -99
x == -99 | x == "." x
## [1] TRUE
However, it is bad practice to use &
and |
to combine logical tests within an if
condition. Why? Because:
- there is something better (as you’ll see in a minute)
&
and|
tend to generate warning messages when used withif
As R operators, both &
and |
are vectorized which means that you can use them with vectors. This is very useful.
<- c(-99, 0 , 1)
x == -99 x
## [1] TRUE FALSE FALSE
== "." x
## [1] FALSE FALSE FALSE
== -99 | x == "." x
## [1] TRUE FALSE FALSE
However, if
conditions are not vectorized. if
expects the logical test contained within its parentheses to return a single TRUE
or FALSE
. If the condition returns a vector of TRUE
or FALSE
s, if
will use the first value and show a warning message.
<- c(-99, 0 , 1)
x if (x == -99 | x == ".") NA
## Error in if (x == -99 | x == ".") NA: the condition has length > 1
&& and ||
You can avoid this by always using &&
and ||
within your if
conditions. &&
and ||
are lazy substitutes for &
and |
. They are lazy in two ways.
First, &&
and ||
always return a single TRUE
or FALSE
. If you give &&
or ||
vectors, they will compare only the first elements of the vectors—and they will not return a warning message.
<- c(-99, 0 , 1)
x == -99 || x == "." x
## Error in x == -99 || x == ".": 'length = 3' in coercion to 'logical(1)'
Use ||
Let’s use this to our immediate advantage.
- Replace the two
if
statements below with a single statement that tests whether x is-99
or"."
without throwing error messages.
clean <- function(x) {
stopifnot(!is.null(x))
if (x == -99) return(NA)
if (x == ".") return(NA)
x
}
"Like |, || expects a _complete_ logical test on each side of ||."
clean <- function(x) {
stopifnot(!is.null(x))
if (x == -99 || x == ".") return(NA)
x
}
<- function(x) {
clean stopifnot(!is.null(x))
if (x == -99 || x == ".") return(NA)
x }
Computation
The most important reason to use ||
instead of |
is that ||
saves unnecessary computation when possible. This is the second way that &&
and ||
are lazy.
When possible, &&
and ||
jump to the correct conclusion after evaluating the first of the two logical tests (not so with &
and |
).
&&
will returnFALSE
if the test on the left returnsFALSE
(because the combined test would returnFALSE
).||
will returnTRUE
if the test on the left returnsTRUE
(because the combined test would returnTRUE
)
In either case, &&
and ||
will not evaluate the test on the right.
<- -99
x if (x == -99 || stop("if you evaluate this.")) "I didn't evaluate stop()."
## [1] "I didn't evaluate stop()."
How could you use this?
Remember how this code returns an error because if
cannot handle the result of NULL == -99
?
<- function(x) {
clean if (x == -99) return(NA)
x
}clean(NULL)
## Error in if (x == -99) return(NA): argument is of length zero
Quiz
Suppose we redefine clean()
like this:
<- function(x) {
clean if (is.null(x) || x == -99) return(NA)
x }
Vectorized if
Burried in the last section is an interesting question: what if you do want clean()
to work with vectors? i.e.
clean(c(-99, 0, 1))
## [1] NA 0 1
That would be a handy way to clean whole columns of data. How could you do it?
Compare these two functions (one should seem familiar). What is different?
<- function(x) {
clean if (x == -99) NA else x
}
<- function(x) {
clean2 ifelse(x == -99, NA, x)
}
ifelse()
ifelse()
is a function that replicates an if else statement. It takes three arguments: a logical test followed by two pieces of code. If the test returns TRUE
, ifelse()
will return the results of the first piece of code. If the test returns FALSE
, ifelse()
will return the results of the second piece of code.
So clean(-99)
and clean2(-99)
both return NA
.
clean(-99)
## [1] NA
clean2(-99)
## [1] NA
However, unlike if
and else
, ifelse
is vectorized. As a result, you can pass ifelse()
a vector of values and it will apply the implied if else statement separately to each element of the vector.
<- c(-99, 0, 1)
x ifelse(x == -99, NA, x)
## [1] NA 0 1
clean2()
inherits this vectorized property from ifelse()
.
clean2(c(-99, 0, 1))
## [1] NA 0 1
Compare that to clean()
(which is non-vectorized because it relies on if
and else
, which are non-vectorized).
clean(c(-99, 0, 1))
## Error in if (x == -99) NA else x: the condition has length > 1
if_else
The dplyr package offers a slight improvement on ifelse()
named if_else()
. if_else()
is faster than ifelse()
, but it requires you to make sure that each case in the if else statement returns the same type of object. For example, the statement needs to return a real number (or a string, or a logical, etc.) whether or not the condition is TRUE
.
No big deal, right? Well kind of.
<- c(-99, 0, 1)
x if_else(x == -99, NA, x)
## [1] NA 0 1
NA
What happened? Recall that data in R comes in six atomic types.
It is true:
typeof(NA)
## [1] "logical"
So when you write if_else(x == -99, NA, x)
, if_else()
returns a logical in the first case and a double (real number) in the second (assuming x
is a real number).
You can get around this mishap in two ways:
- Stick to
ifelse()
- Use a NA that comes with a type
Types of NA
You may not realize it, but R comes with five types of NA. They all appear as NA
when printed, but they are each saved with a separate data type. These are:
NA # logical
## [1] NA
NA_integer_ # integer
## [1] NA
NA_real_ # double
## [1] NA
NA_complex_ # complex
## [1] NA
NA_character_ # character
## [1] NA
You can fix if_else()
by being precise about which NA to use (most other R functions will convert the type of NA without bothering you).
<- c(-99, 0, 1)
x if_else(x == -99, NA_real_, x)
## [1] NA 0 1
Use if_else
- Fix the
if_else()
statement ofclean2()
to work with real numbers. Then click Submoit Answer.
clean2 <- function(x) {
ifelse(x == -99, NA, x)
}
clean2 <- function(x) {
ifelse(x == -99, NA_real_, x)
}
Vectorized else if
What if you want to write a vectorized version of a multi-part if else tree? Like the tree in this function:
<- function(x) {
clean if (x == -99) NA
else if (x == ".") NA
else if (x == "") NA
else if (x == "NaN") NA
else x
}
In this case, neither ifelse()
or if_else()
will do. Why? Because each can only handle a single if condition, but our tree has four.
case_when()
You can vectorize multi-part if else statements with dplyr’s case_when()
function. Here is how you would use case_when()
to rewrite our foo()
function from the Control Flow tutorial.
Here is the masterpiece in its original form
<- function(x) {
foo if (x > 2) "a"
else if (x < 2) "b"
else if (x == 1) "c"
else "d"
}
And here it is with case_when()
.
<- function(x) {
foo2 case_when(
> 2 ~ "a",
x < 2 ~ "b",
x == 1 ~ "c",
x TRUE ~ "d"
) }
And here are our foos in action to prove that foo2()
is vectorized.
<- c(3, 2, 1)
x foo(x)
## Error in if (x > 2) "a" else if (x < 2) "b" else if (x == 1) "c" else "d": the condition has length > 1
foo2(x)
## [1] "a" "d" "b"
Notice that
case_when()
returns a single case for each element, the first case whose left hand side evaluates toTRUE
- The left hand side of the last case evaluates to
TRUE
no matter what the value ofx
is (In fact, the left hand side isTRUE
). This is an easy way to add anelse
clause to the end ofcase_when()
.
Now let’s look at the unusual syntax of case_when()
.
case_when() syntax
<- function(x) {
foo2 case_when(
> 2 ~ "a",
x < 2 ~ "b",
x == 1 ~ "c",
x TRUE ~ "d"
) }
Each argument of case_when()
is a pair that consists of a logical test on the left hand side and a piece of code on the right hand side. The two are always separated by a ~
.
Like if_else()
, case_when()
expects each case to return the same type of output. So keep those NA types handy: NA
, NA_integer_
, NA_real_
, NA_complex_
, NA_character_
.
Final Challenge
- Rewrite the multi-part version of
clean()
to usecase_when()
, which will allowclean()
to handle vectors. Retain each case. Assume where necessary thatclean()
will only work with real numbers. Then click Submit Answer.
clean <- function(x) {
if (x == -99) NA
else if (x == ".") NA
else if (x == "") NA
else if (x == "NaN") NA
else x
}
"Use NA's that have the right type."
clean <- function(x) {
case_when(
x == -99 ~ NA_real_,
x == "." ~ NA_real_,
x == "" ~ NA_real_,
x == "NaN" ~ NA_real_,
TRUE ~ x
)
}
Congratulations!
You’ve learned how to alter the control flow of your functions with:
if
else
return()
stop()
stopifnot()
ifelse()
Not only that, you tackled two advanced methods: dplyr’s if_else()
and dplyr’s case_when()
.