## Warning in file(filename, "r", encoding = encoding): URL
## 'https://metrics.rstudioprimers.com/learnr/installClient': status was 'Couldn't
## resolve host name'
[1] “Warning: An error occurred with the client code.”
&& and ||
This tutorial extends the Control Flow tutorial, where you learned
how to use if
, else
, return()
,
and stop()
.
Here you will learn how to
- combine logical tests in an if statement
- write if statements that work with vectors, which is a prerequisite if you want to write vectorized functions.
Here’s what clean()
looked like at the end of the
Control Flow tutorial. Do you notice that all of the if statements have
the same outcome?
clean <- function(x) {
stopifnot(!is.null(x))
if (x == -99) return(NA)
if (x == ".") return(NA)
if (x == "NaN") return(NA)
x
}
Let’s use your knowledge of logical tests to trim them down to a single if statement.
- Write a logical test that returns TRUE when x is -99 OR x is “.” (Let’s ignore the “NaN” case to keep things simple). Then click Submit Answer.
"You can combine two logical tests in R with `&` (and) and `|` (or), e.g. x < 0 & x > 1."
x == -99 | x == "."
& and |
&
and |
are R’s boolean operators for
combining logical tests.
&
stands for “and” will returnTRUE
if both tests returnTRUE
and will returnFALSE
otherwise.|
stands for “or” will returnTRUE
if one or both tests returnsTRUE
and will returnFALSE
otherwise.
So,
## [1] TRUE
However, it is bad practice to use &
and
|
to combine logical tests within an if
condition. Why? Because:
- there is something better (as you’ll see in a minute)
&
and|
tend to generate warning messages when used withif
As R operators, both &
and |
are
vectorized which means that you can use them with vectors. This is very
useful.
## [1] TRUE FALSE FALSE
## [1] FALSE FALSE FALSE
## [1] TRUE FALSE FALSE
However, if
conditions are not vectorized.
if
expects the logical test contained within its
parentheses to return a single TRUE
or
FALSE
. If the condition returns a vector of
TRUE
or FALSE
s, if
will use the
first value and show a warning message.
## Error in if (x == -99 | x == ".") NA: the condition has length > 1
&& and ||
You can avoid this by always using &&
and
||
within your if
conditions.
&&
and ||
are lazy substitutes for
&
and |
. They are lazy in two ways.
First, &&
and ||
always return a
single TRUE
or FALSE
. If you give
&&
or ||
vectors, they will compare
only the first elements of the vectors—and they will not return a
warning message.
## Error in x == -99 || x == ".": 'length = 3' in coercion to 'logical(1)'
Use ||
Let’s use this to our immediate advantage.
- Replace the two
if
statements below with a single statement that tests whether x is-99
or"."
without throwing error messages.
clean <- function(x) {
stopifnot(!is.null(x))
if (x == -99) return(NA)
if (x == ".") return(NA)
x
}
"Like |, || expects a _complete_ logical test on each side of ||."
clean <- function(x) {
stopifnot(!is.null(x))
if (x == -99 || x == ".") return(NA)
x
}
Computation
The most important reason to use ||
instead of
|
is that ||
saves unnecessary computation
when possible. This is the second way that &&
and
||
are lazy.
When possible, &&
and ||
jump to
the correct conclusion after evaluating the first of the two logical
tests (not so with &
and |
).
&&
will returnFALSE
if the test on the left returnsFALSE
(because the combined test would returnFALSE
).||
will returnTRUE
if the test on the left returnsTRUE
(because the combined test would returnTRUE
)
In either case, &&
and ||
will not
evaluate the test on the right.
## [1] "I didn't evaluate stop()."
How could you use this?
Remember how this code returns an error because if
cannot handle the result of NULL == -99
?
## Error in if (x == -99) return(NA): argument is of length zero
Vectorized if
Burried in the last section is an interesting question: what if you
do want clean()
to work with vectors? i.e.
## [1] NA 0 1
That would be a handy way to clean whole columns of data. How could you do it?
Compare these two functions (one should seem familiar). What is different?
ifelse()
ifelse()
is a function that replicates an if else
statement. It takes three arguments: a logical test followed by two
pieces of code. If the test returns TRUE
,
ifelse()
will return the results of the first piece of
code. If the test returns FALSE
, ifelse()
will
return the results of the second piece of code.
So clean(-99)
and clean2(-99)
both return
NA
.
## [1] NA
## [1] NA
However, unlike if
and else
,
ifelse
is vectorized. As a result, you can pass
ifelse()
a vector of values and it will apply the implied
if else statement separately to each element of the vector.
## [1] NA 0 1
clean2()
inherits this vectorized property from
ifelse()
.
## [1] NA 0 1
Compare that to clean()
(which is non-vectorized because
it relies on if
and else
, which are
non-vectorized).
## Error in if (x == -99) NA else x: the condition has length > 1
if_else
The dplyr package offers a slight improvement on
ifelse()
named if_else()
.
if_else()
is faster than ifelse()
, but it
requires you to make sure that each case in the if else statement
returns the same type of object. For example, the statement needs to
return a real number (or a string, or a logical, etc.) whether or
not the condition is TRUE
.
No big deal, right? Well kind of.
## [1] NA 0 1
NA
What happened? Recall that data in R comes in six atomic types.
It is true:
## [1] "logical"
So when you write if_else(x == -99, NA, x)
,
if_else()
returns a logical in the first case and a double
(real number) in the second (assuming x
is a real
number).
You can get around this mishap in two ways:
- Stick to
ifelse()
- Use a NA that comes with a type
Types of NA
You may not realize it, but R comes with five types of NA. They all
appear as NA
when printed, but they are each saved with a
separate data type. These are:
## [1] NA
## [1] NA
## [1] NA
## [1] NA
## [1] NA
You can fix if_else()
by being precise about which NA to
use (most other R functions will convert the type of NA without
bothering you).
## [1] NA 0 1
Use if_else
- Fix the
if_else()
statement ofclean2()
to work with real numbers. Then click Submoit Answer.
clean2 <- function(x) {
ifelse(x == -99, NA, x)
}
clean2 <- function(x) {
ifelse(x == -99, NA_real_, x)
}
Vectorized else if
What if you want to write a vectorized version of a multi-part if else tree? Like the tree in this function:
clean <- function(x) {
if (x == -99) NA
else if (x == ".") NA
else if (x == "") NA
else if (x == "NaN") NA
else x
}
In this case, neither ifelse()
or if_else()
will do. Why? Because each can only handle a single if condition, but
our tree has four.
case_when()
You can vectorize multi-part if else statements with dplyr’s
case_when()
function. Here is how you would use
case_when()
to rewrite our foo()
function from
the Control Flow tutorial.
Here is the masterpiece in its original form
And here it is with case_when()
.
And here are our foos in action to prove that foo2()
is
vectorized.
## Error in if (x > 2) "a" else if (x < 2) "b" else if (x == 1) "c" else "d": the condition has length > 1
## [1] "a" "d" "b"
Notice that
case_when()
returns a single case for each element, the first case whose left hand side evaluates toTRUE
- The left hand side of the last case evaluates to
TRUE
no matter what the value ofx
is (In fact, the left hand side isTRUE
). This is an easy way to add anelse
clause to the end ofcase_when()
.
Now let’s look at the unusual syntax of case_when()
.
case_when() syntax
Each argument of case_when()
is a pair that consists of
a logical test on the left hand side and a piece of code on the right
hand side. The two are always separated by a
~
.
Like if_else()
, case_when()
expects each
case to return the same type of output. So keep those NA types handy:
NA
, NA_integer_
, NA_real_
,
NA_complex_
, NA_character_
.
Final Challenge
- Rewrite the multi-part version of
clean()
to usecase_when()
, which will allowclean()
to handle vectors. Retain each case. Assume where necessary thatclean()
will only work with real numbers. Then click Submit Answer.
clean <- function(x) {
if (x == -99) NA
else if (x == ".") NA
else if (x == "") NA
else if (x == "NaN") NA
else x
}
"Use NA's that have the right type."
clean <- function(x) {
case_when(
x == -99 ~ NA_real_,
x == "." ~ NA_real_,
x == "" ~ NA_real_,
x == "NaN" ~ NA_real_,
TRUE ~ x
)
}
Congratulations!
You’ve learned how to alter the control flow of your functions with:
if
else
return()
stop()
stopifnot()
ifelse()
Not only that, you tackled two advanced methods: dplyr’s
if_else()
and dplyr’s case_when()
.