## Warning in file(filename, "r", encoding = encoding): URL
## 'https://metrics.rstudioprimers.com/learnr/installClient': status was 'Couldn't
## resolve host name'
[1] “Warning: An error occurred with the client code.”
Why More?
The map functions come with two useful shortcuts:
- an easy way to extract elements from nested lists
- an easy way to apply non-function expressions, like
x + 1
, to the elements of a vector
This tutorial will teach you how to use the shortcuts as you work through a data analysis case study. This first section will introduce the case study.
Data
us
is a list of statistics about the US, measured every five years from 1952 to 2007. The data is a reformatted portion of the gapminder
data set, which comes in the gapminder package.
- Click Submit Answer to see the contents of
us
. Can you spot the life expectancy for a US citizen in 1952? How about in 2007?
us
us
Can you use the values of lifeExp
to compute the rise in life expectancy in the US from 1952 to 2007?
This is a simple task, but we will use it as the basis of an iteration project.
Data Wrangling
First, you need to access the values of lifeExp
. The values are difficult to work with because they are stored inside of a list (us
).
You can extract the values with purrr’s pluck()
function.
pluck()
pluck()
extracts an element from a list, by name or by position. It returns the contents of the element as they are, without surrounding them in a new list.
For example, the two pluck()
calls below both extract the first element of list1
.
list1 <- list(
numbers = 1:3,
letters = c("a", "b", "c"),
logicals = c(TRUE, FALSE)
)
pluck(list1, 1) # list1 %>% pluck(1)
pluck(list1, "numbers") # list1 %>% pluck("numbers")
## [1] 1 2 3
- Use
pluck()
to extractlifeExp
fromus
. Then click Submit Answer. I’ve pre-loaded the purrr package for you.
"The name of the element that you want to pluck is lifeExp."
"Don't forget to put quotes around lifeExp when you pass it to pluck()."
us %>% pluck("lifeExp")
Rise in lifeExp
pluck()
returns the values of lifeExp
as a vector, which makes it easy to manipulate the values with two new functions:
last()
returns the last element of a vector.first()
returns the first element of a vector.
Both come in the dplyr package, which I have pre-loaded for you.
- Use
last()
andfirst()
to compute the rise in life expectancy in the US from 1952 to 2007. Then click Submit Answer.
lifeExp <- us %>% pluck("lifeExp")
"The change in life expectancy will be the last value of lifeExp minus the first value of lifeExp."
lifeExp <- us %>% pluck("lifeExp")
last(lifeExp) - first(lifeExp)
Recap
You computed that the life expectancy of US citizens increased by 9.8 years between 1952 and 2007. To do this, you:
- Plucked the
lifeExp
values fromus
withpluck()
- Calculated the change in life expectancy with
last(lifeExp) - first(lifeExp)
It’d be nice to do this for other countries as well. And you can. gap_list
contains lists of statistics for 142 countries. Click Submit Answer to see.
gap_list
To apply your two step process to each country, you will need to use map()
and each of its shortcuts.
Click Continue when you are ready to begin.
Shortcuts
To map your work to every country, you will need to do two things for each value of gap_list
:
- Pluck the values of
lifeExp
from the sub-list that contains them - Compute the change in life expectancy with
last(lifeExp) - first(lifeExp)
Each step will reveal new aspects of the map functions.
- Step 1 will demonstrate two useful shortcuts for extracting sub-elements
- Step 2 will demonstrate how to make and apply expressions with map functions
Step 1
- Map
pluck()
overgap_list
to return thelifeExp
vectors of each country ingap_list
.
"Recall that map() takes a vector to iterate over, a function to apply to each element of the vector, and then any arguments that that function needs."
gap_list %>% map(pluck, "lifeExp")
shortcuts
Before you go on, consider what you did.
gap_list
contained 142 sub-lists, and each sub-list contained the same set of elements. This set up is more common than you may think.
You pulled out the same element from each of those sub-lists. This operation is more common than you might think.
In fact, this operation is so common that map()
provides two shortcuts to help you do it.
“name” shortcut
First, you can pass map()
the name of an element to extract as a character string. When map()
receives a character string instead of a function, map()
will return the element of each sub-list whose name matches the character string.
params <- list(
"norm1" = list("mu" = 0, "sd" = 1),
"norm2" = list("mu" = 1, "sd" = 1),
"norm3" = list("mu" = 2, "scale" = 1)
)
map(params, "mu") # params %>% map("mu")
## $norm1
## [1] 0
##
## $norm2
## [1] 1
##
## $norm3
## [1] 2
- Change the code below to use the “name” shortcut. Then click Submit Answer. Do you get the same result?
gap_list %>% map(pluck, "lifeExp")
"Did you remember to remove pluck from the map() call?"
"The element that you want to extract from each sub-list is named lifeExp."
gap_list %>% map("lifeExp")
Integers shortcut
Instead of passing map()
a character string that identifies the element to extract by name, you can pass map()
an integer that identifies the element to extract by position, e.g.
params <- list(
"norm1" = list("mu" = 0, "sd" = 1),
"norm2" = list("mu" = 1, "sd" = 1),
"norm3" = list("mu" = 2, "scale" = 1)
)
map(params, 1) # params %>% map(1)
## $norm1
## [1] 0
##
## $norm2
## [1] 1
##
## $norm3
## [1] 2
- Extract
lifeExp
with the integer shortcut. Then click Submit Answer.
gap_list
"Use map() and pass it a number."
"Do you remember the position of lifeExp within each sub-list of gap_list? It is the second element."
gap_list %>% map(2)
Data frames
The best thing about these shortcuts is that they also work when your vector contains data frames. You can use the shortcuts to pull out the same column of each data frame.
For example, gap_dfs
contains the same information as gap_lists
, but it organizes each sub-list into data frame, which is more user-friendly.
gap_dfs %>% pluck(1)
- Use the “name” shortcut to extract the
lifeExp
column of each data frame ingap_dfs
. Then click Submit Answer. What happens?
gap_dfs
gap_dfs %>% map("lifeExp")
One more thing
There is one more thing that you should know about these shortcuts. In the Map tutorial, you learned that map()
is part of a larger family of map functions:
Function | Output |
---|---|
map() |
list |
map_chr() |
character vector |
map_dbl() |
double (numeric) vector |
map_dfc() |
data frame (output column binded) |
map_dfr() |
data frame (output row binded) |
map_int() |
integer vector |
map_lgl() |
logical vector |
walk() |
returns the input invisibly (used to trigger side effects) |
The “name” and integer shortcuts will work with all of these functions. So will the expressions that you are about to learn. Speaking of that, let’s get back to your case study.
Expressions
Recall your goals. For each value of gap_list
, you want to:
- Pluck the values of
lifeExp
from the sub-list that contains them - Compute the change in life expectancy with
last(lifeExp) - first(lifeExp)
Step 1 was easy and you learned two shortcuts along the way. Step 2 will be harder.
Step 2
You know that you can use map()
to apply a function to each element of a list, but last(lifeExp) - first(lifeExp)
isn’t a function. It is an expression that uses two functions. How can you pass it to map()
?
With a map expression.
A pattern
At the heart of map()
is the pattern:
For each element, do _____
When you fill in the blank with a function, or a character string, or an integer, map()
knows just what to do.
But you can also fill in the blank with an arbitrary R expression (like last(lifeExp) - first(lifeExp)
) if you follow two rules.
Rule 1 - ~
First, place a ~
at the start of the expression. This alerts map()
that you are giving it an expression to run:
For each element, do ~last(lifeExp) - first(lifeExp)
Rule 2 - .x
Second, replace the name of the thing to manipulate with .x
wherever it appears in your expression.
For each element, do ~last(.x) - first(.x)
Or more simply,
For each .x
, do ~last(.x) - first(.x)
This tells map()
where to use the element within your expression. If an expression uses each element multiple times, you will need to insert multiple .x
s into your expression.
How does this look with the map()
function? You pass the expression to map()
exactly as you would pass a function.
In this example, the expression plucks the two values in each sub-list of params
and uses them to generate five random normal values.
params <- list(
"norm1" = list("mu" = 0, "sd" = 1),
"norm2" = list("mu" = 1, "sd" = 1),
"norm3" = list("mu" = 2, "scale" = 1)
)
map(params, ~ rnorm(5, mean = pluck(.x, 1), sd = pluck(.x, 2)))
## $norm1
## [1] 1.5017946 0.5057648 0.1666227 -0.5757261 1.1411239
##
## $norm2
## [1] 0.4547392 2.3756265 1.2167280 3.2791252 0.1169030
##
## $norm3
## [1] 0.74409517 3.25762670 4.48313753 2.05811611 0.08310932
# params %>% map(~rnorm(5, mean = pluck(.x, 1), sd = pluck(.x, 2)))
Now it is your turn. To finish your code:
- Turn
last(lifeExp) - first(lifeExp)
into an expression. - Use
map()
to apply the expression to each element returned by the code below. - Click Submit Answer.
gap_list %>%
map("lifeExp")
"Use %>% to add a second map() call to your code."
"Remember the two rules for map expressions: 1) begin with a ~, 2) refer to the elements with .x. Do not surround the expression with quotes."
gap_list %>%
map("lifeExp") %>%
map(~ last(.x) - first(.x))
Beyond map()
- Change the code below to return a double vector in the last step. Will the expression still work? Click Submit Answer to find out.
gap_list %>%
map("lifeExp") %>%
map(~ last(.x) - first(.x))
gap_list %>%
map("lifeExp") %>%
map_dbl(~ last(.x) - first(.x))
A small payoff
It would be rewarding to learn which countries had the largest and smallest changes in life expectancy. You can find out with three new functions:
enframe()
turns a named vector into a data frame with two columns:name
andvalue
.named_vec <- c(uno = 1, dos = 2, tres = 3) enframe(named_vec)
enframe()
comes in the tidyr package, which I’ve pre-loaded for you.
top_n()
returns the n rows that have the highest value of a weighting variable.top_n(mtcars, n = 5, wt = mpg)
(Don’t be fooled: these are the rows with the five highest values of
mpg
.top_n()
retrieves them but does not sort them bympg
).
You can combine
top_n()
withdesc()
to retrieve the lowest n values.top_n(mtcars, n = 5, wt = desc(mpg))
Both
top_n()
anddesc()
come in the dplyr package, which I have pre-loaded for you.
The most extreme changes in life expectancy
- Extend your code with
enframe()
andtop_n()
to retrieve the five countries with the most positve change in life expectancy. Then click Submit Answer.
gap_list %>%
map("lifeExp") %>%
map_dbl(~ last(.x) - first(.x))
"To begin pipe your results into enframe()."
"...i.e. add %>% enframe() to the end of your code."
"Then pipe that result to top_n(). You will need to set the n and wt arguments of top_n()."
'enframe() named the column that you want to wt by "value".'
gap_list %>%
map("lifeExp") %>%
map_dbl(~ last(.x) - first(.x)) %>%
enframe() %>%
top_n(5, wt = value)
- Add
desc()
to your code to retrieve the five countries with the least positive increase in life expectancy. Then click Submit Answer.
gap_list %>%
map("lifeExp") %>%
map_dbl(~ last(.x) - first(.x)) %>%
enframe() %>%
top_n(5, wt = value)
"This time, you want to order by descending values of value."
gap_list %>%
map("lifeExp") %>%
map_dbl(~ last(.x) - first(.x)) %>%
enframe() %>%
top_n(5, wt = desc(value))
Best practices
You can do almost anything with map()
if you use the right expression. The best way to write an expression is to:
- Pluck a single element from your vector
- Write code that works correctly for that element
- Transform the code into an expression to use with
map()
The alternative is to write an expression in your head, and then see if it works. Too often it won’t.
Models
Let’s put this workflow into practice.
Another way to quantify the change in life expectancy over time is to fit a simple linear model to the data. The slope of the model will be how fast life expectancy increased per year, on average.
Step 1 - Pluck a single element from your vector
- Pluck the “United States” element from
gap_dfs
. Save it asusa
, so you don’t overwriteus
. Then click Submit Answer.
"You do not need to use map() here, just pluck()."
usa <- gap_dfs %>% pluck("United States")
Step 2 - Write code that works correctly for that element
You may not know how to fit a linear model with R, so I assure you that the code below will work.
- Click Submit Answer to double check the code runs with your test case.
lm(lifeExp ~ year, data = usa)
"Leave the code as is and click Submit Answer."
lm(lifeExp ~ year, data = usa)
Step 3 - Transform the code into an expression to use with map()
- Turn your code into an expression and
map()
it to each element ofgap_dfs
. Then click Submit Answer.
lm(lifeExp ~ year, data = usa)
"Begin with gap_dfs and pass it to map()."
"The next argument of map() should be your function, rewritten as an expression. Do you remember the two rules for writing map expressions?"
"Rule 1: begin the expression with a ~."
"Rule 2: replace every appearance of the element to use with .x. In our code, each element would iteratively take the place of usa."
gap_dfs %>%
map(~ lm(lifeExp ~ year, data = .x))
A model that accounts for GDP
Use the best practices to fit a model that accounts for GDP to each country. Your model should use the formula
lifeExp ~ year + gdpPercap
inside oflm()
. The best practices are:- Pluck a single element from your vector (in this case
usa
) - Write code that works correctly for that element
- Transform the code into an expression to use with
map()
- Pluck a single element from your vector (in this case
When you are finished fitting the models, click Submit Answer.
gap_dfs %>%
map(~ lm(lifeExp ~ year + gdpPercap, data = .x))
The End
Congratulations on making it to the end. We will use the two lists of models that you created in the next tutorial, which teaches a new dimension of map.
In that tutorial, you will learn how to iterate over two or more vectors at the same time.