## Warning in file(filename, "r", encoding = encoding): URL
## 'https://metrics.rstudioprimers.com/learnr/installClient': status was 'Couldn't
## resolve host name'
[1] “Warning: An error occurred with the client code.”
Introduction
You’ve learned how to map any expresssion over the elements of a single vector.
Now you’ll learn how to map an expression over the elements of two or more vectors at once.
A case study
To see how useful this technique can be, let’s use multi-vector
mapping to compare two lists of models. At the end of the Map Shortcuts
tutorial, you fit two models for every country in gap_dfs
.
This created two lists:
A list of models that predicted life expectancy by year, which I saved as
model1
.model1 <- gap_dfs %>% map(~ lm(lifeExp ~ year, data = .x))
A list of models that predicted life expectancy by both year and GDP per capita, which I saved as
model2
.model2 <- gap_dfs %>% map(~ lm(lifeExp ~ year + gdpPercap, data = .x))
But which model is better? In other words, does adding GDP per capita improve your predictions for life expectancy?
anova()
One way to tell is with the anova()
function.
anova()
takes two models and tests whether the second model
outperforms the first.
Let’s try it. You can use your two models for the United States
usa_mod1 <- model1 %>% pluck("United States")
usa_mod2 <- model2 %>% pluck("United States")
anova(usa_mod1, usa_mod2)
The anova results suggest that adding GDP per capita did not improve your predictions for the US.
How can you tell? The very last number in the anova table is a p-value (here 0.1424). If the number is above 0.05, there is not enough evidence to suggest that the second model outperforms the first: you can ascribe the difference between the two to random chance.
Beyond the US
So GDP per capita doesn’t improve your predictions for the United States, but what about for the other countries?
You can iterate through all of the countries to find out, but this is
a new type of iteration problem. At each step, anova()
will
require one element from model1
and one element from
model2
. You’ll need to simultaneously iterate over
two lists. But how?
map2()
Enter map2()
.
Syntactically, map2()
behaves like map()
,
but it takes two vectors as arguments before it takes a
function (remember that lists are a type of vector).
At each step of the iteration, map2()
will pass an
element from the first vector to the first argument of the function. It
will pass an element from the second vector to the second argument of
the function.
- Use
map2()
to run an anova on each pair of models inmodel1
andmodel2
. Then click Submit Answer.
map2(model1, model2, anova)
map2() and map()
map2()
is similar to map in almost every way. For
example, you can pass extra arguments to your function as extra
arguments to map2()
, e.g.
# map2 will pass test = "Chisq" to the anova function
map2(model1, model2, anova, test = "Chisq")
Like map()
, map2()
returns its results as a
list, but it also comes with variants that will return the result in
other formats. As with map()
, you should use the variant of
map2()
that will return your results in the format that you
want:
Map function | Map2 function | Output |
---|---|---|
map() |
map2() |
list |
map_chr() |
map2_chr() |
character vector |
map_dbl() |
map2_dbl() |
double (numeric) vector |
map_dfc() |
map2_dfc() |
data frame (output column binded) |
map_dfr() |
map2_dfr() |
data frame (output row binded) |
map_int() |
map2_int() |
integer vector |
map_lgl() |
map2_lgl() |
logical vector |
walk() |
walk2() |
returns the input invisibly (used to trigger side effects) |
Map2 expressions
map2()
also uses shortcuts like map()
.
While the “name” and integer shortcuts do not make sense for
map2()
, the expressions shortcut does.
To make an expression for map2()
:
- Begin the expression with
~
- Refer to elements from the first vector as
.x
- Refer to elements from the second vectore as
.y
Only the last step is different from map()
.
Try an expression
How much does the year coefficient change between model1
and model2
for the US? The code below computes the
answer.
pluck(coef(usa_mod2), "year") - pluck(coef(usa_mod1), "year")
## [1] 0.07383349
Can you do the same thing for every country?
- Turn the code above into an expression and map it over the
model1
andmodel2
lists. - Return the results as a double vector.
- Round the results to two decimal places.
- Click Submit Answer when you are finished.
model1 %>%
map2(model2, anova)
"First, replace anova with your expression."
"Second, replace map2() with a variant that will return a double (numeric) vector."
"Third, round the results with %>% round(digits = 2)."
model1 %>%
map2_dbl(model2, ~ pluck(coef(.y), "year") - pluck(coef(.x), "year")) %>%
round(digits = 2)
From named vector to data frame
You can make long named vectors easier to work with, with the
enframe()
function from the tidyr package.
enframe()
takes a named vector and returns a data frame
with two columns:
- A
name
column that contains the names in the vector - A
value
column that contains the value associated with each name
Once you turn your long vector into a data frame, you can use the familiar dplyr tools to explore the vector.
enframe()
the result below. I’ve pre-loaded the tidyr package for you.- Use
arrange()
anddesc()
to see which countries had the biggest change. - Click Submit Answer when you are finished.
model1 %>%
map2_dbl(model2, ~ pluck(coef(.y), "year") - pluck(coef(.x), "year")) %>%
round(digits = 2)
model1 %>%
map2_dbl(model2, ~ pluck(coef(.y), "year") - pluck(coef(.x), "year")) %>%
round(digits = 2) %>%
enframe() %>%
arrange(desc(value))
"First, pass the result to enframe()."
"Then pass the result to arrange(). arrange() comes in the dplyr package, you can learn how to use it in the Work with Data primer."
'Finally, recall that enframe() named the variable that you want to arrange over "value". Here, you want to arrange over descending values of value.'
More models
You can use anova()
to compare any number of models. For
example, you could compare three different models for each country:
- A model that predicts life expectancy by year, like those in
model1
. - A model that predicts life expectancy by both year and GDP per
capita, like those in
model2
. - A model that predicts life expectancy by year and GDP per capita
and population, like those in
model3
. (I mademodel3
for you while you weren’t looking)
Here’s the comparison for the US
usa_mod1 <- model1 %>% pluck("United States")
usa_mod2 <- model2 %>% pluck("United States")
usa_mod3 <- model3 %>% pluck("United States")
anova(usa_mod1, usa_mod2, usa_mod3)
Each value in the Pr(>F)
column shows the p-value
that results from comparing the model to the model above it.
Now can we do this for every country?
Mapping over three vectors
Anova can handle three arguments. Can you?
Yes, but not with map3()
as you might think.
map3()
doesn’t exist. Instead purrr offers the
pmap()
function for mapping over three or more
vectors.
pmap()
The syntax of pmap()
is a little different from the
syntax of map()
and map2()
. Instead of
accepting vectors one at a time as arguments, pmap()
expects a single argument that contains a list of vectors and
then a function to apply to the vectors within that list of vectors.
Name each vector in your list of vectors with the name of the
argument that it should map to. pmap()
will match names to
arguments whenever you provide them. So this code, for example, will
round each of the long numbers to a different number of digits, and it
works because round
’s arguments are called x
and digits
.
long_numbers <- list(pi, exp(1), sqrt(2))
digits <- list(2, 3, 4)
pmap(list(x = long_numbers, digits = digits), round)
## [[1]]
## [1] 3.14
##
## [[2]]
## [1] 2.718
##
## [[3]]
## [1] 1.4142
If you do not provide names, pmap()
will map vectors to
arguments by order.
Use pmap()
Got that? Let’s give it a try.
- Use
pmap()
to iterate overmodel1
,model2
, andmodel3
, applyinganova()
as you go. Then click Submit Answer.
Note: anova()
does not use argument names, so you
will need to supply model1, model2, and model3 in the correct
order.
"Be sure to wrap model1, model2, and model3 in a list before/as you pass them to pmap()."
pmap(list(model1, model2, model3), anova)
similarities between pmap(), map2() and map()
pmap()
also tries to resemble map()
and
map2()
wherever possible.
Specifically, pmap()
will pass extra arguments to your
function, just as map()
and map2()
will.
pmap()
also comes with derivative functions that return
output in new formats:
Map function | Map2 function | pmap function | Output |
---|---|---|---|
map() |
map2() |
pmap() |
list |
map_chr() |
map2_chr() |
pmap_chr() |
character vector |
map_dbl() |
map2_dbl() |
pmap_dbl() |
double (numeric) vector |
map_dfc() |
map2_dfc() |
pmap_dfc() |
data frame (output column binded) |
map_dfr() |
map2_dfr() |
pmap_dfr() |
data frame (output row binded) |
map_int() |
map2_int() |
pmap_int() |
integer vector |
map_lgl() |
map2_lgl() |
pmap_lgl() |
logical vector |
walk() |
walk2() |
pwalk() |
returns the input invisibly (used to trigger side effects) |
pmap()
also uses expressions that start with a
~
. However, pmap()
expects you to name the
elements within an expression ..1
, ..2
,
..3
and so on instead of .x
and
.y
. Notice the double dots.
For example, you could return the year coefficients of each model with the code below.
- Click Run Code to see the results.
pmap(
list(model1, model2, model3),
~ c(
mod1 = pluck(coef(..1), "year"),
mod2 = pluck(coef(..2), "year"),
mod3 = pluck(coef(..3), "year")
)
)
pmap() and data frames
Although its name does not suggest it, pmap()
is an
important tool for manipulating data frames.
Recall from the Introduction to Iteration tutorial that a data frame is a list of column vectors that contains a special class (“data.frame”) and a row.names attribute.
rowwise operations
Because a data frame is a list, you can pass a data frame to the
first argument of pmap()
. pmap()
will do
something natural when you do this: it will apply a function to each row
of a data frame (i.e., to corresponding elements of the columns in the
list used to store the data frame).
Take a look at this in action. The code below makes a small data frame of values.
parameters <- data.frame(
n = c(1, 2, 3),
min = c(0, 5, 10),
max = c(1, 6, 11)
)
parameters
- Click Submit Answer to see how
pmap()
appliesrunif()
in a rowwise fashion to simulate three groups of random uniform values.
parameters %>% pmap(runif)
parameters %>% pmap(runif)
Your Turn
Can you use this technique to run your own simulation? R’s
rnorm()
function generates random normal variables and
takes three arguments:
n
- the number of values to generatemean
- the mean of the distribution to draw the values fromsd
- the standard deviation of the distribution to draw values from
- Create a data frame and use
pmap()
to generate three groups of normal values:- Three values drawn from a normal distribution with a mean of zero and a standard deviation of one.
- Two values from a normal distribution with a mean of one and a standard deviation of two.
- One value from a normal distribution with a mean of 10 and a standard deviation of 100.
- Click Submit Answer when you are finished.
parameters <- data.frame(n = c(3, 2, 1),
mean = c(0, 1, 10),
sd = c(1, 2, 100))
parameters %>% pmap(rnorm)
More simulation
Generating small numbers of values is a simple way to demonstrate how
to use pmap()
with a data frame.
Now, let’s push it one step further. What if you wanted to use a different simulation function for each row? For example, you could use:
rnorm()
to generate normal values from the first rowrlnorm()
to generate log normal values from the second rowrcauchy()
to generate cauchy values from the third row
This would be a different type of iteration problem. Instead of iterating over values, you would need to iterate over functions, which is the topic of the last section.
invoke_map()
purrr’s invoke_map()
function is designed to iterate
over a vector of functions followed by a vector of arguments. It will
run the first function with the first argument, the second function with
the second argument, and so on.
So for example, you could use invoke_map()
to generate
sets of standard normal, log normal, and cauchy values. These
are “standard” because in this example we do not change the default
parameter values for each distribution, e.g. mean = 0, sd = 1.
- Click Submit Answer to give it a try.
functions <- list(rnorm, rlnorm, rcauchy)
n <- c(1, 2, 3)
invoke_map(functions, n) # functions %>% invoke_map(n)
functions <- list(rnorm, rlnorm, rcauchy)
n <- c(1, 2, 3)
invoke_map(functions, n) # functions %>% invoke_map(n)
Arguments
As with map()
, map2()
, and
pmap()
, invoke_map()
will pass on extra
arguments to the functions that it runs. For example, you could provide
a mean argument
invoke_map(functions, n, mean = 100)
But before you run this code, let me tell you: it will return an error.
When you pass extra arguments to invoke_map()
,
invoke_map()
will try to match them to each function by
argument name, which is why the last piece of code does not run.
invoke_map(functions, n, mean = 100)
## Warning: `invoke_map()` was deprecated in purrr 1.0.0.
## ℹ Please use map() + exec() instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Error in `map2()`:
## ℹ In index: 3.
## Caused by error:
## ! unused argument (mean = 100)
rnorm()
, rlnorm()
, and
rcauchy()
all take a “mean-like” value and a “sd-like”
value, but they name them different things:
rnorm(n, mean = 0, sd = 1)
rlnorm(n, meanlog = 0, sdlog = 1)
rcauchy(n, location = 0, scale = 1)
But recall that map functions, including invoke_map()
will pass on unnamed extra arguments by position.
- Take advantage of this to create values that are all centered around 100 instead of zero. Then click Submit Answer.
invoke_map(functions, n)
invoke_map(functions, n, 100)
Multiple arguments
If you would like to iterate over multiple arguments, you
can pass invoke_map()
a list of vectors as its second
argument. Each vector should contain a set of arguments tailored to a
single function. invoke_map()
will pass the first vector to
the first function, the second vector to the second, and so on.
- Use
functions
,parameters
, andinvoke_map
to generate three normal values, two log normal values, and one cauchy value.
args <- list(norm = c(3, mean = 0, sd = 1),
lnorm = c(2, meanlog = 1, sdlog = 2),
cauchy = c(1, location = 10, scale = 100))
invoke_map(functions, args)
## [[1]]
## [1] 0.009629572 -0.676311274 -0.240454242
##
## [[2]]
## [1] 3.277851 7.171651
##
## [[3]]
## [1] -43.16578
invoke_map() variants
Lastly, invoke_map()
comes with most of the familiar map
variants that return output in new formats:
Map function | Map2 function | pmap function | invoke function | Output |
---|---|---|---|---|
map() |
map2() |
pmap() |
invoke_map() |
list |
map_chr() |
map2_chr() |
pmap_chr() |
invoke_map_chr() |
character vector |
map_dbl() |
map2_dbl() |
pmap_dbl() |
invoke_map_dbl() |
double (numeric) vector |
map_dfc() |
map2_dfc() |
pmap_dfc() |
invoke_map_dfc() |
data frame (output column binded) |
map_dfr() |
map2_dfr() |
pmap_dfr() |
invoke_map_dfr() |
data frame (output row binded) |
map_int() |
map2_int() |
pmap_int() |
invoke_map_int() |
integer vector |
map_lgl() |
map2_lgl() |
pmap_lgl() |
invoke_map_lgl() |
logical vector |
walk() |
walk2() |
pwalk() |
returns the input invisibly (used to trigger side effects) |
A few final functions
To be a true purrr master, you should know that a couple more mapping functions come with purrr:
lmap()
, which works exclusively with functions that take listsimap()
, which applies a function to each element of a vector, and its indexmap_at()
andmap_if()
, which only map a function to specific elements of a listmodify()
,modify_at()
,modify_if()
, andmodify_depth()
, which return a modified version of the original data.
You can learn more about each at their help pages or purrr.tidyverse.org.
Congratulations
You’ve finished the Multiple Vectors tutorial, and you now have a wealth of new information to think about.
When you are ready, the List Column tutorial will show you how to integrate your purrr skills with your dplyr skills to create an unusually well organized data science workflow.