Multiple Vectors

## Warning in file(filename, "r", encoding = encoding): URL
## 'https://metrics.rstudioprimers.com/learnr/installClient': status was 'Couldn't
## resolve host name'

[1] “Warning: An error occurred with the client code.”

Introduction

You’ve learned how to map any expresssion over the elements of a single vector.

Now you’ll learn how to map an expression over the elements of two or more vectors at once.

A case study

To see how useful this technique can be, let’s use multi-vector mapping to compare two lists of models. At the end of the Map Shortcuts tutorial, you fit two models for every country in gap_dfs. This created two lists:

A list of models that predicted life expectancy by year, which I saved as model1.
```
model1 <- gap_dfs %>%
  map(~ lm(lifeExp ~ year, data = .x))
```
A list of models that predicted life expectancy by both year and GDP per capita, which I saved as model2.
```
model2 <- gap_dfs %>%
  map(~ lm(lifeExp ~ year + gdpPercap, data = .x))
```

But which model is better? In other words, does adding GDP per capita improve your predictions for life expectancy?

anova()

One way to tell is with the anova() function. anova() takes two models and tests whether the second model outperforms the first.

Let’s try it. You can use your two models for the United States

usa_mod1 <- model1 %>% pluck("United States")
usa_mod2 <- model2 %>% pluck("United States")

anova(usa_mod1, usa_mod2)

The anova results suggest that adding GDP per capita did not improve your predictions for the US.

How can you tell? The very last number in the anova table is a p-value (here 0.1424). If the number is above 0.05, there is not enough evidence to suggest that the second model outperforms the first: you can ascribe the difference between the two to random chance.

Beyond the US

So GDP per capita doesn’t improve your predictions for the United States, but what about for the other countries?

You can iterate through all of the countries to find out, but this is a new type of iteration problem. At each step, anova() will require one element from model1 and one element from model2. You’ll need to simultaneously iterate over two lists. But how?

map2()

Enter map2().

Syntactically, map2() behaves like map(), but it takes two vectors as arguments before it takes a function (remember that lists are a type of vector).

At each step of the iteration, map2() will pass an element from the first vector to the first argument of the function. It will pass an element from the second vector to the second argument of the function.

Use map2() to run an anova on each pair of models in model1 and model2. Then click Submit Answer.

map2(model1, model2, anova)

map2() and map()

map2() is similar to map in almost every way. For example, you can pass extra arguments to your function as extra arguments to map2(), e.g.

# map2 will pass test = "Chisq" to the anova function
map2(model1, model2, anova, test = "Chisq")

Like map(), map2() returns its results as a list, but it also comes with variants that will return the result in other formats. As with map(), you should use the variant of map2() that will return your results in the format that you want:

Map function	Map2 function	Output
`map()`	`map2()`	list
`map_chr()`	`map2_chr()`	character vector
`map_dbl()`	`map2_dbl()`	double (numeric) vector
`map_dfc()`	`map2_dfc()`	data frame (output column binded)
`map_dfr()`	`map2_dfr()`	data frame (output row binded)
`map_int()`	`map2_int()`	integer vector
`map_lgl()`	`map2_lgl()`	logical vector
`walk()`	`walk2()`	returns the input invisibly (used to trigger side effects)

Map2 expressions

map2() also uses shortcuts like map(). While the “name” and integer shortcuts do not make sense for map2(), the expressions shortcut does.

To make an expression for map2():

Begin the expression with ~
Refer to elements from the first vector as .x
Refer to elements from the second vectore as .y

Only the last step is different from map().

Try an expression

How much does the year coefficient change between model1 and model2 for the US? The code below computes the answer.

pluck(coef(usa_mod2), "year") - pluck(coef(usa_mod1), "year")

## [1] 0.07383349

Can you do the same thing for every country?

Turn the code above into an expression and map it over the model1 and model2 lists.
Return the results as a double vector.
Round the results to two decimal places.
Click Submit Answer when you are finished.

model1 %>%
  map2(model2, anova)

"First, replace anova with your expression."

"Second, replace map2() with a variant that will return a double (numeric) vector."

"Third, round the results with %>% round(digits = 2)."

model1 %>%
  map2_dbl(model2, ~ pluck(coef(.y), "year") - pluck(coef(.x), "year")) %>%
  round(digits = 2)

From named vector to data frame

You can make long named vectors easier to work with, with the enframe() function from the tidyr package. enframe() takes a named vector and returns a data frame with two columns:

A name column that contains the names in the vector
A value column that contains the value associated with each name

Once you turn your long vector into a data frame, you can use the familiar dplyr tools to explore the vector.

enframe() the result below. I’ve pre-loaded the tidyr package for you.
Use arrange() and desc() to see which countries had the biggest change.
Click Submit Answer when you are finished.

model1 %>%
  map2_dbl(model2, ~ pluck(coef(.y), "year") - pluck(coef(.x), "year")) %>%
  round(digits = 2)

model1 %>%
  map2_dbl(model2, ~ pluck(coef(.y), "year") - pluck(coef(.x), "year")) %>%
  round(digits = 2) %>%
  enframe() %>%
  arrange(desc(value))

"First, pass the result to enframe()."

"Then pass the result to arrange(). arrange() comes in the dplyr package, you can learn how to use it in the Work with Data primer."

'Finally, recall that enframe() named the variable that you want to arrange over "value". Here, you want to arrange over descending values of value.'

More models

You can use anova() to compare any number of models. For example, you could compare three different models for each country:

A model that predicts life expectancy by year, like those in model1.
A model that predicts life expectancy by both year and GDP per capita, like those in model2.
A model that predicts life expectancy by year and GDP per capita and population, like those in model3. (I made model3 for you while you weren’t looking)

Here’s the comparison for the US

usa_mod1 <- model1 %>% pluck("United States")
usa_mod2 <- model2 %>% pluck("United States")
usa_mod3 <- model3 %>% pluck("United States")
anova(usa_mod1, usa_mod2, usa_mod3)

Each value in the Pr(>F) column shows the p-value that results from comparing the model to the model above it.

Now can we do this for every country?

Mapping over three vectors

Anova can handle three arguments. Can you?

Yes, but not with map3() as you might think. map3() doesn’t exist. Instead purrr offers the pmap() function for mapping over three or more vectors.

pmap()

The syntax of pmap() is a little different from the syntax of map() and map2(). Instead of accepting vectors one at a time as arguments, pmap() expects a single argument that contains a list of vectors and then a function to apply to the vectors within that list of vectors.

Name each vector in your list of vectors with the name of the argument that it should map to. pmap() will match names to arguments whenever you provide them. So this code, for example, will round each of the long numbers to a different number of digits, and it works because round’s arguments are called x and digits.

long_numbers <- list(pi, exp(1), sqrt(2))
digits <- list(2, 3, 4)
pmap(list(x = long_numbers, digits = digits), round)

## [[1]]
## [1] 3.14
## 
## [[2]]
## [1] 2.718
## 
## [[3]]
## [1] 1.4142

If you do not provide names, pmap() will map vectors to arguments by order.

Use pmap()

Got that? Let’s give it a try.

Use pmap() to iterate over model1, model2, and model3, applying anova() as you go. Then click Submit Answer.

Note: anova() does not use argument names, so you will need to supply model1, model2, and model3 in the correct order.

"Be sure to wrap model1, model2, and model3 in a list before/as you pass them to pmap()."

pmap(list(model1, model2, model3), anova)

similarities between pmap(), map2() and map()

pmap() also tries to resemble map() and map2() wherever possible.

Specifically, pmap() will pass extra arguments to your function, just as map() and map2() will.

pmap() also comes with derivative functions that return output in new formats:

Map function	Map2 function	pmap function	Output
`map()`	`map2()`	`pmap()`	list
`map_chr()`	`map2_chr()`	`pmap_chr()`	character vector
`map_dbl()`	`map2_dbl()`	`pmap_dbl()`	double (numeric) vector
`map_dfc()`	`map2_dfc()`	`pmap_dfc()`	data frame (output column binded)
`map_dfr()`	`map2_dfr()`	`pmap_dfr()`	data frame (output row binded)
`map_int()`	`map2_int()`	`pmap_int()`	integer vector
`map_lgl()`	`map2_lgl()`	`pmap_lgl()`	logical vector
`walk()`	`walk2()`	`pwalk()`	returns the input invisibly (used to trigger side effects)

pmap() also uses expressions that start with a ~. However, pmap() expects you to name the elements within an expression ..1, ..2, ..3 and so on instead of .x and .y. Notice the double dots.

For example, you could return the year coefficients of each model with the code below.

Click Run Code to see the results.

pmap(
  list(model1, model2, model3),
  ~ c(
    mod1 = pluck(coef(..1), "year"),
    mod2 = pluck(coef(..2), "year"),
    mod3 = pluck(coef(..3), "year")
  )
)

pmap() and data frames

Although its name does not suggest it, pmap() is an important tool for manipulating data frames.

Recall from the Introduction to Iteration tutorial that a data frame is a list of column vectors that contains a special class (“data.frame”) and a row.names attribute.

rowwise operations

Because a data frame is a list, you can pass a data frame to the first argument of pmap(). pmap() will do something natural when you do this: it will apply a function to each row of a data frame (i.e., to corresponding elements of the columns in the list used to store the data frame).

Take a look at this in action. The code below makes a small data frame of values.

parameters <- data.frame(
  n = c(1, 2, 3),
  min = c(0, 5, 10),
  max = c(1, 6, 11)
)
parameters

Click Submit Answer to see how pmap() applies runif() in a rowwise fashion to simulate three groups of random uniform values.

parameters %>% pmap(runif)

parameters %>% pmap(runif)

Your Turn

Can you use this technique to run your own simulation? R’s rnorm() function generates random normal variables and takes three arguments:

n - the number of values to generate
mean - the mean of the distribution to draw the values from
sd - the standard deviation of the distribution to draw values from

Create a data frame and use pmap() to generate three groups of normal values:
1. Three values drawn from a normal distribution with a mean of zero and a standard deviation of one.
2. Two values from a normal distribution with a mean of one and a standard deviation of two.
3. One value from a normal distribution with a mean of 10 and a standard deviation of 100.
Click Submit Answer when you are finished.

parameters <- data.frame(n = c(3, 2, 1), 
                 mean = c(0, 1, 10),
                 sd = c(1, 2, 100))

parameters %>% pmap(rnorm)

More simulation

Generating small numbers of values is a simple way to demonstrate how to use pmap() with a data frame.

Now, let’s push it one step further. What if you wanted to use a different simulation function for each row? For example, you could use:

rnorm() to generate normal values from the first row
rlnorm() to generate log normal values from the second row
rcauchy() to generate cauchy values from the third row

This would be a different type of iteration problem. Instead of iterating over values, you would need to iterate over functions, which is the topic of the last section.

invoke_map()

purrr’s invoke_map() function is designed to iterate over a vector of functions followed by a vector of arguments. It will run the first function with the first argument, the second function with the second argument, and so on.

So for example, you could use invoke_map() to generate sets of standard normal, log normal, and cauchy values. These are “standard” because in this example we do not change the default parameter values for each distribution, e.g. mean = 0, sd = 1.

Click Submit Answer to give it a try.

functions <- list(rnorm, rlnorm, rcauchy)
n <- c(1, 2, 3)

invoke_map(functions, n) # functions %>% invoke_map(n)

functions <- list(rnorm, rlnorm, rcauchy)
n <- c(1, 2, 3)

invoke_map(functions, n) # functions %>% invoke_map(n)

Arguments

As with map(), map2(), and pmap(), invoke_map() will pass on extra arguments to the functions that it runs. For example, you could provide a mean argument

invoke_map(functions, n, mean = 100)

But before you run this code, let me tell you: it will return an error.

When you pass extra arguments to invoke_map(), invoke_map() will try to match them to each function by argument name, which is why the last piece of code does not run.

invoke_map(functions, n, mean = 100)

## Warning: `invoke_map()` was deprecated in purrr 1.0.0.
## ℹ Please use map() + exec() instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Error in `map2()`:
## ℹ In index: 3.
## Caused by error:
## ! unused argument (mean = 100)

rnorm(), rlnorm(), and rcauchy() all take a “mean-like” value and a “sd-like” value, but they name them different things:

rnorm(n, mean = 0, sd = 1)
rlnorm(n, meanlog = 0, sdlog = 1)
rcauchy(n, location = 0, scale = 1)

But recall that map functions, including invoke_map() will pass on unnamed extra arguments by position.

Take advantage of this to create values that are all centered around 100 instead of zero. Then click Submit Answer.

invoke_map(functions, n)

invoke_map(functions, n, 100)

Multiple arguments

If you would like to iterate over multiple arguments, you can pass invoke_map() a list of vectors as its second argument. Each vector should contain a set of arguments tailored to a single function. invoke_map() will pass the first vector to the first function, the second vector to the second, and so on.

Use functions, parameters, and invoke_map to generate three normal values, two log normal values, and one cauchy value.

args <- list(norm = c(3, mean = 0, sd = 1), 
             lnorm = c(2, meanlog = 1, sdlog = 2),
             cauchy = c(1, location = 10, scale = 100))

invoke_map(functions, args)

## [[1]]
## [1] -0.484585114 -1.127470348  0.001633088
## 
## [[2]]
## [1]  0.6833004 79.0483641
## 
## [[3]]
## [1] -32.97893

invoke_map() variants

Lastly, invoke_map() comes with most of the familiar map variants that return output in new formats:

Map function	Map2 function	pmap function	invoke function	Output
`map()`	`map2()`	`pmap()`	`invoke_map()`	list
`map_chr()`	`map2_chr()`	`pmap_chr()`	`invoke_map_chr()`	character vector
`map_dbl()`	`map2_dbl()`	`pmap_dbl()`	`invoke_map_dbl()`	double (numeric) vector
`map_dfc()`	`map2_dfc()`	`pmap_dfc()`	`invoke_map_dfc()`	data frame (output column binded)
`map_dfr()`	`map2_dfr()`	`pmap_dfr()`	`invoke_map_dfr()`	data frame (output row binded)
`map_int()`	`map2_int()`	`pmap_int()`	`invoke_map_int()`	integer vector
`map_lgl()`	`map2_lgl()`	`pmap_lgl()`	`invoke_map_lgl()`	logical vector
`walk()`	`walk2()`	`pwalk()`		returns the input invisibly (used to trigger side effects)

A few final functions

To be a true purrr master, you should know that a couple more mapping functions come with purrr:

lmap(), which works exclusively with functions that take lists
imap(), which applies a function to each element of a vector, and its index
map_at() and map_if(), which only map a function to specific elements of a list
modify(), modify_at(), modify_if(), and modify_depth(), which return a modified version of the original data.

You can learn more about each at their help pages or purrr.tidyverse.org.

Congratulations

You’ve finished the Multiple Vectors tutorial, and you now have a wealth of new information to think about.

When you are ready, the List Column tutorial will show you how to integrate your purrr skills with your dplyr skills to create an unusually well organized data science workflow.