Map Shortcuts

## Warning in file(filename, "r", encoding = encoding): URL
## 'https://metrics.rstudioprimers.com/learnr/installClient': status was 'Couldn't
## resolve host name'

[1] “Warning: An error occurred with the client code.”

Why More?

The map functions come with two useful shortcuts:

an easy way to extract elements from nested lists
an easy way to apply non-function expressions, like x + 1, to the elements of a vector

This tutorial will teach you how to use the shortcuts as you work through a data analysis case study. This first section will introduce the case study.

Data

us is a list of statistics about the US, measured every five years from 1952 to 2007. The data is a reformatted portion of the gapminder data set, which comes in the gapminder package.

Click Submit Answer to see the contents of us. Can you spot the life expectancy for a US citizen in 1952? How about in 2007?

us

us

Can you use the values of lifeExp to compute the rise in life expectancy in the US from 1952 to 2007?

This is a simple task, but we will use it as the basis of an iteration project.

Data Wrangling

First, you need to access the values of lifeExp. The values are difficult to work with because they are stored inside of a list (us).

You can extract the values with purrr’s pluck() function.

pluck()

pluck() extracts an element from a list, by name or by position. It returns the contents of the element as they are, without surrounding them in a new list.

For example, the two pluck() calls below both extract the first element of list1.

list1 <- list(
  numbers = 1:3,
  letters = c("a", "b", "c"),
  logicals = c(TRUE, FALSE)
)

pluck(list1, 1) # list1 %>% pluck(1)
pluck(list1, "numbers") # list1 %>% pluck("numbers")

## [1] 1 2 3

Use pluck() to extract lifeExp from us. Then click Submit Answer. I’ve pre-loaded the purrr package for you.

"The name of the element that you want to pluck is lifeExp."

"Don't forget to put quotes around lifeExp when you pass it to pluck()."

us %>% pluck("lifeExp")

Rise in lifeExp

pluck() returns the values of lifeExp as a vector, which makes it easy to manipulate the values with two new functions:

last() returns the last element of a vector.
first() returns the first element of a vector.

Both come in the dplyr package, which I have pre-loaded for you.

Use last() and first() to compute the rise in life expectancy in the US from 1952 to 2007. Then click Submit Answer.

lifeExp <- us %>% pluck("lifeExp")

"The change in life expectancy will be the last value of lifeExp minus the first value of lifeExp."

lifeExp <- us %>% pluck("lifeExp")
last(lifeExp) - first(lifeExp)

Recap

You computed that the life expectancy of US citizens increased by 9.8 years between 1952 and 2007. To do this, you:

Plucked the lifeExp values from us with pluck()
Calculated the change in life expectancy with last(lifeExp) - first(lifeExp)

It’d be nice to do this for other countries as well. And you can. gap_list contains lists of statistics for 142 countries. Click Submit Answer to see.

gap_list

To apply your two step process to each country, you will need to use map() and each of its shortcuts.

Click Continue when you are ready to begin.

Shortcuts

To map your work to every country, you will need to do two things for each value of gap_list:

Pluck the values of lifeExp from the sub-list that contains them
Compute the change in life expectancy with last(lifeExp) - first(lifeExp)

Each step will reveal new aspects of the map functions.

Step 1 will demonstrate two useful shortcuts for extracting sub-elements
Step 2 will demonstrate how to make and apply expressions with map functions

Step 1

Map pluck() over gap_list to return the lifeExp vectors of each country in gap_list.

"Recall that map() takes a vector to iterate over, a function to apply to each element of the vector, and then any arguments that that function needs."

gap_list %>% map(pluck, "lifeExp")

shortcuts

Before you go on, consider what you did.

gap_list contained 142 sub-lists, and each sub-list contained the same set of elements. This set up is more common than you may think.

You pulled out the same element from each of those sub-lists. This operation is more common than you might think.

In fact, this operation is so common that map() provides two shortcuts to help you do it.

“name” shortcut

First, you can pass map() the name of an element to extract as a character string. When map() receives a character string instead of a function, map() will return the element of each sub-list whose name matches the character string.

params <- list(
  "norm1" = list("mu" = 0, "sd" = 1),
  "norm2" = list("mu" = 1, "sd" = 1),
  "norm3" = list("mu" = 2, "scale" = 1)
)
map(params, "mu") # params %>% map("mu")

## $norm1
## [1] 0
## 
## $norm2
## [1] 1
## 
## $norm3
## [1] 2

Change the code below to use the “name” shortcut. Then click Submit Answer. Do you get the same result?

gap_list %>% map(pluck, "lifeExp")

"Did you remember to remove pluck from the map() call?"

"The element that you want to extract from each sub-list is named lifeExp."

gap_list %>% map("lifeExp")

Integers shortcut

Instead of passing map() a character string that identifies the element to extract by name, you can pass map() an integer that identifies the element to extract by position, e.g.

params <- list(
  "norm1" = list("mu" = 0, "sd" = 1),
  "norm2" = list("mu" = 1, "sd" = 1),
  "norm3" = list("mu" = 2, "scale" = 1)
)
map(params, 1) # params %>% map(1)

## $norm1
## [1] 0
## 
## $norm2
## [1] 1
## 
## $norm3
## [1] 2

Extract lifeExp with the integer shortcut. Then click Submit Answer.

gap_list

"Use map() and pass it a number."

"Do you remember the position of lifeExp within each sub-list of gap_list? It is the second element."

gap_list %>% map(2)

Data frames

The best thing about these shortcuts is that they also work when your vector contains data frames. You can use the shortcuts to pull out the same column of each data frame.

For example, gap_dfs contains the same information as gap_lists, but it organizes each sub-list into data frame, which is more user-friendly.

gap_dfs %>% pluck(1)

Use the “name” shortcut to extract the lifeExp column of each data frame in gap_dfs. Then click Submit Answer. What happens?

gap_dfs

gap_dfs %>% map("lifeExp")

One more thing

There is one more thing that you should know about these shortcuts. In the Map tutorial, you learned that map() is part of a larger family of map functions:

Function	Output
`map()`	list
`map_chr()`	character vector
`map_dbl()`	double (numeric) vector
`map_dfc()`	data frame (output column binded)
`map_dfr()`	data frame (output row binded)
`map_int()`	integer vector
`map_lgl()`	logical vector
`walk()`	returns the input invisibly (used to trigger side effects)

The “name” and integer shortcuts will work with all of these functions. So will the expressions that you are about to learn. Speaking of that, let’s get back to your case study.

Expressions

Recall your goals. For each value of gap_list, you want to:

Pluck the values of lifeExp from the sub-list that contains them
Compute the change in life expectancy with last(lifeExp) - first(lifeExp)

Step 1 was easy and you learned two shortcuts along the way. Step 2 will be harder.

Step 2

You know that you can use map() to apply a function to each element of a list, but last(lifeExp) - first(lifeExp) isn’t a function. It is an expression that uses two functions. How can you pass it to map()?

With a map expression.

A pattern

At the heart of map() is the pattern:

For each element, do _____

When you fill in the blank with a function, or a character string, or an integer, map() knows just what to do.

But you can also fill in the blank with an arbitrary R expression (like last(lifeExp) - first(lifeExp)) if you follow two rules.

Rule 1 - ~

First, place a ~ at the start of the expression. This alerts map() that you are giving it an expression to run:

For each element, do ~last(lifeExp) - first(lifeExp)

Rule 2 - .x

Second, replace the name of the thing to manipulate with .x wherever it appears in your expression.

For each element, do ~last(.x) - first(.x)

Or more simply,

For each .x, do ~last(.x) - first(.x)

This tells map() where to use the element within your expression. If an expression uses each element multiple times, you will need to insert multiple .xs into your expression.

How does this look with the map() function? You pass the expression to map() exactly as you would pass a function.

In this example, the expression plucks the two values in each sub-list of params and uses them to generate five random normal values.

params <- list(
  "norm1" = list("mu" = 0, "sd" = 1),
  "norm2" = list("mu" = 1, "sd" = 1),
  "norm3" = list("mu" = 2, "scale" = 1)
)

map(params, ~ rnorm(5, mean = pluck(.x, 1), sd = pluck(.x, 2)))

## $norm1
## [1]  1.7611691 -1.9087050  0.9397371 -1.7454697 -0.2741927
## 
## $norm2
## [1]  1.022036 -0.290552  1.604339  1.186206  1.584581
## 
## $norm3
## [1] 0.1444424 2.4429881 1.7164643 1.2980527 1.9737017

# params %>% map(~rnorm(5, mean = pluck(.x, 1), sd = pluck(.x, 2)))

Now it is your turn. To finish your code:

Turn last(lifeExp) - first(lifeExp) into an expression.
Use map() to apply the expression to each element returned by the code below.
Click Submit Answer.

gap_list %>%
  map("lifeExp")

"Use %>% to add a second map() call to your code."

"Remember the two rules for map expressions: 1) begin with a ~, 2) refer to the elements with .x. Do not surround the expression with quotes."

gap_list %>%
  map("lifeExp") %>%
  map(~ last(.x) - first(.x))

Beyond map()

Change the code below to return a double vector in the last step. Will the expression still work? Click Submit Answer to find out.

gap_list %>%
  map("lifeExp") %>%
  map(~ last(.x) - first(.x))

gap_list %>%
  map("lifeExp") %>%
  map_dbl(~ last(.x) - first(.x))

A small payoff

It would be rewarding to learn which countries had the largest and smallest changes in life expectancy. You can find out with three new functions:

enframe() turns a named vector into a data frame with two columns: name and value.
```
named_vec <- c(uno = 1, dos = 2, tres = 3)
enframe(named_vec)
```
enframe() comes in the tidyr package, which I’ve pre-loaded for you.

top_n() returns the n rows that have the highest value of a weighting variable.
```
top_n(mtcars, n = 5, wt = mpg)
```
(Don’t be fooled: these are the rows with the five highest values of mpg. top_n() retrieves them but does not sort them by mpg).

You can combine top_n() with desc() to retrieve the lowest n values.
```
top_n(mtcars, n = 5, wt = desc(mpg))
```
Both top_n() and desc() come in the dplyr package, which I have pre-loaded for you.

The most extreme changes in life expectancy

Extend your code with enframe() and top_n() to retrieve the five countries with the most positve change in life expectancy. Then click Submit Answer.

gap_list %>%
  map("lifeExp") %>%
  map_dbl(~ last(.x) - first(.x))

"To begin pipe your results into enframe()."

"...i.e. add %>% enframe() to the end of your code."

"Then pipe that result to top_n(). You will need to set the n and wt arguments of top_n()."

'enframe() named the column that you want to wt by "value".'

gap_list %>%
  map("lifeExp") %>%
  map_dbl(~ last(.x) - first(.x)) %>%
  enframe() %>%
  top_n(5, wt = value)

Add desc() to your code to retrieve the five countries with the least positive increase in life expectancy. Then click Submit Answer.

gap_list %>%
  map("lifeExp") %>%
  map_dbl(~ last(.x) - first(.x)) %>%
  enframe() %>%
  top_n(5, wt = value)

"This time, you want to order by descending values of value."

gap_list %>%
  map("lifeExp") %>%
  map_dbl(~ last(.x) - first(.x)) %>%
  enframe() %>%
  top_n(5, wt = desc(value))

Best practices

You can do almost anything with map() if you use the right expression. The best way to write an expression is to:

Pluck a single element from your vector
Write code that works correctly for that element
Transform the code into an expression to use with map()

The alternative is to write an expression in your head, and then see if it works. Too often it won’t.

Models

Let’s put this workflow into practice.

Another way to quantify the change in life expectancy over time is to fit a simple linear model to the data. The slope of the model will be how fast life expectancy increased per year, on average.

Step 1 - Pluck a single element from your vector

Pluck the “United States” element from gap_dfs. Save it as usa, so you don’t overwrite us. Then click Submit Answer.

"You do not need to use map() here, just pluck()."

usa <- gap_dfs %>% pluck("United States")

Step 2 - Write code that works correctly for that element

You may not know how to fit a linear model with R, so I assure you that the code below will work.

Click Submit Answer to double check the code runs with your test case.

lm(lifeExp ~ year, data = usa)

"Leave the code as is and click Submit Answer."

lm(lifeExp ~ year, data = usa)

Step 3 - Transform the code into an expression to use with map()

Turn your code into an expression and map() it to each element of gap_dfs. Then click Submit Answer.

lm(lifeExp ~ year, data = usa)

"Begin with gap_dfs and pass it to map()."

"The next argument of map() should be your function, rewritten as an expression. Do you remember the two rules for writing map expressions?"

"Rule 1: begin the expression with a ~."

"Rule 2: replace every appearance of the element to use with .x. In our code, each element would iteratively take the place of usa."

gap_dfs %>%
  map(~ lm(lifeExp ~ year, data = .x))

A model that accounts for GDP

Use the best practices to fit a model that accounts for GDP to each country. Your model should use the formula lifeExp ~ year + gdpPercap inside of lm(). The best practices are:
1. Pluck a single element from your vector (in this case usa)
2. Write code that works correctly for that element
3. Transform the code into an expression to use with map()
When you are finished fitting the models, click Submit Answer.

gap_dfs %>%
  map(~ lm(lifeExp ~ year + gdpPercap, data = .x))

The End

Congratulations on making it to the end. We will use the two lists of models that you created in the next tutorial, which teaches a new dimension of map.

In that tutorial, you will learn how to iterate over two or more vectors at the same time.