Environments and Scoping

## Warning in file(filename, "r", encoding = encoding): URL
## 'https://metrics.rstudioprimers.com/learnr/installClient': status was 'Couldn't
## resolve host name'

[1] “Warning: An error occurred with the client code.”

Environments

Thanks to the previous tutorials, you can write and execute functions, but can you predict how a function will work?

To do that precisely, you need to know how R will look up the values of objects that appear in the function.

Consider the code below, which defines and then calls the function foo(). With its last line, foo() returns the value of z, but what will the value be?

z <- 1

foo <- function(z = 2) {
  z <- 3
  z
}

foo(z = 4)

Scoping rules and environments

This tutorial will teach you R’s rules for looking up values.

The rules that a language uses to look up values are known as scoping rules, and R’s scoping rules are closely tied to a new type of R object: the environment. So let’s start there.

Before we begin, let me assure you: this topic is worth studying, even though it is unusually technical. R becomes much more predictable when you know how R looks up objects and their values.

The big picture

An R environment is a list of object names paired with the values contained in the objects. Each environment is linked to another environment, and together these links form a chain of environments.

Every object in R is saved somewhere in an environment. When R needs to look up the value of an object, R searches through the chain of environments until R finds the object and its value.

That’s the big picture. Let’s look at the details.

globalenv()

globalenv() is a function that returns an R environment. In fact, globalenv() returns a very special R environment named the global environment. You’ll learn more about the global environment later, but first let’s take a look at how R displays environments.

Click Submit Answer to run the code below.

globalenv()

globalenv()

`ls.str()`

I’ve saved some objects in the global environment. Would you like to see them? You can display the contents of an R environment with ls.str().

To see what I’ve saved in the global environment, run ls.str() on the code below. Then click Submit Answer.

globalenv()

"Pass `globalenv()` to `ls.str()`."

ls.str(globalenv())

Parent environments

In addition to name-value pairs, each environment contains a link to another environment. This second environment is called the parent environment of the first environment.

The relationship between an environment and its parent is a special relationship that we will return to in a moment. But first:

Call parent.env() on globalenv() to see which environment is the parent environment of the global environment. Then click Submit Answer.

parent.env(globalenv())

parenvs()

The parent of an environment will also have a parent environment. And that parent environment will have a parent environment, and so on. Together the parents will form a chain of environments that ends with the empty environment, which is the only environment in R that does not have a parent.

Every environment can trace its lineage to the empty environment through a set of parents.

You can see an environment’s lineage with the parenvs() function that comes in the pryr package. parenvs(e, all = TRUE) will display the chain of parent environments that leads to the empty environment from whichever environment you pass to the e argument.

Use parenvs() to see the “lineage” of the global environment. Don’t forget to include the argument all = TRUE. Then click Submit Answer.

library(pryr)

"Set e = globalenv()."

"Don't forget the argument all = TRUE."

library(pryr)
parenvs(globalenv(), all = TRUE)

Summary

You’ve learned four things about environments:

An environment is a list of name-value pairs that define the values of R objects
Each environment contains a link to a parent environment (with the exception of the empty environment).
Each environment is linked to the empty environment by a chain of parent environments.
Every object in R is stored in an environment

Scoping rules

The active environment

At any moment in time, R is working closely with a single environment, which I will call the active environment. If I say that R is running code in an environment, I mean that the environment is the active environment when R runs the code.

The active environment is special in two ways:

If code creates a new object, R will store the object in the active environment.
If code calls an object, R will look for the object in the active environment.

What if R cannot find the value in the active environment?

Search path

If R cannot find an object in the active environment, then R will look in the parent of the active environment, and then the parent of the parent, and so on until R either finds the object or reaches the empty environment.

If R gets to the empty environment before it finds the object, R will return the familiar error message: Error: object not found.

In this way, the chain of environments from the active environment to the empty environment forms R’s search path.

environment()

Which environment is active will change from time to time depending on what R is doing (which means that the search path will change as well).

You can use the environment() function to return the current active environment.

Type environment() in the exercise chunk below to return the label of the active environment. Then click Submit Answer.

"Run `environment()` with no arguments."

environment()

The global environment

The global environment plays a very important role in R because it is the active environment when you run code from the command line of an R console, like the RStudio IDE.

As a result, the global environment acts as your personal workspace: it is where R will save the objects that you create at the command line, and it is where R will look for the objects that you call at the command line.

Other environments

Other R environments include:

the empty environment
package environments (which contain all of the objects loaded by a package)
temporary environments (that R creates to do certain tasks, like execute the exercise chunks in this tutorial)

You can see each type of environment in the search path from the active environment of the exercise chunk below.

Use the exercise chunk to display the search path from the active environment (i.e. the list of parents that connects the active environment() to the empty environment). Then click Submit Answer.

"Use parenvs()."

"Set e = environment(), which returns the active environment)."

"Don't forget to include all = TRUE."

parenvs(e = environment(), all = TRUE)

Environments in tutorials

Whenever you run an exercise chunk in a tutorial, R creates a temporary environment to run your code in. The parent of this environment is the global environment.

Compare this to what will happen when you run code in an R console. There, the code that you run at the command line will be executed in the global environment (i.e. the global environment will be active).

This is a small difference, but since we’re talking about environments today, I want you to be aware of it.

Summary

The chain of parent environments from the active environment to the empty environment creates a search path that R uses to look for objects.

R first looks for objects in the current active environment.
If R cannot find an object in the active environment, R looks for the object in the parent of the active environment. R then looks in the parent of the parent, and so on until R finds the object or comes to the empty environment.
If R gets to the empty environment before it finds an object, R will return the familiar error message: Error: object not found.

These three rules are R’s scoping rules.

Keep in mind

R’s search path will vary based on which environment is active when you call parenvs() (or begin searching).

The search path will also vary based on which packages you have loaded. The environments of loaded packages appear between the global environment and the empty environment in the reverse order of which the packages were loaded.

Now that you know how R looks up objects, let’s look at what can go wrong.

Overwriting and Masking

Overwriting

In the exercise below, I’ve saved an object named x to the active environment.

Call x to see its value. Then click Submit Answer.

x is also stored in the active environment of the exercise chunk below (you can check if you like).

What would happen if you run the code in the chunk? (Click Start Over if the code is no longer there.)

Make a prediction then click Submit Answer. What happened?

x <- "oops"
x

x <- "oops"
x

Masking

This time, let’s save x <- "password123" to the global environment (I’ll no longer add x to the active environments for the exercise chunks).

Check the contents of the global environment to see if x is there. Then click Submit Answer.

"Use ls.str()"

"Run ls.str() on globalenv()"

ls.str(globalenv())

Now what would happen if you run the code below?

Make a prediction then click Submit Answer. What happened?

x <- "oops"
x

x <- "oops"
x

This behavior is called masking. Masking occurs whenever two objects with the same name exist in the search path. When masking happens, R will always use the object that appears first in the search path.

Masking can be confusing if you do not realize that it is happening.

Masking and packages

R will help you detect one source of masking: R will return an informative message if you load a package that contains objects that mask other objects. Here, R tells us that the date() function in the lubridate package masks the date() function in the base package.

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

::

You can get around package masking with the :: syntax. To use ::, write a package name followed by :: and then an object name. R will look for the object in the package environment, circumventing the search path and any masking conflicts.

lubridate::date

## function (x) 
## {
##     UseMethod("date")
## }
## <bytecode: 0x64264d2f68f8>
## <environment: namespace:lubridate>

base::date

## function () 
## .Internal(date())
## <bytecode: 0x64264a64a038>
## <environment: namespace:base>

Summary

Overwriting happens when you assign a new value to a name that already exists in the active environment, replacing the old value.
Masking happens when you create an object that has the same name as an object further down the search path, hiding the object.

Back to functions

You now know everything you need to know to understand R’s scoping and execution rules for functions. R must follow a set of rules to execute the code in functions safely, without accidentally masking or overwriting existing variables.

Function Rules

When you call a function, R executes the code that is saved in the body of the function. To execute that code safely:

R creates a fresh environment to run the code in. I’ll call this environment the execution environment.
R sets the parent of the execution environment to the function’s enclosing environment, which is the environment where the function was first defined. This ensures that the function will use the same, predictable search path each time that it runs.
When R finishes running the function, R returns the result to the calling environment, which is the environment that was active when the function was called. R also makes the calling environment the active environment again, which removes the execution environment from the search path.

R repeats these steps everytime it runs a function. Let’s use some quizzes to unpack these steps and their implications.

Calling Environments

What would happen…?

Consider the foo() function:

foo <- function(z = 2) {
  z <- 3
  z
}

Could the execution environment stay active?

Suppose there is an object named z stored in the calling environment. In fact, z contains my password, "password123". Boy am I glad that R runs the body of foo() in a fresh execution environment where, z <- 3 cannot overwrite my password!

What if…?

The Call Stack

Since one R function can call another R function, an execution environment can become the calling environment for a second execution environment. If the second function calls a third function, then the second execution environment would become the calling environment for a third execution environment, and so on.

These chains of calling environments are known as the call stack. Let’s see one in action.

show_stack()

The show_stack() function comes in the envtutorial package, which is a package I made specifically for this tutorial. show_stack() shows the call stack at the moment it is called.

The call stack does not look very impressive when you call show_stack() directly from the active environment (which in my case is the global environment):

show_stack()

##    label                         name
## 1  <environment: 0x64264cd83250> ""  
## 2  <environment: 0x64264f129888> ""  
## 3  <environment: 0x64264cd6b9f8> ""  
## 4  <environment: 0x64264cbe5458> ""  
## 5  <environment: 0x64264cbe0e00> ""  
## 6  <environment: 0x64264990bd30> ""  
## 7  <environment: 0x64264990b2e8> ""  
## 8  <environment: 0x64264aa73728> ""  
## 9  <environment: 0x64264aa735a0> ""  
## 10 <environment: 0x64264ab33300> ""  
## 11 <environment: 0x64264ab7c9b8> ""  
## 12 <environment: 0x64264f128e78> ""  
## 13 <environment: 0x642648fe3bf0> ""  
## 14 <environment: 0x642648ff6128> ""  
## 15 <environment: 0x642648ce3f08> ""  
## 16 <environment: R_GlobalEnv>    ""

The first row of the result is the execution environment of show_stack(). The second row is the calling environment of show_stack().

A bigger stack

But it is easy to embed show_stack() in a series of functions. When I run the code below, the k() function will call the j() function, which will call the i() function, which will call show_stack().

i <- function() show_stack()
j <- function() i()
k <- function() j()
k()

##    label                         name
## 1  <environment: 0x64264dbb51d8> ""  
## 2  <environment: 0x64264dbb5280> ""  
## 3  <environment: 0x64264dbb52f0> ""  
## 4  <environment: 0x64264dbb5360> ""  
## 5  <environment: 0x64264f129888> ""  
## 6  <environment: 0x64264dbb5520> ""  
## 7  <environment: 0x64264d558e30> ""  
## 8  <environment: 0x64264d5563e8> ""  
## 9  <environment: 0x64264d3b8ce0> ""  
## 10 <environment: 0x64264d3b9418> ""  
## 11 <environment: 0x64264d12dbd8> ""  
## 12 <environment: 0x64264d12ddd0> ""  
## 13 <environment: 0x64264ab33300> ""  
## 14 <environment: 0x64264ab7c9b8> ""  
## 15 <environment: 0x64264f128e78> ""  
## 16 <environment: 0x642648fe3bf0> ""  
## 17 <environment: 0x642648ff6128> ""  
## 18 <environment: 0x642648ce3f08> ""  
## 19 <environment: R_GlobalEnv>    ""

Here, the first row is the execution environment of show_stack(). The second row is the calling environment of show_stack(), which is the execution environment of i(). The third row is the calling environment of i(), which is the execution environment of j(). The fourth row is the calling environment of j(), which is the execution environment of k(). And the final row is the calling environment of k(), which is the global environment.

Call stack life cycle

Picture call stacks expanding and then collapsing as R runs its code. R built the call stack above one environment at a time, first making an execution environment to run k(), then an execution environment to run j(), and so on.

After R ran the last function, show_env(), R switched the active environment back to the calling environment of show_env(), removing the execution environment of show_env() from the call stack.

The calling environment of show_env() was the execution environment of i(). When R finished running i(), R switched the active environment back to the calling environment of i(), removing the execution environment of i() from the call stack, and so on.

Eventually, R had finished running all of the code and had removed the execution environments one at a time until the call stack only contained the original active environment (here the global environment).

Call stacks and search paths

Masking in the call stack

Let’s call show_stack() in a different way:

i <- function() show_stack()
j <- function() {
  show_stack <- function() 1 + 1
  i()
}
k <- function() j()
k()

This time the j() function defines its own version of show_stack(), which will live in the calling environment of i() (i.e. in the call stack).

Believe it or not, but show_stack() isn’t defined in the execution environment of i()—show_stack() is defined in the package environment for the envtutorial package. As a result, R needs to look up show_stack() in the same way that it looks up any other object.

If R uses the call stack as its search path, R will find and use the incorrect version of show_stack() that was created by j().

Enclosing environments

It would be a bad idea to use the call stack as the search path, since there is no way to police what might appear in the call stack that leads to a function. But what does R do instead?

Every function saves a reference to the environment where it was originally defined. This environment is known as the function’s enclosing environment.

You can look up a function’s enclosing environment by running environment() on the function, or by simply typing the name of the function: its enclosing environment will appear after its code body.

Click Submit Answer below to try it out.

environment(show_stack)
show_stack

environment(show_stack)
show_stack

Aside

You may have noticed that there is both a package environment and a namespace environment for the envtutorial package (and for every other package). The difference between package environments and namespace environments is very technical, and not important today. We won’t cover it.

If you would like to learn more about the difference between namespaces and package environments, you can read about them here.

Enclosing environments and the search path

Each time R runs a function, R will create a new execution environment; but each of these execution environments will use the same parent environment: the enclosing environment of the function.

As a result, the function will always run with the same search path, finding the same values for undefined variables each time. (Note that different functions will have different enclosing environments and hence different search paths).

In my example, i() was defined in the global environment, which became its enclosing environment.

environment(i)

## <environment: 0x64264f129888>

When R needed to look up show_stack() it first looked in the execution environment of i(), and then in the global environment, bypassing the execution environment of j().

What if you need an object from the calling environment?

Enclosing environments mean that the calling environment will (usually) not be on a function’s search path.

Congratulations

You have finished the most technical tutorial in this primer!

The remaining tutorials will show you how to write functions that do different things, like handle cases or iterate over loops.