Skip to Tutorial Content
## Warning in file(filename, "r", encoding = encoding): URL
## 'https://metrics.rstudioprimers.com/learnr/installClient': status was 'Couldn't
## resolve host name'

[1] “Warning: An error occurred with the client code.”

Welcome

This tutorial will show you how to make and enhance bar charts with the ggplot2 package. You will learn how to:

  • make and interpret bar charts
  • customize bar charts with aesthetics and parameters
  • use position adjustments
  • use facets to create subplots

The tutorial is adapted from R for Data Science by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. You can purchase the book at shop.oreilly.com.

The tutorial uses the ggplot2 and dplyr packages, which have been pre-loaded for your convenience.

Bar Charts

How to make a bar chart

To make a bar chart with ggplot2, add geom_bar() to the ggplot2 template. For example, the code below plots a bar chart of the cut variable in the diamonds dataset, which comes with ggplot2.

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut))

The y axis

You should not supply a \(y\) aesthetic when you use geom_bar(); ggplot2 will count how many times each \(x\) value appears in the data, and then display the counts on the \(y\) axis. So, for example, the plot above shows that over 20,000 diamonds in the data set had a value of Ideal.

You can compute this information manually with the count() function from the dplyr package.

diamonds %>% 
  count(cut)

geom_col()

Sometimes, you may want to map the heights of the bars not to counts, but to a variable in the data set. To do this, use geom_col(), which is short for column.

ggplot(data = pressure) +
  geom_col(mapping = aes(x = temperature, y = pressure))

geom_col() data

When you use geom_col(), your \(x\) and \(y\) values should have a one to one relationship, as they do in the pressure data set (i.e. each value of temperature is paired with a single value of pressure).

pressure

Exercise 1 - Make a bar chart

Use the code chunk below to plot the distribution of the color variable in the diamonds data set, which comes in the ggplot2 package.

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = color))

Exercise 2 - Interpretation

Bar charts

Exercise 3 - What went wrong?

Diagnose the error below and then fix the code chunk to make a plot.

ggplot(data = pressure) +
  geom_bar(mapping = aes(x = temperature, y = pressure))
Error in `geom_bar()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `setup_params()`:
! `stat_count()` must only have an x or y aesthetic.
ggplot(data = pressure) +
  geom_col(mapping = aes(x = temperature, y = pressure))

Exercise 4 - count() and col()

Recreate the bar graph of color from exercise one, but this time first use count() to manually compute the heights of the bars. Then use geom_col() to plot the results as a bar graph. Does your graph look the same as in exercise one?

diamonds %>% 
  count(color) %>% 
  ggplot() +
    geom_col(mapping = aes(x = color, y = n))

Aesthetics

Aesthetics for bars

geom_bar() and geom_col() can use several aesthetics:

  • alpha
  • color
  • fill
  • linetype
  • size

One of these, color, creates the most surprising results. Predict what the code below will return and then run it.

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, color = cut))

fill

The color aesthetic controls the outline of each bar in your bar plot, which may not be what you want. To color the interior of each bar, use the fill aesthetic:

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Use the code chunk below to experiment with fill, along with other geom_bar() aesthetics, like alpha, linetype, and size.

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, color = cut))

Width

You can control the width of each bar in your bar chart with the width parameter. In the chunk below, set width = 1, then width = 0.5. Can you spot the difference?

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = cut), width = 0.9)

Notice that width is a parameter, not an aesthetic mapping. Hence, you should set width outside of the aes() function.

Exercise 5 - aesthetics

Create a colored bar chart of the class variable from the mpg data set, which comes with ggplot2. Map the interior color of each bar to class.

ggplot(data = mpg) +
  geom_bar(mapping = aes(x = class, fill = class))

Position adjustments

Positions

If you map fill to a new variable, geom_bar() will display a stacked bar chart:

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = clarity))

This plot displays 40 different combinations of cut and clarity, each displayed by its own rectangle. geom_bar() lays out the rectangles by stacking rectangles that have the same cut value on top of one another. You can change this behavior with a position adjustment.

Position = “dodge”

To place rectangles that have the same cut value beside each other, set position = "dodge".

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")

This plot shows the same rectangles as the previous chart; however, it lays out rectangles that have the same cut value beside each other.

Position = “stack”

To create the familiar stacked bar chart, set position = "stack" (which is the default for geom_bar()).

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "stack")

Position = “fill”

To expand each bar to take up the entire \(y\) axis, set position = "fill". ggplot2 will stack the rectangles and then scale them within each bar.

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")

This makes it easy to compare proportions. For example, you can scan across the bars to see how the proportion of IF diamonds changes from cut to cut.

What is a position adjustment?

Every geom function in ggplot2 takes a position argument that is preset to a reasonable default. You can use position to determine how a geom should adjust objects that would otherwise overlap with each other.

For example, in our plot, each value of cut is associated with eight rectangles: one each for I1, SI2, SI1, VS2, VS1, VVS2, VVS1, and IF. Each of these eight rectangles deserves to go in the same place: directly above the value of cut that it is associated with, with the bottom of the rectangle placed at count = 0. But if we plotted the plot like that, the rectangles would overlap each other.

Here’s what that would look like if you could peek around the side of the graph.

Position = “identity”

..and here’s what that would look like if you could see the graph from the front. You can make this plot by setting position = "identity".

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "identity")

Position adjustments tell ggplot2 how to re-distribute objects when they overlap. position = "identity" is the “adjustment” that let’s objects overlap each other. It is a bad choice for bar graphs because the result looks like a stacked bar chart, even though it is not.

Exercise 6 - Positions

Use the code chunk to recreate the plot you see below. Remember: color is the name of a variable in diamonds (not to be confused with an aesthetic).

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = color, fill = clarity), position = "fill", width = 1)

Exercise 7 - Positions

Use the code chunk to recreate the plot you see below. Remember: color is the name of a variable in diamonds (not to be confused with an aesthetic).

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = color, fill = cut), position = "dodge")

Exercise 8 - position = “identity”

Why is position = “identity” a bad idea?

Facets

Facetting

You can more easily compare subgroups of data if you place each subgroup in its own subplot, a process known as facetting.

facet_grid()

ggplot2 provides two functions for facetting. facet_grid() divides the plot into a grid of subplots based on the values of one or two facetting variables. To use it, add facet_grid() to the end of your plot call.

The code chunks below, show three ways to facet with facet_grid(). Spot the differences between the chunks, then run the code to learn what the differences do.

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = color)) +
  facet_grid(clarity ~ cut)
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = color)) +
  facet_grid(. ~ cut)
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = color)) +
  facet_grid(clarity ~ .)

facet_grid() recap

As you saw in the code examples, you use facet_grid() by passing it a formula, the names of two variables connected by a ~.

facet_grid() will split the plot into facets vertically by the values of the first variable: each facet will contain the observations that have a common value of the variable. facet_grid() will split the plot horizontally by values of the second variable. The result is a grid of facets, where each specific subplot shows a specific combination of values.

If you do not wish to split on the vertical or horizontal dimension, pass facet_grid() a . instead of a variable name as a place holder.

facet_wrap()

facet_wrap() provides a more relaxed way to facet a plot on a single variable. It will split the plot into subplots and then reorganize the subplots into multiple rows so that each plot has a more or less square aspect ratio. In short, facet_wrap() wraps the single row of subplots that you would get with facet_grid() into multiple rows.

To use facet_wrap() pass it a single variable name with a ~ before it, e.g. facet_wrap( ~ color).

Add facet_wrap() to the code below to create the graph that appeared at the start of this section. Facet on cut.

ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = color, fill = cut))
ggplot(data = diamonds) +
  geom_bar(mapping = aes(x = color, fill = cut)) +
  facet_wrap( ~ cut)

scales

By default, each facet in your plot will share the same \(x\) and \(y\) ranges. You can change this by adding a scales argument to facet_wrap() or facet_grid().

  • scales = "free" will let the \(x\) and \(y\) range of each facet vary

  • scales = "free_x" will let the \(x\) range of each facet vary, but not the \(y\) range

  • scales = "free_y" will let the \(y\) range of each facet vary, but not the \(x\) range. This is a convenient way to compare the shapes of different distributions:

    ggplot(data = diamonds) +
      geom_bar(mapping = aes(x = color, fill = cut)) +
      facet_wrap( ~ cut, scales = "free_y")

Recap

In this tutorial, you learned how to make bar charts; but much of what you learned applies to other types of charts as well. Here’s what you should know:

  • Bar charts are the basis for histograms, which means that you can interpret histograms in a similar way.
  • Bars are not the only geom in ggplot2 that use the fill aesthetic. You can use both fill and color aesthetics with any geom that has an “interior” region.
  • You can use the same position adjustments with any ggplot2 geom: "identity", "stack", "dodge", "fill", "nudge", and "jitter" (we’ll learn about "nudge" and "jitter" later). Each geom comes with its own sensible default.
  • You can facet any ggplot2 plot by adding facet_grid() or facet_wrap() to the plot call.

Bar charts are an excellent way to display the distribution of a categorical variable. In the next tutorial, we’ll meet a set of geoms that display the distribution of a continuous variable.

Bar Charts