Creating and using custom ggplot2 themes

The best way to make each plot your own.

ggplot2
data visualization
theme
tidyverse
NFL
espnscrapeR
Author

Thomas Mock

Published

December 26, 2020

Graphs for Communication

ggplot2 is an R package for producing statistical graphics, and is an implementation of the “Grammar of Graphics” by Leland Wilkinson. As such the primary way that folks use ggplot2 initially is for robust, fast, and easy exploratory data analysis or the creation of scientific/statistical plots while analyzing data. Here the primary use-case is for the data scientist themself, and the quick iteration of graphs. Relatively little time is spent on adjusting the theme or on making the graph “beautiful” as it may be ephemeral, or simply a visual check of the underlying relationships.

The next stage in many analyses is creating a graphic primarily intended for others to consume. At this point, the graphic needs to be more engaging, and often needs to be more focused on telling a “story” or reinforcing some point that you are trying to present from the underlying data.

In this blogpost, I’ll be covering some example themes, how to create and apply a theme, and then walk through some real life applications. I’ll leave larger color theory, and creating your own ggplot2 color scales for a future blogpost.

Themes

ggplot2 is remarkably extensible and customizable both through specific graphical components (geom_, scale_, aes, etc) or by theme components (grid lines, background colors, fonts, etc). There is also the concept of fully established themes which change many theme components at once.

First, we’ll get some data and load our libraries of interest.

We’re returning some data on NFL win rates, playoff seeding, and point differentials (ie difference between points given up and points scored).

nfl_stand <- 2014:2020 %>% 
  map_dfr(get_nfl_standings)
Returning 2014
Returning 2015
Returning 2016
Returning 2017
Returning 2018
Returning 2019
Returning 2020
diff_df <- nfl_stand %>% 
  select(season = season, conf, team_name, team_abb = team_abb, team_logo, win_pct, pts_diff) %>% 
  arrange(season, conf, desc(win_pct))

diff_df
# A tibble: 224 × 7
   season conf  team_name team_abb team_logo                    win_pct pts_diff
    <int> <chr> <chr>     <chr>    <chr>                          <dbl>    <dbl>
 1   2014 AFC   Patriots  NE       https://a.espncdn.com/i/tea…   0.75       155
 2   2014 AFC   Broncos   DEN      https://a.espncdn.com/i/tea…   0.75       128
 3   2014 AFC   Steelers  PIT      https://a.espncdn.com/i/tea…   0.688       68
 4   2014 AFC   Colts     IND      https://a.espncdn.com/i/tea…   0.688       89
 5   2014 AFC   Bengals   CIN      https://a.espncdn.com/i/tea…   0.656       21
 6   2014 AFC   Ravens    BAL      https://a.espncdn.com/i/tea…   0.625      107
 7   2014 AFC   Chiefs    KC       https://a.espncdn.com/i/tea…   0.562       72
 8   2014 AFC   Texans    HOU      https://a.espncdn.com/i/tea…   0.562       65
 9   2014 AFC   Chargers  SD       https://a.espncdn.com/i/tea…   0.562        0
10   2014 AFC   Bills     BUF      https://a.espncdn.com/i/tea…   0.562       54
# … with 214 more rows

theme_grey()

The default ggplot2 theme is theme_grey(), it’s major components are a grey panel.background, white panel.grid lines, coupled with a white plot.background, black text and a default black color for geom_ components.

The signature ggplot2 theme with a grey background and white gridlines, designed to put the data forward yet make comparisons easy.

ggplot(diff_df, aes(x = pts_diff, y = win_pct)) +
  geom_point()

theme_minimal()

This theme drops the panel.background and no longer has the visual separation between the panel vs plot areas.

A minimalistic theme with no background annotations.

ggplot(diff_df, aes(x = pts_diff, y = win_pct)) +
  geom_point() +
  theme_minimal()

theme_bw()

It’s major components are a white panel.background, grey panel.grid lines, coupled with a white plot.background, black text and a default black color for geom_ components.

The classic dark-on-light ggplot2 theme. May work better for presentations displayed with a projector.

ggplot(diff_df, aes(x = pts_diff, y = win_pct)) +
  geom_point() +
  theme_bw()

Other themes

There are several other themes built into ggplot2, and other packages that provide new themes. The most well-known external theme package is ggthemes, which provides a number of Data Journalism themes such as theme_few(), theme_fivethirtyeight(), theme_economist() and others.

These are implementations of very opinionated frameworks, and highlight the extensibility of the ggplot2 theme components.

ggplot(diff_df, aes(x = pts_diff, y = win_pct)) +
  geom_point() +
  ggthemes::theme_fivethirtyeight()

ggplot(diff_df, aes(x = pts_diff, y = win_pct)) +
  geom_point() +
  ggthemes::theme_economist()

ggplot2 theme components

You can always change specific theme components one at a time or in conjunction with a proper theme_. For example, we can apply the theme_fivethirtyeight() theme and then change one of the theme components individually.

ggplot(diff_df, aes(x = pts_diff, y = win_pct)) +
  geom_point() +
  ggthemes::theme_fivethirtyeight() +
  theme(
    panel.grid.major = element_line(color = "red")
  )

There are dozens of customizable theme components. They are outlined in the ggplot2 documentation. A deep walkthrough of a lot of ggplot2 customizations can be found on Cedric Scherer’s blog. Kieran Healy’s Data Visualization: A practical introduction covers additional considerations for refining and customizing plots and why certain decisions are good.

ggplot2 theme elements

For each of the theme components there are one of a few theme elements. For example, to change the axis grid line color, you’ll use element_line() as in the following pseudocode:

plot_object +
  theme(panel.grid.major = element_line(color = "red"))

For the theme elements, there are:

  • element_line() - change line element components, takes arguments like color, size, linetype (dotted, dashed, solid, etc)

  • element_rect() - change rectangular components like plot backgrounds, legend backgrounds, etc, takes arguments like fill, color, size

  • element_text() - change text components like axis labels, titles, and takes arguments like family (font family), face (bold, italics, etc), hjust/vjust (horizontal or vertical alignment), color, etc

  • element_blank() - completely remove an element by name

  • margin() - adjust margins of an element, can be used within some other theme componenets, and takes arguments of t (top), r (right), b (bottom), l (left), and unit (unit such as points, in, cm, etc)

  • rel() - relative sizing of elements, useful for text especially, ie choosing a base font size and scaling the titles vs body fonts relative to each other

You can always refer to the ggplot2 documentation for which theme element to use, but you can also typically “guess” which is the right one to use by referring to the output type (ie text uses element_text(), lines use element_line(), etc).


Inspiration

For inspiration, I often find that “Stealing like an artist” is my favorite approach.

Your job is to collect good ideas. The more good ideas you collect, the more you can choose from to be influenced by.

Plagiarism is trying to pass someone else’s work off as your own. Copying is about reverse-engineering.

With this approach, you are allowing yourself to be influenced by “good” work, and learning how to implement the approaches of experts while making it your own. I find that FiveThirtyEight as an organization produces great graphics and tables, and thus I take a lot of influence from their approaches with my own novel data. It helps that they do produce some of their graphics in ggplot2!

That being said, projects like #TidyTuesday or #StorytellingWithData can also give you a much more focused batch of concepts to borrow from as the focus is on applying concepts as a community with either a similar output or similar starting data.


A DataViz Journey

Let’s begin our journey by telling two stories, and working through the refinement of plots to help tell your story.

Story one is a continuation of our data above. We’re exploring the relationship between scoring points (offense), preventing the opposing team from scoring points (defense), and how this relates to wins. This is a relatively obvious story, but has some neat “sub” stories we can look into.

The second story will be about a specific team (the Indianapolis Colts), and how their approach to defense is effective while being very different than other teams.

Note that since this is using live data feeds, to recreate the plots exactly as they are you’d need to read in the data I used below.

diff_df <- readr::read_csv("https://raw.githubusercontent.com/jthomasmock/radix_themockup/master/static/diff_df.csv")
combo_pass <- readr::read_csv("https://raw.githubusercontent.com/jthomasmock/radix_themockup/master/static/combo_pass.csv")

Point pts_diff

Point differential is a simple metric (points scored - points against) that is pretty predictive of overall win rate and making the playoffs. Teams that score more points than the other team typically win more games, which is fairly obvious!

However simply returning the plot below may not be as engaging to a lay audience or tell some of the nuance of what happens in individual seasons.

Data Collection and Plot
nfl_stand <- 2014:2020 %>% 
  map_dfr(get_nfl_standings)
Returning 2014
Returning 2015
Returning 2016
Returning 2017
Returning 2018
Returning 2019
Returning 2020
nfl_stand_plot <- nfl_stand %>% 
  ggplot(aes(x = pts_diff, y = win_pct)) +
  geom_point() +
  geom_smooth(method = "lm")
nfl_stand_plot
`geom_smooth()` using formula 'y ~ x'

That plot told us there is a fairly linear response between having a larger point differential and having a better win rate, but does winning games and scoring more points always guarantee a playoff spot or even possibly missing the playoffs?

Playoff Teams

Let’s add a color that shows where playoff teams fit into the big picture. Note that in 2020 teams with a ranking of 7th or better make the playoffs vs previously only the top 6 teams made it.

Add Color
nfl_stand %>% 
  mutate(
    color = case_when(
      season < 2020 & seed <= 6 ~ "blue",
      season == 2020 & seed <= 7 ~ "blue",
      TRUE ~  "red"
    )
  ) %>% 
  ggplot(aes(x = pts_diff, y = win_pct)) +
  geom_vline(xintercept = 0, size = 0.75, color = "#737373") +
  geom_point(aes(color = color))

Add some color

Uh-oh… what happened with the colors??? We told it to be blue and red but it reversed the colors!

We needed a scale_color_identity() call. While we’re adding color, let’s also add some “color commentary” of the axes, a title, and a source caption. There we go! Now we can show that: “Playoff teams typically have a positive point differential”. I’ll also bump up the size of the points to bump up the “ink to white” ratio and some transparency (alpha) so that overlapping points are clear.

Lastly, I’ve added a vertical line at the 0 mark to clearly indicate the transition from negative to positive point differential.

Proper Colors and Text
nfl_stand %>% 
  mutate(
    color = case_when(
      season < 2020 & seed <= 6 ~ "blue",
      season == 2020 & seed <= 7 ~ "blue",
      TRUE ~  "red"
    )
  ) %>% 
  ggplot(aes(x = pts_diff, y = win_pct)) +
  geom_vline(xintercept = 0, size = 0.75, color = "#737373") +
  geom_point(
    aes(color = color),
    size = 3, alpha = 0.8
    ) +
  scale_color_identity() +
  labs(x = "Points Differential", y = "Win Percent",
       title = "Playoff teams typically have a positive point differential",
       subtitle = "Data through week 15 of the 2020 NFL Season",
       caption = "Plot: @thomas_mock | Data: ESPN")

Refine labels

Now, because we don’t have a legend for the colors, let’s add some direct labels to indicate what’s going on. I’ll use ggtext to add nicely formatted color text. Note that I’ve also borrowed the NFL shield’s blue and red hex color code to add a bit more engaging color for both the labels and points. Lastly, I’ve also converted the y-axis from decimals to proper percent labels via scales::percent_format() and some better breakpoints.

This is starting to look a lot better, but I’m not a huge fan of the base theme.

Refine Labels
library(ggtext)

playoff_label_scatter <- tibble(
  pts_diff = c(25,-125), y = c(0.3, 0.8), 
  label = c("Missed<br>Playoffs", "Made<br>Playoffs"),
  color = c("#D50A0A", "#013369")
)


playoff_diff_plot <- nfl_stand %>% 
  mutate(
    color = case_when(
      season < 2020 & seed <= 6 ~ "#013369",
      season == 2020 & seed <= 7 ~ "#013369",
      TRUE ~  "#D50A0A"
    )
  ) %>% 
  ggplot(aes(x = pts_diff, y = win_pct)) +
  geom_vline(xintercept = 0, size = 0.75, color = "#737373") +
  geom_hline(yintercept = 0, size = 0.75, color = "#737373") +
  geom_point(
    aes(color = color),
    size = 3, alpha = 0.8
    ) +
  ggtext::geom_richtext(
    data = playoff_label_scatter,
    aes(x = pts_diff, y = y, label = label, color = color),
    fill = "#f0f0f0", label.color = NA, # remove background and outline
    label.padding = grid::unit(rep(0, 4), "pt"), # remove padding
    family = "Chivo", hjust = 0.1, fontface = "bold",
    size = 8
  ) +
  scale_color_identity() +
  labs(x = "Points Differential", y = "Win Percent",
       title = "Playoff teams typically have a positive point differential",
       subtitle = "Data through week 15 of the 2020 NFL Season",
       caption = str_to_upper("Plot: @thomas_mock | Data: ESPN")) +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    breaks = seq(.0, 1, by = .10)
    ) +
  scale_x_continuous(
    breaks = seq(-200, 250, by = 50)
  )
playoff_diff_plot

Premade themes

Now, as mentioned earlier we can use themes to change theme components in bulk. Since we’re interested in some FiveThirtyEight style plots, let’s try the theme_fivethirtyeight() from ggthemes.

This looks relatively close to the right overall style, but we lost axis labels, and the fonts are still a bit basic. Let’s try and build our own!

playoff_diff_plot +
  ggthemes::theme_fivethirtyeight() 

Custom Themes

There are two ways of building your own themes:

  • theme() - Add theme components individually
  • theme_?? %+replace% - Apply an existing theme and overwriting components of it

theme()

Let’s first just build up the theme components, we’ll be attempting to recreate the FiveThirtyEight theme from scratch. I’m focusing on the minimal changes necessary to recreate the plotting style. We need a lightgrey background, grey gridlines, larger text with a specific font. Note that I’m using systemfonts to load all my system font libraries into R.

A quick example of a FiveThirtyEight style plot:

In short, a few concepts that make up the FiveThirtyEight “style guide”:

  • Focus on Web, ie relatively small graphics
  • Light smoke-grey background with grey gridlines
  • Black Plot Titles/Subtitles and Axis Labels
  • Grey axis text (ie numbers on axis)
  • LARGE plot titles and axis labels, with medium subtitles and axis text
  • Always add a source
  • Bright colors for plots

There’s a bit more fine details to what makes their plots so good but that covers the big parts we’ll try to capture in our theme. Some of the nuance is up to the individual to enact (ie choosing fonts, specific colors, just how big to make the fonts/points/etc).

The first pass at a theme that we’ve created to match that style. I’ve added comments as to what different theme elements change.

Custom Theme Code
theme_538 <- function(..., base_size = 12) {
  
    theme(
      # plotting components
      
      ## drop minor gridlines
      panel.grid.minor = element_blank(),
      # change grid lines to gray
      panel.grid.major =  element_line(color = "#d0d0d0"),
      # fill the plot and panel spaces with grey and remove border
      panel.background = element_rect(fill = "#f0f0f0", color = NA),
      plot.background = element_rect(fill = "#f0f0f0", color = NA),
      panel.border = element_blank(),
      # remove strip background
      strip.background = element_blank(),
      # adjust the margins of plots and remove axis ticks
      plot.margin = margin(0.5, 1, 0.5, 1, unit = "cm"),
      axis.ticks = element_blank(),
      # change text family, size, and adjust position of titles
      text = element_text(family = "Chivo", size = base_size),
      axis.text = element_text(face = "bold", color = "grey", size = base_size),
      axis.title = element_text(face = "bold", size = rel(1.33)),
      axis.title.x = element_text(margin = margin(0.5, 0, 0, 0, unit = "cm")),
      axis.title.y = element_text(margin = margin(0, 0.5, 0, 0, unit = "cm"), angle =90),
      plot.title = element_text(face = "bold", size = rel(1.67), hjust = 0),
      plot.title.position = "plot",
      plot.subtitle = element_text(size = 16, margin = margin(0.2, 0, 1, 0, unit = "cm"), hjust = 0),
      plot.caption = element_text(size = 10, margin = margin(1, 0, 0, 0, unit = "cm"), hjust = 1),
      strip.text = element_text(size = rel(1.33), face = "bold"),
      ...
    )
}

Now let’s apply the theme to our plot.

playoff_diff_plot +
  theme_538()

I think we’ve created a nice representation of the FiveThirtyEight style!

theme_?? %+replace%

The %+replace% version instead applies a theme and then completely replaces those components rather than adding to them. This is typically more robust, but requires you to specific more arguments as they’re removed otherwise. For our purposes the code inside theme() is identical but there are situations where building up by theme() alone requires less initial work but is less robust to “real life” edge cases.

%+replace% theme
theme_539 <- function(base_size = 12, base_family = "Chivo") {
  
  theme_grey(base_size = base_size, base_family = base_family) %+replace%
    theme(
      panel.grid.minor = element_blank(),
      axis.ticks = element_blank(),
      text = element_text(family = "Chivo", size = base_size),
      axis.text = element_text(face = "bold", color = "grey", size = base_size),
      axis.title = element_text(face = "bold", size = rel(1.33)),
      axis.title.x = element_text(margin = margin(0.5, 0, 0, 0, unit = "cm")),
      axis.title.y = element_text(margin = margin(0, 0.5, 0, 0, unit = "cm"), angle =90),
      plot.title = element_text(face = "bold", size = rel(1.67), hjust = 0),
      plot.title.position = "plot",
      plot.subtitle = element_text(size = 16, margin = margin(0.2, 0, 1, 0, unit = "cm"), hjust = 0),
      plot.caption = element_text(size = 10, margin = margin(1, 0, 0, 0, unit = "cm"), hjust = 1),
      plot.background = element_rect(fill = "#f0f0f0", color = NA),
      panel.background = element_rect(fill = "#f0f0f0", color = NA),
      panel.grid.major =  element_line(color = "#d0d0d0"),
      panel.border = element_blank(),
      plot.margin = margin(0.5, 1, 0.5, 1, unit = "cm"),
      strip.background = element_blank(),
      strip.text = element_text(size = rel(1.33), face = "bold")
    )
}
playoff_diff_plot +
  theme_539()

Same Data, Different Story

The Same Data, Different Story (SDDS) means we can use the same data in a different plot to tell a slightly different story. We can now represent theNow we’re going to use our theme and the same data, but represent the data a bit differently. We’ll load the ggridges package to let us plot many distributions at once. Again we’re adjusting the colors, plotting the data, and adding our theme. Overall this looks pretty excellent out of the gate! This is why themes + custom colors are so useful as you can get 80% of the way there with just those two changes.

We can see that for most years playoff teams are more frequently with positive point differentials and non-playoff teams have negative differentials, but there are clear cases (2014, 2016) where the story is not as clean!

library(ggridges)

stand_density <- nfl_stand %>% 
  mutate(
    color = case_when(
      season < 2020 & seed <= 6 ~ "#013369",
      season == 2020 & seed <= 7 ~ "#013369",
      TRUE ~  "#D50A0A"
    )
  ) %>% 
  ggplot(aes(x = pts_diff, y = factor(season), color = color, fill = color)) +
  geom_vline(xintercept = 0.5, size = 0.75, color = "#737373") +
  geom_density_ridges(alpha = 0.8, scale = 1.1) +
  scale_color_identity(aesthetics = c("fill", "color")) +
  theme_538()

stand_density
Picking joint bandwidth of 30.5

Add context

We can add a bit more context with custom embedded labels, a title, subtitle, etc and by adjusting some of the x/y breaks. I’m going to completely drop the y-gridlines since they’re not needed. Note that for the custom annotations that factors are represented essentially as discrete ordered integers, so for the 7 years we have plotted we can put a label just above the last year by plotting at 7.5 (just above the 7th year/2020).

Add Context
# create a small dataset for the custom annotations
playoff_label_ridge <- tibble(
  y = c(7.75, 7.75), pts_diff = c(-250,150),
  label = c("Missed<br>Playoffs", "Made<br>Playoffs"),
  color = c("#D50A0A", "#013369")
)

stand_density +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
  coord_cartesian(xlim = c(-250, 250)) +
  ggtext::geom_richtext(
    data = playoff_label_ridge,
    aes(x = pts_diff, y = y, label = label, color = color),
    fill = "#f0f0f0", label.color = NA, # remove background and outline
    label.padding = grid::unit(rep(0, 4), "pt"), # remove padding
    family = "Chivo", hjust = 0 , fontface = "bold",
    size = 6
  ) +
  theme_538() + 
  theme(panel.grid.major.y = element_blank()) +
  labs(
    x = "Point Differential", y = "",
    title = "Playoff teams typically have a positive point differential",
    subtitle = "Data through week 15 of the 2020 NFL Season",
    caption = "Plot: @thomas_mock | Data: ESPN"
    )
Picking joint bandwidth of 30.5

Picking joint bandwidth of 30.5

And if you want to look at the full code to create this all at once, see the expandable tag below.

Full code
library(ggridges)

playoff_label_ridge <- tibble(
  y = c(7.5, 7.5), pts_diff = c(-225,150),
  label = c("Missed<br>Playoffs", "Made<br>Playoffs"),
  color = c("#D50A0A", "#013369")
)


nfl_stand %>% 
  mutate(
    color = case_when(
      season < 2020 & seed <= 6 ~ "#013369",
      season == 2020 & seed <= 7 ~ "#013369",
      TRUE ~  "#D50A0A"
    )
  ) %>% 
  ggplot(aes(x = pts_diff, y = factor(season), color = color, fill = color)) +
  geom_vline(xintercept = 0.5, size = 0.75, color = "#737373") +
  geom_density_ridges(alpha = 0.8, scale = 1.1) +
  ggtext::geom_richtext(
    data = playoff_label_ridge,
    aes(x = pts_diff, y = y, label = label, color = color),
    fill = "#f0f0f0", label.color = NA, # remove background and outline
    label.padding = grid::unit(rep(0, 4), "pt"), # remove padding
    family = "Chivo", hjust = 0 , fontface = "bold",
    size = 6
  ) +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 10)) +
  coord_cartesian(xlim = c(-250, 250)) +
  scale_color_identity(aesthetics = c("fill", "color")) +
  theme_538() +
  theme(panel.grid.major.y = element_blank()) +
  labs(
    x = "Point Differential", y = "",
    title = "Playoff teams typically have a positive point differential",
    subtitle = "Data through Week 15 of the 2020 NFL Season",
    caption = "Plot: @thomas_mock | Data: ESPN"
    )
Picking joint bandwidth of 30.5


SDDS 2: Rise of 2020

We can go one step deeper on the Same Data, Different Story principle and plot the data just a bit differently. Let’s plot the playoff teams for 2020 (top 7), and the top two “on-the-bubble” teams. I’ve added the theme_538(), and a horizontal line at the 0 mark again. Note that I’m using tidytext::reorder_within() to reorder the teams by their playoff seed within a conference. reorder() would work here as well, so tidytext::reorder_within() isn’t truly necessary, but can be very useful when re-ordering columns across facets.

stand_df <- nfl_stand %>% 
  filter(season == 2020)

stand_df %>% 
  filter(seed <= 12) %>% 
  ggplot(aes(x = tidytext::reorder_within(team_abb, seed, conf), y = pts_diff)) +
  geom_col() + 
  tidytext::scale_x_reordered() +
  facet_grid(~conf, scales = "free_x") +
  geom_hline(yintercept = 0, size = 0.75, color = "#737373") +
  theme_538()

Add Context

Rather than labeling the x-axis with team, we could supply playoff seed. We’ll add our titles, captions, etc along with a vertical line for separation of the playoff vs non-playoff teams, and a vertical line to separate the Y-axis from the X-axis baseline. We’ve also expanded the break points on the y-axis and dropped the x-axis gridlines.

Add Context
# Small label dataset
playoff_label <- tibble(
  seed = c(9, 4),
  pts_diff = c(30, 100),
  conf = c("AFC", "AFC"),
  label = c("Outside<br>looking in", "Playoff<br>teams"),
  color = c("#D50A0A", "#013369")
)

stand_df %>%
  filter(seed <= 12) %>%
  ggplot(aes(x = as.factor(seed), y = pts_diff)) +
  geom_col(
    aes(fill = if_else(seed <= 7, "#013369", "#D50A0A")),
    width = 0.8
  ) +
  ggtext::geom_richtext(
    data = playoff_label,
    aes(label = label, color = color),
    fill = "#f0f0f0",
    label.color = NA,
    # remove background and outline
    label.padding = grid::unit(rep(0, 4), "pt"),
    # remove padding
    family = "Chivo",
    hjust = 0.1,
    fontface = "bold",
    size = 6
  ) +
  geom_hline(yintercept = 0, size = 0.75, color = "#737373") +
  geom_vline(xintercept = 7.5, size = 1, color = "grey") +
  geom_vline(xintercept = 0.5, size = 0.75, color = "#737373") +
  facet_grid(~conf, scales = "free_x") +
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
  scale_fill_identity(aesthetics = c("fill", "color")) +
  theme_538() +
  theme(panel.grid.major.x = element_blank()) +
  labs(
    x = "Playoff Seed",
    y = "Points Differential",
    title = "Playoff teams typically have a positive point differential",
    subtitle = "Data through week 15 of the 2020 NFL Season",
    caption = "Plot: @thomas_mock | Data: ESPN"
  )

Team vs Seed

By converting the x-axis to playoff seed instead of team, we’ve lost some data here, specifically which teams are which. We could add these back with by adding in an image at the end of each “bar”.

Add images
link_to_img <- function(x, width = 25) {
  glue::glue("<img src='{x}' width='{width}'/>")
}

playoff_label <- tibble(
  seed = c(9, 4),
  pts_diff = c(90, 110),
  conf = c("AFC", "AFC"),
  label = c("Outside<br>looking in", "Playoff<br>teams"),
  color = c("#D50A0A", "#013369")
)

stand_plot_logo <- stand_df %>%
  filter(seed <= 12) %>%
  mutate(label = link_to_img(team_logo, width = 25)) %>%
  ggplot(aes(x = as.factor(seed), y = pts_diff)) +
  geom_col(
    aes(fill = if_else(seed <= 7, "#013369", "#D50A0A")),
    width = 0.8
  ) +
  ggtext::geom_richtext(
    data = playoff_label,
    aes(label = label, color = color),
    fill = "#f0f0f0",
    label.color = NA,
    # remove background and outline
    label.padding = grid::unit(rep(0, 4), "pt"),
    # remove padding
    family = "Chivo",
    hjust = 0.1,
    fontface = "bold",
    size = 5
  ) +
  geom_richtext(
    aes(
      x = seed,
      y = pts_diff,
      label = label,
      vjust = if_else(pts_diff <= 1, 1.1, -0.1)
    ),
    size = 1,
    fill = "#f0f0f0",
    label.color = NA,
    # remove background and outline
    label.padding = grid::unit(rep(0, 4), "pt") # remove padding
  ) +
  geom_hline(yintercept = 0, size = 0.75, color = "#737373") +
  geom_vline(xintercept = 7.5, size = 1, color = "grey") +
  geom_vline(xintercept = 0.5, size = 0.75, color = "#737373") +
  facet_grid(~conf, scales = "free_x") +
  scale_y_continuous(
    breaks = seq(-125, 125, by = 25),
    limits = c(-130, 130)
  ) +
  scale_fill_identity(aesthetics = c("fill", "color")) +
  theme_538() +
  theme(
    panel.grid.major.x = element_blank()
  ) +
  labs(
    x = "Playoff Seed",
    y = "Points Differential",
    title = "Playoff teams typically have a positive point differential",
    subtitle = "Amongst the bubble teams, Baltimore appears to be the most 'deservering' of a playoff spot",
    caption = "Plot: @thomas_mock | Data: ESPN"
  )

ggsave("stand_plot_logo.png", stand_plot_logo, height = 8, width = 10, dpi = 300)
Warning: Removed 4 rows containing missing values (position_stack).
Warning: Removed 4 rows containing missing values (geom_rich_text).

Alternatively, we could convert the X-axis back to team, and then add playoff seed numbers back to each of the bars. This is getting a bit busier, but I think it’s useful to show the if_else() workflow for plotting a “label” at a specific point based on a criterion. That can be very helpful in plotting where you can’t control the “direction” of a bar just like we see here.

Direct Labels
stand_df %>%
  filter(seed <= 12) %>%
  ggplot(aes(x = tidytext::reorder_within(team_abb, seed, conf), y = pts_diff)) +
  geom_col(
    aes(fill = if_else(seed <= 7, "#013369", "#D50A0A")),
    width = 0.8
  ) +
  ggtext::geom_richtext(
    data = playoff_label,
    aes(x = seed, label = label, color = color),
    fill = "#f0f0f0",
    label.color = NA,
    # remove background and outline
    label.padding = grid::unit(rep(0, 4), "pt"),
    # remove padding
    family = "Chivo",
    hjust = 0.1,
    fontface = "bold",
    size = 6
  ) +
  ggtext::geom_richtext(
    aes(label = seed, y = if_else(pts_diff <= 0, 10, -10)),
    color = "black",
    fill = "#f0f0f0",
    label.color = NA,
    # remove background and outline
    label.padding = grid::unit(rep(0, 4), "pt"),
    # remove padding
    family = "Chivo",
    hjust = 0.5,
    fontface = "bold",
    size = 4
  ) +
  geom_hline(yintercept = 0, size = 0.75, color = "#737373") +
  geom_vline(xintercept = 7.5, size = 1, color = "grey") +
  geom_vline(xintercept = 0.5, size = 0.75, color = "#737373") +
  facet_grid(~conf, scales = "free_x") +
  tidytext::scale_x_reordered() +
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10)) +
  scale_fill_identity(aesthetics = c("fill", "color")) +
  theme_538() +
  theme(
    panel.grid.major.x = element_blank()
  ) +
  labs(
    x = "Playoff Seed",
    y = "Points Differential",
    title = "Playoff teams typically have a positive point differential",
    subtitle = "Amongst the bubble teams, Baltimore appears to be the most 'deservering' of a playoff spot",
    caption = "Plot: @thomas_mock | Data: ESPN"
  )


Blitzing

Now for the next example of a story with various components, we can look at blitz rate vs pressure rate, or blitz rate vs pass affected rate (sacks + ints + passes defended)/pass att. This is building off an article from FiveThirtyEight highlighting the Colts defense about a month ago.

The data they used is available at Pro-Football-Reference under a few defensive categories.

Get the data

First we have to download the data from PFR. There are two defensive datasets we’ll need, the Advanced Stats for pressure/blitz rates and the general Passing Stats for passes defended + intercetions.

We’ll get them via the script below, and then combine by team.

Data Collection
raw_url <- "https://www.pro-football-reference.com/years/2020/opp.htm"

raw_html <- read_html(raw_url)

raw_table <- raw_html %>% 
  html_table(fill = TRUE) %>% 
  .[[2]] %>% 
  janitor::clean_names() %>% 
  tibble()

pressure_df <- raw_table %>% 
  select(tm, blitz_pct = bltz_percent, press_pct = prss_percent) %>% 
  mutate(across(c(blitz_pct, press_pct), parse_number))

pass_def_raw <- raw_html %>% 
  html_node("#all_passing") %>% 
  html_nodes(xpath = "comment()") %>% 
  html_text() %>% 
  read_html() %>% 
  html_node("table") %>% 
  html_table() %>% 
  janitor::clean_names() %>% 
  tibble()

pass_def_df <- pass_def_raw %>% 
  select(tm, pass_att = att, int, pass_def = pd, sack = sk, ypa = y_a, anypa = any_a)

We can then do our left-join by tm to get our working dataset for this example.

combo_pass <- left_join(
  pressure_df, pass_def_df,
  by = "tm"
)

combo_pass
# A tibble: 32 × 9
   tm              blitz_pct press_pct pass_att   int pass_def  sack   ypa anypa
   <chr>               <dbl>     <dbl>    <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl>
 1 Atlanta Falcons      32.9      23.6      625    12       51    29   7.9   7.4
 2 Buffalo Bills        35.8      22.2      573    15       76    38   6.9   5.7
 3 Carolina Panth…      24        22.4      585     7       58    29   6.9   6.6
 4 Chicago Bears        21.4      22.4      547    10       71    35   7.2   6.6
 5 Cincinnati Ben…      31.1      19        541    11       80    17   7.3   7.2
 6 Cleveland Brow…      21.3      21.9      585    11       74    38   7.2   6.6
 7 Indianapolis C…      17.1      23.3      562    15       78    40   7.3   6.1
 8 Arizona Cardin…      39.4      25.9      570    11       57    48   6.9   5.9
 9 Dallas Cowboys       22.8      22.8      513    10       46    31   7.4   7.1
10 Denver Broncos       27.9      26.2      567    10       64    42   7.2   6.2
# … with 22 more rows

Initial Plot

We’ll first plot the data and try out our theme. I want to add the colors of the specific article we’re recreating.

combo_pass %>% 
  ggplot(aes(x = blitz_pct, y = press_pct)) +
  geom_point() +
  labs(
    x = "Blitz Rate", y = "Pressure Rate",
    title = "The Colts are pressuring QBs without much of a blitz",
    subtitle = "Blitz rate vs. pressure rate for each NFL defense, through Week 15 of the 2020 season"
  ) + 
  theme_538()

Add color

If we add color specifically for the Colts, and then also expand the axis range to match the article, we get the following plot. I’ll save the mutate calls into a new dataframe, and then plot it.

Add Color and Text
colt_df <- combo_pass %>% 
  mutate(
    color = if_else(tm == "Indianapolis Colts", "#359fda", "#91c390"),
    fill = colorspace::lighten(color, amount = 0.3)
    ) %>% 
  rowwise() %>% 
  mutate(
    att_def = sum(int, pass_def, sack),
    cov_rate = att_def/pass_att*100
    ) %>% 
  ungroup() %>% 
  arrange(desc(cov_rate))

colt_df %>% 
  ggplot(aes(x = blitz_pct, y = press_pct, fill = fill, color = color)) +
  geom_point(size = 5, pch = 21, alpha = 0.8) +
  scale_color_identity(aesthetics = c("fill", "color")) +
  labs(
    x = "Blitz Rate", y = "Pressure Rate",
    title = "The Colts are pressuring QBs without much of a blitz",
    subtitle = "Blitz rate vs. pressure rate for each NFL defense,\nthrough Week 15 of the 2020 season",
    caption = toupper("Plot: @thomas_mock | Data: PFR | Inspiration: FiveThirtyEight")
  ) +
  scale_x_continuous(limits = c(10, 45), breaks = seq(10, 45, by = 5)) +
  scale_y_continuous(limits = c(10, 35), breaks = seq(10, 35, by = 5)) +
  theme_538()
Warning: Removed 1 rows containing missing values (geom_point).

Warning: Removed 1 rows containing missing values (geom_point).

Add labels

We can create a small “helper” dataset to plot the labels, and we’ve basically re-created the original plot!

Add Labels
label_df_press <- tibble(
  label = c("Colts", "Everyone else"),
  color = c("#359fda", "#91c390"),
  fill = colorspace::lighten(color, amount = 0.3),
  x = c(16, 30),
  y = c(25, 29)
)

colt_df %>% 
  ggplot(aes(x = blitz_pct, y = press_pct, fill = fill, color = color)) +
  geom_point(size = 5, pch = 21, alpha = 0.8) +
  scale_color_identity(aesthetics = c("fill", "color")) +
  labs(
    x = "Blitz Rate", y = "Pressure Rate",
    title = "The Colts are pressuring QBs without much of a blitz",
    subtitle = "Blitz rate vs. pressure rate for each NFL defense,\nthrough Week 15 of the 2020 season",
    caption = "Source: Pro-Football-Reference.com"
  ) +
  scale_x_continuous(limits = c(10, 45), breaks = seq(10, 45, by = 5)) +
  scale_y_continuous(limits = c(10, 35), breaks = seq(10, 35, by = 5)) +
  geom_label(
    data = label_df_press,
    aes(x = x, y = y, color = color, label = label),
    fill = "#f0f0f0",
    size = 6,
    fontface = "bold",
    hjust = 0.8,
    label.size = NA # remove the border
  ) +
  theme_538()
Warning: Removed 1 rows containing missing values (geom_point).

Warning: Removed 1 rows containing missing values (geom_point).

Pass Affected Rate

The Colts are also getting good pass affected rate, which I’ve extended to be (interceptions + sacks + passes defended)/pass att. This is a proxy for the number of plays where the defense either stops the pass or is directly defending the reception.

label_df_cov <- tibble(
  label = c("Colts", "Everyone else"),
  color = c("#359fda", "#91c390"),
  fill = colorspace::lighten(color, amount = 0.3),
  x = c(16, 33),
  y = c(25, 28)
)

We can then use that to create a very similar plot as what we saw before. Note I’m adding an annotation outside the plot area by turning clip = 'off' in coord_cartesian(). Otherwise this plot is very close code-wise to the previous one.

Add Annotations
colt_df %>%
  ggplot(aes(x = blitz_pct, y = cov_rate, color = color, fill = fill)) +
  geom_point(size = 5, pch = 21) +
  scale_color_identity(aesthetics = c("fill", "color")) +
  labs(
    x = "Blitz Rate",
    y = "Pass Affected Rate",
    title = "The Colts affect passes at an elite rate while blitzing the least",
    subtitle = "Blitz rate vs. Pass affected rate for each NFL defense,\nthrough Week 15 of the 2020 season",
    caption = "Plot: @thomas_mock | Source: PFR"
  ) +
  scale_x_continuous(limits = c(10, 45), breaks = seq(10, 45, by = 5)) +
  scale_y_continuous(limits = c(10, 35), breaks = seq(10, 35, by = 5)) +
  coord_cartesian(clip = "off") +
  annotate("text", x = 10, y = 10, label = "Pass affected rate = (ints + sacks + passes defended)/pass attempts", vjust = 10, hjust = 0.2, color = "darkgrey") +
  theme_538() +
  geom_label(
    data = label_df_cov,
    aes(x = x, y = y, color = color, label = label),
    fill = "#f0f0f0",
    size = 6,
    fontface = "bold",
    hjust = 0.8,
    label.size = NA
  )

Slope Chart

By pivoting the data longer, and converting the blitz/pass affected rate columns into a combined column we can pretty quickly get a decent slope-chat put together. In short, we are creating a new metric and corresponding value column out of the blitz/pass affected rates.

This can then be plotted where the metric name is on the x-axis and the metric value is the y-axis. Because both metrics are percentages they’re in the same plotting range.

long_colts <- colt_df %>%
  mutate(
    color = if_else(color == "#91c390", "grey", "#359fda"),
    fill = color
  ) %>%
  rename(pass_affected_rate = cov_rate) %>%
  pivot_longer(
    cols = c(blitz_pct, pass_affected_rate), 
    names_to = "metric", values_to = "value"
    ) %>%
  mutate(
    metric = if_else(
      metric == "blitz_pct", 
      "Blitz Rate", 
      "Pass Affected Rate"
      )
  )

long_colts
# A tibble: 64 × 13
   tm    press_pct pass_att   int pass_def  sack   ypa anypa color fill  att_def
   <chr>     <dbl>    <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl> <chr> <chr>   <dbl>
 1 Pitt…      35.1      526    18       84    56   6.6   4.7 grey  grey      158
 2 Pitt…      35.1      526    18       84    56   6.6   4.7 grey  grey      158
 3 New …      26.3      557    18       84    45   6.7   5.4 grey  grey      147
 4 New …      26.3      557    18       84    45   6.7   5.4 grey  grey      147
 5 Miam…      24.8      545    18       76    41   8     6.2 grey  grey      135
 6 Miam…      24.8      545    18       76    41   8     6.2 grey  grey      135
 7 Los …      23.4      548    14       68    53   6.2   4.6 grey  grey      135
 8 Los …      23.4      548    14       68    53   6.2   4.6 grey  grey      135
 9 Wash…      25.9      529    16       66    47   6.4   4.8 grey  grey      129
10 Wash…      25.9      529    16       66    47   6.4   4.8 grey  grey      129
# … with 54 more rows, and 2 more variables: metric <chr>, value <dbl>

Initial Slope Chart

The initial slope chart is only a few lines of code. We’ll plot metric names on the x-axis, metric values on the y-axis, add lines by team between the compared measures, and then layer points on top of it all.

long_colts %>%
  ggplot(aes(x = metric, y = value, group = tm, color = color, fill = fill)) +
  geom_line(aes(size = if_else(tm == "Indianapolis Colts", 2, 0.5))) +
  geom_point(size = 5, pch = 21, color = "#f0f0f0", stroke = 1) +
  scale_color_identity(aesthetics = c("fill", "color")) +
  scale_size_identity()

Add theme and context

We can further enhance the plot with the title/source data. I’ve also shared a little “hack” that allows you to plot outside of the plot area. By setting clip = 'off' in coord_cartestian(), we can add an annotation that falls “outside” of the plot area. This lets us add a description note similar to what FiveThirtyEight does with some of their plots.

Add Theme and Context
colt_slope <- long_colts %>%
  ggplot(aes(x = metric, y = value, group = tm, color = color, fill = fill)) +
  geom_line(aes(size = if_else(tm == "Indianapolis Colts", 2, 0.5))) +
  geom_point(size = 5, pch = 21, color = "#f0f0f0", stroke = 1) +
  geom_label(
    data = filter(long_colts, tm == "Indianapolis Colts"),
    aes(label = paste0(round(value, 0), "%")),
    fill = NA,
    hjust = c(1, 0),
    nudge_x = c(-0.02, 0.02),
    label.size = NA,
    family = "Chivo",
    fontface = "bold",
    size = 6
  ) +
  annotate(
    "text", x = 1, y = 12, vjust = 11, hjust = 0.4, color = "darkgrey",
    family = "Chivo",
    label = "Pass affected rate = (sacks + ints + passes defended)/pass att"
    ) +
  coord_cartesian(clip = "off") +
  scale_y_continuous(limits = c(12, 43), breaks = scales::pretty_breaks(n = 7)) +
  scale_x_discrete(expand = c(0.2, 0.2)) +
  scale_color_identity(aesthetics = c("fill", "color")) +
  scale_size_identity() +
  theme_538() +
  theme(
    panel.grid.major.y = element_blank(),
    axis.text.y = element_blank(),
    plot.margin = margin(0.5, 0.5, 0.1, 0.5, unit = "cm"),
    axis.text.x = element_text(color = "black", size = 20),
    plot.caption = element_markdown(size = 14)
  ) +
  labs(
    x = "",
    y = "",
    title = "The Colts affect more passes with fewer rushers",
    subtitle = "Data since through Week 15 of the 2020 NFL season",
    caption = "**Plot**: @thomas_mock | **Data**: PFR"
  )

ggsave("colt_slope.png", colt_slope, height = 5, width = 4, dpi = 500, scale = 1.75)
Warning: Removed 1 row(s) containing missing values (geom_path).
Warning: Removed 1 rows containing missing values (geom_point).

Barbell

For our last example with this data, we’re going to really highlight the Colt’s passing defense effectiveness without a blitz. A barbell plot is another combo of lines + points, where the line is a segment between the two measures of interest.

Our first attempt is pretty straightforward, we’ll plot the segment between the points, then layer two geom_points(), one for each measure.

Initial Barbell
barbell_first <- colt_df %>% 
  rename(pass_affected_rate = cov_rate) %>% 
  mutate(
    color = str_replace(color, "#91c390", "grey"),
    fill = str_replace(fill, "#91c390", "grey"),
    tm = word(tm, -1)
    ) %>% 
  ggplot(aes(x = blitz_pct, y = fct_reorder(tm, pass_affected_rate), 
             group = tm, color = color)) +
  geom_segment(aes(xend = pass_affected_rate, yend = tm), size = 2) +
  geom_point(size = 5, color = "grey") +
  geom_point(aes(x = pass_affected_rate, y = tm), size = 5, color = "black") +
  scale_color_identity()

ggsave("barbell_first.png", barbell_first, height = 12, width = 8, units = "in", dpi = 300)

Theme and context

We can add some more labels and context, along with our theme and a custom color placement to indicate the Colts as the outlier of interest.

Barbell Theme
colt_colors <- colt_df %>% 
  arrange(cov_rate) %>% 
  mutate(color = str_replace(color, "#91c390", "black")) %>% 
  pull(color)

pass_label_df <- tibble(
  x = c(20, 19.5),
  tm = c(30, 25),
  label = c("Blitz Rate", "Pass Affected<br>Rate"),
  color = c("grey", "black")
)

ex_barbell <- colt_df %>%
  rename(pass_affected_rate = cov_rate) %>%
  mutate(
    color = str_replace(color, "#91c390", "grey"),
    fill = str_replace(fill, "#91c390", "grey"),
    tm = word(tm, -1)
  ) %>%
  ggplot(aes(x = blitz_pct, y = fct_reorder(tm, pass_affected_rate), 
             group = tm, color = color, fill = fill)) +
  geom_segment(aes(xend = pass_affected_rate, yend = tm), size = 2) +
  geom_point(size = 5, color = "grey") +
  geom_point(aes(x = pass_affected_rate, y = tm), size = 5, color = "black") +
  ggtext::geom_richtext(
    data = pass_label_df,
    aes(x = x, y = tm, label = label, color = color),
    fill = "#f0f0f0",
    label.color = NA,
    # remove background and outline
    label.padding = grid::unit(rep(0, 4), "pt"),
    # remove padding
    family = "Chivo",
    hjust = 1,
    fontface = "bold",
    size = 5
  ) +
  annotate(
    "text",
    x = 10,
    y = 1,
    vjust = 10,
    hjust = 0.4,
    color = "darkgrey",
    family = "Chivo",
    label = "Pass affected rate = (sacks + ints + passes defended)/pass att"
  ) +
  scale_x_continuous(limits = c(10, 45), breaks = scales::pretty_breaks(n = 7)) +
  scale_color_identity(aesthetics = c("fill", "color")) +
  coord_cartesian(clip = "off") +
  labs(
    x = "<span style = 'color:grey;'>Blitz rate</span> vs Pass affected rate",
    y = "",
    title = "The Colts affect more passes with fewer rushers",
    subtitle = "Data through Week 15 of the 2020 NFL season",
    caption = "**Plot**: @thomas_mock | **Data**: PFR"
  ) +
  theme_538() +
  theme(
    panel.grid.major.y = element_blank(),
    axis.text.y = element_text(color = colt_colors, size = 18),
    axis.title.x = element_markdown(),
    plot.caption = element_markdown()
  )
Warning: Vectorized input to `element_text()` is not officially supported.
Results may be unexpected or may change in future versions of ggplot2.
ggsave("barbell.png", ex_barbell, height = 12, width = 8, units = "in", dpi = 300)


The end

We’ve now covered quite a bit about themes and customization of ggplot2 beyond the basics! You can dive back into other themes from ggthemes to recreate them/build off of them or develop your own novel theme to use.

If my blog has helped you, you can buy me a coffee!

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.0 (2022-04-22)
 os       macOS Monterey 12.2.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Chicago
 date     2022-04-28
 pandoc   2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
 quarto   0.9.294 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 colorspace  * 2.0-3   2022-02-21 [1] CRAN (R 4.2.0)
 dplyr       * 1.0.8   2022-02-08 [1] CRAN (R 4.2.0)
 espnscrapeR * 0.6.5   2022-04-26 [1] Github (jthomasmock/espnscrapeR@084ce80)
 forcats     * 0.5.1   2021-01-27 [1] CRAN (R 4.2.0)
 ggplot2     * 3.3.5   2021-06-25 [1] CRAN (R 4.2.0)
 ggridges    * 0.5.3   2021-01-08 [1] CRAN (R 4.2.0)
 ggtext      * 0.1.1   2020-12-17 [1] CRAN (R 4.2.0)
 ggthemes    * 4.2.4   2021-01-20 [1] CRAN (R 4.2.0)
 purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
 readr       * 2.1.2   2022-01-30 [1] CRAN (R 4.2.0)
 rvest       * 1.0.2   2021-10-16 [1] CRAN (R 4.2.0)
 sessioninfo * 1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
 stringr     * 1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
 systemfonts * 1.0.4   2022-02-11 [1] CRAN (R 4.2.0)
 tibble      * 3.1.6   2021-11-07 [1] CRAN (R 4.2.0)
 tidyr       * 1.2.0   2022-02-01 [1] CRAN (R 4.2.0)
 tidyverse   * 1.3.1   2021-04-15 [1] CRAN (R 4.2.0)

 [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────