2020 in Review

meta

Surviving a pandemic at home.

Author

Thomas Mock

Published

12-31-2020

My 2020

2020 was absolutely difficult in a lot of ways, both for the world and for me personally. Rather than focusing on some of the difficulties, I’d rather focus on some of the successes for 2020.

I feel very blessed that my family has stayed happy and healthy, that I have the change to continue being able to grow into expanded roles at RStudio, that I have had a lot of time with my wife Maria and our dog Howard, and that 2020 is now over. Here’s to hopefully efficient vaccine delivery in 2021, and a safer new year to all!

Growth of #TidyTuesday

TidyTuesday has always been a labor of love for me, and the past 3 years have provided me with such a cool view of the R community, and I believe have helped thousands of learners across the world engage with the R community.

If we’re using the raw Tweet data, it looks like 8,601 unique tweets that reference #TidyTuesday along with some key plotting related words.

Code to reproduce
library(tidyverse)
library(lubridate)
raw_url <- "https://github.com/rfordatascience/tidytuesday/blob/master/tidytuesday_tweets/data.csv?raw=true"

raw_df <- read_csv(raw_url)

raw_df %>% 
  filter(
    str_detect(tolower(text), "rstats|code|plot|graph|viz|data|tidyverse")
  ) %>% 
  count()

raw_df %>% 
  filter(
    str_detect(tolower(text), "rstats|code|plot|graph|viz|data|tidyverse")
  ) %>% 
  distinct(screen_name) %>% nrow()

sum_df <- raw_df %>% 
  filter(str_detect(tolower(text), "rstats|code|plot|graph|viz|data|tidyverse")) %>% 
  mutate(created_date = lubridate::as_date(created_at),
         year = year(created_date),
         week = week(created_date),
  ) %>% 
  count(year, week) %>% 
  group_by(year) %>% 
  mutate(roll_n = cumsum(n),
         week = if_else(year > 2018, week - 1, week - 14),
         year = factor(year, levels = c(2020, 2019, 2018)),
         colour = case_when(
           year == 2020 ~ "#003399",
           year == 2019 ~ "#FF2B4F",
           year == 2018 ~ "#fcab27",
           TRUE ~ "gray80"
         )
  ) %>% 
  ungroup()

top_weeks <- sum_df %>% 
  group_by(year) %>% 
  arrange(desc(roll_n)) %>% 
  slice(1)

tt_plot <-  ggplot(
  sum_df, 
  aes(x = week, y = roll_n, 
      color = colour, group = year)
) +
  geom_step(size = 1) +
  geom_point(data = top_weeks,
             aes(col = colour), 
             size = 2.5, stroke = 1) +
  geom_text(data = top_weeks,
            aes(label = year),
            size = 8, hjust = c(1, 1, 0), nudge_y = 50,
            vjust = 0) +
  geom_hline(yintercept = 0, size = 1, color = "black") +
  scale_y_continuous(breaks = seq(0, 5000, by = 500),
                     limits = c(0, 5000)) +
  scale_x_continuous(breaks = c(seq(0, 50, 5), 52),
                     limits = c(0, 53)) +
  scale_color_identity(aesthetics = c("colour", "fill")) +
  labs(x = "\nWeek Number", y = "Cumulative Tweets\n",
       caption = "Data: rtweet | Plot: @thomas_mock",
       title = "Cumulative tweets for #TidyTuesday by year",
       subtitle = "Note that Week 1 of 2018 started in April\nTweets must contain: 'rstats, code, plot, graph, viz, data or tidyverse'\n")  +
  tomtom::theme_538() +
  theme(legend.position = c(0.1, 0.8),
        legend.background = element_blank(),
        legend.title = element_blank(),
        legend.text = element_text(size = 12),
        plot.background = element_blank(),
        plot.title = element_text(size = 24))

tt_plot

ggsave("tt-plot.png", tt_plot, dpi = "retina")

Re-emphasis on blogging

I had neglected my blog since Jan 2019, but kicked off with a new blog post about simulating optimal strategy for my favorite dice game “10,000”, after seeing some Dave Robinson simulations examples. I have a strong desire to create tutorials/guides so that I can remember things for the future and to help provide some scaffolding for others to learn from as they progress through their data science learning journey.


This really took off in March/April when Texas was heavily in lockdown and I could no longer go out to restaurants or see family. I ended up with a lot of exploration of topics, mostly for the TidyTuesday and the NFL sports analytics communities.

  • Beer and PDF tools - a vignette: this one covered extracting tables from many PDFs at once, which we used for a TidyTuesday dataset
  • Bigger, nflfastR and dbplyr: with the launch of nflfastR for large NFL play-by-play, I put together an example of creating local SQLite databases and querying them via dplyr
  • Flipping tibbles for many models: in a previous life, I used broom to tidy up my stats, and in this post I covered how to run any number of models at one time by nesting dataframes and tidying them up with broom

In May, I started a “How to make tables with R” series, mostly focused on gt and reactable.


I also started providing some more general resources on how to use the new tidyr functions to clean up deeply nested JSON, how to use RMarkdown, and some other guides.


By August, I was still stuck inside, so I put together a resource on how to start your OWN blog with distill, and some other guides. Eventually, distill got a HUGE update which I provided some ideas to, although all of the real work was done by Alison Hill and the RMarkdown team.

  • Build a blog with distill: End-to-end how to guide for building and deploying a blog to netlify for free.
  • Heatmaps in ggplot2: Created examples of how to create various heatmaps and some info on when to choose a specific version to avoid overplotting

September kicked off a deeper focus on gt specifically.


In October and November, I spent more time on plotting.


In December, I revisited JSON data and more plotting.

For 2021, I’ll be continuing to blog and create technical guides - tweet at me if there’s something you want to see! @thomas_mock .

Conferences and Presentations

I gave some conference talks, mostly invited which I can attribute to being engaged with the community and technical blogging.

useR 2020

I was lucky enough to both attend (virtually) and present at my first ever useR! I focused on the #TidyTuesday project, what it’s all about, and how the community has built around it.

NYC R

Got to attend and present at my first NYC R conference, focused on doing more with RMarkdown!

Hockey Analytics Night in Canada: NFL Edition

I was honored to be invited to and present at HANICx NFL, and after running a poll decided to present on the most requested topic: tidymodels!

  • Slides: A short intro presentation on using tidymodels to predict whether the opposing team will run or pass
  • Recording

NHS R Conference

Gave an adapted version of my RMarkdown presentation for the NHS R community, enjoyed the chatter in the live groups there!

Carnegie Mellon Sports Analytics Conference

Gave my first public workshop for the CMSAC team, big thanks to Ron Yorko for the invite! I’ve always wanted to attend this conference and was deeply honored to join as a workshop presenter.

Tables Presentation

Gave a 60 min presentation on “Beautiful Tables in R”

Personal

I survived, along with my very patient wife Maria, and our dog Howard!

  • We played a lot of Mario Kart on our Switch 🎮
  • I beat Zelda: Breath of the Wild (but am nowhere near all the Korok seeds) 🏆
  • I cut my own hair 7 times so far this year 💇‍♂️
  • We spent a lot of time indoors or in the outdoors but isolated 🌲
  • We ate a lot of takeout at home 🥘
  • We voted 🇺🇸
  • We wore a LOT of masks 😷

I’m looking forward to some rest, and then a refresh for 2021!

Blessings to you and yours, and happy New Year!

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.0 (2022-04-22)
 os       macOS Monterey 12.2.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Chicago
 date     2022-04-28
 pandoc   2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
 quarto   0.9.294 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 sessioninfo * 1.2.2   2021-12-06 [1] CRAN (R 4.2.0)

 [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────