Creating a custom gt function for aligning first-row text and testing it with testthat

gt functions testing

Creating and testing your own functions is fun!

Thomas Mock https://twitter.com/thomas_mock
2021-03-07

Creating and testing gt functions

It’s no secret, but I love the gt package and tables in general. I’ve been on a big table kick for almost a year at this point!

While I love all the amazing features built into gt, sometimes I also want to create my own functions to wrap or extend features.

For example, I’ve done:

This blogpost will cover how to solve a fairly common ask, how to add a symbol/character to the end of ONLY the first row of a column and maintain the alignment of the entire column. We’ll walk through how to accomplish this with gt only, creating our own function to do it more succinctly, and then how to further test our gt outputs with testthat!

No repeats

I’ve always been a fan of not having to repeat symbols/prefixes/suffixes inside tables. There’s some ongoing work here in gt to add this as a feature, but in the meantime I wanted to play around with a few ways to accomplish this with gt as it is, and/or a custom function as of today.

You can imagine a situation like below, where we want to label cells within a column as a percent, and want to indicate that it’s a percent ONLY on the first row.

head(gtcars) %>%
  mutate(hp_pct = (hp/max(hp) * 100)) %>% 
  dplyr::select(mfr, model, year, trim, hp, hp_pct) %>%
  gt() %>% 
  fmt_percent(columns = vars(hp_pct), rows = 1, scale_values = FALSE) %>% 
  fmt_number(columns = vars(hp_pct), rows = 2:6) %>% 
  tab_style(
    style = cell_text(color = "red"), 
    locations = cells_body(vars(hp_pct), rows = 1)
    )
mfr model year trim hp hp_pct
Ford GT 2017 Base Coupe 647 97.88%
Ferrari 458 Speciale 2015 Base Coupe 597 90.32
Ferrari 458 Spider 2015 Base 562 85.02
Ferrari 458 Italia 2014 Base Coupe 562 85.02
Ferrari 488 GTB 2016 Base Coupe 661 100.00
Ferrari California 2015 Base Convertible 553 83.66

However, you can quickly see that this misaligned the first row from the remaining rows.


No repeats in gt

An alternative would be to convert those rows to text and apply specific changes.

There’s quite a bit going on here:

I want to pause here and say with the code below, we have officially accomplished our goal. However, this was fairly manual and can be repetitive for adding several of these transformations in a single table.

head(gtcars) %>%
  mutate(hp_pct = (hp/max(hp) * 100)) %>% 
  dplyr::select(mfr, model, year, trim, hp, hp_pct) %>%
  gt() %>%
  # use a mono-spaced font
  tab_style(
    style = cell_text(font = google_font("Fira Mono")),
    locations = cells_body(columns = vars(hp_pct))
    ) %>% 
  # align the column of interst to right
  cols_align(align = "right", columns = vars(hp_pct)) %>% 
  # round and transform the first row to percent
  text_transform(
    locations = cells_body(vars(hp_pct), rows = 1),
    fn = function(x){ 
      fmt_val <- format(as.double(x), nsmall = 1, digits = 1)
      paste0(fmt_val, "%") %>% gt::html()}
  ) %>% 
  text_transform(
    locations = cells_body(vars(hp_pct), rows = 2:6),
    fn = function(x){ 
      # round remaining rows, add a non-breaking space
     fmt_val <- format(as.double(x), nsmall = 1, digits = 1)
     lapply(fmt_val, function(x) paste0(x, '&nbsp') %>% gt::html())
  })
mfr model year trim hp hp_pct
Ford GT 2017 Base Coupe 647 97.9%
Ferrari 458 Speciale 2015 Base Coupe 597 90.3 
Ferrari 458 Spider 2015 Base 562 85.0 
Ferrari 458 Italia 2014 Base Coupe 562 85.0 
Ferrari 488 GTB 2016 Base Coupe 661 100.0 
Ferrari California 2015 Base Convertible 553 83.7 

Format symbol first Function

We can try to wrap some of the gt code into a function and apply these transformations in bulk at the location of our choosing! This is especially important for making it generally apply to other types of inputs instead of JUST %. The function of interest is actually two custom functions, some gt functions, and a good chunk of logic.

I’ve commented the individual sections as to their purpose, and included quite a bit of error-handling or protecting against various user inputs.

fmt_symbol_first <- function(
  gt_data,
  column = NULL,        # column of interest to apply to
  symbol = NULL,        # symbol to add, optionally
  suffix = "",          # suffix to add, optionally
  decimals = NULL,      # number of decimal places to round to
  last_row_n,           # what's the last row in data?
  symbol_first = FALSE  # symbol before or after suffix?
) {
  
  # Test and error out if mandatory columns are missing
  stopifnot("`symbol_first` argument must be a logical" = is.logical(symbol_first))
  stopifnot("`last_row_n` argument must be specified and numeric" = is.numeric(last_row_n))
  stopifnot("Input must be a gt table" = class(gt_data)[[1]] == "gt_tbl")

  # needs to type convert to double to play nicely with decimals and rounding
  # as it's converted to character by gt::text_transform
  add_to_first <- function(x, suff = suffix, symb = symbol) {
    if (!is.null(decimals)) {
      x <- suppressWarnings(as.double(x))
      fmt_val <- format(x = x, nsmall = decimals, digits = decimals)
    } else {
      fmt_val <- x
    }

    # combine the value, passed suffix, symbol -> html
    if (isTRUE(symbol_first)) {
      paste0(fmt_val, symb, suff) %>% gt::html()
    } else {
      paste0(fmt_val, suff, symb) %>% gt::html()
    }
  }

  # repeat non-breaking space for combined length of suffix + symbol
  # logic is based on is a NULL passed or not
  if (!is.null(symbol) | !identical(as.character(symbol), character(0))) {
    suffix <- ifelse(identical(as.character(suffix), character(0)), "", suffix)
    length_nbsp <- c("&nbsp", rep("&nbsp", nchar(suffix))) %>%
      paste0(collapse = "")
  } else {
    suffix <- ifelse(identical(as.character(suffix), character(0)), "", suffix)
    length_nbsp <- rep("&nbsp", nchar(suffix)) %>%
      paste0(collapse = "")
  }

  # affect rows OTHER than the first row
  add_to_remainder <- function(x, length = length_nbsp) {
    if (!is.null(decimals)) {
      # if decimal not null, convert to double
      x <- suppressWarnings(as.double(x))
      # then round and format ALL to force specific decimals
      fmt_val <- format(x = x, nsmall = decimals, digits = decimals)
    } else {
      fmt_val <- x
    }
    paste0(fmt_val, length) %>% lapply(FUN = gt::html)
  }

  # pass gt object
  # align right to make sure the spacing is meaningful
  gt_data %>%
    cols_align(align = "right", columns = vars({{ column }})) %>%
    # convert to mono-font for column of interest
    tab_style(
      style = cell_text(font = google_font("Fira Mono")),
      locations = cells_body(columns = vars({{ column }}))
    ) %>%
    # transform first rows
    text_transform(
      locations = cells_body(vars({{ column }}), rows = 1),
      fn = add_to_first
    ) %>%
    # transform remaining rows
    text_transform(
      locations = cells_body(vars({{ column }}), rows = 2:last_row_n),
      fn = add_to_remainder
    )
}

Use the function

We can now use that fmt_symbol_first() function, note that I’m testing a few different combinations of suffix/symbols, decimals, etc that may be a bit nonsensical in the table itself but are interactively testing that the results are what I expect. Specifically, I’m making sure that symbols/suffixes are added, and that the spacing is correct. While this is useful for sanity checking quickly, we can also take another step to apply some proper unit-testing in the next section.

gtcars %>% 
  head() %>% 
  dplyr::select(mfr, year, bdy_style, mpg_h, hp) %>% 
  dplyr::mutate(mpg_h = rnorm(n = dplyr::n(), mean = 22, sd = 1)) %>% 
  gt() %>% 
  opt_table_lines() %>% 
  fmt_symbol_first(column = mfr, symbol = "&#x24;", suffix = " ", last_row_n = 6) %>%
  fmt_symbol_first(column = year, symbol = NULL, suffix = "%", last_row_n = 6) %>%
  fmt_symbol_first(column = mpg_h, symbol = "&#37;", suffix = NULL, last_row_n = 6, decimals = 1) %>% 
  fmt_symbol_first(column = hp, symbol = "&#176;", suffix = "F", last_row_n = 6, decimals = NULL, symbol_first = TRUE)
mfr year bdy_style mpg_h hp
Ford $ 2017% coupe 23.2% 647°F
Ferrari   2015  coupe 21.7  597  
Ferrari   2015  convertible 21.9  562  
Ferrari   2014  coupe 20.8  562  
Ferrari   2016  coupe 22.2  661  
Ferrari   2015  convertible 20.6  553  

Unit testing

At this point, we’ve created a custom gt wrapper function, added some relatively robust checks into the function, but are still manually checking the output confirms to our expectations. We can perform proper unit testing with the {testthat} package.

Testing your code can be painful and tedious, but it greatly increases the quality of your code. testthat tries to make testing as fun as possible, so that you get a visceral satisfaction from writing tests.

While an in-depth run through of testhat is beyond the scope of this post, I have included an expandable section with a minimal example below, expanded from the “R Packages” book chapter on testing:

testthat Example
[1] 1
str_length("ab")  # 2
[1] 2
str_length("abc") # 3
[1] 3

So str_length() counts the length of a string, fairly straightforward!

We can convert this to a logical confirmation, which means that a computer can understand if the output was as expected, rather than just printing and reading which is mainly for our interactive use. I have included one FALSE output just as an example.

str_length("a")   == 1 # 1 TRUE
[1] TRUE
str_length("ab")  == 2 # 2 TRUE
[1] TRUE
str_length("abc") == 3 # 3 TRUE
[1] TRUE
str_length("abc") == 1 # 3 FALSE
[1] FALSE

While this testing is useful, we can make it even easier with testhat, by using expect_equal(). Now, these functions will not return anything if they pass. If they fail, then they will print an error, and a helpful statement saying what the failure was.

### All TRUE
testthat::expect_equal(str_length("a"),   1) # TRUE
testthat::expect_equal(str_length("ab"),  2) # TRUE
testthat::expect_equal(str_length("abc"), 3) # TRUE

Just to show you, here’s one where we get a FALSE, the match is off by 2.

testthat::expect_equal(str_length("a"),  3) # FALSE
Error: str_length("a") not equal to 3.
1/1 mismatches
[1] 1 - 3 == -2

The last step, is wrapping our various tests into test_that structure. Here, while the individual tests return no visible output, we can get a friendly message saying they have all passed!

test_that(
  desc = "str_length is number of characters",
  code = {
    expect_equal(str_length("a"), 1)
    expect_equal(str_length("ab"), 2)
    expect_equal(str_length("abc"), 3)
  }
)
Test passed 🥇

We can also see what happens if there is a failure (abcd is not 3 characters, but 4).

test_that(
  desc = "str_length is number of characters",
  code = {
    expect_equal(str_length("a"), 1)
    expect_equal(str_length("ab"), 2)
    expect_equal(str_length("abc"), 3)
    expect_equal(str_length("abcd"), 3)
  }
)
── Failure (<text>:7:5): str_length is number of characters ────────────────────
str_length("abcd") not equal to 3.
1/1 mismatches
[1] 4 - 3 == 1

These tests can be used interactively, but ultimately are even more useful when rolled into an R package. For that next step, I recommend reading through the “R Packages” book, specifically the Packages Chapter.

Testing gt

Now you may say, well those minimal example tests were easy, it’s just counting?! How do I test gt? We can treat gt exactly like what it is, a HTML table. Quick example below using our custom function (fmt_symbol_first()).

ex_gt <- gtcars %>% 
  head() %>% 
  dplyr::select(mfr, year, bdy_style, mpg_h, hp) %>% 
  dplyr::mutate(mpg_h = c(20.2, 22.0, 20.8, 21.2, 22.8, 22.7)) %>% 
  gt() %>% 
  opt_table_font(font = google_font("Roboto Mono")) %>%
  opt_table_lines() %>% 
  fmt_symbol_first(column = mfr, symbol = "&#x24;", suffix = " ", last_row_n = 6) %>%
  fmt_symbol_first(column = year, symbol = NULL, suffix = "%", last_row_n = 6) %>%
  fmt_symbol_first(column = mpg_h, symbol = "&#37;", suffix = NULL, last_row_n = 6, decimals = 1) %>% 
  fmt_symbol_first(column = hp, symbol = "&#176;", suffix = "F", last_row_n = 6, decimals = NULL, symbol_first = TRUE)

# example table
ex_gt
mfr year bdy_style mpg_h hp
Ford $ 2017% coupe 20.2% 647°F
Ferrari   2015  coupe 22.0  597  
Ferrari   2015  convertible 20.8  562  
Ferrari   2014  coupe 21.2  562  
Ferrari   2016  coupe 22.8  661  
Ferrari   2015  convertible 22.7  553  
# what is it?
ex_gt %>% 
  as_raw_html() %>%  
  str(max.level = 1)
 'html' chr "<table style=\"font-family: 'Roboto Mono', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubunt"| __truncated__
 - attr(*, "html")= logi TRUE

rvest

That’s a relatively basic table, but if used interactively it will just print out the output. We can “capture” the raw HTML via gt::as_raw_html(), and then just treat it like another table to “webscrape” with rvest.

library(rvest)
# create object as RAW html
ex_gt_raw <- ex_gt %>% 
  as_raw_html()

# read into rvest, and grab the table body
ex_html_tab <- read_html(ex_gt_raw) %>% 
  html_node("table > tbody") 

# 6 row table!
ex_html_tab
{html_node}
<tbody style="border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3;">
[1] <tr>\n<td style="padding-top: 8px; padding-bottom: 8px; padding ...
[2] <tr>\n<td style="padding-top: 8px; padding-bottom: 8px; padding ...
[3] <tr>\n<td style="padding-top: 8px; padding-bottom: 8px; padding ...
[4] <tr>\n<td style="padding-top: 8px; padding-bottom: 8px; padding ...
[5] <tr>\n<td style="padding-top: 8px; padding-bottom: 8px; padding ...
[6] <tr>\n<td style="padding-top: 8px; padding-bottom: 8px; padding ...

Test HTML

Now that it’s saved as HTML, we can extract a column, and we’re back to raw strings!

col1_extract <- ex_html_tab %>% 
    html_nodes(paste0("td:nth-child(",1 , ")")) %>% 
    html_text() 

col1_extract
[1] "Ford $"            "Ferrari&nbsp&nbsp" "Ferrari&nbsp&nbsp"
[4] "Ferrari&nbsp&nbsp" "Ferrari&nbsp&nbsp" "Ferrari&nbsp&nbsp"

Now, rather than using the whole string, I’m going to focus on testing the 1st row and then the remainder. Mainly because our function should do different things to the first row versus the remaining rows!

col1_extract[1] 
[1] "Ford $"

So this should be relatively straightforward, we want to do an exact match expecting "Ford $".

testthat::expect_match(col1_extract[1], "Ford $")
Error: col1_extract\[1\] does not match "Ford $".
Actual value: "Ford \$"

BUT oh no we get a failure??? This is because the $ is a special character in regex, so we need to “escape” it with \\. This tells regex to parse it as a literal “dollar sign”. After passing the escape, we now get a silent pass!

testthat::expect_match(col1_extract[1], "Ford \\$")

We can run it with test_that() as well, and since it passes we get our friendly little message!

test_that(
  desc = "First word is Ford $",
  code = testthat::expect_match(col1_extract[1], "Ford \\$")
)
Test passed 🌈

Testing function

Now, I want to test the individual columns for different things, so I’m going to write a test expectation function.

I’m interested in:

All using the same HTML input

test_gt_by_col <- function(col_n, row_first = TRUE, expectation){
  
  # if row_first = TRUE, then just get the 1st row
  # otherwise select the remainder
  if(isTRUE(row_first)){
    row_sel <- 1
  } else {
    row_sel <- 2:6
  }
  
  # use our example html
  # grab the column by number
  # get the rows by selection
  # test the expectation
  ex_html_tab %>% 
    html_nodes(paste0("td:nth-child(",col_n , ")")) %>% 
    html_text() %>% 
    .[row_sel] %>% 
    testthat::expect_match(expectation)
}

We can then use our function and avoid having to copy-paste much at all!

test_that(
  desc = "First word is Ford $",
  code = test_gt_by_col(1, row_first = TRUE, expectation = "Ford \\$")
)
Test passed 🌈

Just a quick reminder, if it fails (I’m intentionally failing). We can see that the expectation doesn’t match the remainder.

test_that(
  desc = "First word is Ford $",
  code = test_gt_by_col(1, row_first = FALSE, expectation = "Ford \\$")
)
── Failure (<text>:15:3): First word is Ford $ ─────────────────────────────────
`\.` does not match "Ford \\$".
Actual values:
* Ferrari&nbsp&nbsp
* Ferrari&nbsp&nbsp
* Ferrari&nbsp&nbsp
* Ferrari&nbsp&nbsp
* Ferrari&nbsp&nbsp
Backtrace:
 1. global::test_gt_by_col(1, row_first = FALSE, expectation = "Ford \\$")
 3. testthat::expect_match(., expectation)

Put it all together

We can put it all together now, and test all of our columns of interest, with testthat using our custom testing function ON the output of the custom function we wrote earlier.

Create HTML table, extract w/ rvest, define test function
ex_gt <- gtcars %>% 
  head() %>% 
  dplyr::select(mfr, year, bdy_style, mpg_h, hp) %>% 
  dplyr::mutate(mpg_h = c(20.2, 22.0, 20.8, 21.2, 22.8, 22.7)) %>% 
  gt() %>% 
  opt_table_font(font = google_font("Roboto Mono")) %>%
  opt_table_lines() %>% 
  fmt_symbol_first(column = mfr, symbol = "&#x24;", suffix = " ", last_row_n = 6) %>%
  fmt_symbol_first(column = year, symbol = NULL, suffix = "%", last_row_n = 6) %>%
  fmt_symbol_first(column = mpg_h, symbol = "&#37;", suffix = NULL, last_row_n = 6, decimals = 1) %>% 
  fmt_symbol_first(column = hp, symbol = "&#176;", suffix = "F", last_row_n = 6, decimals = NULL, symbol_first = TRUE)

ex_gt_raw <- ex_gt %>% 
  as_raw_html()

# read into rvest, and grab the table body
ex_html_tab <- read_html(ex_gt_raw) %>% 
  html_node("table > tbody") 

test_gt_by_col <- function(col_n, row_first = TRUE, expectation){
  
  # if row_first = TRUE, then just get the 1st row
  # otherwise select the remainder
  if(isTRUE(row_first)){
    row_sel <- 1
  } else {
    row_sel <- 2:6
  }
  
  # use our example html
  # grab the column by number
  # get the rows by selection
  # test the expectation
  ex_html_tab %>% 
    html_nodes(paste0("td:nth-child(",col_n , ")")) %>% 
    html_text() %>% 
    .[row_sel] %>% 
    testthat::expect_match(expectation)
}

Now we can run our tests on the specific columns and get a lot of “praise”! That’s it for now, but maybe we’ll explore putting these tests into a package down the line.

# Test for escaped characters ---------------------------------------------
# check that a suffix + symbol worked, and that escaped characters can be tested
testthat::test_that(
  "Escaped characters work",
  {
    test_gt_by_col(1, expectation = "Ford \\$")
    test_gt_by_col(1, row_first = FALSE, expectation = "Ferrari&nbsp&nbsp")
  }
  )
Test passed 🥳
# Test for raw percent ----------------------------------------------------
# on this column we used the literal string of %
testthat::test_that(
  "Raw percent character works",
  {
    test_gt_by_col(2, expectation = "2017%")
    test_gt_by_col(2, row_first = FALSE, expectation = "201[4-7]&nbsp")
  }
  )
Test passed 😀
# Test for symbolic percent -----------------------------------------------
# on this column we used the HTML code for percent
testthat::test_that(
  "HTML symbol for percent works",
  {
    test_gt_by_col(4, expectation = "20.2%")
    test_gt_by_col(4, row_first = FALSE, expectation = "[0-9]+&nbsp")
  }
  )
Test passed 🌈
# Test for suffix + symbol ------------------------------------------------
# test for case where the symbol is in front of suffix
testthat::test_that(
  "A combined suffix + symbol work",
  {
    test_gt_by_col(5, expectation = "647°F")
    test_gt_by_col(5, row_first = FALSE, expectation = "[0-9]+&nbsp&nbsp")
  }
  )
Test passed 🌈

Citation

For attribution, please cite this work as

Mock (2021, March 7). The Mockup Blog: Creating a custom gt function for aligning first-row text and testing it with testthat. Retrieved from https://themockup.blog/posts/2021-03-07-creating-a-custom-gt-function-for-aligning-first-row-text-and-testing-it-with-testthat/

BibTeX citation

@misc{mock2021creating,
  author = {Mock, Thomas},
  title = {The Mockup Blog: Creating a custom gt function for aligning first-row text and testing it with testthat},
  url = {https://themockup.blog/posts/2021-03-07-creating-a-custom-gt-function-for-aligning-first-row-text-and-testing-it-with-testthat/},
  year = {2021}
}