Removing image backgrounds with magick

magick tables

Tables with dark backgrounds deserve transparent logos!

Thomas Mock https://twitter.com/thomas_mock
01-28-2021

The Problem

Colin Welsh reached out on Twitter asking about removing the background from player headshots for use in dark-themed table. He had a bunch of player headshots for the NHL, but they had a white background which he wanted to remove and then embed the headshots in gt.

He thought that {magick} could be used to remove the background, and let’s see what we can do!

If you missed my last blogpost, it has some more details on the {magick} package. In short, {magick} is an R wrapper around the ImageMagick library that is used for image processing.

For another fantastic longer form blogpost, make sure to check out Deemah’s blogpost on Miracles with magick! I adapted some of his examples for the logos at the end. He pointed out rightfully so, that simply replacing ALL white with transparent can have some negative effects (this REALLy is a problem with logos). I’ve gone ahead and rebuilt the examples with that in mind.

Load the Data

We’ll load our libraries and pull in the data of interest, there are a lot of columns but I’ll limit it to a subset later on.

library(tidyverse)
library(gt)
library(magick)

skater_game_score <- read_rds(url('https://github.com/Colinifer/hockey/blob/master/gt_help_dataset.rds?raw=true'))

glimpse(skater_game_score)
Rows: 50
Columns: 26
$ player          <chr> "CONNOR.MCDAVID", "NATHAN.MACKINNON", "MIKK…
$ games           <int> 7, 6, 6, 6, 6, 7, 6, 6, 7, 7, 6, 6, 6, 6, 6…
$ g               <dbl> 4, 2, 5, 1, 5, 4, 3, 3, 4, 3, 0, 1, 3, 2, 4…
$ a1              <dbl> 4, 5, 1, 6, 1, 6, 1, 1, 2, 4, 2, 4, 2, 4, 1…
$ a2              <dbl> 2, 1, 1, 3, 2, 0, 2, 3, 1, 2, 5, 1, 4, 2, 2…
$ pts             <dbl> 10, 8, 7, 10, 8, 10, 6, 7, 7, 9, 7, 6, 9, 8…
$ gs_tot          <dbl> 11.855, 10.630, 10.020, 9.785, 9.690, 9.135…
$ team.x          <chr> "EDM", "COL", "COL", "L.A", "MTL", "TOR", "…
$ gs_avg          <dbl> 1.5067857, 1.7733333, 1.8225000, 1.7329167,…
$ num_last_first  <chr> "97 MCDAVID, CONNOR", "29 MACKINNON, NATHAN…
$ player_team_num <chr> "EDM97", "COL29", "COL96", "L.A11", "MTL73"…
$ position        <chr> "C", "C", "R", "C", "C", "C", "C", "C", "C"…
$ position_type   <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F"…
$ season          <chr> "20202021", "20202021", "20202021", "202020…
$ session         <chr> "R", "R", "R", "R", "R", "R", "R", "R", "R"…
$ team.y          <chr> "EDM", "COL", "COL", "L.A", "MTL", "TOR", "…
$ jersey_number   <dbl> 97, 29, 96, 11, 73, 16, 34, 87, 91, 29, 9, …
$ first_name      <chr> "CONNOR", "NATHAN", "MIKKO", "ANZE", "TYLER…
$ last_name       <chr> "MCDAVID", "MACKINNON", "RANTANEN", "KOPITA…
$ year            <int> 20202021, 20202021, 20202021, 20202021, 202…
$ full_name       <glue> "CONNOR MCDAVID", "NATHAN MACKINNON", "MIK…
$ team_name       <chr> "Oilers", "Avalanche", "Avalanche", "Kings"…
$ player_id       <int> 8478402, 8477492, 8478420, 8471685, 8475726…
$ logo            <chr> "http://content.sportslogos.net/logos/1/12/…
$ headshot_url    <glue> "https://cms.nhl.bamgrid.com/images/headsh…
$ action_shot_url <glue> "https://cms.nhl.bamgrid.com/images/action…

Initial Table

We can quickly convert this into gt table like so. Look pretty good, and the player headshots look fine as well.

skater_game_score %>% 
  slice(1:10) %>%
  select(player, headshot_url, games:pts)%>% 
  gt() %>% 
  text_transform(
    locations = cells_body(vars(headshot_url)),
    fn = function(x){
      web_image(url = x)
    }
  )
player headshot_url games g a1 a2 pts
CONNOR.MCDAVID 7 4 4 2 10
NATHAN.MACKINNON 6 2 5 1 8
MIKKO.RANTANEN 6 5 1 1 7
ANZE.KOPITAR 6 1 6 3 10
TYLER.TOFFOLI 6 5 1 2 8
MITCH.MARNER 7 4 6 0 10
AUSTON.MATTHEWS 6 3 1 2 6
SIDNEY.CROSBY 6 3 1 3 7
JOHN.TAVARES 7 4 2 1 7
LEON.DRAISAITL 7 3 4 2 9

The real problem here is that Colin was interested in using a black background for his table. Let’s see what that looks like. Here we can see that the player background adds a lot of unnecessary white to our otherwise nice looking black table.

skater_game_score %>% 
  slice(1:10) %>%
  select(player, headshot_url, games:pts)%>% 
  gt() %>% 
  text_transform(
    locations = cells_body(vars(headshot_url)),
    fn = function(x){
      web_image(url = x)
    }
  ) %>% 
  tab_options(
    table.background.color = "black"
  )
player headshot_url games g a1 a2 pts
CONNOR.MCDAVID 7 4 4 2 10
NATHAN.MACKINNON 6 2 5 1 8
MIKKO.RANTANEN 6 5 1 1 7
ANZE.KOPITAR 6 1 6 3 10
TYLER.TOFFOLI 6 5 1 2 8
MITCH.MARNER 7 4 6 0 10
AUSTON.MATTHEWS 6 3 1 2 6
SIDNEY.CROSBY 6 3 1 3 7
JOHN.TAVARES 7 4 2 1 7
LEON.DRAISAITL 7 3 4 2 9

Clean the images

So our goal is to remove the white background and turn it transparent. There are 3 steps here:

Extract the image name

We have a url that points at a player headshot. An example is:

https://cms.nhl.bamgrid.com/images/headshots/current/168x168/8478402.jpg

Which returns:

Now, we only need the image name, not the URL. We can remove the extra “fluff” around the image name with regex + stringr::str_replace() or base::gsub().

Now… if you’re anything like me, the below regex kind of looks like gibberish.

ex_url <- "https://cms.nhl.bamgrid.com/images/headshots/current/168x168/8478402.jpg"

str_replace(ex_url, ".*[/]([^.]+)[.].*", "\\1")
[1] "8478402"

If you want to see some explanations for the regex, see the details tag below.

Fun aside on regex

This regex code: .*[/]([^.]+)[.].* gives us the following explanation at regex101.com

We can see the capture group via str_match(), this will separate out the full match from the capture group.

str_match(ex_url, ".*[/]([^.]+)[.].*")
     [,1]                                                                      
[1,] "https://cms.nhl.bamgrid.com/images/headshots/current/168x168/8478402.jpg"
     [,2]     
[1,] "8478402"

Note that this could actually be much simpler to fix…just remove the static portions!

ex_url %>% 
  str_remove("https://cms.nhl.bamgrid.com/images/headshots/current/168x168/") %>% 
  str_remove(".jpg")
[1] "8478402"

Trim the image and remove background

We can now trim the image if necessary and turn the background “white” into transparent. Note we’re using image_fill() here instead of simply image_background() which would replace ALL the white in the image with transparency.

# clean image and write to disk
clean_img_transparent <- function(img_url, trim = FALSE){
  
  # find the name of the img and extract it
  img_name <- str_replace(img_url, ".*[/]([^.]+)[.].*", "\\1")
  
  # some images need to be trimmed
  trim_area <- if(isTRUE(trim)){
    geometry_area(0, 0, 0, 10)
  } else {
    geometry_area(0, 0, 0, 0)
  }
  
  img_url %>% 
    image_read() %>% 
    image_crop(geometry = trim_area) %>% 
    image_fill(
      color = "transparent", 
      refcolor = "white", 
      fuzz = 4,
      point = "+1+1" # start at top left 1 pixel in
      ) 
}

We can test the function with and without trimming, then stack them next to each other. I will use image_ggplot() to “show” the image in this RMarkdown blog, but interactively you could remove that as it prints to the R Console.

img_ex <- clean_img_transparent(ex_url)
img_ex_trim <- clean_img_transparent(ex_url, trim = TRUE) 

c(img_ex, img_ex_trim) %>% 
  image_append() %>% 
  image_ggplot()

This looks great, but the for the last portion we’ll need to write to disk. It doesn’t really “look” any different, but the whitepspace around the player image is now “transparent”.

We could see this more clearly by replacing with a different color. Note that again if you just used image_transparent() as I did in a previous version of this post you’d “lose” some white details in the player’s jersey/sweater.

ex_url %>% 
    image_read() %>% 
    image_crop(geometry = geometry_area(0, 0, 0, 10)) %>% 
    image_fill(
      color = "green", 
      refcolor = "white", 
      fuzz = 4,
      point = "+1+1" # start at top left 1 pixel in
      ) %>% 
  image_ggplot()

Write to disk

We can now use image_write() to write the image to disk so it can be used with gt::local_image(). We’ll add that to our function.

# clean image and write to disk
clean_img_transparent <- function(img_url, trim = FALSE){
  
  # find the name of the img and extract it
  img_name <- str_replace(img_url, ".*[/]([^.]+)[.].*", "\\1")
  
  # some images need to be trimmed
  trim_area <- if(isTRUE(trim)){
    geometry_area(0, 0, 0, 10)
  } else {
    geometry_area(0, 0, 0, 0)
  }
  
  img_url %>% 
    image_read() %>% 
    image_crop(geometry = trim_area) %>% 
    image_fill(
      color = "transparent", 
      refcolor = "white", 
      fuzz = 4,
      point = "+1+1"
      ) %>% 
    image_write(path = paste0(img_name, ".png"), format = "png")
}

All together now

Now we need to get the top 10 players, grab the headshot_url column, and then remove any missing images (na.jpg), trim the first image, and then write out to disk.

skater_game_score <- read_rds(url('https://github.com/Colinifer/hockey/blob/master/gt_help_dataset.rds?raw=true'))

skater_include <- skater_game_score %>% 
  slice(1:10) %>%
  select(img_url = headshot_url) %>% 
  filter(str_detect(img_url, pattern = "NA.jpg", negate = TRUE)) %>% 
  mutate(trim = c(TRUE, rep(FALSE, 9)))

skater_include
# A tibble: 10 x 2
   img_url                                                       trim 
   <glue>                                                        <lgl>
 1 https://cms.nhl.bamgrid.com/images/headshots/current/168x168… TRUE 
 2 https://cms.nhl.bamgrid.com/images/headshots/current/168x168… FALSE
 3 https://cms.nhl.bamgrid.com/images/headshots/current/168x168… FALSE
 4 https://cms.nhl.bamgrid.com/images/headshots/current/168x168… FALSE
 5 https://cms.nhl.bamgrid.com/images/headshots/current/168x168… FALSE
 6 https://cms.nhl.bamgrid.com/images/headshots/current/168x168… FALSE
 7 https://cms.nhl.bamgrid.com/images/headshots/current/168x168… FALSE
 8 https://cms.nhl.bamgrid.com/images/headshots/current/168x168… FALSE
 9 https://cms.nhl.bamgrid.com/images/headshots/current/168x168… FALSE
10 https://cms.nhl.bamgrid.com/images/headshots/current/168x168… FALSE

I’ll use pwalk() to pass the urls + trim argument to clean_img_transparent() for each of the combos.

skater_include%>% 
  pwalk(clean_img_transparent)

We can then read them back in and apply to a gt table, and boom we’re done!

skater_game_score %>% 
  slice(1:10) %>%
  mutate(
    img_name = str_replace(headshot_url, ".*[/]([^.]+)[.].*", "\\1"),
    img_name = paste0(img_name, ".png"),
    img_name = map(img_name, local_image),
    img_name = map(img_name, ~html(as.character(.x)))
    ) %>% 
  select(player, img_name, games:pts) %>% 
  gt() %>% 
  tab_options(
    table.background.color = "black"
  ) 
player img_name games g a1 a2 pts
CONNOR.MCDAVID 7 4 4 2 10
NATHAN.MACKINNON 6 2 5 1 8
MIKKO.RANTANEN 6 5 1 1 7
ANZE.KOPITAR 6 1 6 3 10
TYLER.TOFFOLI 6 5 1 2 8
MITCH.MARNER 7 4 6 0 10
AUSTON.MATTHEWS 6 3 1 2 6
SIDNEY.CROSBY 6 3 1 3 7
JOHN.TAVARES 7 4 2 1 7
LEON.DRAISAITL 7 3 4 2 9

What about the logos?

Now, the logos present another problem, they have large whitespace areas INSIDE the logos themselves… so if you were to apply a global transparency they would end up transparent INSIDE the logos where they should be white.

Again, for another fantastic longer form blogpost, make sure to check out Deemah’s blogpost on Miracles with magick! I looked through some of his examples, although I used a slightly different workflow here for the logos. Another ImageMagick proper blogpost from Nate Murray was also very useful.

So here, we’ll need to convert some of our code, but it will still turn out pretty good! Again please note, that I’m using image_ggplot() just to show it in the blogpost, and interactively you can just return the output to the RStudio viewer.

Logo Fill

We can use a logo with a lot of whitespace, to play around with.

logo_url  <- "http://content.sportslogos.net/logos/1/16/thumbs/124.gif"                     

raw_logo <- logo_url %>%
  image_read() 

raw_logo %>%
  image_ggplot()

For our steps we’re going to fill (basically flood) in the white space around the logo with green. Now because some logos (like the Avalanche) have areas that can’t get flooded with color in one path, I’m going to flood at each corner (top left, top right, bottom left, bottom right). For our real usage, we’re going to convert this “green” space to transparent instead.

img_filled <- raw_logo %>% 
    image_fill("green", "+1+1", fuzz = 50, refcolor = "white") %>% 
    image_fill("green", "+140+1", fuzz = 50, refcolor = "white") %>% 
    image_fill("green", "+1+99", fuzz = 50, refcolor = "white") %>% 
    image_fill("green", "+140+99", fuzz = 50, refcolor = "white")

img_filled %>% 
  image_ggplot()
img_filled <- raw_logo %>% 
    image_fill("transparent", "+1+1", fuzz = 50, refcolor = "white") %>% 
    image_fill("transparent", "+140+1", fuzz = 50, refcolor = "white") %>% 
    image_fill("transparent", "+1+99", fuzz = 50, refcolor = "white") %>% 
    image_fill("transparent", "+140+99", fuzz = 50, refcolor = "white")

Logo edges and mask

Now that we have a transparency around the logo, we can take that ‘opacity’ channel and extract everything but that.

img_filled %>% 
    image_channel("Opacity") %>% 
    image_convert(matte=FALSE) %>% 
  image_ggplot()

Now to create a proper “mask” that we can apply to the image, we can negate this and apply a gentle blur to make the edges not as “sharp” against the background.

logo_mask <- img_filled %>% 
    image_channel("Opacity") %>% 
    image_convert(matte=FALSE) %>% 
    image_negate() %>% 
    image_blur()

logo_mask %>% 
  image_ggplot()

This looks great as a mask! Note, that you can’t really “see” the mask here since it’s really just affecting the whitespace around the logo.

image_composite(raw_logo, logo_mask, operator = "CopyOpacity") %>% 
  image_ggplot()

Function applied

We can convert this to a function just like we did above. We’re getting the name, then reading in the image from a url, applying our fills, converting to transparent, flipping the image as a mask, and then applying our blur. Once we apply the mask we’ll write it back to disk.

clean_logo_transparent <- function(img_url) {
  
  # find the name of the img and extract it
  img_name <- str_replace(img_url, ".*[/]([^.]+)[.].*", "\\1")

  raw_img <- img_url %>%
    image_read() %>% 
    image_convert("PNG")
  
  img_mask <- raw_img  %>% 
    image_fill("transparent", "+1+1", fuzz = 2, refcolor = "white") %>% 
    image_fill("transparent", "+1+99", fuzz = 2, refcolor = "white") %>% 
    image_fill("transparent", "+140+1", fuzz = 2, refcolor = "white") %>% 
    image_fill("transparent", "+140+99", fuzz = 2, refcolor = "white") %>% 
    image_channel("Opacity") %>%
    image_convert(matte=FALSE) %>%
    image_negate() %>%
    image_blur()
  
  
  image_composite(raw_img, img_mask, operator = "CopyOpacity") %>%
    image_write(paste0(img_name, ".png"))
}

Once again, we can use purrr::pwalk() to write out the images to disk in bulk.

skater_game_score %>% 
  slice(1:10) %>%
  select(img_url = logo) %>% 
  filter(str_detect(img_url, pattern = "NA.jpg", negate = TRUE)) %>% 
  pwalk(clean_logo_transparent)

Put in a table

Our code here is again just a repeat of what we did above. This turns out remarkably nice!

skater_game_score %>% 
  slice(1:10) %>%
  mutate(
    img_name = str_replace(logo, ".*[/]([^.]+)[.].*", "\\1"),
    img_name = paste0(img_name, ".png"),
    img_name = map(img_name, local_image),
    img_name = map(img_name, ~html(as.character(.x)))
  ) %>% 
  select(player, img_name, games:pts) %>% 
  gt() %>% 
  tab_options(
    table.background.color = "black"
  ) 
player img_name games g a1 a2 pts
CONNOR.MCDAVID 7 4 4 2 10
NATHAN.MACKINNON 6 2 5 1 8
MIKKO.RANTANEN 6 5 1 1 7
ANZE.KOPITAR 6 1 6 3 10
TYLER.TOFFOLI 6 5 1 2 8
MITCH.MARNER 7 4 6 0 10
AUSTON.MATTHEWS 6 3 1 2 6
SIDNEY.CROSBY 6 3 1 3 7
JOHN.TAVARES 7 4 2 1 7
LEON.DRAISAITL 7 3 4 2 9

Logos without Mask

OK so one more spoiler, we don’t HAVE to create a mask, but it could be useful in the future! We can just flood the area with transparency and write it out.

Still fun to play around with various techniques!

clean_logo_transparent <- function(img_url) {
  
  # find the name of the img and extract it
  img_name <- str_replace(img_url, ".*[/]([^.]+)[.].*", "\\1")

  raw_img <- img_url %>%
    image_read() %>% 
    image_convert("PNG")
  
  img_out <- raw_img  %>% 
    image_fill("transparent", "+1+1", fuzz = 2, refcolor = "white") %>% 
    image_fill("transparent", "+1+99", fuzz = 2, refcolor = "white") %>% 
    image_fill("transparent", "+140+1", fuzz = 2, refcolor = "white") %>% 
    image_fill("transparent", "+140+99", fuzz = 2, refcolor = "white")
  
  
  img_out %>% 
    image_write(paste0(img_name, ".png"))
}

Once again, we can use purrr::pwalk() to write out the images to disk in bulk.

skater_game_score %>% 
  slice(1:10) %>%
  select(img_url = logo) %>% 
  filter(str_detect(img_url, pattern = "NA.jpg", negate = TRUE)) %>% 
  pwalk(clean_logo_transparent)

Put in a table

Our code here is again just a repeat of what we did above. This turns out remarkably nice!

skater_game_score %>% 
  slice(1:10) %>%
  mutate(
    img_name = str_replace(logo, ".*[/]([^.]+)[.].*", "\\1"),
    img_name = paste0(img_name, ".png"),
    img_name = map(img_name, local_image),
    img_name = map(img_name, ~html(as.character(.x)))
  ) %>% 
  select(player, img_name, games:pts) %>% 
  gt() %>% 
  tab_options(
    table.background.color = "black"
  ) 
player img_name games g a1 a2 pts
CONNOR.MCDAVID 7 4 4 2 10
NATHAN.MACKINNON 6 2 5 1 8
MIKKO.RANTANEN 6 5 1 1 7
ANZE.KOPITAR 6 1 6 3 10
TYLER.TOFFOLI 6 5 1 2 8
MITCH.MARNER 7 4 6 0 10
AUSTON.MATTHEWS 6 3 1 2 6
SIDNEY.CROSBY 6 3 1 3 7
JOHN.TAVARES 7 4 2 1 7
LEON.DRAISAITL 7 3 4 2 9

Thanks to Colin for sharing this problem, and for Deemah pointing me to his blogpost!

Citation

For attribution, please cite this work as

Mock (2021, Jan. 28). The Mockup Blog: Removing image backgrounds with magick. Retrieved from https://themockup.blog/posts/2021-01-28-removing-image-backgrounds-with-magick/

BibTeX citation

@misc{mock2021removing,
  author = {Mock, Thomas},
  title = {The Mockup Blog: Removing image backgrounds with magick},
  url = {https://themockup.blog/posts/2021-01-28-removing-image-backgrounds-with-magick/},
  year = {2021}
}