overview

This document narrates the demonstration of the (prep + plot + print) technique using the data from Canadian Chronic Disease Survaillance System (CCDSS). We follow the phases of transforming a concept ggplot2 plot into a sequence of custom functions that automate graph production. View data description on this README.md.

Another demonstration of this technique was created data from VADA 2019 Summer School Data Challenge. See github.com/andkov/vada-2019-summer-school repository for reproducible scripts. The data, however, is not available publically. Please contact VADA program coordinator to enquire about data access.

We will proceed building a system for reproducible graphing in the following sequence of phases:

  • PHASE 0 - explore the data
  • PHASE 1 - build the plot
  • PHASE 2 - build the plot function
  • PHASE 3 - isolate the prep step
  • PHASE 4 - isolate the print step
  • PHASE 5 - serialize graph production
  • PHASE 6 - place graphs onto the canvas

Each phase marks a milestone in the expanding complexity of the visualization system.

disclaimer

mindset

Data scientists describe the ultimate reality about data using various dialets of expression. Each translation has its benefits and disadvantages. We need them all to tell a good story.

No one language is better than the other. Each allows for different shades of distinction in model specification.

set the scene

PHASE 0 - explore

inspect-2 categorical

area n
Alberta 336
British Columbia 336
Canada 336
Manitoba 336
New Brunswick 336
Newfoundland and Labrador 336
Northwest Territories 336
Nova Scotia 336
Nunavut 231
Ontario 336
Prince Edward Island 336
Quebec 336
Saskatchewan 315
condition n
Use of health services for mood and anxiety disorders (annual) 4242
age_group n
1-19 606
1+ 606
20-34 606
35-49 606
50-64 606
65-79 606
80+ 606
sex n
Both sexes 1414
Females 1414
Males 1414
year n
2000 252
2001 252
2002 252
2003 252
2004 252
2005 273
2006 273
2007 273
2008 273
2009 273
2010 273
2011 273
2012 273
2013 273
2014 273
2015 252

inspect-3 continuous

variable type na na_pct unique min mean max
rate dou 31 0.7 1596 0.91 9.87 22.47
rate_cv dou 31 0.7 394 0.05 1.93 31.62
rate_95_ci_lower dou 31 0.7 1571 0.74 9.60 22.20
rate_95_ci_upper dou 31 0.7 1576 0.99 10.17 22.74
number dou 0 0.0 2902 0.00 105923.92 3617090.00
population dou 0 0.0 3969 50.00 1025646.57 36691810.00

PHASE 1 - graph

graph-1 basic

PHASE 2 - make plot

# We need our function to offer us a convenient way to:
# 1. Control the order of the columns (and which are displayed)
# 2. Control the order of the rows    (and which are displayed)
# 3. Control the order and aesthetics of the color dimention

# if we were to pack everything into a single function we would get something like:
make_plot_1_packed <- function(
  d
  ,measure
){
  d1 <- d
  # d1 <- ds1 # for testing and development
  # create support objects
  order_of_age_groups <- d1 %>% 
    dplyr::arrange() %>% 
    dplyr::distinct(age_group) %>% 
    as.list() %>% unlist() %>% as.character()
  # make total value to be at the end of the vector
  order_of_age_groups <- c(setdiff(order_of_age_groups,"1+"),"1+")
  
  order_of_areas <- d1 %>% 
    dplyr::distinct(area) %>% 
    dplyr::arrange(area) %>% 
    as.list() %>% unlist() %>% as.character()
  # make total value to be at the beginning of the vector
  order_of_areas <- c("Canada", setdiff(order_of_areas, "Canada")  )
  # to customize the order of levels
  levels_sex <- c("Females", "Males","Both sexes")
  # 
  d1 <- d %>% 
    dplyr::mutate(
      years_since_2000 = year - 2000 # to create a shorter label
      # to enforce the chosen order of the levels:
      ,area            = factor(area,      levels = order_of_areas)
      ,age_group       = factor(age_group, levels = order_of_age_groups)
      ,levels_sex      = factor(sex,       levels = levels_sex)
    )
  # to create custom pallets:
  
  # descriptive tag              # green     # red      # blue
  palette_sex_dark         <- c("#1b9e77", "#d95f02", "#7570b3") #duller than below
  # palette_sex_dark         <- c("#66c2a5", "#fc8d62", "#8da0cb") #brighter than above
  # taken from http://colorbrewer2.org/#type=qualitative&scheme=Dark2&n=3
  pallete_sex_light        <- adjustcolor(palette_sex_dark, alpha.f = .2)
  names(palette_sex_dark)  <- c("Both sexes", "Females", "Males")
  names(pallete_sex_light) <- names(pallete_sex_light)
  
  g_out <- d1 %>% 
    ggplot(aes_string(
      x      = "years_since_2000"
      ,y     = "rate"
      ,color = "sex"
    ))+
    geom_line( aes_string(group = "sex") )+
    geom_point()+
    facet_grid(area ~ age_group)+
    scale_color_manual(values = palette_sex_dark)+
    # scale_color_manual(values = pallete_sex_light)+
    theme_minimal()+
    labs( title = "Crude prevalence of MH service utilization in Canada")
  return(g_out)  
}
# how to use
ds1 %>% 
  # to limit the view while in development
  dplyr::filter(age_group %in% c("1-19", "20-34", "35-49", "65-79")) %>%
  dplyr::filter(area %in% c("Canada", "Manitoba", "British Columbia")) %>%
  dplyr::filter(sex %in% c("Males","Females")) %>%
  make_plot_1_packed(measure = "rate")

PHASE 3 - prep data

PHASE 4 - print plot

print_plot_1 <- function(
  l_support
  ,path_output_folder
  ,prefex     = NA
  ,graph_name = "auto"
  ,...
){
  if( graph_name == "auto" ){
    graph_name <- paste0(
      # should be replaced with features appropriate for analysis
      l_support$measure
      ,"-("
      ,l_support$set$sex %>% paste0(collapse = "-")
      ,")-("
      ,l_support$set$area %>% paste0(collapse = "-")
      ,")-("
      ,l_support$set$age_group %>% paste0(collapse = "-")
      ,")"
      ,collapse = "-"
    )
  }else{
    graph_name <- paste0(l_support$measure,"-", graph_name)
  }
  # add a label to distinguish a particular graph (last element in the file name)
  if( !is.na(prefex) ){ # inserts a PREFEX before the graph name
    (path_save_plot <- paste0(path_output_folder, prefex,"-",graph_name) )
  }else{
    ( path_save_plot <- paste0(path_output_folder, graph_name) )
  }

  # if folder does not exist yet, create it
  if( !dir.exists(path_output_folder) ){
    dir.create(path_output_folder)
  }
  # print the graphical object using jpeg device
  path_printed_plot <- paste0(path_save_plot, ".jpg")
  jpeg(
    filename = path_printed_plot
    ,...
  )
  l_support$graph %>% print() # reach into the custom object we made for graphing
  dev.off() # close the device
  l_support[["path_plot"]] <- path_printed_plot
  return(l_support)
}
# how to use
l_support <- ds1 %>% 
  prep_data_plot_1(
    set_sex        = c("Females", "Males") 
    # set_sex        = c("Females", "Males", "Both sexes") 
    ,set_area      = c("Canada", "Manitoba", "British Columbia") 
    ,set_age_group = c("1-19",  "35-49", "65-79","1+")
  ) %>% 
  make_plot_1(
    measure = "rate"
  ) %>% 
  print_plot_1(
    path_output_folder = "./analysis/scenario-3/prints/demo-1/"
    # ,prefex            = "attempt1"
    # ,graph_name        = "take1" # `auto` by default
# options added through `...` into the jpeg() function   
    ,width   = 1700
    ,height  = 700
    ,units   = "px"
    ,quality = 100
    ,res     = 200
  )

PHASE 5 - serialize

future directions

  • how to incorporate additional condition? (i.e. anxiety_mood instead of mental_health)
  • how to add additional INTERNAL dimension? (i.e. marital_status)

session information

For the sake of documentation and reproducibility, the current report was rendered in the following environment. Click the line below to expand.

Environment

- Session info -----------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.2 (2018-12-20)
 os       Windows >= 8 x64            
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/Los_Angeles         
 date     2019-06-19                  

- Packages ---------------------------------------------------------------------------------------
 package      * version date       lib source        
 abind          1.4-5   2016-07-21 [1] CRAN (R 3.5.2)
 assertthat     0.2.1   2019-03-21 [1] CRAN (R 3.5.3)
 backports      1.1.4   2019-04-10 [1] CRAN (R 3.5.3)
 callr          3.2.0   2019-03-15 [1] CRAN (R 3.5.3)
 car            3.0-3   2019-05-27 [1] CRAN (R 3.5.3)
 carData        3.0-2   2018-09-30 [1] CRAN (R 3.5.2)
 cellranger     1.1.0   2016-07-27 [1] CRAN (R 3.5.3)
 cli            1.1.0   2019-03-19 [1] CRAN (R 3.5.3)
 colorspace     1.4-1   2019-03-18 [1] CRAN (R 3.5.3)
 crayon         1.3.4   2017-09-16 [1] CRAN (R 3.5.3)
 curl           3.3     2019-01-10 [1] CRAN (R 3.5.3)
 data.table     1.12.2  2019-04-07 [1] CRAN (R 3.5.3)
 desc           1.2.0   2018-05-01 [1] CRAN (R 3.5.3)
 devtools       2.0.2   2019-04-08 [1] CRAN (R 3.5.3)
 dichromat    * 2.0-0   2013-01-24 [1] CRAN (R 3.5.2)
 digest         0.6.19  2019-05-20 [1] CRAN (R 3.5.3)
 dplyr        * 0.8.1   2019-05-14 [1] CRAN (R 3.5.3)
 DT             0.6     2019-05-09 [1] CRAN (R 3.5.3)
 evaluate       0.14    2019-05-28 [1] CRAN (R 3.5.2)
 explore        0.4.2   2019-05-22 [1] CRAN (R 3.5.3)
 extrafont    * 0.17    2014-12-08 [1] CRAN (R 3.5.2)
 extrafontdb    1.0     2012-06-11 [1] CRAN (R 3.5.2)
 fansi          0.4.0   2018-10-05 [1] CRAN (R 3.5.3)
 forcats        0.4.0   2019-02-17 [1] CRAN (R 3.5.3)
 foreign        0.8-71  2018-07-20 [2] CRAN (R 3.5.2)
 fs             1.3.1   2019-05-06 [1] CRAN (R 3.5.3)
 ggplot2      * 3.1.1   2019-04-07 [1] CRAN (R 3.5.3)
 glue           1.3.1   2019-03-12 [1] CRAN (R 3.5.3)
 gridExtra      2.3     2017-09-09 [1] CRAN (R 3.5.3)
 gtable         0.3.0   2019-03-25 [1] CRAN (R 3.5.3)
 haven          2.1.0   2019-02-19 [1] CRAN (R 3.5.3)
 highr          0.8     2019-03-20 [1] CRAN (R 3.5.3)
 hms            0.4.2   2018-03-10 [1] CRAN (R 3.5.3)
 htmltools      0.3.6   2017-04-28 [1] CRAN (R 3.5.3)
 htmlwidgets    1.3     2018-09-30 [1] CRAN (R 3.5.3)
 httpuv         1.5.1   2019-04-05 [1] CRAN (R 3.5.3)
 httr           1.4.0   2018-12-11 [1] CRAN (R 3.5.3)
 jpeg           0.1-8   2014-01-23 [1] CRAN (R 3.5.2)
 kableExtra     1.1.0   2019-03-16 [1] CRAN (R 3.5.3)
 knitr        * 1.23    2019-05-18 [1] CRAN (R 3.5.2)
 labeling       0.3     2014-08-23 [1] CRAN (R 3.5.2)
 later          0.8.0   2019-02-11 [1] CRAN (R 3.5.3)
 lazyeval       0.2.2   2019-03-15 [1] CRAN (R 3.5.3)
 magrittr     * 1.5     2014-11-22 [1] CRAN (R 3.5.3)
 memoise        1.1.0   2017-04-21 [1] CRAN (R 3.5.3)
 mime           0.6     2018-10-05 [1] CRAN (R 3.5.2)
 munsell        0.5.0   2018-06-12 [1] CRAN (R 3.5.3)
 openxlsx       4.1.0.1 2019-05-28 [1] CRAN (R 3.5.3)
 pillar         1.4.1   2019-05-28 [1] CRAN (R 3.5.2)
 pkgbuild       1.0.3   2019-03-20 [1] CRAN (R 3.5.3)
 pkgconfig      2.0.2   2018-08-16 [1] CRAN (R 3.5.3)
 pkgload        1.0.2   2018-10-29 [1] CRAN (R 3.5.3)
 plyr           1.8.4   2016-06-08 [1] CRAN (R 3.5.3)
 prettyunits    1.0.2   2015-07-13 [1] CRAN (R 3.5.3)
 processx       3.3.1   2019-05-08 [1] CRAN (R 3.5.2)
 promises       1.0.1   2018-04-13 [1] CRAN (R 3.5.3)
 ps             1.3.0   2018-12-21 [1] CRAN (R 3.5.3)
 purrr          0.3.2   2019-03-15 [1] CRAN (R 3.5.3)
 R6             2.4.0   2019-02-14 [1] CRAN (R 3.5.3)
 RColorBrewer * 1.1-2   2014-12-07 [1] CRAN (R 3.5.2)
 Rcpp           1.0.1   2019-03-17 [1] CRAN (R 3.5.3)
 readr          1.3.1   2018-12-21 [1] CRAN (R 3.5.3)
 readxl         1.3.1   2019-03-13 [1] CRAN (R 3.5.3)
 remotes        2.0.4   2019-04-10 [1] CRAN (R 3.5.3)
 reshape2       1.4.3   2017-12-11 [1] CRAN (R 3.5.3)
 rio            0.5.16  2018-11-26 [1] CRAN (R 3.5.3)
 rlang          0.3.4   2019-04-07 [1] CRAN (R 3.5.3)
 rmarkdown      1.13    2019-05-22 [1] CRAN (R 3.5.3)
 rprojroot      1.3-2   2018-01-03 [1] CRAN (R 3.5.3)
 rstudioapi     0.10    2019-03-19 [1] CRAN (R 3.5.3)
 Rttf2pt1       1.3.7   2018-06-29 [1] CRAN (R 3.5.2)
 rvest          0.3.4   2019-05-15 [1] CRAN (R 3.5.3)
 scales         1.0.0   2018-08-09 [1] CRAN (R 3.5.3)
 sessioninfo    1.1.1   2018-11-05 [1] CRAN (R 3.5.3)
 shiny          1.3.2   2019-04-22 [1] CRAN (R 3.5.2)
 stringi        1.4.3   2019-03-12 [1] CRAN (R 3.5.3)
 stringr        1.4.0   2019-02-10 [1] CRAN (R 3.5.3)
 testit         0.9     2018-12-05 [1] CRAN (R 3.5.3)
 tibble         2.1.3   2019-06-06 [1] CRAN (R 3.5.3)
 tidyr          0.8.3   2019-03-01 [1] CRAN (R 3.5.3)
 tidyselect     0.2.5   2018-10-11 [1] CRAN (R 3.5.3)
 usethis        1.5.0   2019-04-07 [1] CRAN (R 3.5.3)
 utf8           1.1.4   2018-05-24 [1] CRAN (R 3.5.3)
 vctrs          0.1.0   2018-11-29 [1] CRAN (R 3.5.3)
 viridis      * 0.5.1   2018-03-29 [1] CRAN (R 3.5.3)
 viridisLite  * 0.3.0   2018-02-01 [1] CRAN (R 3.5.3)
 webshot        0.5.1   2018-09-28 [1] CRAN (R 3.5.3)
 withr          2.1.2   2018-03-15 [1] CRAN (R 3.5.3)
 xfun           0.7     2019-05-14 [1] CRAN (R 3.5.3)
 xml2           1.2.0   2018-01-24 [1] CRAN (R 3.5.3)
 xtable         1.8-4   2019-04-21 [1] CRAN (R 3.5.3)
 yaml           2.2.0   2018-07-25 [1] CRAN (R 3.5.2)
 zeallot        0.1.0   2018-01-28 [1] CRAN (R 3.5.3)
 zip            2.0.2   2019-05-13 [1] CRAN (R 3.5.3)

[1] C:/Users/an499583/Documents/R/win-library/3.5
[2] C:/Program Files/R/R-3.5.2/library

Report rendered by an499583 at 2019-06-19, 10:21 -0700 in 22 seconds.