overview

This document narrates the demonstration of the (prep + plot + print) technique using the data from Canadian Chronic Disease Survaillance System (CCDSS). We follow the phases of transforming a concept ggplot2 plot into a sequence of custom functions that automate graph production. View data description on this README.md.

Another demonstration of this technique was created data from VADA 2019 Summer School Data Challenge. See github.com/andkov/vada-2019-summer-school repository for reproducible scripts. The data, however, is not available publically. Please contact VADA program coordinator to enquire about data access.

We will proceed building a system for reproducible graphing in the following sequence of phases:

PHASE 0 - explore the data
PHASE 1 - build the plot
PHASE 2 - build the plot function
PHASE 3 - isolate the prep step
PHASE 4 - isolate the print step
PHASE 5 - serialize graph production
PHASE 6 - place graphs onto the canvas

Each phase marks a milestone in the expanding complexity of the visualization system.

disclaimer

This essay is more about making graphs than about analyzing data
This essay is more about demonstrating a technique than about prescribing a framework. (“Frameworks are like toothbrushes: everyone’s got one and no-one wants to use anyone else’s”)

mindset

Data scientists describe the ultimate reality about data using various dialets of expression. Each translation has its benefits and disadvantages. We need them all to tell a good story.

No one language is better than the other. Each allows for different shades of distinction in model specification.

set the scene

load packages

# Attach these packages so their functions don't need to be qualified
library(magrittr)          # enables piping : %>% 
library(dplyr)             # data wrangling
library(knitr)             # tables
library(ggplot2)           # graphs
library(viridis)           # for colorblind friendly color palettes
requireNamespace("dplyr")  # data wrangling
requireNamespace("readr")  # for data input
requireNamespace("tidyr")  # for data manipulation
requireNamespace("testit") # for asserting conditions meet expected patterns
requireNamespace("car")    # for its `recode()` function

load sources

# Call `base::source()` on any repo file that defines external functions needed in this report. 
# Ideally, no real operations are performed in these scripts.
base::source("./scripts/common-functions.R") # used in multiple reports
base::source("./scripts/graphing/graph-presets.R") # fonts, colors, themes

declare globals

# location of the data sources
path_input  <- "./data-public/raw/scenario-3/PHAC_Infobase_CCDSS_-8586421808967913043.csv" # mental illness
# path_input   <- "./data-public/raw/scenario-3/PHAC_Infobase_CCDSS_-8586421810172823624.csv" # anxiety & mood
output_format = "html"

load data

View ./data-unshared/contents.md for data dictionary and the retrieval specifications.

# see ./data-unshared/contents.md for origin of the data
ds0 <- path_input %>% readr::read_csv(skip = 3) %>% tibble::as_tibble()
ds0 %>% dplyr::glimpse(60)

Observations: 4,257
Variables: 12
$ Area                <chr> "Canada", "Canada", "Canada...
$ Condition           <chr> "Use of health services for...
$ `Age Group`         <chr> "'1+'", "'1-19'", "'20-34'"...
$ Sex                 <chr> "Both sexes", "Both sexes",...
$ Year                <dbl> 2000, 2000, 2000, 2000, 200...
$ Rate                <dbl> 10.71, 3.04, 11.11, 14.29, ...
$ `Rate CV`           <dbl> 0.05, 0.21, 0.12, 0.09, 0.1...
$ `Rate_95%_CI_Lower` <dbl> 10.70, 3.03, 11.08, 14.26, ...
$ `Rate_95%_CI_Upper` <dbl> 10.72, 3.05, 11.13, 14.31, ...
$ Number              <dbl> 3364580, 236410, 723770, 11...
$ Population          <dbl> 31412970, 7781830, 6517490,...
$ X12                 <lgl> NA, NA, NA, NA, NA, NA, NA,...

PHASE 0 - explore

tweak data

# to trace the tweaks
ds1 <- ds0 # start with the identical copy
# to systematize variable name for easier remembering and typing
names(ds1) <- gsub(" ","_", names(ds1)) # to make into a single word (all word chars)
names(ds1) <- gsub("%","", names(ds1))  # to remove special characters
names(ds1) <- tolower(names(ds1))       # to standardize spellining

ds1 <- ds1 %>% dplyr::select(-x12) # to remove an empty column

ds1 <- ds1 %>% 
  dplyr::mutate(
    age_group = gsub("'","",age_group) # to remove extra set of quotes
  ) %>% 
  dplyr::filter(!is.na(condition)) # to remove notes at the end of the spreadsheet
ds1 %>% dplyr::glimpse(60)

Observations: 4,242
Variables: 11
$ area             <chr> "Canada", "Canada", "Canada", ...
$ condition        <chr> "Use of health services for mo...
$ age_group        <chr> "1+", "1-19", "20-34", "35-49"...
$ sex              <chr> "Both sexes", "Both sexes", "B...
$ year             <dbl> 2000, 2000, 2000, 2000, 2000, ...
$ rate             <dbl> 10.71, 3.04, 11.11, 14.29, 14....
$ rate_cv          <dbl> 0.05, 0.21, 0.12, 0.09, 0.12, ...
$ rate_95_ci_lower <dbl> 10.70, 3.03, 11.08, 14.26, 14....
$ rate_95_ci_upper <dbl> 10.72, 3.05, 11.13, 14.31, 14....
$ number           <dbl> 3364580, 236410, 723770, 11277...
$ population       <dbl> 31412970, 7781830, 6517490, 78...

inspect-1 raw

# to demonstrate the principle of custom functions
print_distinct <- function(
  d
  ,group_by_variables
){
  # define values needed for testing and development inside the function:
  # d <- ds0
  # group_by_variables <- "area"
  # group_by_variables <- c("area","age_group")
  
  d_out <- d %>%
    dplyr::group_by(.dots = c(group_by_variables) ) %>%
    dplyr::count() 
  
  return(d_out)
}
# How to use
ds1 %>% print_distinct(group_by_variables = "area")

# A tibble: 13 x 2
# Groups:   area [13]
   area                          n
   <chr>                     <int>
 1 Alberta                     336
 2 British Columbia            336
 3 Canada                      336
 4 Manitoba                    336
 5 New Brunswick               336
 6 Newfoundland and Labrador   336
 7 Northwest Territories       336
 8 Nova Scotia                 336
 9 Nunavut                     231
10 Ontario                     336
11 Prince Edward Island        336
12 Quebec                      336
13 Saskatchewan                315

# to demonstrate how to vary the input (values for the argument of the function)
# ds1 %>% print_distinct(group_by_variables = "condition")
# ds1 %>% print_distinct(group_by_variables = "age_group")
# ds1 %>% print_distinct(group_by_variables = c("area","age_group") )
# TODO: instead of `n` compute the average of a specific variable

declare components

# to help remember how each variable could be used during serial application
varnames_categorical <- c("area","condition", "age_group", "sex","year")
varnames_continuous  <- c(
  "rate", "rate_cv", "rate_95_ci_lower","rate_95_ci_upper", "number","population"
)
# to help with mapping data space into the visualization space
varnames_rate     <- c("rate", "rate_cv","rate_95_ci_lower","rate_95_ci_upper")
varnames_measure  <- c(varnames_rate,"number","population" )
varnames_time     <- c("year")
varnames_design   <- c("area","age_group","sex", "condition")

inspect-2 categorical

# to explore the scles of the CATEGORICAL variables
for(i in varnames_categorical){
  ds1 %>% 
    print_distinct( group_by_variables = i ) %>%
    neat() %>% # to apply custom style for html canvas
    print()    # because inside the loop
}

area	n
Alberta	336
British Columbia	336
Canada	336
Manitoba	336
New Brunswick	336
Newfoundland and Labrador	336
Northwest Territories	336
Nova Scotia	336
Nunavut	231
Ontario	336
Prince Edward Island	336
Quebec	336
Saskatchewan	315

condition	n
Use of health services for mood and anxiety disorders (annual)	4242

age_group	n
1-19	606
1+	606
20-34	606
35-49	606
50-64	606
65-79	606
80+	606

sex	n
Both sexes	1414
Females	1414
Males	1414

year	n
2000	252
2001	252
2002	252
2003	252
2004	252
2005	273
2006	273
2007	273
2008	273
2009	273
2010	273
2011	273
2012	273
2013	273
2014	273
2015	252

inspect-3 continuous

# to explore the scales of CONTINUOUS variables
ds1 %>% 
  dplyr::select_(.dots = varnames_continuous) %>% 
  explore::describe() %>% neat()

variable	type	na	na_pct	unique	min	mean	max
rate	dou	31	0.7	1596	0.91	9.87	22.47
rate_cv	dou	31	0.7	394	0.05	1.93	31.62
rate_95_ci_lower	dou	31	0.7	1571	0.74	9.60	22.20
rate_95_ci_upper	dou	31	0.7	1576	0.99	10.17	22.74
number	dou	0	0.0	2902	0.00	105923.92	3617090.00
population	dou	0	0.0	3969	50.00	1025646.57	36691810.00

PHASE 1 - graph

sketch the space

# now let us find ways to look at/in/with data.
# the above analysis helps us to conceptualize available variables as:

#  I - DATA space

## MEASURE - rate       # crude rate (includes cv, ci_upper, ci_lower) 
## MEASURE - number     # count of cases of the condition
## MEASURE - population # total alive
## TIME    - year       # fiscal
## DESIGN  - area       ( 12 provinces + 1 total )
## DESIGN  - age_group  ( 6 gropus + 1 total )
## DESIGN  - sex        ( 2 gropus + 1 total )
## DESIGN  - condition  ( 1 )

#  II - VISUALIZATION space 

## INNER - horizontal - TIME    - (year) 
## INNER - vertical   - MEASURE - (rate)  
## INNER - color      - DESIGN  - (sex)
## OUTER - horizontal - DESIGN  - (area)
## OUTER - vertial    - DESIGN  - (age_group)

graph-1 basic

# let us sketch the most basic graph in 3 internal dimensions of our blueprint
# notice that we isolate a single value on all other dimensions
g1 <- ds1 %>% 
  dplyr::filter(area       ==  "British Columbia" ) %>% 
  dplyr::filter(age_group  ==  "20-34" ) %>%
  dplyr::filter(sex       %in% c("Males","Females") ) %>% 
  ggplot(aes(
    x      = year
    ,y     = rate
    ,color = sex
  ))+
  geom_point()+
  geom_line( aes(group = sex) )+
  theme_minimal()+
  labs( title = "Crude prevalence of MH service utilization in BC among 20-34 year olds")
g1

# to demonstrate how we can enhance the effectiveness of information display
g1a <- ds1 %>% 
  dplyr::filter(area       ==  "British Columbia" ) %>% 
  dplyr::filter(age_group  ==  "20-34" ) %>%
  dplyr::filter(sex       %in% c("Males","Females") ) %>% 
  ggplot(aes(
    x      = year
    ,y     = rate
    ,fill  = sex # maps onto a different aesthetic than `color = ` 
  ))+
  geom_smooth(method = "lm", se = F, size=1, aes(color=sex))+ # to show linear trend
  geom_line(aes(group = sex), alpha = .5 )+                   # to minimize emphasis
  geom_point(shape = 21, color="black", size = 3)+            # to minimize ink
  scale_fill_viridis_d( end=.85, option="plasma",alpha = .6)+ # colorblind-friendly
  scale_color_viridis_d(end=.85, option="plasma",alpha = .6)+ # colorblind-friendly
  theme_minimal()+
  labs( title = "Crude prevalence of MH service utilization in BC among 20-34 year olds")
g1a

# AESTHETIC ASIDE: the following values seems to work well in this context
# scale_color_viridis_d(end=.75)+
# scale_color_viridis_d(end=.85, option="plasma")+
# scale_color_viridis_d(end=.70, option="magma")+
# scale_color_viridis_d(end=.80, option="inferno")+

graph-2 wrap

# now we can introduce external dimensions 
# notice we remove the filter and feed `area` to ggplot2::facet_wrap()
g2a <- ds1 %>% 
  # dplyr::filter(area       ==  "British Columbia" ) %>% # to enable faceting
  dplyr::filter(age_group  ==  "20-34" ) %>%
  dplyr::filter(sex       %in% c("Males","Females") ) %>% 
  ggplot(aes(
    x      = year
    ,y     = rate
    ,color = sex
  ))+
  geom_line(aes(group= sex))+
  geom_point()+
  facet_wrap("area")+ # we can becasue we removed filter
  scale_color_viridis_d(end=.70, option="magma")+ 
  theme_minimal()+
  labs( title = "Crude prevalence of MH service utilization in Canada among 20-34 year olds")
g2a

# see more about the viridis package: 
# https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html

# now let us feed `age_group` to ggplot2::facet_wrap()
g2b <- ds1 %>% 
  dplyr::filter(area       ==  "British Columbia" ) %>%
  # dplyr::filter(age_group  ==  "20-34" ) %>%
  dplyr::filter(sex       %in% c("Males","Females") ) %>% 
  ggplot(aes(
    x      = year
    ,y     = rate
    ,color = sex  
  ))+
  geom_point()+ 
  geom_line(aes(group= sex) )+
  facet_wrap("age_group")+ # to change what we facet on
  scale_color_viridis_d(end=.70, option="magma")+ 
  theme_minimal()+
  labs( title = "Crude prevalence of MH service utilization in British Columbia")
g2b

graph-2 grid

# now let facet_grid on two demensions
g2c <- ds1 %>% 
  dplyr::mutate(
    years_since_2000 = year - 2000 # for shorter axis labels
  ) %>% 
  # dplyr::filter(area       ==  "British Columbia" ) %>%
  # dplyr::filter(age_group  ==  "20-34" ) %>%
  dplyr::filter(sex       %in% c("Males","Females") ) %>% 
  ggplot(aes(
    x      = years_since_2000 
    ,y     = rate
    ,color = sex
  ))+
  geom_point()+
  geom_line(aes(group=sex))+
  facet_grid(age_group ~ area)+ # new
  scale_color_viridis_d(end=.70, option="magma")+ 
  theme_minimal()+
  theme(legend.position = "bottom")+
  labs( title = "Crude prevalence of MH service utilization in Canada")
g2c

g2d <- ds1 %>% 
  dplyr::mutate(
    years_since_2000 = year - 2000 # for shorter axis labels
  ) %>% 
  # dplyr::filter(area       ==  "British Columbia" ) %>%
  # dplyr::filter(age_group  ==  "20-34" ) %>%
  dplyr::filter(sex       %in% c("Males","Females") ) %>% 
  ggplot(aes(
    x      = years_since_2000  
    ,y     = rate
    ,color  = sex
  ))+
  geom_line(aes(group=sex))+
  geom_point()+
  facet_grid(area ~ age_group)+ # change faceting
  scale_color_viridis_d(end=.70, option="magma")+ 
  theme_minimal()+
  labs( title = "Crude prevalence of MH service utilization in Canada")
g2d

PHASE 2 - make plot

# suppose, we have settled on the graphical form `g2d` (immediately above)
g2d <- ds1 %>% 
  dplyr::mutate(
    years_since_2000 = year - 2000
  ) %>% 
  dplyr::filter(sex       %in% c("Males","Females") ) %>% 
  ggplot(aes(
    x      = years_since_2000 # new
    ,y     = rate
    ,color  = sex
  ))+
  geom_point()+
  geom_line(aes(group=sex))+
  facet_grid(area ~ age_group)+
  scale_color_viridis_d(end=.70, option="magma")+ 
  theme_minimal()+
  labs( title = "Crude prevalence of MH service utilization in Canada")

# now let us re-express this plot as a custom function
make_plot_1_basic <- function(
  d
  ,measure = "rate"
){
  d1 <- d %>% 
    dplyr::mutate(
      years_since_2000 = year - 2000 # to create a shorter axis label
    )
  
  g_out <- d1 %>% 
    ggplot(aes_string(
       x      = "years_since_2000"
       ,y     = measure
       ,color  = "sex"
    ))+
    geom_point()+
    geom_line(aes(group=sex))+
    facet_grid(area ~ age_group)+
    scale_color_viridis_d(end=.70, option="magma")+ 
    theme_minimal()+
    labs( title = "Crude prevalence of MH service utilization in Canada")
  return(g_out)  
}
# how to use:
ds1 %>% 
  dplyr::filter(sex       %in% c("Males","Females") ) %>% 
  dplyr::filter(age_group %in% c("1-19", "20-34", "35-49", "65-79")) %>%
  dplyr::filter(area %in% c("Canada", "Manitoba", "British Columbia")) %>%
  # notice that we keep operations on the data outside of the function definition
  make_plot_1_basic(measure = "rate")

# We need our function to offer us a convenient way to:
# 1. Control the order of the columns (and which are displayed)
# 2. Control the order of the rows    (and which are displayed)
# 3. Control the order and aesthetics of the color dimention

# if we were to pack everything into a single function we would get something like:
make_plot_1_packed <- function(
  d
  ,measure
){
  d1 <- d
  # d1 <- ds1 # for testing and development
  # create support objects
  order_of_age_groups <- d1 %>% 
    dplyr::arrange() %>% 
    dplyr::distinct(age_group) %>% 
    as.list() %>% unlist() %>% as.character()
  # make total value to be at the end of the vector
  order_of_age_groups <- c(setdiff(order_of_age_groups,"1+"),"1+")
  
  order_of_areas <- d1 %>% 
    dplyr::distinct(area) %>% 
    dplyr::arrange(area) %>% 
    as.list() %>% unlist() %>% as.character()
  # make total value to be at the beginning of the vector
  order_of_areas <- c("Canada", setdiff(order_of_areas, "Canada")  )
  # to customize the order of levels
  levels_sex <- c("Females", "Males","Both sexes")
  # 
  d1 <- d %>% 
    dplyr::mutate(
      years_since_2000 = year - 2000 # to create a shorter label
      # to enforce the chosen order of the levels:
      ,area            = factor(area,      levels = order_of_areas)
      ,age_group       = factor(age_group, levels = order_of_age_groups)
      ,levels_sex      = factor(sex,       levels = levels_sex)
    )
  # to create custom pallets:
  
  # descriptive tag              # green     # red      # blue
  palette_sex_dark         <- c("#1b9e77", "#d95f02", "#7570b3") #duller than below
  # palette_sex_dark         <- c("#66c2a5", "#fc8d62", "#8da0cb") #brighter than above
  # taken from http://colorbrewer2.org/#type=qualitative&scheme=Dark2&n=3
  pallete_sex_light        <- adjustcolor(palette_sex_dark, alpha.f = .2)
  names(palette_sex_dark)  <- c("Both sexes", "Females", "Males")
  names(pallete_sex_light) <- names(pallete_sex_light)
  
  g_out <- d1 %>% 
    ggplot(aes_string(
      x      = "years_since_2000"
      ,y     = "rate"
      ,color = "sex"
    ))+
    geom_line( aes_string(group = "sex") )+
    geom_point()+
    facet_grid(area ~ age_group)+
    scale_color_manual(values = palette_sex_dark)+
    # scale_color_manual(values = pallete_sex_light)+
    theme_minimal()+
    labs( title = "Crude prevalence of MH service utilization in Canada")
  return(g_out)  
}
# how to use
ds1 %>% 
  # to limit the view while in development
  dplyr::filter(age_group %in% c("1-19", "20-34", "35-49", "65-79")) %>%
  dplyr::filter(area %in% c("Canada", "Manitoba", "British Columbia")) %>%
  dplyr::filter(sex %in% c("Males","Females")) %>%
  make_plot_1_packed(measure = "rate")

# as you notice, the function got bulkier due to operations needed
# to construct a reference vector for factor levels 
# these and other operations are typically best sourced out 
# to the `prep_data` function

PHASE 3 - prep data

# let us construct a new `prep_data` function that would
# isolate the preparatory operations from the `make_plot` function
prep_data_plot_1 <- function(
  d_input
  ,set_area      #= c("Canada")
  ,set_age_group #= c("20-34")
  ,set_sex       #= c("Males","Females")
){
  d1 <- d_input  # for within-function use
  
  d2 <- d1 %>% 
    dplyr::filter(area      %in% set_area      ) %>%
    dplyr::filter(age_group %in% set_age_group ) %>%
    dplyr::filter(sex       %in% set_sex       ) %>% 
    dplyr::mutate(
      # to create a shorter label
      years_since_2000 = year - 2000
      # to enforce the chosen order of the levels:
      ,area      = factor(area,      levels = set_area)
      ,age_group = factor(age_group, levels = set_age_group)
      ,sex       = factor(sex,       levels = set_sex )
    )
  # to store objects for passing to the `make_plot` function
  l_support <- list()
  l_support[["data"]] <- d2
  l_support[["set"]] <- list() # in case the first element is single
  l_support[["set"]][["sex"]]       <- set_sex
  l_support[["set"]][["area"]]      <- set_area
  l_support[["set"]][["age_group"]] <- set_age_group
  lapply(l_support, class) # view contents
  # the make_plot funtion will rely on the structure and values in l_support

  return(l_support)
}
# how to use
l_support <- ds1 %>% 
  prep_data_plot_1(
    set_area      = c("Canada")
    # set_area      = c("Canada", "Manitoba", "British Columbia") 
    # ,set_sex       = c("Females", "Males")
    ,set_sex       = c("Both sexes")
    # ,set_age_group = c("1-19",  "35-49", "65-79","1+")
    ,set_age_group = c("1+")
  )
l_support %>% print()

$data
# A tibble: 16 x 12
   area  condition age_group sex    year  rate rate_cv rate_95_ci_lower rate_95_ci_upper number
   <fct> <chr>     <fct>     <fct> <dbl> <dbl>   <dbl>            <dbl>            <dbl>  <dbl>
 1 Cana~ Use of h~ 1+        Both~  2000 10.7     0.05            10.7             10.7  3.36e6
 2 Cana~ Use of h~ 1+        Both~  2001 10.8     0.05            10.8             10.8  3.43e6
 3 Cana~ Use of h~ 1+        Both~  2002 10.9     0.05            10.8             10.9  3.48e6
 4 Cana~ Use of h~ 1+        Both~  2003 10.8     0.05            10.7             10.8  3.49e6
 5 Cana~ Use of h~ 1+        Both~  2004 10.6     0.05            10.6             10.6  3.48e6
 6 Cana~ Use of h~ 1+        Both~  2005 10.7     0.05            10.6             10.7  3.53e6
 7 Cana~ Use of h~ 1+        Both~  2006 10.5     0.05            10.5             10.6  3.51e6
 8 Cana~ Use of h~ 1+        Both~  2007 10.4     0.05            10.4             10.4  3.49e6
 9 Cana~ Use of h~ 1+        Both~  2008 10.2     0.05            10.2             10.3  3.48e6
10 Cana~ Use of h~ 1+        Both~  2009 10.2     0.05            10.2             10.2  3.49e6
11 Cana~ Use of h~ 1+        Both~  2010 10.0     0.05            10.0             10.0  3.50e6
12 Cana~ Use of h~ 1+        Both~  2011 10.1     0.05            10.1             10.1  3.56e6
13 Cana~ Use of h~ 1+        Both~  2012  9.86    0.05             9.85             9.87 3.53e6
14 Cana~ Use of h~ 1+        Both~  2013  9.9     0.05             9.89             9.91 3.59e6
15 Cana~ Use of h~ 1+        Both~  2014  9.86    0.05             9.85             9.87 3.62e6
16 Cana~ Use of h~ 1+        Both~  2015 10.0     0.05            10.0             10.1  3.61e6
# ... with 2 more variables: population <dbl>, years_since_2000 <dbl>

$set
$set$sex
[1] "Both sexes"

$set$area
[1] "Canada"

$set$age_group
[1] "1+"

# now we can pass this curated object `l_support` to graphing function
# note that we need to adjust the function to accomodate a new input object
make_plot_1 <- function(
  l_support
  ,measure 
){
  d <- l_support$data
  # to customize the color 
  # descriptive tag              # green     # red      # blue
  palette_sex_dark         <- c("#1b9e77", "#d95f02", "#7570b3") #duller than below
  # palette_sex_dark         <- c("#66c2a5", "#fc8d62", "#8da0cb") #brighter than above
  # taken from http://colorbrewer2.org/#type=qualitative&scheme=Dark2&n=3
  pallete_sex_light        <- adjustcolor(palette_sex_dark, alpha.f = .2)
  names(palette_sex_dark)  <- c("Both sexes", "Females", "Males")
  names(pallete_sex_light) <- names(pallete_sex_light)
  
  g_out <- d %>% 
    ggplot(aes_string(
      x      = "years_since_2000"
      ,y     = measure
      ,color = "sex"
    ))+
    geom_point()+
    geom_line( aes_string(group = "sex") )+
    facet_grid(area ~ age_group)+
    scale_color_manual(values = palette_sex_dark)+
    # scale_color_manual(values = pallete_sex_light)+
    theme_minimal()+
    labs( title = "Crude prevalence of MH service utilization")
  l_support[["graph"]]   <- g_out
  l_support[["measure"]] <- measure
  return(l_support)  
}
# how to use
l_support <- ds1 %>% 
  prep_data_plot_1(
    set_area      = c("Canada")
    # set_area      = c("Canada", "Manitoba", "British Columbia")
    ,set_sex       = c("Females", "Males")
    # ,set_sex       = c("Both sexes")
    ,set_age_group = c("1-19", "20-34","35-49","50-64","65-79", "80+")
    # ,set_age_group = c("1+")
  ) %>% 
  make_plot_1(measure = "rate")
l_support$graph %>% print()

PHASE 4 - print plot

print_plot_1 <- function(
  l_support
  ,path_output_folder
  ,prefex     = NA
  ,graph_name = "auto"
  ,...
){
  if( graph_name == "auto" ){
    graph_name <- paste0(
      # should be replaced with features appropriate for analysis
      l_support$measure
      ,"-("
      ,l_support$set$sex %>% paste0(collapse = "-")
      ,")-("
      ,l_support$set$area %>% paste0(collapse = "-")
      ,")-("
      ,l_support$set$age_group %>% paste0(collapse = "-")
      ,")"
      ,collapse = "-"
    )
  }else{
    graph_name <- paste0(l_support$measure,"-", graph_name)
  }
  # add a label to distinguish a particular graph (last element in the file name)
  if( !is.na(prefex) ){ # inserts a PREFEX before the graph name
    (path_save_plot <- paste0(path_output_folder, prefex,"-",graph_name) )
  }else{
    ( path_save_plot <- paste0(path_output_folder, graph_name) )
  }

  # if folder does not exist yet, create it
  if( !dir.exists(path_output_folder) ){
    dir.create(path_output_folder)
  }
  # print the graphical object using jpeg device
  path_printed_plot <- paste0(path_save_plot, ".jpg")
  jpeg(
    filename = path_printed_plot
    ,...
  )
  l_support$graph %>% print() # reach into the custom object we made for graphing
  dev.off() # close the device
  l_support[["path_plot"]] <- path_printed_plot
  return(l_support)
}
# how to use
l_support <- ds1 %>% 
  prep_data_plot_1(
    set_sex        = c("Females", "Males") 
    # set_sex        = c("Females", "Males", "Both sexes") 
    ,set_area      = c("Canada", "Manitoba", "British Columbia") 
    ,set_age_group = c("1-19",  "35-49", "65-79","1+")
  ) %>% 
  make_plot_1(
    measure = "rate"
  ) %>% 
  print_plot_1(
    path_output_folder = "./analysis/scenario-3/prints/demo-1/"
    # ,prefex            = "attempt1"
    # ,graph_name        = "take1" # `auto` by default
# options added through `...` into the jpeg() function   
    ,width   = 1700
    ,height  = 700
    ,units   = "px"
    ,quality = 100
    ,res     = 200
  )

# notice that if I print the GRAPH by reaching into the `l_support` object
# if will be displayed according to the `fig.width`, `fig.height`, and `out.width`
# parameters specified in the chuck options (in the .Rmd file)
l_support$graph %>% print()

# if, however, we reach into the disk, we will recover the image generated
# with the dimensions and specs defined in the `print_plot` function
l_support$path_plot %>% jpeg::readJPEG() %>% grid::grid.raster()

PHASE 5 - serialize

# it often makes sense to genrate a series of plot to be explored manually

# GRAPH SERIES 1
path_target           <- "./analysis/scenario-3/prints/series_1/"
provinces_to_pair     <- c("British Columbia", "Alberta", "Manitoba")
age_groups_to_display <- c("1-19", "20-34", "35-49", "50-64","65-79","80+","1+")
# for each selected province create a comparison with Canada
ls_plot_series <- list()
for(province_i in provinces_to_pair){
  ls_plot_series[[province_i]]  <- ds1 %>%
    prep_data_plot_1(
      set_sex        = c("Females", "Males")
      ,set_area      = c("Canada", province_i)
      ,set_age_group = age_groups_to_display
    ) %>%
    make_plot_1(
      measure = "rate"
    ) %>%
    print_plot_1(
      path_output_folder = path_target
      # options added through `...` into the jpeg() function
      ,width   = 1700
      ,height  = 600
      ,units   = "px"
      ,quality = 100
      ,res     = 200
    )
}
saveRDS(ls_plot_series, paste0(path_target,"ls_plots.rds") )

# GRAPH SERIES 2
path_target           <- "./analysis/scenario-3/prints/series_2/"
provinces_to_pair     <- c("British Columbia", "Alberta", "Manitoba")
age_groups_to_display <- c( "+1")
# for each selected province create a comparison with Canada
ls_plot_series <- list()
for(province_i in provinces_to_pair){
  ls_plot_series[[province_i]]  <- ds1 %>%
    prep_data_plot_1(
      set_sex        = c("Males","Females")
      ,set_area      = c("Canada", province_i)
      ,set_age_group = c("20-34", "+1")
    ) %>%
    make_plot_1(
      measure = "rate"
    ) %>%
    print_plot_1(
      path_output_folder = path_target
      # options added through `...` into the jpeg() function
      ,width   = 900
      ,height  = 600
      ,units   = "px"
      ,quality = 100
      ,res     = 200
    )
}
saveRDS(ls_plot_series, paste0(path_target,"ls_plots.rds") )

PHASE 6 - place plot

l_support <- readRDS("./analysis/scenario-3/prints/series_1/ls_plots.rds")
l_support[["British Columbia"]]$path_plot

[1] "./analysis/scenario-3/prints/series_1/rate-(Females-Males)-(Canada-British Columbia)-(1-19-20-34-35-49-50-64-65-79-80+-1+).jpg"

l_support[["British Columbia"]]$path_plot %>% jpeg::readJPEG() %>% grid::grid.raster()

future directions

how to incorporate additional condition? (i.e. anxiety_mood instead of mental_health)
how to add additional INTERNAL dimension? (i.e. marital_status)

session information

For the sake of documentation and reproducibility, the current report was rendered in the following environment. Click the line below to expand.

Environment

- Session info -----------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.2 (2018-12-20)
 os       Windows >= 8 x64            
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/Los_Angeles         
 date     2019-06-19                  

- Packages ---------------------------------------------------------------------------------------
 package      * version date       lib source        
 abind          1.4-5   2016-07-21 [1] CRAN (R 3.5.2)
 assertthat     0.2.1   2019-03-21 [1] CRAN (R 3.5.3)
 backports      1.1.4   2019-04-10 [1] CRAN (R 3.5.3)
 callr          3.2.0   2019-03-15 [1] CRAN (R 3.5.3)
 car            3.0-3   2019-05-27 [1] CRAN (R 3.5.3)
 carData        3.0-2   2018-09-30 [1] CRAN (R 3.5.2)
 cellranger     1.1.0   2016-07-27 [1] CRAN (R 3.5.3)
 cli            1.1.0   2019-03-19 [1] CRAN (R 3.5.3)
 colorspace     1.4-1   2019-03-18 [1] CRAN (R 3.5.3)
 crayon         1.3.4   2017-09-16 [1] CRAN (R 3.5.3)
 curl           3.3     2019-01-10 [1] CRAN (R 3.5.3)
 data.table     1.12.2  2019-04-07 [1] CRAN (R 3.5.3)
 desc           1.2.0   2018-05-01 [1] CRAN (R 3.5.3)
 devtools       2.0.2   2019-04-08 [1] CRAN (R 3.5.3)
 dichromat    * 2.0-0   2013-01-24 [1] CRAN (R 3.5.2)
 digest         0.6.19  2019-05-20 [1] CRAN (R 3.5.3)
 dplyr        * 0.8.1   2019-05-14 [1] CRAN (R 3.5.3)
 DT             0.6     2019-05-09 [1] CRAN (R 3.5.3)
 evaluate       0.14    2019-05-28 [1] CRAN (R 3.5.2)
 explore        0.4.2   2019-05-22 [1] CRAN (R 3.5.3)
 extrafont    * 0.17    2014-12-08 [1] CRAN (R 3.5.2)
 extrafontdb    1.0     2012-06-11 [1] CRAN (R 3.5.2)
 fansi          0.4.0   2018-10-05 [1] CRAN (R 3.5.3)
 forcats        0.4.0   2019-02-17 [1] CRAN (R 3.5.3)
 foreign        0.8-71  2018-07-20 [2] CRAN (R 3.5.2)
 fs             1.3.1   2019-05-06 [1] CRAN (R 3.5.3)
 ggplot2      * 3.1.1   2019-04-07 [1] CRAN (R 3.5.3)
 glue           1.3.1   2019-03-12 [1] CRAN (R 3.5.3)
 gridExtra      2.3     2017-09-09 [1] CRAN (R 3.5.3)
 gtable         0.3.0   2019-03-25 [1] CRAN (R 3.5.3)
 haven          2.1.0   2019-02-19 [1] CRAN (R 3.5.3)
 highr          0.8     2019-03-20 [1] CRAN (R 3.5.3)
 hms            0.4.2   2018-03-10 [1] CRAN (R 3.5.3)
 htmltools      0.3.6   2017-04-28 [1] CRAN (R 3.5.3)
 htmlwidgets    1.3     2018-09-30 [1] CRAN (R 3.5.3)
 httpuv         1.5.1   2019-04-05 [1] CRAN (R 3.5.3)
 httr           1.4.0   2018-12-11 [1] CRAN (R 3.5.3)
 jpeg           0.1-8   2014-01-23 [1] CRAN (R 3.5.2)
 kableExtra     1.1.0   2019-03-16 [1] CRAN (R 3.5.3)
 knitr        * 1.23    2019-05-18 [1] CRAN (R 3.5.2)
 labeling       0.3     2014-08-23 [1] CRAN (R 3.5.2)
 later          0.8.0   2019-02-11 [1] CRAN (R 3.5.3)
 lazyeval       0.2.2   2019-03-15 [1] CRAN (R 3.5.3)
 magrittr     * 1.5     2014-11-22 [1] CRAN (R 3.5.3)
 memoise        1.1.0   2017-04-21 [1] CRAN (R 3.5.3)
 mime           0.6     2018-10-05 [1] CRAN (R 3.5.2)
 munsell        0.5.0   2018-06-12 [1] CRAN (R 3.5.3)
 openxlsx       4.1.0.1 2019-05-28 [1] CRAN (R 3.5.3)
 pillar         1.4.1   2019-05-28 [1] CRAN (R 3.5.2)
 pkgbuild       1.0.3   2019-03-20 [1] CRAN (R 3.5.3)
 pkgconfig      2.0.2   2018-08-16 [1] CRAN (R 3.5.3)
 pkgload        1.0.2   2018-10-29 [1] CRAN (R 3.5.3)
 plyr           1.8.4   2016-06-08 [1] CRAN (R 3.5.3)
 prettyunits    1.0.2   2015-07-13 [1] CRAN (R 3.5.3)
 processx       3.3.1   2019-05-08 [1] CRAN (R 3.5.2)
 promises       1.0.1   2018-04-13 [1] CRAN (R 3.5.3)
 ps             1.3.0   2018-12-21 [1] CRAN (R 3.5.3)
 purrr          0.3.2   2019-03-15 [1] CRAN (R 3.5.3)
 R6             2.4.0   2019-02-14 [1] CRAN (R 3.5.3)
 RColorBrewer * 1.1-2   2014-12-07 [1] CRAN (R 3.5.2)
 Rcpp           1.0.1   2019-03-17 [1] CRAN (R 3.5.3)
 readr          1.3.1   2018-12-21 [1] CRAN (R 3.5.3)
 readxl         1.3.1   2019-03-13 [1] CRAN (R 3.5.3)
 remotes        2.0.4   2019-04-10 [1] CRAN (R 3.5.3)
 reshape2       1.4.3   2017-12-11 [1] CRAN (R 3.5.3)
 rio            0.5.16  2018-11-26 [1] CRAN (R 3.5.3)
 rlang          0.3.4   2019-04-07 [1] CRAN (R 3.5.3)
 rmarkdown      1.13    2019-05-22 [1] CRAN (R 3.5.3)
 rprojroot      1.3-2   2018-01-03 [1] CRAN (R 3.5.3)
 rstudioapi     0.10    2019-03-19 [1] CRAN (R 3.5.3)
 Rttf2pt1       1.3.7   2018-06-29 [1] CRAN (R 3.5.2)
 rvest          0.3.4   2019-05-15 [1] CRAN (R 3.5.3)
 scales         1.0.0   2018-08-09 [1] CRAN (R 3.5.3)
 sessioninfo    1.1.1   2018-11-05 [1] CRAN (R 3.5.3)
 shiny          1.3.2   2019-04-22 [1] CRAN (R 3.5.2)
 stringi        1.4.3   2019-03-12 [1] CRAN (R 3.5.3)
 stringr        1.4.0   2019-02-10 [1] CRAN (R 3.5.3)
 testit         0.9     2018-12-05 [1] CRAN (R 3.5.3)
 tibble         2.1.3   2019-06-06 [1] CRAN (R 3.5.3)
 tidyr          0.8.3   2019-03-01 [1] CRAN (R 3.5.3)
 tidyselect     0.2.5   2018-10-11 [1] CRAN (R 3.5.3)
 usethis        1.5.0   2019-04-07 [1] CRAN (R 3.5.3)
 utf8           1.1.4   2018-05-24 [1] CRAN (R 3.5.3)
 vctrs          0.1.0   2018-11-29 [1] CRAN (R 3.5.3)
 viridis      * 0.5.1   2018-03-29 [1] CRAN (R 3.5.3)
 viridisLite  * 0.3.0   2018-02-01 [1] CRAN (R 3.5.3)
 webshot        0.5.1   2018-09-28 [1] CRAN (R 3.5.3)
 withr          2.1.2   2018-03-15 [1] CRAN (R 3.5.3)
 xfun           0.7     2019-05-14 [1] CRAN (R 3.5.3)
 xml2           1.2.0   2018-01-24 [1] CRAN (R 3.5.3)
 xtable         1.8-4   2019-04-21 [1] CRAN (R 3.5.3)
 yaml           2.2.0   2018-07-25 [1] CRAN (R 3.5.2)
 zeallot        0.1.0   2018-01-28 [1] CRAN (R 3.5.3)
 zip            2.0.2   2019-05-13 [1] CRAN (R 3.5.3)

[1] C:/Users/an499583/Documents/R/win-library/3.5
[2] C:/Program Files/R/R-3.5.2/library

Report rendered by an499583 at 2019-06-19, 10:21 -0700 in 22 seconds.

Scenario 3: CCDSS

Andriy Koval

Created on 2019-06-05.
Updated on 2019-06-19.

overview

disclaimer

mindset

set the scene

load packages

load sources

declare globals

load data

PHASE 0 - explore

tweak data

inspect-1 raw

declare components

inspect-2 categorical

inspect-3 continuous

PHASE 1 - graph

sketch the space

graph-1 basic

graph-2 wrap

graph-2 grid

PHASE 2 - make plot

PHASE 3 - prep data

PHASE 4 - print plot

PHASE 5 - serialize

PHASE 6 - place plot

future directions

session information

Scenario 3: CCDSS

Andriy Koval

Created on 2019-06-05. Updated on 2019-06-19.

overview

disclaimer

mindset

set the scene

load packages

load sources

declare globals

load data

PHASE 0 - explore

tweak data

inspect-1 raw

declare components

inspect-2 categorical

inspect-3 continuous

PHASE 1 - graph

sketch the space

graph-1 basic

graph-2 wrap

graph-2 grid

PHASE 2 - make plot

PHASE 3 - prep data

PHASE 4 - print plot

PHASE 5 - serialize

PHASE 6 - place plot

future directions

session information

Created on 2019-06-05.
Updated on 2019-06-19.