--- title: "Plotting in .mono[R]" subtitle: "EC 425/525, Lab 5" author: "Edward Rubin" date: "`r format(Sys.time(), '%d %B %Y')`" output: xaringan::moon_reader: css: ['default', 'metropolis', 'metropolis-fonts', 'my-css.css'] # self_contained: true nature: highlightStyle: github highlightLines: true countIncrementalSlides: false --- class: inverse, middle ```{R, setup, include = F} # devtools::install_github("dill/emoGG") library(pacman) p_load( broom, tidyverse, latex2exp, ggplot2, ggthemes, ggforce, viridis, extrafont, gridExtra, kableExtra, snakecase, janitor, data.table, dplyr, estimatr, lubridate, knitr, parallel, lfe, here, magrittr ) # Define pink color red_pink <- "#e64173" turquoise <- "#20B2AA" orange <- "#FFA500" red <- "#fb6107" blue <- "#3b3b9a" green <- "#8bb174" grey_light <- "grey70" grey_mid <- "grey50" grey_dark <- "grey20" purple <- "#6A5ACD" slate <- "#314f4f" # Dark slate grey: #314f4f # Knitr options opts_chunk$set( comment = "#>", fig.align = "center", fig.height = 7, fig.width = 10.5, warning = F, message = F ) opts_chunk$set(dev = "svg") options(device = function(file, width, height) { svg(tempfile(), width = width, height = height) }) options(knitr.table.format = "html") ``` # Prologue --- name: schedule # Schedule ## Last time Regession ## Today Plotting in .mono[R] (especially `ggplot2`) --- layout: true # Plotting --- name: plotting class: inverse, middle --- name: plot ## The default option: `plot()` While we'll quickly move on to other options, .mono[R]'s `plot()` function (in the default `graphics` package) is a great tool for basic data exploration—it's fast, simple, and flexible. -- In fact, `plot()` is a generic function, that works for many classes. -- .hi-slate[General arguments] for `plot()`: - `x` and `y` for coordinates - `type =` {`"p"`oints, `"l"`ines, *etc.*} .it[.grey-light[(optional)]] - `xlab`, `ylab`, `main`, and `sub` for axis labels and (sub)title .it[.grey-light[(optional)]] - `col` and `pch` for color and plot character .it[.grey-light[(optional)]] - `lty` and `lwd` for line type, and line width .it[.grey-light[(optional)]] --- layout: false class: clear, middle Let's see `plot()` in action. ```{R, ex-plot0} # Define two vectors a <- seq(from = 0, to = 2*pi, by = 0.2) b <- sin(a) ``` --- layout: true class: clear, middle --- name: ex-plot ```{R, ex-plot1} plot(x = a, y = b) ``` --- ```{R, ex-plot2} plot(x = a, y = b, xlab = "x", ylab = "sin(x)") ``` --- ```{R, ex-plot3} plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue") ``` --- ```{R, ex-plot4} plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "l") ``` --- ```{R, ex-plot5} plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "b") ``` --- ```{R, ex-plot6} plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "s") ``` --- name: multiple `plot()` is essentially calling `points()` or `lines()`. You can layer plots by using these individual functions. --- ```{R, ex-plot7} plot(x = a, y = b, col = "blue") ``` --- ```{R, ex-plot8} plot(x = a, y = b, col = "blue"); points(x = a, y = -b, col = "orange") ``` --- `graphics` also offers a nice histogram function in `hist()`. --- name: hist ```{R, ex-hist} hist(x = b, breaks = 10, col = "purple", xlab = "sin(x)", main = "Wow.") ``` --- That said/done, further customization/manipulation of your graphics using `graphics` plotting functions can become quite difficult. .note[Enter] `ggplot2` --- layout: true # ggplot2 --- name: ggplot2 class: inverse, middle --- name: gg-intro ## The grammar The `ggplot2` package offers an incredibly flexible, diverse, and powerful set of functions for creating graphics in .mono[R]. -- The `gg` stands for the *grammar of graphics*. -- `ggplot2` 1. centers on a .hi-slate[data frame] (the `data` argument) 1. maps variables to .hi-slate[aesthetics] (the `aes` argument) 1. .hi-slate[layers geometries] to *build up* your graphic .note[Note] The package is called `ggplot2`, but the main function is `ggplot()`. --- name: ggplot ## `ggplot()` Main arguments 1. .hi-pink[`data`] Your dataset. As a data frame (or `tibble`). -- 2. .hi-purple[`aes()`] Maps variables in `data` to "aesthetics" like `x`, `color`, `shape`. -- .ex[Example] A time series of problems, `color` defined by money ```{R, gg-fake, eval = F} library(ggplot2) ggplot( data = pretend_df, aes(x = time, y = problems, color = money) ) ``` --- name: layers ## Layers The `ggplot()` function doesn't plot anything—it *sets up* the plot. To create the actual figure, you layer .hi-slate[geometries] (*e.g.*, `geom_point()`), --
.hi-slate[scales] (*e.g.*, `scale_color_manual()`), -- and other .hi-slate[options] (*e.g.*, `xlab()`). -- You .hi-slate[add layers] using the addition sign (`+`). -- .ex[Example] A time series of problems, `color` defined by money ```{R, gg-fake2, eval = F} library(ggplot2) ggplot( data = pretend_df, aes(x = time, y = problems, color = money) *) + *geom_point() + geom_line() ``` --- layout: true class: clear, middle --- Alright, let's build a plot. We'll use the `economics` dataset that comes with `ggplot2`
(because economics). --- ```{R, view-economics, echo = F, eval = T} DT::datatable( economics, fillContainer = FALSE, options = list(pageLength = 8) ) ``` --- name: ex-gg .smaller[Set up the plot. ```{R, gg0, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop)) ``` ] --- .smaller[Label the axes. ```{R, gg1, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") ``` ] --- .smaller[Draw some points. ```{R, gg2, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point() ``` ] --- .smaller[Map the `size` to the median duration of unemployment. ```{R, gg3, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) + ylab("Unemployment rate") + xlab("Date") + geom_point() ``` ] --- .smaller[Change the `shape` of the points. ```{R, gg4, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) + ylab("Unemployment rate") + xlab("Date") + geom_point(shape = 1) ``` ] --- .smaller[Map points' `color` to the median duration of unemployment. ```{R, gg5, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed)) ``` ] --- .smaller[Add some transparency (`alpha`) to our points. ```{R, gg6, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5) ``` ] --- .smaller[Same size points; all bigger. ```{R, gg7, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) ``` ] --- .smaller[Change our theme—maybe you're a minimalist (but want slightly larger fonts)? ```{R, gg8, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + theme_minimal(base_size = 14) ``` ] --- .smaller[Want your figure to look like Stata made it? ```{R, gg9, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + ggthemes::theme_stata(base_size = 14) ``` ] --- .smaller[The "pander" theme from the `ggthemes` package. ```{R, gg10, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + ggthemes::theme_pander(base_size = 14) ``` ] --- .smaller[Change (and label) our color scale. .note[Note] `viridis` [is the best](https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html). ```{R, gg11, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + ggthemes::theme_pander(base_size = 14) + scale_color_viridis_c("Dur. unemp.") ``` ] --- .smaller[Connect the dots. ```{R, gg12, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_line(color = "grey80") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + ggthemes::theme_pander(base_size = 14) + scale_color_viridis_c("Dur. unemp.") ``` ] --- .smaller[How about a smoother? ```{R, gg13, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + geom_smooth(se = F) + ggthemes::theme_pander(base_size = 14) + scale_color_viridis_c("Dur. unemp.") ``` ] --- .smaller[The `group` aesthetic separates groups. ```{R, gg14, fig.height = 5} ggplot(data = economics, aes(x = date, y = unemploy/pop, group = date < ymd(19900101))) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + geom_smooth(se = F) + ggthemes::theme_pander(base_size = 14) + scale_color_viridis_c("Dur. unemp.") ``` ] --- .note[Note] The `ymd()` function comes from the `lubridate` package. --- `ggplot2` knows histogams. --- name: gg-hist A histogram. .smaller[ ```{R, gg-hist1, fig.height = 5} ggplot(data = economics, aes(x = unemploy/pop)) + xlab("Unemployment rate") + geom_histogram(color = "white", fill = "#e64173") + ggthemes::theme_pander(base_size = 14) ``` ] --- Add a horizontal line where count = 0. .smaller[ ```{R, gg-hist2, fig.height = 5} ggplot(data = economics, aes(x = unemploy/pop)) + xlab("Unemployment rate") + geom_histogram(color = "white", fill = "#e64173") + geom_hline(yintercept = 0) + ggthemes::theme_pander(base_size = 14) ``` ] --- `ggplot2` knows densities. --- name: gg-density A density plot. .smaller[ ```{R, gg-density1, fig.height = 5} ggplot(data = economics, aes(x = unemploy/pop)) + xlab("Unemployment rate") + geom_density(color = NA, fill = "#e64173") + geom_hline(yintercept = 0) + ggthemes::theme_pander(base_size = 14) ``` ] --- Now with Epanechnikov kernel! .smaller[ ```{R, gg-density2, fig.height = 5} ggplot(data = economics, aes(x = unemploy/pop)) + xlab("Unemployment rate") + geom_density(kernel = "epanechnikov", color = NA, fill = "#e64173") + geom_hline(yintercept = 0) + ggthemes::theme_pander(base_size = 14) ``` ] --- `ggplot2` itself is incredibly flexible/powerful. But there are [even more packages](https://www.ggplot2-exts.org/gallery/) that extend its power—_e.g._, `ggthemes`, `gganimate`, `cowplot`, `ggmap`, `ggExtra`, and (of course) `viridis`. --- name: gg-more Gapminder meets `gganimate` ```{R, ex-gganimate, include = F, cache = T, dev = "png", eval = F} # The package for animating ggplot2 p_load(gganimate, gapminder) # As before gg <- ggplot( data = gapminder %>% filter(continent != "Oceania"), aes(gdpPercap, lifeExp, size = pop, color = country) ) + geom_point(alpha = 0.7, show.legend = FALSE) + scale_colour_manual(values = country_colors) + scale_size(range = c(2, 12)) + scale_x_log10("GDP per capita", label = scales::comma) + facet_wrap(~continent) + theme_pander(base_size = 16) + theme(panel.border = element_rect(color = "grey90", fill = NA)) + # Here comes the gganimate-specific bits labs(title = "Year: {frame_time}") + ylab("Life Expectancy") + transition_time(year) + ease_aes("linear") # Save the animation anim_save( animation = gg, filename = "ex_gganimate.gif", path = here(), width = 10.5, height = 7, # units = "in", # res = 150, nframes = 56 ) ``` .center[![Gapminder](ex_gganimate.gif)] --- US births by month since 1933 ```{R, ex-new-ts, echo = F, eval = T} # Load births data; drop totals; create time variable birth_df <- read_csv("usa_birth_1933_2015.csv") %>% janitor::clean_names() %>% filter(month != "TOT") %>% mutate( month = as.numeric(month), time = year + (month-1)/12 ) # Load days of months data days_df <- read_csv("days_of_month.csv") # Clean up days days_lon <- gather(days_df, year, n_days, -Month) days_lon <- janitor::clean_names(days_lon) days_lon$year <- as.integer(days_lon$year) # Join birth_df <- left_join( x = birth_df, y = days_lon, by = c("year", "month") ) # Calculate 30-day equivalent births by month birth_df %<>% mutate( births_30day = births / n_days * 30 ) lo <- min(c(birth_df$births, birth_df$births_30day)) hi <- max(c(birth_df$births, birth_df$births_30day)) # Plot new-ish time-series graph of birth rates # Plot newfangled time-series graph of birth rates ggplot(data = birth_df %>% filter(year < 2050), aes( x = year, y = factor(month, labels = month.abb), fill = births/1e5, color = births/1e5 ) ) + geom_tile() + xlab("Year") + ylab("Month") + theme_pander(base_family = "Fira Sans Book", base_size = 20) + scale_fill_viridis("Births (100K)", option = "magma", limits = c(lo, hi)/1e5) + scale_color_viridis("Births (100K)", option = "magma", limits = c(lo, hi)/1e5) + theme( legend.position = "bottom", legend.key.width = unit(1.5, units = "in"), legend.key.height = unit(0.2, units = "in"), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), line = element_blank(), rect = element_blank(), axis.ticks = element_blank() ) ``` --- layout: true # ggplot2 --- name: ggsave ## Saving plots You can save your `ggplot2`-based figures using `ggsave()`. --- ## `ggsave()` Option 1 By default, `ggsave()` saves the last plot printed to the screen. ```{R, ex-ggsave-1, eval = F} # Create a simple scatter plot ggplot(data = fun_df, aes(x = x, y = y)) + geom_point() # Save our simple scatter plot ggsave(filename = "simple_scatter.pdf") ``` -- .note[Notes] - This example creates a PDF. Change to `".png"` for PNG, *etc.* - There several helpful, optional arguments: `path`, `width`, `height`, `dpi`. --- ## `ggsave()` Option 2 You can assign your `ggplot()` objects to memory ```{R, ex-gg-assign, eval = F} # Create a simple scatter plot named 'gg_points' gg_points <- ggplot(data = fun_df, aes(x = x, y = y)) + geom_point() ``` -- We can then save this figure with the name `gg_points` using `ggsave()` ```{R, ex-ggsave-2, eval = F} # Save our simple scatter plot name 'ggsave' ggsave( filename = "simple_scatter.pdf", plot = gg_points ) ``` --- layout: false # Resources ## There's always more `ggplot2` 1. .mono[RStudio]'s [cheat sheet for `ggplot2`](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf). 1. `ggplot2` [reference index](https://ggplot2.tidyverse.org/reference/index.html) 1. The `tidyverse` [page](https://ggplot2.tidyverse.org) on `ggplot2`. 1. Hadley Wickham's on [*Data visualization*](https://r4ds.had.co.nz/data-visualisation.html) in his data science book. --- # Table of contents .pull-left[ ### Default options .smaller[ 1. [`plot()`](#plot) - [Description](#plot) - [Examples](#ex-plot) - [Layering plots](#add) 1. [`hist()`](#hist) ]] .pull-right[ ### ggplot2 .smaller[ 1. [`ggplot2`](#ggplot2) - [Intro](#gg-intro) - [`ggplot()`](#ggplot) - [Layers](#layers) - [Building a plot](#gg-ex) - [Histogram](#gg-hist) - [Density](#gg-density) - [More](#gg-more) - [Saving](#ggsave) 1. [More resources](#resources) ]] --- exclude: true ```{R, generate pdfs, include = F, eval = T} source("../../ScriptsR/unpause.R") unpause("05RPlot.Rmd", ".", T, T) ```