class: center, middle, inverse, title-slide # Plotting in .mono[R] ## EC 425/525, Lab 5 ### Edward Rubin ### 10 May 2019 --- class: inverse, middle # Prologue --- name: schedule # Schedule ## Last time Regession ## Today Plotting in .mono[R] (especially `ggplot2`) --- layout: true # Plotting --- name: plotting class: inverse, middle --- name: plot ## The default option: `plot()` While we'll quickly move on to other options, .mono[R]'s `plot()` function (in the default `graphics` package) is a great tool for basic data exploration—it's fast, simple, and flexible. -- In fact, `plot()` is a generic function, that works for many classes. -- .hi-slate[General arguments] for `plot()`: - `x` and `y` for coordinates - `type =` {`"p"`oints, `"l"`ines, *etc.*} .it[.grey-light[(optional)]] - `xlab`, `ylab`, `main`, and `sub` for axis labels and (sub)title .it[.grey-light[(optional)]] - `col` and `pch` for color and plot character .it[.grey-light[(optional)]] - `lty` and `lwd` for line type, and line width .it[.grey-light[(optional)]] --- layout: false class: clear, middle Let's see `plot()` in action. ```r # Define two vectors a <- seq(from = 0, to = 2*pi, by = 0.2) b <- sin(a) ``` --- layout: true class: clear, middle --- name: ex-plot ```r plot(x = a, y = b) ``` <img src="05RPlot_files/figure-html/ex-plot1-1.svg" style="display: block; margin: auto;" /> --- ```r plot(x = a, y = b, xlab = "x", ylab = "sin(x)") ``` <img src="05RPlot_files/figure-html/ex-plot2-1.svg" style="display: block; margin: auto;" /> --- ```r plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue") ``` <img src="05RPlot_files/figure-html/ex-plot3-1.svg" style="display: block; margin: auto;" /> --- ```r plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "l") ``` <img src="05RPlot_files/figure-html/ex-plot4-1.svg" style="display: block; margin: auto;" /> --- ```r plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "b") ``` <img src="05RPlot_files/figure-html/ex-plot5-1.svg" style="display: block; margin: auto;" /> --- ```r plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "s") ``` <img src="05RPlot_files/figure-html/ex-plot6-1.svg" style="display: block; margin: auto;" /> --- name: multiple `plot()` is essentially calling `points()` or `lines()`. You can layer plots by using these individual functions. --- ```r plot(x = a, y = b, col = "blue") ``` <img src="05RPlot_files/figure-html/ex-plot7-1.svg" style="display: block; margin: auto;" /> --- ```r plot(x = a, y = b, col = "blue"); points(x = a, y = -b, col = "orange") ``` <img src="05RPlot_files/figure-html/ex-plot8-1.svg" style="display: block; margin: auto;" /> --- `graphics` also offers a nice histogram function in `hist()`. --- name: hist ```r hist(x = b, breaks = 10, col = "purple", xlab = "sin(x)", main = "Wow.") ``` <img src="05RPlot_files/figure-html/ex-hist-1.svg" style="display: block; margin: auto;" /> --- That said/done, further customization/manipulation of your graphics using `graphics` plotting functions can become quite difficult. .note[Enter] `ggplot2` --- layout: true # ggplot2 --- name: ggplot2 class: inverse, middle --- name: gg-intro ## The grammar The `ggplot2` package offers an incredibly flexible, diverse, and powerful set of functions for creating graphics in .mono[R]. -- The `gg` stands for the *grammar of graphics*. -- `ggplot2` 1. centers on a .hi-slate[data frame] (the `data` argument) 1. maps variables to .hi-slate[aesthetics] (the `aes` argument) 1. .hi-slate[layers geometries] to *build up* your graphic .note[Note] The package is called `ggplot2`, but the main function is `ggplot()`. --- name: ggplot ## `ggplot()` Main arguments 1. .hi-pink[`data`] Your dataset. As a data frame (or `tibble`). -- 2. .hi-purple[`aes()`] Maps variables in `data` to "aesthetics" like `x`, `color`, `shape`. -- .ex[Example] A time series of problems, `color` defined by money ```r library(ggplot2) ggplot( data = pretend_df, aes(x = time, y = problems, color = money) ) ``` --- name: layers ## Layers The `ggplot()` function doesn't plot anything—it *sets up* the plot. To create the actual figure, you layer .hi-slate[geometries] (*e.g.*, `geom_point()`), -- <br>.hi-slate[scales] (*e.g.*, `scale_color_manual()`), -- and other .hi-slate[options] (*e.g.*, `xlab()`). -- You .hi-slate[add layers] using the addition sign (`+`). -- .ex[Example] A time series of problems, `color` defined by money ```r library(ggplot2) ggplot( data = pretend_df, aes(x = time, y = problems, color = money) *) + *geom_point() + geom_line() ``` --- layout: true class: clear, middle --- Alright, let's build a plot. We'll use the `economics` dataset that comes with `ggplot2` <br>(because economics). ---
--- name: ex-gg .smaller[Set up the plot. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop)) ``` <img src="05RPlot_files/figure-html/gg0-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Label the axes. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") ``` <img src="05RPlot_files/figure-html/gg1-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Draw some points. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point() ``` <img src="05RPlot_files/figure-html/gg2-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Map the `size` to the median duration of unemployment. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) + ylab("Unemployment rate") + xlab("Date") + geom_point() ``` <img src="05RPlot_files/figure-html/gg3-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Change the `shape` of the points. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) + ylab("Unemployment rate") + xlab("Date") + geom_point(shape = 1) ``` <img src="05RPlot_files/figure-html/gg4-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Map points' `color` to the median duration of unemployment. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed)) ``` <img src="05RPlot_files/figure-html/gg5-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Add some transparency (`alpha`) to our points. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5) ``` <img src="05RPlot_files/figure-html/gg6-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Same size points; all bigger. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) ``` <img src="05RPlot_files/figure-html/gg7-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Change our theme—maybe you're a minimalist (but want slightly larger fonts)? ```r ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + theme_minimal(base_size = 14) ``` <img src="05RPlot_files/figure-html/gg8-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Want your figure to look like Stata made it? ```r ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + ggthemes::theme_stata(base_size = 14) ``` <img src="05RPlot_files/figure-html/gg9-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[The "pander" theme from the `ggthemes` package. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + ggthemes::theme_pander(base_size = 14) ``` <img src="05RPlot_files/figure-html/gg10-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Change (and label) our color scale. .note[Note] `viridis` [is the best](https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html). ```r ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + ggthemes::theme_pander(base_size = 14) + scale_color_viridis_c("Dur. unemp.") ``` <img src="05RPlot_files/figure-html/gg11-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[Connect the dots. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_line(color = "grey80") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + ggthemes::theme_pander(base_size = 14) + scale_color_viridis_c("Dur. unemp.") ``` <img src="05RPlot_files/figure-html/gg12-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[How about a smoother? ```r ggplot(data = economics, aes(x = date, y = unemploy/pop)) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + geom_smooth(se = F) + ggthemes::theme_pander(base_size = 14) + scale_color_viridis_c("Dur. unemp.") ``` <img src="05RPlot_files/figure-html/gg13-1.svg" style="display: block; margin: auto;" /> ] --- .smaller[The `group` aesthetic separates groups. ```r ggplot(data = economics, aes(x = date, y = unemploy/pop, group = date < ymd(19900101))) + ylab("Unemployment rate") + xlab("Date") + geom_point(aes(color = uempmed), alpha = 0.5, size = 3) + geom_smooth(se = F) + ggthemes::theme_pander(base_size = 14) + scale_color_viridis_c("Dur. unemp.") ``` <img src="05RPlot_files/figure-html/gg14-1.svg" style="display: block; margin: auto;" /> ] --- .note[Note] The `ymd()` function comes from the `lubridate` package. --- `ggplot2` knows histogams. --- name: gg-hist A histogram. .smaller[ ```r ggplot(data = economics, aes(x = unemploy/pop)) + xlab("Unemployment rate") + geom_histogram(color = "white", fill = "#e64173") + ggthemes::theme_pander(base_size = 14) ``` <img src="05RPlot_files/figure-html/gg-hist1-1.svg" style="display: block; margin: auto;" /> ] --- Add a horizontal line where count = 0. .smaller[ ```r ggplot(data = economics, aes(x = unemploy/pop)) + xlab("Unemployment rate") + geom_histogram(color = "white", fill = "#e64173") + geom_hline(yintercept = 0) + ggthemes::theme_pander(base_size = 14) ``` <img src="05RPlot_files/figure-html/gg-hist2-1.svg" style="display: block; margin: auto;" /> ] --- `ggplot2` knows densities. --- name: gg-density A density plot. .smaller[ ```r ggplot(data = economics, aes(x = unemploy/pop)) + xlab("Unemployment rate") + geom_density(color = NA, fill = "#e64173") + geom_hline(yintercept = 0) + ggthemes::theme_pander(base_size = 14) ``` <img src="05RPlot_files/figure-html/gg-density1-1.svg" style="display: block; margin: auto;" /> ] --- Now with Epanechnikov kernel! .smaller[ ```r ggplot(data = economics, aes(x = unemploy/pop)) + xlab("Unemployment rate") + geom_density(kernel = "epanechnikov", color = NA, fill = "#e64173") + geom_hline(yintercept = 0) + ggthemes::theme_pander(base_size = 14) ``` <img src="05RPlot_files/figure-html/gg-density2-1.svg" style="display: block; margin: auto;" /> ] --- `ggplot2` itself is incredibly flexible/powerful. But there are [even more packages](https://www.ggplot2-exts.org/gallery/) that extend its power—_e.g._, `ggthemes`, `gganimate`, `cowplot`, `ggmap`, `ggExtra`, and (of course) `viridis`. --- name: gg-more Gapminder meets `gganimate` .center[] --- US births by month since 1933 <img src="05RPlot_files/figure-html/ex-new-ts-1.svg" style="display: block; margin: auto;" /> --- layout: true # ggplot2 --- name: ggsave ## Saving plots You can save your `ggplot2`-based figures using `ggsave()`. --- ## `ggsave()` Option 1 By default, `ggsave()` saves the last plot printed to the screen. ```r # Create a simple scatter plot ggplot(data = fun_df, aes(x = x, y = y)) + geom_point() # Save our simple scatter plot ggsave(filename = "simple_scatter.pdf") ``` -- .note[Notes] - This example creates a PDF. Change to `".png"` for PNG, *etc.* - There several helpful, optional arguments: `path`, `width`, `height`, `dpi`. --- ## `ggsave()` Option 2 You can assign your `ggplot()` objects to memory ```r # Create a simple scatter plot named 'gg_points' gg_points <- ggplot(data = fun_df, aes(x = x, y = y)) + geom_point() ``` -- We can then save this figure with the name `gg_points` using `ggsave()` ```r # Save our simple scatter plot name 'ggsave' ggsave( filename = "simple_scatter.pdf", plot = gg_points ) ``` --- layout: false # Resources ## There's always more `ggplot2` 1. .mono[RStudio]'s [cheat sheet for `ggplot2`](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf). 1. `ggplot2` [reference index](https://ggplot2.tidyverse.org/reference/index.html) 1. The `tidyverse` [page](https://ggplot2.tidyverse.org) on `ggplot2`. 1. Hadley Wickham's on [*Data visualization*](https://r4ds.had.co.nz/data-visualisation.html) in his data science book. --- # Table of contents .pull-left[ ### Default options .smaller[ 1. [`plot()`](#plot) - [Description](#plot) - [Examples](#ex-plot) - [Layering plots](#add) 1. [`hist()`](#hist) ]] .pull-right[ ### ggplot2 .smaller[ 1. [`ggplot2`](#ggplot2) - [Intro](#gg-intro) - [`ggplot()`](#ggplot) - [Layers](#layers) - [Building a plot](#gg-ex) - [Histogram](#gg-hist) - [Density](#gg-density) - [More](#gg-more) - [Saving](#ggsave) 1. [More resources](#resources) ]] --- exclude: true