+ - 0:00:00
Notes for current slide
Notes for next slide

Plotting in R

EC 425/525, Lab 5

Edward Rubin

10 May 2019

1 / 54

Prologue

2 / 54

Schedule

Last time

Regession

Today

Plotting in R (especially ggplot2)

3 / 54

Plotting

4 / 54

Plotting

The default option: plot()

While we'll quickly move on to other options, R's plot() function (in the default graphics package) is a great tool for basic data exploration—it's fast, simple, and flexible.

5 / 54

Plotting

The default option: plot()

While we'll quickly move on to other options, R's plot() function (in the default graphics package) is a great tool for basic data exploration—it's fast, simple, and flexible.

In fact, plot() is a generic function, that works for many classes.

5 / 54

Plotting

The default option: plot()

While we'll quickly move on to other options, R's plot() function (in the default graphics package) is a great tool for basic data exploration—it's fast, simple, and flexible.

In fact, plot() is a generic function, that works for many classes.

General arguments for plot():

  • x and y for coordinates
  • type = {"p"oints, "l"ines, etc.} (optional)
  • xlab, ylab, main, and sub for axis labels and (sub)title (optional)
  • col and pch for color and plot character (optional)
  • lty and lwd for line type, and line width (optional)
5 / 54

Let's see plot() in action.

# Define two vectors
a <- seq(from = 0, to = 2*pi, by = 0.2)
b <- sin(a)
6 / 54
plot(x = a, y = b)

7 / 54
plot(x = a, y = b, xlab = "x", ylab = "sin(x)")

8 / 54
plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue")

9 / 54
plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "l")

10 / 54
plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "b")

11 / 54
plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "s")

12 / 54

plot() is essentially calling points() or lines().

You can layer plots by using these individual functions.

13 / 54
plot(x = a, y = b, col = "blue")

14 / 54
plot(x = a, y = b, col = "blue"); points(x = a, y = -b, col = "orange")

15 / 54

graphics also offers a nice histogram function in hist().

16 / 54
hist(x = b, breaks = 10, col = "purple", xlab = "sin(x)", main = "Wow.")

17 / 54

That said/done, further customization/manipulation of your graphics using graphics plotting functions can become quite difficult.

Enter ggplot2

18 / 54

ggplot2

19 / 54

ggplot2

The grammar

The ggplot2 package offers an incredibly flexible, diverse, and powerful set of functions for creating graphics in R.

20 / 54

ggplot2

The grammar

The ggplot2 package offers an incredibly flexible, diverse, and powerful set of functions for creating graphics in R.

The gg stands for the grammar of graphics.

20 / 54

ggplot2

The grammar

The ggplot2 package offers an incredibly flexible, diverse, and powerful set of functions for creating graphics in R.

The gg stands for the grammar of graphics.

ggplot2

  1. centers on a data frame (the data argument)
  2. maps variables to aesthetics (the aes argument)
  3. layers geometries to build up your graphic

Note The package is called ggplot2, but the main function is ggplot().

20 / 54

ggplot2

ggplot()

Main arguments

  1. data Your dataset. As a data frame (or tibble).
21 / 54

ggplot2

ggplot()

Main arguments

  1. data Your dataset. As a data frame (or tibble).

  2. aes() Maps variables in data to "aesthetics" like x, color, shape.

21 / 54

ggplot2

ggplot()

Main arguments

  1. data Your dataset. As a data frame (or tibble).

  2. aes() Maps variables in data to "aesthetics" like x, color, shape.

Example A time series of problems, color defined by money

library(ggplot2)
ggplot(
data = pretend_df,
aes(x = time, y = problems, color = money)
)
21 / 54

ggplot2

Layers

The ggplot() function doesn't plot anything—it sets up the plot.

To create the actual figure, you layer geometries (e.g., geom_point()),

22 / 54

ggplot2

Layers

The ggplot() function doesn't plot anything—it sets up the plot.

To create the actual figure, you layer geometries (e.g., geom_point()),
scales (e.g., scale_color_manual()),

22 / 54

ggplot2

Layers

The ggplot() function doesn't plot anything—it sets up the plot.

To create the actual figure, you layer geometries (e.g., geom_point()),
scales (e.g., scale_color_manual()), and other options (e.g., xlab()).

22 / 54

ggplot2

Layers

The ggplot() function doesn't plot anything—it sets up the plot.

To create the actual figure, you layer geometries (e.g., geom_point()),
scales (e.g., scale_color_manual()), and other options (e.g., xlab()).

You add layers using the addition sign (+).

22 / 54

ggplot2

Layers

The ggplot() function doesn't plot anything—it sets up the plot.

To create the actual figure, you layer geometries (e.g., geom_point()),
scales (e.g., scale_color_manual()), and other options (e.g., xlab()).

You add layers using the addition sign (+).

Example A time series of problems, color defined by money

library(ggplot2)
ggplot(
data = pretend_df,
aes(x = time, y = problems, color = money)
) +
geom_point() + geom_line()
22 / 54

Alright, let's build a plot.

We'll use the economics dataset that comes with ggplot2
(because economics).

23 / 54
 
24 / 54

Set up the plot.

ggplot(data = economics, aes(x = date, y = unemploy/pop))

25 / 54

Label the axes.

ggplot(data = economics, aes(x = date, y = unemploy/pop)) +
ylab("Unemployment rate") + xlab("Date")

26 / 54

Draw some points.

ggplot(data = economics, aes(x = date, y = unemploy/pop)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point()

27 / 54

Map the size to the median duration of unemployment.

ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point()

28 / 54

Change the shape of the points.

ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point(shape = 1)

29 / 54

Map points' color to the median duration of unemployment.

ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point(aes(color = uempmed))

30 / 54

Add some transparency (alpha) to our points.

ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point(aes(color = uempmed), alpha = 0.5)

31 / 54

Same size points; all bigger.

ggplot(data = economics, aes(x = date, y = unemploy/pop)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point(aes(color = uempmed), alpha = 0.5, size = 3)

32 / 54

Change our theme—maybe you're a minimalist (but want slightly larger fonts)?

ggplot(data = economics, aes(x = date, y = unemploy/pop)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +
theme_minimal(base_size = 14)

33 / 54

Want your figure to look like Stata made it?

ggplot(data = economics, aes(x = date, y = unemploy/pop)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +
ggthemes::theme_stata(base_size = 14)

34 / 54

The "pander" theme from the ggthemes package.

ggplot(data = economics, aes(x = date, y = unemploy/pop)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +
ggthemes::theme_pander(base_size = 14)

35 / 54

Change (and label) our color scale. Note viridis is the best.

ggplot(data = economics, aes(x = date, y = unemploy/pop)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +
ggthemes::theme_pander(base_size = 14) +
scale_color_viridis_c("Dur. unemp.")

36 / 54

Connect the dots.

ggplot(data = economics, aes(x = date, y = unemploy/pop)) +
ylab("Unemployment rate") + xlab("Date") +
geom_line(color = "grey80") +
geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +
ggthemes::theme_pander(base_size = 14) +
scale_color_viridis_c("Dur. unemp.")

37 / 54

How about a smoother?

ggplot(data = economics, aes(x = date, y = unemploy/pop)) +
ylab("Unemployment rate") + xlab("Date") +
geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +
geom_smooth(se = F) +
ggthemes::theme_pander(base_size = 14) +
scale_color_viridis_c("Dur. unemp.")

38 / 54

The group aesthetic separates groups.

ggplot(data = economics, aes(x = date, y = unemploy/pop, group = date < ymd(19900101))) +
ylab("Unemployment rate") + xlab("Date") +
geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +
geom_smooth(se = F) +
ggthemes::theme_pander(base_size = 14) +
scale_color_viridis_c("Dur. unemp.")

39 / 54

Note The ymd() function comes from the lubridate package.

40 / 54

ggplot2 knows histogams.

41 / 54

A histogram.

ggplot(data = economics, aes(x = unemploy/pop)) +
xlab("Unemployment rate") +
geom_histogram(color = "white", fill = "#e64173") +
ggthemes::theme_pander(base_size = 14)

42 / 54

Add a horizontal line where count = 0.

ggplot(data = economics, aes(x = unemploy/pop)) +
xlab("Unemployment rate") +
geom_histogram(color = "white", fill = "#e64173") +
geom_hline(yintercept = 0) +
ggthemes::theme_pander(base_size = 14)

43 / 54

ggplot2 knows densities.

44 / 54

A density plot.

ggplot(data = economics, aes(x = unemploy/pop)) +
xlab("Unemployment rate") +
geom_density(color = NA, fill = "#e64173") +
geom_hline(yintercept = 0) +
ggthemes::theme_pander(base_size = 14)

45 / 54

Now with Epanechnikov kernel!

ggplot(data = economics, aes(x = unemploy/pop)) +
xlab("Unemployment rate") +
geom_density(kernel = "epanechnikov", color = NA, fill = "#e64173") +
geom_hline(yintercept = 0) +
ggthemes::theme_pander(base_size = 14)

46 / 54

ggplot2 itself is incredibly flexible/powerful.

But there are even more packages that extend its power—e.g., ggthemes, gganimate, cowplot, ggmap, ggExtra, and (of course) viridis.

47 / 54

Gapminder meets gganimate

Gapminder

48 / 54

US births by month since 1933

49 / 54

ggplot2

Saving plots

You can save your ggplot2-based figures using ggsave().

50 / 54

ggplot2

ggsave() Option 1

By default, ggsave() saves the last plot printed to the screen.

# Create a simple scatter plot
ggplot(data = fun_df, aes(x = x, y = y)) +
geom_point()
# Save our simple scatter plot
ggsave(filename = "simple_scatter.pdf")
51 / 54

ggplot2

ggsave() Option 1

By default, ggsave() saves the last plot printed to the screen.

# Create a simple scatter plot
ggplot(data = fun_df, aes(x = x, y = y)) +
geom_point()
# Save our simple scatter plot
ggsave(filename = "simple_scatter.pdf")

Notes

  • This example creates a PDF. Change to ".png" for PNG, etc.
  • There several helpful, optional arguments: path, width, height, dpi.
51 / 54

ggplot2

ggsave() Option 2

You can assign your ggplot() objects to memory

# Create a simple scatter plot named 'gg_points'
gg_points <- ggplot(data = fun_df, aes(x = x, y = y)) +
geom_point()
52 / 54

ggplot2

ggsave() Option 2

You can assign your ggplot() objects to memory

# Create a simple scatter plot named 'gg_points'
gg_points <- ggplot(data = fun_df, aes(x = x, y = y)) +
geom_point()

We can then save this figure with the name gg_points using ggsave()

# Save our simple scatter plot name 'ggsave'
ggsave(
filename = "simple_scatter.pdf",
plot = gg_points
)
52 / 54

Resources

There's always more

ggplot2

  1. RStudio's cheat sheet for ggplot2.
  2. ggplot2 reference index
  3. The tidyverse page on ggplot2.
  4. Hadley Wickham's on Data visualization in his data science book.
53 / 54

Prologue

2 / 54
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow