Regession
Plotting in R (especially ggplot2
)
plot()
While we'll quickly move on to other options, R's plot()
function (in the default graphics
package) is a great tool for basic data exploration—it's fast, simple, and flexible.
plot()
While we'll quickly move on to other options, R's plot()
function (in the default graphics
package) is a great tool for basic data exploration—it's fast, simple, and flexible.
In fact, plot()
is a generic function, that works for many classes.
plot()
While we'll quickly move on to other options, R's plot()
function (in the default graphics
package) is a great tool for basic data exploration—it's fast, simple, and flexible.
In fact, plot()
is a generic function, that works for many classes.
General arguments for plot()
:
x
and y
for coordinatestype =
{"p"
oints, "l"
ines, etc.} (optional)xlab
, ylab
, main
, and sub
for axis labels and (sub)title (optional)col
and pch
for color and plot character (optional)lty
and lwd
for line type, and line width (optional)Let's see plot()
in action.
# Define two vectorsa <- seq(from = 0, to = 2*pi, by = 0.2)b <- sin(a)
plot(x = a, y = b)
plot(x = a, y = b, xlab = "x", ylab = "sin(x)")
plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue")
plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "l")
plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "b")
plot(x = a, y = b, xlab = "x", ylab = "sin(x)", col = "blue", type = "s")
plot()
is essentially calling points()
or lines()
.
You can layer plots by using these individual functions.
plot(x = a, y = b, col = "blue")
plot(x = a, y = b, col = "blue"); points(x = a, y = -b, col = "orange")
graphics
also offers a nice histogram function in hist()
.
hist(x = b, breaks = 10, col = "purple", xlab = "sin(x)", main = "Wow.")
That said/done, further customization/manipulation of your graphics using graphics
plotting functions can become quite difficult.
Enter ggplot2
The ggplot2
package offers an incredibly flexible, diverse, and powerful set of functions for creating graphics in R.
The ggplot2
package offers an incredibly flexible, diverse, and powerful set of functions for creating graphics in R.
The gg
stands for the grammar of graphics.
The ggplot2
package offers an incredibly flexible, diverse, and powerful set of functions for creating graphics in R.
The gg
stands for the grammar of graphics.
ggplot2
data
argument)aes
argument)Note The package is called ggplot2
, but the main function is ggplot()
.
ggplot()
Main arguments
data
Your dataset. As a data frame (or tibble
).ggplot()
Main arguments
data
Your dataset. As a data frame (or tibble
).
aes()
Maps variables in data
to "aesthetics" like x
, color
, shape
.
ggplot()
Main arguments
data
Your dataset. As a data frame (or tibble
).
aes()
Maps variables in data
to "aesthetics" like x
, color
, shape
.
Example A time series of problems, color
defined by money
library(ggplot2)ggplot( data = pretend_df, aes(x = time, y = problems, color = money))
The ggplot()
function doesn't plot anything—it sets up the plot.
To create the actual figure, you layer geometries (e.g., geom_point()
),
The ggplot()
function doesn't plot anything—it sets up the plot.
To create the actual figure, you layer geometries (e.g., geom_point()
),
scales (e.g., scale_color_manual()
),
The ggplot()
function doesn't plot anything—it sets up the plot.
To create the actual figure, you layer geometries (e.g., geom_point()
),
scales (e.g., scale_color_manual()
), and other options (e.g., xlab()
).
The ggplot()
function doesn't plot anything—it sets up the plot.
To create the actual figure, you layer geometries (e.g., geom_point()
),
scales (e.g., scale_color_manual()
), and other options (e.g., xlab()
).
You add layers using the addition sign (+
).
The ggplot()
function doesn't plot anything—it sets up the plot.
To create the actual figure, you layer geometries (e.g., geom_point()
),
scales (e.g., scale_color_manual()
), and other options (e.g., xlab()
).
You add layers using the addition sign (+
).
Example A time series of problems, color
defined by money
library(ggplot2)ggplot( data = pretend_df, aes(x = time, y = problems, color = money)) +geom_point() + geom_line()
Alright, let's build a plot.
We'll use the economics
dataset that comes with ggplot2
(because economics).
Set up the plot.
ggplot(data = economics, aes(x = date, y = unemploy/pop))
Label the axes.
ggplot(data = economics, aes(x = date, y = unemploy/pop)) +ylab("Unemployment rate") + xlab("Date")
Draw some points.
ggplot(data = economics, aes(x = date, y = unemploy/pop)) +ylab("Unemployment rate") + xlab("Date") +geom_point()
Map the size
to the median duration of unemployment.
ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) +ylab("Unemployment rate") + xlab("Date") +geom_point()
Change the shape
of the points.
ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) +ylab("Unemployment rate") + xlab("Date") +geom_point(shape = 1)
Map points' color
to the median duration of unemployment.
ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) +ylab("Unemployment rate") + xlab("Date") +geom_point(aes(color = uempmed))
Add some transparency (alpha
) to our points.
ggplot(data = economics, aes(x = date, y = unemploy/pop, size = uempmed)) +ylab("Unemployment rate") + xlab("Date") +geom_point(aes(color = uempmed), alpha = 0.5)
Same size points; all bigger.
ggplot(data = economics, aes(x = date, y = unemploy/pop)) +ylab("Unemployment rate") + xlab("Date") +geom_point(aes(color = uempmed), alpha = 0.5, size = 3)
Change our theme—maybe you're a minimalist (but want slightly larger fonts)?
ggplot(data = economics, aes(x = date, y = unemploy/pop)) +ylab("Unemployment rate") + xlab("Date") +geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +theme_minimal(base_size = 14)
Want your figure to look like Stata made it?
ggplot(data = economics, aes(x = date, y = unemploy/pop)) +ylab("Unemployment rate") + xlab("Date") +geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +ggthemes::theme_stata(base_size = 14)
The "pander" theme from the ggthemes
package.
ggplot(data = economics, aes(x = date, y = unemploy/pop)) +ylab("Unemployment rate") + xlab("Date") +geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +ggthemes::theme_pander(base_size = 14)
Change (and label) our color scale. Note viridis
is the best.
ggplot(data = economics, aes(x = date, y = unemploy/pop)) +ylab("Unemployment rate") + xlab("Date") +geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +ggthemes::theme_pander(base_size = 14) +scale_color_viridis_c("Dur. unemp.")
Connect the dots.
ggplot(data = economics, aes(x = date, y = unemploy/pop)) +ylab("Unemployment rate") + xlab("Date") +geom_line(color = "grey80") +geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +ggthemes::theme_pander(base_size = 14) +scale_color_viridis_c("Dur. unemp.")
How about a smoother?
ggplot(data = economics, aes(x = date, y = unemploy/pop)) +ylab("Unemployment rate") + xlab("Date") +geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +geom_smooth(se = F) +ggthemes::theme_pander(base_size = 14) +scale_color_viridis_c("Dur. unemp.")
The group
aesthetic separates groups.
ggplot(data = economics, aes(x = date, y = unemploy/pop, group = date < ymd(19900101))) +ylab("Unemployment rate") + xlab("Date") +geom_point(aes(color = uempmed), alpha = 0.5, size = 3) +geom_smooth(se = F) +ggthemes::theme_pander(base_size = 14) +scale_color_viridis_c("Dur. unemp.")
Note The ymd()
function comes from the lubridate
package.
ggplot2
knows histogams.
A histogram.
ggplot(data = economics, aes(x = unemploy/pop)) +xlab("Unemployment rate") +geom_histogram(color = "white", fill = "#e64173") +ggthemes::theme_pander(base_size = 14)
Add a horizontal line where count = 0.
ggplot(data = economics, aes(x = unemploy/pop)) +xlab("Unemployment rate") +geom_histogram(color = "white", fill = "#e64173") +geom_hline(yintercept = 0) +ggthemes::theme_pander(base_size = 14)
ggplot2
knows densities.
A density plot.
ggplot(data = economics, aes(x = unemploy/pop)) +xlab("Unemployment rate") +geom_density(color = NA, fill = "#e64173") +geom_hline(yintercept = 0) +ggthemes::theme_pander(base_size = 14)
Now with Epanechnikov kernel!
ggplot(data = economics, aes(x = unemploy/pop)) +xlab("Unemployment rate") +geom_density(kernel = "epanechnikov", color = NA, fill = "#e64173") +geom_hline(yintercept = 0) +ggthemes::theme_pander(base_size = 14)
ggplot2
itself is incredibly flexible/powerful.
But there are even more packages that extend its power—e.g., ggthemes
, gganimate
, cowplot
, ggmap
, ggExtra
, and (of course) viridis
.
Gapminder meets gganimate
US births by month since 1933
You can save your ggplot2
-based figures using ggsave()
.
ggsave()
Option 1By default, ggsave()
saves the last plot printed to the screen.
# Create a simple scatter plotggplot(data = fun_df, aes(x = x, y = y)) +geom_point()# Save our simple scatter plotggsave(filename = "simple_scatter.pdf")
ggsave()
Option 1By default, ggsave()
saves the last plot printed to the screen.
# Create a simple scatter plotggplot(data = fun_df, aes(x = x, y = y)) +geom_point()# Save our simple scatter plotggsave(filename = "simple_scatter.pdf")
Notes
".png"
for PNG, etc.path
, width
, height
, dpi
.ggsave()
Option 2You can assign your ggplot()
objects to memory
# Create a simple scatter plot named 'gg_points'gg_points <- ggplot(data = fun_df, aes(x = x, y = y)) +geom_point()
ggsave()
Option 2You can assign your ggplot()
objects to memory
# Create a simple scatter plot named 'gg_points'gg_points <- ggplot(data = fun_df, aes(x = x, y = y)) +geom_point()
We can then save this figure with the name gg_points
using ggsave()
# Save our simple scatter plot name 'ggsave'ggsave( filename = "simple_scatter.pdf", plot = gg_points)
ggplot2
ggplot2
.ggplot2
reference indextidyverse
page on ggplot2
.Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |