Maps with ggplot2!

Maps!

Who doesn’t love a nice-looking map?!

Today you will create a map of the United States that describes geographic variation in social mobility. One concrete measure of social mobility is the fraction of people raised by low-income parents who go on to earn incomes in the top 20 percent of the national income distribution. You will use county-level data on this measure and other related variables from the Opportunity Atlas.

Where is the land of opportunity?

One version of the American Dream is that children grow up to do better than their parents. If by “better” we mean “earn more,” then the American Dream isn’t faring so well. Research by Harvard economist Raj Chetty and his team of researchers demonstrates that social mobility has declined in the United States over the last few decades. That said, there are areas of the country where high social mobility persists. Your task today is to identify those areas.

Step 1: Social Mobility Data

Download the file social_mobility.csv from Canvas.

library(pacman)
p_load(tidyverse)

mobility <- read_csv("social_mobility.csv")

# view the data
head(mobility)

## # A tibble: 6 x 8
##    fips county_name parent_inc hh_income hh_income_t20 hs_grad
##   <dbl> <chr>       <chr>          <dbl>         <dbl>   <dbl>
## 1  1001 Autauga Co~ High           52207        0.239    0.915
## 2  1001 Autauga Co~ Low            26977        0.0603   0.725
## 3  1001 Autauga Co~ Middle         38817        0.133    0.843
## 4  1003 Baldwin Co~ High           50902        0.253    0.913
## 5  1003 Baldwin Co~ Low            29832        0.0872   0.750
## 6  1003 Baldwin Co~ Middle         39821        0.155    0.851
## # ... with 2 more variables: incarceration <dbl>, teen_birth <dbl>

The data are based on detailed administrative records that follow 20 million Americans born between 1978 and 1983. To protect the identity of the individuals in the data, the Opportunity Atlas team aggregated the records to the county level. For each county, there are average adult outcomes for three discrete levels of parental income: low, middle, and high. The outcome variables include teenage birth rates, incarceration rates, high school graduation rates, household income, and the fraction of people who ended up in the top 20 percent of the income distribution.

Consider the value of hh_income_t20 for the first row. It tells you that 23.87 percent of the people who were raised by high-income parents in Autauga County, AL earned household incomes in the top 20 percent of the income distribution as adults.

Step 2: Map Data

Before you can map the social mobility data, you need to retrieve map data. To do this, you will use the maps package.

p_load(maps)

us_counties <- map_data("county")

# view the data
head(us_counties)

##        long      lat group order  region subregion
## 1 -86.50517 32.34920     1     1 alabama   autauga
## 2 -86.53382 32.35493     1     2 alabama   autauga
## 3 -86.54527 32.36639     1     3 alabama   autauga
## 4 -86.55673 32.37785     1     4 alabama   autauga
## 5 -86.57966 32.38357     1     5 alabama   autauga
## 6 -86.59111 32.37785     1     6 alabama   autauga

The us_counties data frame contains points that trace out US counties. You can visualize the data using ggplot. The geom_polygon function will plot the county borders.

us_counties %>% 
  ggplot(aes(x = long, y = lat, group = group, fill = region)) +
  # tell ggplot what to do with the data
  geom_polygon(color = "white", size = 0.1) + 
  guides(fill = FALSE)

It’s a start, but the Mercator projection isn’t exactly the most visually pleasing. Use the Albers projection from the mapproj package instead.¹

p_load(mapproj)

us_counties %>% 
  ggplot(aes(x = long, y = lat, group = group, fill = region)) +
  # tell ggplot what to do with the data
  geom_polygon(color = "white", size = 0.1) + 
  guides(fill = FALSE) +
  # change the map projection
  coord_map(projection = "albers", lat0 = 39, lat1 = 45)

Step 3: Join Data

You will join the election data with the map data using county FIPS codes.²

Preliminary steps: fetch FIPS codes

However, the us_counties dataset does not include FIPS codes. To add them, import the county.fips dataset from the maps package.

fips <- get("county.fips")

# view the data
head(fips)

##   fips        polyname
## 1 1001 alabama,autauga
## 2 1003 alabama,baldwin
## 3 1005 alabama,barbour
## 4 1007    alabama,bibb
## 5 1009  alabama,blount
## 6 1011 alabama,bullock

Next, you need to join us_counties with fips. To join two data frames, you need a common variable. Use the variable polyname. While this variable is not in the us_counties dataset, the components of the variable are (state name, county name). We can add polyname to us_counties using mutate and paste.

us_counties <- us_counties %>% 
  mutate(polyname = paste(region, subregion, sep = ","))

# take a peak
head(us_counties)

##        long      lat group order  region subregion        polyname
## 1 -86.50517 32.34920     1     1 alabama   autauga alabama,autauga
## 2 -86.53382 32.35493     1     2 alabama   autauga alabama,autauga
## 3 -86.54527 32.36639     1     3 alabama   autauga alabama,autauga
## 4 -86.55673 32.37785     1     4 alabama   autauga alabama,autauga
## 5 -86.57966 32.38357     1     5 alabama   autauga alabama,autauga
## 6 -86.59111 32.37785     1     6 alabama   autauga alabama,autauga

Then you can add FIPS codes to us_counties by using the left_join() function.

us_counties <- left_join(us_counties, fips, by = "polyname")

# take another peak
head(us_counties)

##        long      lat group order  region subregion        polyname fips
## 1 -86.50517 32.34920     1     1 alabama   autauga alabama,autauga 1001
## 2 -86.53382 32.35493     1     2 alabama   autauga alabama,autauga 1001
## 3 -86.54527 32.36639     1     3 alabama   autauga alabama,autauga 1001
## 4 -86.55673 32.37785     1     4 alabama   autauga alabama,autauga 1001
## 5 -86.57966 32.38357     1     5 alabama   autauga alabama,autauga 1001
## 6 -86.59111 32.37785     1     6 alabama   autauga alabama,autauga 1001

Make the join

Use left_join() again to join the social mobility data with the map data.

mobility <- left_join(us_counties, mobility, by = "fips")

Step 4: Social Mobility Map

Now that the mobility dataset contains map data, you can map the social mobility data. Plot the fraction of low-income children who end up earning incomes in the top 20 percent of the income distribution (variable name: hh_income_t20; subset: parent_inc == "Low").

mobility %>% 
  filter(parent_inc == "Low") %>% 
  ggplot(aes(x = long, y = lat, group = group, fill = hh_income_t20)) +
  geom_polygon(color = "white", size = 0.01) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45)

It’s a start, but the colors don’t look great and the axes clutter the map.

A better-looking map

To make a map that’s easier on the eyes, you can use 1) a color scheme from the viridis package, 2) the theme_map() option from the ggthemes package, which removes unnecessary clutter from the plot, and 3) add state borders.

p_load(viridis, ggthemes)

# data on state borders
state_df <- map_data("state")

mobility %>% 
  filter(parent_inc == "Low") %>% 
  ggplot(aes(x = long, y = lat, group = group, fill = hh_income_t20)) +
  geom_polygon(color = "white", size = 0.01) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
  # use familiar colors and add titles and a caption
  scale_fill_viridis(option = "inferno", na.value = "white") +
  labs(title = "Where is the Land of Opportunity?", 
       subtitle = "Fraction of low-income children earning top-20% incomes as adults", 
       fill = NULL, 
       caption = "Source: Opportunity Atlas.") +
  # add state borders
  geom_polygon(data = state_df, color = "white", fill = NA, size = 0.65) +
  # remove clutter
  theme_map()

What do you see?

Your Turn

Make a map using one of the other outcome variables in the mobility dataset. Your choices include:

hh_income: Average household income as adults.
hs_grad: Fraction who graduated high school.
incarceration: Fraction who went to prison.
teen_birth: Fraction who gave birth as a teenager.

You can also examine other subsets of the data:

parent_inc == "Middle".
parent_inc == "High".

Describe the data you mapped and make note of any interesting patterns that emerge. Does a story emerge from your map? Provide a discussion.

One of the computational problems of Problem Set 5 will ask you to 1) make a map using a different variable from social_mobility.csv (i.e., not hh_income_t20) and 2) describe the geographic patterns you observe.

Maps with `ggplot2`!

Kyle Raze, Youssef Ait Benasser, & Saurabh Gupta
EC 320: Introduction to Econometrics
University of Oregon

Fall 2019

Maps!

Where is the land of opportunity?

Step 2: Map Data

Step 3: Join Data

Preliminary steps: fetch FIPS codes

Make the join

Your Turn

Further reading

Maps with ggplot2!

Kyle Raze, Youssef Ait Benasser, & Saurabh Gupta EC 320: Introduction to Econometrics University of Oregon

Fall 2019

Maps!

Where is the land of opportunity?

Step 1: Social Mobility Data

Step 2: Map Data

Step 3: Join Data

Preliminary steps: fetch FIPS codes

Make the join

Step 4: Social Mobility Map

A better-looking map

Your Turn

Further reading

Maps with `ggplot2`!

Kyle Raze, Youssef Ait Benasser, & Saurabh Gupta
EC 320: Introduction to Econometrics
University of Oregon