Maps!

Who doesn’t love a nice-looking map?!

Today you will create a map of the United States that describes geographic variation in social mobility. One concrete measure of social mobility is the fraction of people raised by low-income parents who go on to earn incomes in the top 20 percent of the national income distribution. You will use county-level data on this measure and other related variables from the Opportunity Atlas.

Where is the land of opportunity?

One version of the American Dream is that children grow up to do better than their parents. If by “better” we mean “earn more,” then the American Dream isn’t faring so well. Research by Harvard economist Raj Chetty and his team of researchers demonstrates that social mobility has declined in the United States over the last few decades. That said, there are areas of the country where high social mobility persists. Your task today is to identify those areas.

Step 1: Social Mobility Data

Download the file social_mobility.csv from Canvas.

## # A tibble: 6 x 8
##    fips county_name parent_inc hh_income hh_income_t20 hs_grad
##   <dbl> <chr>       <chr>          <dbl>         <dbl>   <dbl>
## 1  1001 Autauga Co~ High           52207        0.239    0.915
## 2  1001 Autauga Co~ Low            26977        0.0603   0.725
## 3  1001 Autauga Co~ Middle         38817        0.133    0.843
## 4  1003 Baldwin Co~ High           50902        0.253    0.913
## 5  1003 Baldwin Co~ Low            29832        0.0872   0.750
## 6  1003 Baldwin Co~ Middle         39821        0.155    0.851
## # ... with 2 more variables: incarceration <dbl>, teen_birth <dbl>

The data are based on detailed administrative records that follow 20 million Americans born between 1978 and 1983. To protect the identity of the individuals in the data, the Opportunity Atlas team aggregated the records to the county level. For each county, there are average adult outcomes for three discrete levels of parental income: low, middle, and high. The outcome variables include teenage birth rates, incarceration rates, high school graduation rates, household income, and the fraction of people who ended up in the top 20 percent of the income distribution.

Consider the value of hh_income_t20 for the first row. It tells you that 23.87 percent of the people who were raised by high-income parents in Autauga County, AL earned household incomes in the top 20 percent of the income distribution as adults.

Step 2: Map Data

Before you can map the social mobility data, you need to retrieve map data. To do this, you will use the maps package.

##        long      lat group order  region subregion
## 1 -86.50517 32.34920     1     1 alabama   autauga
## 2 -86.53382 32.35493     1     2 alabama   autauga
## 3 -86.54527 32.36639     1     3 alabama   autauga
## 4 -86.55673 32.37785     1     4 alabama   autauga
## 5 -86.57966 32.38357     1     5 alabama   autauga
## 6 -86.59111 32.37785     1     6 alabama   autauga

The us_counties data frame contains points that trace out US counties. You can visualize the data using ggplot. The geom_polygon function will plot the county borders.

It’s a start, but the Mercator projection isn’t exactly the most visually pleasing. Use the Albers projection from the mapproj package instead.1

Step 3: Join Data

You will join the election data with the map data using county FIPS codes.2

Preliminary steps: fetch FIPS codes

However, the us_counties dataset does not include FIPS codes. To add them, import the county.fips dataset from the maps package.

##   fips        polyname
## 1 1001 alabama,autauga
## 2 1003 alabama,baldwin
## 3 1005 alabama,barbour
## 4 1007    alabama,bibb
## 5 1009  alabama,blount
## 6 1011 alabama,bullock

Next, you need to join us_counties with fips. To join two data frames, you need a common variable. Use the variable polyname. While this variable is not in the us_counties dataset, the components of the variable are (state name, county name). We can add polyname to us_counties using mutate and paste.

##        long      lat group order  region subregion        polyname
## 1 -86.50517 32.34920     1     1 alabama   autauga alabama,autauga
## 2 -86.53382 32.35493     1     2 alabama   autauga alabama,autauga
## 3 -86.54527 32.36639     1     3 alabama   autauga alabama,autauga
## 4 -86.55673 32.37785     1     4 alabama   autauga alabama,autauga
## 5 -86.57966 32.38357     1     5 alabama   autauga alabama,autauga
## 6 -86.59111 32.37785     1     6 alabama   autauga alabama,autauga

Then you can add FIPS codes to us_counties by using the left_join() function.

##        long      lat group order  region subregion        polyname fips
## 1 -86.50517 32.34920     1     1 alabama   autauga alabama,autauga 1001
## 2 -86.53382 32.35493     1     2 alabama   autauga alabama,autauga 1001
## 3 -86.54527 32.36639     1     3 alabama   autauga alabama,autauga 1001
## 4 -86.55673 32.37785     1     4 alabama   autauga alabama,autauga 1001
## 5 -86.57966 32.38357     1     5 alabama   autauga alabama,autauga 1001
## 6 -86.59111 32.37785     1     6 alabama   autauga alabama,autauga 1001

Make the join

Use left_join() again to join the social mobility data with the map data.

Step 4: Social Mobility Map

Now that the mobility dataset contains map data, you can map the social mobility data. Plot the fraction of low-income children who end up earning incomes in the top 20 percent of the income distribution (variable name: hh_income_t20; subset: parent_inc == "Low").

It’s a start, but the colors don’t look great and the axes clutter the map.

Your Turn

Make a map using one of the other outcome variables in the mobility dataset. Your choices include:

  • hh_income: Average household income as adults.
  • hs_grad: Fraction who graduated high school.
  • incarceration: Fraction who went to prison.
  • teen_birth: Fraction who gave birth as a teenager.

You can also examine other subsets of the data:

  • parent_inc == "Middle".
  • parent_inc == "High".

Describe the data you mapped and make note of any interesting patterns that emerge. Does a story emerge from your map? Provide a discussion.

One of the computational problems of Problem Set 5 will ask you to 1) make a map using a different variable from social_mobility.csv (i.e., not hh_income_t20) and 2) describe the geographic patterns you observe.

Further reading

This lab is based on Chapter 7 of Data Visualization, a free ebook by Duke University sociologist Kieran Healy. If you want to hone your ggplot skills, I can’t think of a better resource. The book features numerous hands-on examples for a variety of data visualizations. Chapter 7 focuses on maps, using US elections as a running example.


  1. What’s the deal with projections? Imagine that you have a globe that shows the political boundaries of every country on earth. Your globe is perfect: it shows the exact relative geographic position of each country, the exact shape of each country, and the area of each country in exact proportions. If you were to cut your globe in half and flatten it on a table, you would necessarily distort the relative geographic position, shape, or areas of the countries on your globe. Any map of the earth is really a flattened globe that makes distortions. The way you flatten the globe is called a projection. There are many different projections that make trade-offs between various kinds of distortions. The Albers projection sacrifices geographic position (e.g., the flat part of the US-Canada border is curved) in exchange for accurate shapes and minimal area distortions.

  2. County FIPS codes provide a unique identifier for each county. There are FIPS codes for other geographies, too, such as states or metro areas. In case you were wondering, “FIPS” stands for Federal Information Processing Standards.

 

Kyle Raze