ggplot2
!Who doesn’t love a nice-looking map?!
Today you will create a map of the United States that describes geographic variation in social mobility. One concrete measure of social mobility is the fraction of people raised by low-income parents who go on to earn incomes in the top 20 percent of the national income distribution. You will use county-level data on this measure and other related variables from the Opportunity Atlas.
One version of the American Dream is that children grow up to do better than their parents. If by “better” we mean “earn more,” then the American Dream isn’t faring so well. Research by Harvard economist Raj Chetty and his team of researchers demonstrates that social mobility has declined in the United States over the last few decades. That said, there are areas of the country where high social mobility persists. Your task today is to identify those areas.
Before you can map the social mobility data, you need to retrieve map data. To do this, you will use the maps
package.
## long lat group order region subregion
## 1 -86.50517 32.34920 1 1 alabama autauga
## 2 -86.53382 32.35493 1 2 alabama autauga
## 3 -86.54527 32.36639 1 3 alabama autauga
## 4 -86.55673 32.37785 1 4 alabama autauga
## 5 -86.57966 32.38357 1 5 alabama autauga
## 6 -86.59111 32.37785 1 6 alabama autauga
The us_counties
data frame contains points that trace out US counties. You can visualize the data using ggplot
. The geom_polygon
function will plot the county borders.
us_counties %>%
ggplot(aes(x = long, y = lat, group = group, fill = region)) +
# tell ggplot what to do with the data
geom_polygon(color = "white", size = 0.1) +
guides(fill = FALSE)
It’s a start, but the Mercator projection isn’t exactly the most visually pleasing. Use the Albers projection from the mapproj
package instead.1
p_load(mapproj)
us_counties %>%
ggplot(aes(x = long, y = lat, group = group, fill = region)) +
# tell ggplot what to do with the data
geom_polygon(color = "white", size = 0.1) +
guides(fill = FALSE) +
# change the map projection
coord_map(projection = "albers", lat0 = 39, lat1 = 45)
You will join the election data with the map data using county FIPS codes.2
However, the us_counties
dataset does not include FIPS codes. To add them, import the county.fips
dataset from the maps
package.
## fips polyname
## 1 1001 alabama,autauga
## 2 1003 alabama,baldwin
## 3 1005 alabama,barbour
## 4 1007 alabama,bibb
## 5 1009 alabama,blount
## 6 1011 alabama,bullock
Next, you need to join us_counties
with fips
. To join two data frames, you need a common variable. Use the variable polyname
. While this variable is not in the us_counties
dataset, the components of the variable are (state name, county name). We can add polyname
to us_counties
using mutate
and paste
.
us_counties <- us_counties %>%
mutate(polyname = paste(region, subregion, sep = ","))
# take a peak
head(us_counties)
## long lat group order region subregion polyname
## 1 -86.50517 32.34920 1 1 alabama autauga alabama,autauga
## 2 -86.53382 32.35493 1 2 alabama autauga alabama,autauga
## 3 -86.54527 32.36639 1 3 alabama autauga alabama,autauga
## 4 -86.55673 32.37785 1 4 alabama autauga alabama,autauga
## 5 -86.57966 32.38357 1 5 alabama autauga alabama,autauga
## 6 -86.59111 32.37785 1 6 alabama autauga alabama,autauga
Then you can add FIPS codes to us_counties
by using the left_join()
function.
## long lat group order region subregion polyname fips
## 1 -86.50517 32.34920 1 1 alabama autauga alabama,autauga 1001
## 2 -86.53382 32.35493 1 2 alabama autauga alabama,autauga 1001
## 3 -86.54527 32.36639 1 3 alabama autauga alabama,autauga 1001
## 4 -86.55673 32.37785 1 4 alabama autauga alabama,autauga 1001
## 5 -86.57966 32.38357 1 5 alabama autauga alabama,autauga 1001
## 6 -86.59111 32.37785 1 6 alabama autauga alabama,autauga 1001
Use left_join()
again to join the social mobility data with the map data.
Make a map using one of the other outcome variables in the mobility
dataset. Your choices include:
hh_income
: Average household income as adults.hs_grad
: Fraction who graduated high school.incarceration
: Fraction who went to prison.teen_birth
: Fraction who gave birth as a teenager.You can also examine other subsets of the data:
parent_inc == "Middle"
.parent_inc == "High"
.Describe the data you mapped and make note of any interesting patterns that emerge. Does a story emerge from your map? Provide a discussion.
One of the computational problems of Problem Set 5 will ask you to 1) make a map using a different variable from social_mobility.csv
(i.e., not hh_income_t20
) and 2) describe the geographic patterns you observe.
This lab is based on Chapter 7 of Data Visualization, a free ebook by Duke University sociologist Kieran Healy. If you want to hone your ggplot
skills, I can’t think of a better resource. The book features numerous hands-on examples for a variety of data visualizations. Chapter 7 focuses on maps, using US elections as a running example.
What’s the deal with projections? Imagine that you have a globe that shows the political boundaries of every country on earth. Your globe is perfect: it shows the exact relative geographic position of each country, the exact shape of each country, and the area of each country in exact proportions. If you were to cut your globe in half and flatten it on a table, you would necessarily distort the relative geographic position, shape, or areas of the countries on your globe. Any map of the earth is really a flattened globe that makes distortions. The way you flatten the globe is called a projection. There are many different projections that make trade-offs between various kinds of distortions. The Albers projection sacrifices geographic position (e.g., the flat part of the US-Canada border is curved) in exchange for accurate shapes and minimal area distortions.↩
County FIPS codes provide a unique identifier for each county. There are FIPS codes for other geographies, too, such as states or metro areas. In case you were wondering, “FIPS” stands for Federal Information Processing Standards.↩