class: center, middle, inverse, title-slide .title[ # Lecture 16: Spatial Analysis, Part 2 ] .author[ ### Nick Hagerty*
ECNS 460/560 Fall 2023
Montana State University ] .date[ ### .small[
*Adapted from
“R Geospatial Fundamentals”
by the UC Berkeley D-Lab, used under
CC BY-NC 4.0
.] ] --- name: toc <style type="text/css"> # CSS for including pauses in printed PDF output (see bottom of lecture) @media print { .has-continuation { display: block !important; } } .remark-code-line { font-size: 95%; } .small { font-size: 75%; } .medsmall { font-size: 90%; } .scroll-output-full { height: 90%; overflow-y: scroll; } .scroll-output-75 { height: 75%; overflow-y: scroll; } </style> # Table of contents **Part 1** 1. Spatial data and quick mapping 1. Reference systems and projections **Part 2** 1. [Spatial queries](#queries) 1. [Spatial subsetting](#subset) 1. [Geometry operations](#geometry) 1. [Spatial joins](#joins) --- # Preliminaries Set your working directory (in your console) to the location of this file. E.g.: ```r setwd("C:/git/491/course-materials/lecture-slides/16-Spatial") ``` Run this (from Part 1): ```r library(sf) library(tidyverse) library(tmap) counties = st_read("data/california_counties/CaliforniaCounties.shp") |> st_make_valid() alameda = counties |> filter(NAME == "Alameda") ``` --- class: inverse, middle name: queries # Spatial queries --- # Types of spatial queries **Spatial measurement** - Returns a continuous numerical answer. - **Length:** What is the length of I-90 in Montana? - **Area:** What is the area of Gallatin County? - **Distance:** What is the distance between Bozeman and Livingston? **Spatial relationships** - Returns TRUE or FALSE, or the set of matching features. - **Intersect:** What states is Yellowstone National Park in? - **Within:** Is the city of Bozeman entirely within Gallatin County? - **Cross:** What counties does the Madison River pass through? --- # Calculate area ```r st_area(alameda) ``` ``` ## 1927093261 [m^2] ``` Too big! -- ```r st_area(alameda)/1000000 ``` ``` ## 1927.093 [m^2] ``` Wait, the units are wrong! -- ```r library(units) set_units(st_area(alameda), km^2) ``` ``` ## 1927.093 [km^2] ``` --- # Calculate area For all counties (add as new column/attribute): ```r counties2 = counties |> cbind(area = set_units(st_area(counties), km^2)) mean(counties2$area) ``` ``` ## 7061.969 [km^2] ``` --- # Calculate area Spatial measurements can differ greatly depending on the CRS. ```r # Calculate area using data in WGS84 CRS (4326) counties2$area_wgs84 = set_units(st_area(st_transform(counties, 4326)), km^2) # Calculate area using data in Web Mercator CRS (3857) counties2$area_web = set_units(st_area(st_transform(counties, 3857)), km^2) # Take a look at the results counties2 |> select(starts_with("area")) |> head() ``` ``` ## Simple feature collection with 6 features and 3 fields ## Geometry type: GEOMETRY ## Dimension: XY ## Bounding box: xmin: -267387.9 ymin: -578158.6 xmax: 216677.6 ymax: 352693.6 ## Projected CRS: NAD83 / California Albers ## area area_wgs84 area_web ## 1 21137.940 [km^2] 21137.848 [km^2] 31840.623 [km^2] ## 2 3603.706 [km^2] 3603.104 [km^2] 5528.030 [km^2] ## 3 3443.291 [km^2] 3440.360 [km^2] 5725.400 [km^2] ## 4 12225.762 [km^2] 12210.926 [km^2] 21276.793 [km^2] ## 5 10585.866 [km^2] 10588.192 [km^2] 15559.365 [km^2] ## 6 5577.010 [km^2] 5574.650 [km^2] 8810.596 [km^2] ## geometry ## 1 POLYGON ((213672.6 -243975,... ## 2 POLYGON ((12524.03 -179431.... ## 3 POLYGON ((-235734.3 155339.... ## 4 POLYGON ((12.28914 351918.3... ## 5 MULTIPOLYGON (((147026.8 -5... ## 6 MULTIPOLYGON (((11664.15 -1... ``` --- # Calculate area Of these: * Albers is an equal area CRS optimized for CA, so area measurements will be highly accurate. * WGS84 is a geographic (unprojected) CRS in degrees... but `sf` calculates area using spherical geometry! * Web Mercator is terrible! Preserves shape, but area is highly distorted. **Your choice of CRS is absolutely critical to accurate calculations.** --- # Calculate distance Load BART rapid transit stations: ```r stations = st_read("data/bart_stations_2019.geojson") ``` ``` ## Reading layer `bart_stations_2019' from data source ## `C:\git\491\course-materials\lecture-slides\16-Spatial\data\bart_stations_2019.geojson' ## using driver `GeoJSON' ## Simple feature collection with 48 features and 11 fields ## Geometry type: POINT ## Dimension: XY ## Bounding box: xmin: -122.4691 ymin: 37.50217 xmax: -121.7799 ymax: 38.01891 ## Geodetic CRS: WGS 84 ``` Calculate distance from each station to the Oakland Airport: ```r st_distance(filter(stations, station_na == "Oakland Airport"), stations) ``` ``` ## Units: [m] ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] ## [1,] 11334.6 19134.84 11680.88 18629.62 16331.04 20718.99 7752.583 12274.3 ## [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] ## [1,] 19232.88 4692.455 22576.08 33128.98 22612.12 18122.17 25284.96 27566.01 ## [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] ## [1,] 18505.79 27025.27 6929.976 19616.87 12030 21436.17 10416.66 13752.12 ## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] ## [1,] 19844.22 18640.85 18937.08 36202.1 0 18529.67 27585.41 41289.05 ## [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] ## [1,] 22395.15 18948.58 27774.66 15018.23 4619.099 19826.3 19199.73 16217.29 ## [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] ## [1,] 21109.25 21908.65 33595.12 24872.9 25024.8 12529.76 49234.41 44088.15 ``` --- class: inverse, middle name: subsets # Spatial subsetting --- # Spatial subsetting Often we want to **subset** (i.e., filter) features according to their spatial relationship with another set of features. Types of relationships (queries that are either `TRUE` or `FALSE`): .pull-left[  ] .pull-right[ Common `sf` functions: * `st_intersects` (most general) * `st_within` * `st_contains` (the inverse of `st_within`) ] These functions are **binary predicates** that return matrices of `TRUE` and `FALSE` for all permutations of features in two shapefiles. They are most useful in combination with `st_filter`. --- # Within vs. intersect Load a shapefile of protected areas: ```r protected = st_read("data/protected_areas/CPAD_2020a_units.shp") ``` ``` ## Reading layer `CPAD_2020a_Units' from data source ## `C:\git\491\course-materials\lecture-slides\16-Spatial\data\protected_areas\CPAD_2020a_Units.shp' ## using driver `ESRI Shapefile' ## Simple feature collection with 17068 features and 21 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -374984.2 ymin: -604454.8 xmax: 540016.3 ymax: 449743.2 ## Projected CRS: NAD83 / California Albers ``` ```r # Verify CRS is the same as county stopifnot(st_crs(protected) == st_crs(alameda)) ``` --- # Within vs. intersect Map them (takes a minute): ```r tm_shape(alameda) + tm_polygons() + tm_shape(protected) + tm_polygons(col = "green") ``` <img src="16-Spatial-part2_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- # Within vs. intersect Some protected areas are completely **within** the county, some **overlap**, and some are completely **outside**. * `st_intersects` returns both the "within" and "overlap" cases. * `st_within` returns only the "completely within" cases. Select the protected areas matching each of these criteria: ```r protected_intersect = st_filter(protected, alameda) # default: st_intersects protected_within = st_filter(protected, alameda, .predicate = st_within) # Alternative syntax using [column, row] subsetting: # protected_within = protected[alameda, , op = st_within] ``` --- # Within vs. intersect Map them: ```r tm_shape(alameda) + tm_polygons(alpha = 0.5) + tm_shape(protected_intersect) + tm_polygons(col = "green", alpha = 0.5) + tm_shape(protected_within) + tm_polygons(col = "blue", alpha = 0.5) ``` <img src="16-Spatial-part2_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- # Challenge **Map only the BART stations that are within Alameda County.** Make them nice big dots with a fun color. (Hint: use `tm_dots()`.) --- class: inverse, middle name: geometry # Geometry operations --- # Intersections What if we want only the parts of the protected areas that fall within the county? * Now we aren't just *selecting* features; we're *manipulating* their geometry. We need a different spatial operation: `st_intersection`. * It returns a new set of features that are **clipped** to the boundary of the county. ```r protected_intersection = st_intersection(protected, alameda) ``` --- # Intersect vs. intersection ```r tm_shape(alameda) + tm_polygons() + tm_shape(protected_intersect) + tm_polygons(col = "green", alpha = 0.5) + tm_shape(protected_intersection) + tm_polygons(col = "red", alpha = 0.5) ``` <img src="16-Spatial-part2_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- # Intersections Check the resulting object: ```r table(protected_intersection$COUNTY) ``` ``` ## ## Alameda Contra Costa San Joaquin Santa Clara ## 645 12 1 3 ``` Whoops! What went wrong? --- # Intersections .scroll-output-full[ ```r str(protected_intersection) ``` ``` ## Classes 'sf' and 'data.frame': 661 obs. of 46 variables: ## $ ACCESS_TYP: chr "Open Access" "Open Access" "Open Access" "Open Access" ... ## $ UNIT_ID : int 185 366 586 1438 48353 1631 1962 3448 3504 5062 ... ## $ UNIT_NAME : chr "Augustin Bernal Park" "San Antonio Park" "Quarry Lakes Regional Recreation Area" "Tennis & Community Park" ... ## $ SUID_NMA : int 8732 24832 30594 26243 32917 26365 27270 23819 20226 33002 ... ## $ AGNCY_ID : int 1257 1228 2032 1257 1090 1001 1306 2032 1228 1178 ... ## $ AGNCY_NAME: chr "Pleasanton, City of" "Oakland, City of" "East Bay Regional Park District" "Pleasanton, City of" ... ## $ AGNCY_LEV : chr "City" "City" "Special District" "City" ... ## $ AGNCY_TYP : chr "City Agency" "City Agency" "Recreation/Parks District" "City Agency" ... ## $ AGNCY_WEB : chr "http://www.cityofpleasantonca.gov/" "http://www2.oaklandnet.com/Government/o/opr/index.htm" "http://www.ebparks.org/" "http://www.cityofpleasantonca.gov/" ... ## $ LAYER : chr "City" "City" "Special District" "City" ... ## $ MNG_AG_ID : int 1257 1228 2032 1257 1090 1001 1306 2032 1228 1178 ... ## $ MNG_AGENCY: chr "Pleasanton, City of" "Oakland, City of" "East Bay Regional Park District" "Pleasanton, City of" ... ## $ MNG_AG_LEV: chr "City" "City" "Special District" "City" ... ## $ MNG_AG_TYP: chr "City Agency" "City Agency" "Recreation/Parks District" "City Agency" ... ## $ PARK_URL : chr "http://www.cityofpleasantonca.gov/services/recreation/rec-augustinpark.html" NA NA NA ... ## $ COUNTY : chr "Alameda" "Alameda" "Alameda" "Alameda" ... ## $ ACRES : num 217.39 10.62 254.62 15.6 4.99 ... ## $ LABEL_NAME: chr "Augustin Bernal Park" "San Antonio Park" "Quarry Lakes Reg. Rec. Area" "Tennis & Community Park" ... ## $ YR_EST : int 0 0 2001 0 2018 0 NA NA 0 0 ... ## $ DES_TP : chr "Local Park" "Local Park" "Local Recreation Area" "Local Park" ... ## $ GAP_STS : chr "4" "4" "4" "4" ... ## $ NAME : chr "Alameda" "Alameda" "Alameda" "Alameda" ... ## $ STATE_NAME: chr "California" "California" "California" "California" ... ## $ POP2010 : int 1510271 1510271 1510271 1510271 1510271 1510271 1510271 1510271 1510271 1510271 ... ## $ POP10_SQMI: num 2030 2030 2030 2030 2030 ... ## $ POP2012 : int 1534551 1534551 1534551 1534551 1534551 1534551 1534551 1534551 1534551 1534551 ... ## $ POP12_SQMI: num 2062 2062 2062 2062 2062 ... ## $ WHITE : int 649122 649122 649122 649122 649122 649122 649122 649122 649122 649122 ... ## $ BLACK : int 190451 190451 190451 190451 190451 190451 190451 190451 190451 190451 ... ## $ AMERI_ES : int 9799 9799 9799 9799 9799 9799 9799 9799 9799 9799 ... ## $ ASIAN : int 394560 394560 394560 394560 394560 394560 394560 394560 394560 394560 ... ## $ HAWN_PI : int 12802 12802 12802 12802 12802 12802 12802 12802 12802 12802 ... ## $ HISPANIC : int 339889 339889 339889 339889 339889 339889 339889 339889 339889 339889 ... ## $ OTHER : int 162540 162540 162540 162540 162540 162540 162540 162540 162540 162540 ... ## $ MULT_RACE : int 90997 90997 90997 90997 90997 90997 90997 90997 90997 90997 ... ## $ MALES : int 740573 740573 740573 740573 740573 740573 740573 740573 740573 740573 ... ## $ FEMALES : int 769698 769698 769698 769698 769698 769698 769698 769698 769698 769698 ... ## $ MED_AGE : num 36.6 36.6 36.6 36.6 36.6 36.6 36.6 36.6 36.6 36.6 ... ## $ AVE_HH_SZ : num 2.7 2.7 2.7 2.7 2.7 2.7 2.7 2.7 2.7 2.7 ... ## $ AVE_FAM_SZ: num 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 3.3 ... ## $ HSE_UNITS : int 582549 582549 582549 582549 582549 582549 582549 582549 582549 582549 ... ## $ VACANT : int 37411 37411 37411 37411 37411 37411 37411 37411 37411 37411 ... ## $ OWNER_OCC : int 291242 291242 291242 291242 291242 291242 291242 291242 291242 291242 ... ## $ RENTER_OCC: int 253896 253896 253896 253896 253896 253896 253896 253896 253896 253896 ... ## $ CountyFIPS: chr "06068" "06068" "06068" "06068" ... ## $ geometry :sfc_GEOMETRY of length 661; first list element: List of 1 ## ..$ : num [1:17, 1:2] -168711 -168257 -167928 -167934 -168561 ... ## ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg" ## - attr(*, "sf_column")= chr "geometry" ## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ... ## ..- attr(*, "names")= chr [1:45] "ACCESS_TYP" "UNIT_ID" "UNIT_NAME" "SUID_NMA" ... ``` ] --- # Intersections `st_intersection` returns attributes from **both** intersecting features. * Mindlessly, without checking whether they still makes sense for the *intersection* geometry. * For example, `ACRES` describes the area of the original, full protected area -- not only the part that is in Alameda County. -- **Be careful when interpreting attributes after geometry operations.** --- # Other geometry operations  --- # Challenge **Combine these Southern California counties into one shape and map it:** Los Angeles, Orange, and San Diego. --- # Centroids The **geographic centroid** of a polygon is its "center of gravity". * A way to create a point representation of a more complex object. ```r county_centroids = st_centroid(counties) tm_shape(counties) + tm_borders() + tm_shape(county_centroids) + tm_dots(col = "red") ``` <img src="16-Spatial-part2_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- # Buffering **How many protected areas are within walking distance of a BART station?** Let's say walking distance is 1 km, or about 0.6 miles. First, **buffer** the BART stations: ```r stations_1km = st_buffer(stations, dist = 1000) # in the CRS's length units tm_shape(stations_1km) + tm_polygons(col = "pink") + tm_shape(stations) + tm_dots() ``` <img src="16-Spatial-part2_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- # Buffering Now, we can subset the protected areas by whether they intersect one of these buffer polygons. ```r protected_transit = st_filter(protected, stations_1km) ``` ``` ## Error in `stopifnot()`: ## ℹ In argument: `lengths(.predicate(x, y, ...)) > 0`. ## Caused by error in `st_geos_binop()`: ## ! st_crs(x) == st_crs(y) is not TRUE ``` Oops... we forgot to put them in the same CRS. --- # Buffering ```r stations_1km_proj = st_transform(stations_1km, st_crs(protected)) protected_transit = st_filter(protected, stations_1km_proj) tm_shape(stations_1km) + tm_polygons(col = "pink") + tm_shape(protected_transit) + tm_polygons(col = "green", alpha = 0.4) ``` <img src="16-Spatial-part2_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> --- # Challenge **What fraction of the area of Alameda County is within 2 km of a BART station?** Hint: Make a step-by-step plan before you begin. --- class: inverse, middle name: joins # Spatial joins --- # Spatial joins **Spatial joins** let us conduct graphical, statistical, or econometric analysis on variables that are originally at different levels of spatial aggregation. **Are parent incomes predictive of school academic performance?** What data would you ideally want to answer this question? -- What we have: * By school: academic performance index (API), location. * By Census tract: median household income, tract boundary. How can we link this data for an imperfect approximation? --- # Spatial joins Load the data: ```r tracts = st_read("data/census/tracts_acs_sdf_ac.json") ``` ``` ## Reading layer `tracts_acs_sdf_ac' from data source ## `C:\git\491\course-materials\lecture-slides\16-Spatial\data\census\tracts_acs_sdf_ac.json' ## using driver `GeoJSON' ## Simple feature collection with 360 features and 52 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -122.3423 ymin: 37.45419 xmax: -121.4692 ymax: 37.90582 ## Geodetic CRS: NAD83 ``` ```r schools = read_csv("data/alco_schools.csv") ``` --- # Spatial joins Look at the data: .medsmall[ ```r colnames(tracts) ``` ``` ## [1] "GEOID" "STATEFP" "COUNTYFP" "TRACTCE" "AFFGEOID" ## [6] "NAME.x" "LSAD" "ALAND" "AWATER" "NAME.y" ## [11] "c_race" "c_white" "c_black" "c_asian" "c_latinx" ## [16] "state_fips" "county_fips" "tract_fips" "med_rent" "med_hhinc" ## [21] "c_tenants" "c_owners" "c_renters" "c_movers" "c_stay" ## [26] "c_movelocal" "c_movecounty" "c_movestate" "c_moveabroad" "c_commute" ## [31] "c_car" "c_carpool" "c_transit" "c_bike" "c_walk" ## [36] "year" "p_white" "p_black" "p_asian" "p_latinx" ## [41] "p_owners" "p_renters" "p_stay" "p_movelocal" "p_movecounty" ## [46] "p_movestate" "p_moveabroad" "p_car" "p_carpool" "p_transit" ## [51] "p_bike" "p_walk" "geometry" ``` ```r head(schools) ``` ``` ## # A tibble: 6 × 9 ## X Y Site Address City State Type API Org ## <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> ## 1 -122. 37.7 Amelia Earhart Elementary 400 Packe… Alam… CA ES 933 Publ… ## 2 -122. 37.7 Bay Farm Elementary 200 Aughi… Alam… CA ES 932 Publ… ## 3 -122. 37.8 Donald D. Lum Elementary 1801 Sand… Alam… CA ES 853 Publ… ## 4 -122. 37.8 Edison Elementary 2700 Buen… Alam… CA ES 927 Publ… ## 5 -122. 37.8 Frank Otis Elementary 3010 Fill… Alam… CA ES 894 Publ… ## 6 -122. 37.8 Franklin Elementary 1433 San … Alam… CA ES 893 Publ… ``` ] --- # Spatial joins The schools dataset is just a csv, not a shapefile / feature collection. * But it does have lat/lon coordinates, from which we can create a point layer. Convert to an `sf` object. (GCS is unknown, but fine to assume it's WGS84.) ```r wgs84 = 4326 schools = read_csv("data/alco_schools.csv") |> st_as_sf(coords = c("X", "Y"), crs = wgs84) colnames(schools) ``` ``` ## [1] "Site" "Address" "City" "State" "Type" "API" "Org" ## [8] "geometry" ``` --- # Spatial joins Let's see where the schools are in relation to the Census tracts: ```r tm_shape(tracts) + tm_polygons(col = "skyblue") + tm_shape(schools) + tm_dots(col = "navy") ``` <img src="16-Spatial-part2_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> --- # Spatial joins We want to join the tract data to the schools data. Normally we look for a common variable. ```r colnames(schools) ``` ``` ## [1] "Site" "Address" "City" "State" "Type" "API" "Org" ## [8] "geometry" ``` ```r colnames(tracts) ``` ``` ## [1] "GEOID" "STATEFP" "COUNTYFP" "TRACTCE" "AFFGEOID" ## [6] "NAME.x" "LSAD" "ALAND" "AWATER" "NAME.y" ## [11] "c_race" "c_white" "c_black" "c_asian" "c_latinx" ## [16] "state_fips" "county_fips" "tract_fips" "med_rent" "med_hhinc" ## [21] "c_tenants" "c_owners" "c_renters" "c_movers" "c_stay" ## [26] "c_movelocal" "c_movecounty" "c_movestate" "c_moveabroad" "c_commute" ## [31] "c_car" "c_carpool" "c_transit" "c_bike" "c_walk" ## [36] "year" "p_white" "p_black" "p_asian" "p_latinx" ## [41] "p_owners" "p_renters" "p_stay" "p_movelocal" "p_movecounty" ## [46] "p_movestate" "p_moveabroad" "p_car" "p_carpool" "p_transit" ## [51] "p_bike" "p_walk" "geometry" ``` -- **But there are no variables in common!** (What if the schools dataset identified the county?) -- Still a bad idea -- we would throw away more detailed information. --- # Spatial joins Instead, we need to perform a join using spatial relationships -- a **spatial join**. First, check the CRS: .scroll-output-full[ ```r st_crs(schools) ``` ``` ## Coordinate Reference System: ## User input: EPSG:4326 ## wkt: ## GEOGCRS["WGS 84", ## ENSEMBLE["World Geodetic System 1984 ensemble", ## MEMBER["World Geodetic System 1984 (Transit)"], ## MEMBER["World Geodetic System 1984 (G730)"], ## MEMBER["World Geodetic System 1984 (G873)"], ## MEMBER["World Geodetic System 1984 (G1150)"], ## MEMBER["World Geodetic System 1984 (G1674)"], ## MEMBER["World Geodetic System 1984 (G1762)"], ## MEMBER["World Geodetic System 1984 (G2139)"], ## ELLIPSOID["WGS 84",6378137,298.257223563, ## LENGTHUNIT["metre",1]], ## ENSEMBLEACCURACY[2.0]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["geodetic latitude (Lat)",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["geodetic longitude (Lon)",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## USAGE[ ## SCOPE["Horizontal component of 3D system."], ## AREA["World."], ## BBOX[-90,-180,90,180]], ## ID["EPSG",4326]] ``` ```r st_crs(tracts) ``` ``` ## Coordinate Reference System: ## User input: NAD83 ## wkt: ## GEOGCRS["NAD83", ## DATUM["North American Datum 1983", ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["geodetic latitude (Lat)",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["geodetic longitude (Lon)",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4269]] ``` ] --- # Spatial joins Put them in the same CRS, then do a spatial join: ```r schools_nad83 = st_transform(schools, st_crs(tracts)) schools_tract = st_join(schools_nad83, tracts) View(schools_tract) colnames(schools_tract) ``` ``` ## [1] "Site" "Address" "City" "State" "Type" ## [6] "API" "Org" "geometry" "GEOID" "STATEFP" ## [11] "COUNTYFP" "TRACTCE" "AFFGEOID" "NAME.x" "LSAD" ## [16] "ALAND" "AWATER" "NAME.y" "c_race" "c_white" ## [21] "c_black" "c_asian" "c_latinx" "state_fips" "county_fips" ## [26] "tract_fips" "med_rent" "med_hhinc" "c_tenants" "c_owners" ## [31] "c_renters" "c_movers" "c_stay" "c_movelocal" "c_movecounty" ## [36] "c_movestate" "c_moveabroad" "c_commute" "c_car" "c_carpool" ## [41] "c_transit" "c_bike" "c_walk" "year" "p_white" ## [46] "p_black" "p_asian" "p_latinx" "p_owners" "p_renters" ## [51] "p_stay" "p_movelocal" "p_movecounty" "p_movestate" "p_moveabroad" ## [56] "p_car" "p_carpool" "p_transit" "p_bike" "p_walk" ``` Now we know the neighborhood household income for each school! -- * Is this what we wanted? --- # Checking our output **Good questions to ask after any operation:** 1. What type of object should we have gotten? 2. What should the dimensions of that object be, and why? 3. How can we visually check the results? --- # Checking our output ```r tmap_mode("view") tm_shape(tracts) + tm_polygons(col = 'white', border.col = 'black') + tm_shape(schools_tract) + tm_dots(col = 'GEOID') ```
--- # Spatial joins Finally, we can look at a scatterplot of the relationship we wanted: ```r ggplot(schools_tract, aes(x = med_hhinc, y = API)) + geom_point() ``` <img src="16-Spatial-part2_files/figure-html/unnamed-chunk-31-1.png" width="80%" style="display: block; margin: auto;" /> --- # Spatial joins Removing the zeros (probably missing data) and add a flexibly-estimated CEF: ```r schools_tract |> filter(API > 0) |> ggplot(aes(x = med_hhinc, y = API)) + geom_point() + geom_smooth() ``` <img src="16-Spatial-part2_files/figure-html/unnamed-chunk-32-1.png" width="80%" style="display: block; margin: auto;" /> --- # Challenge **What fraction of the population of Alameda County has a BART station in their Census tract?** (Hint: `c_race` is total population.) --- # Challenge But Census tracts don't really correspond to anything meaningful in people's daily lives. Maybe what we really want to know is: **What fraction of the population of Alameda County is within 2 km of a BART station?** You can't do this perfectly. Try to find the least bad way, and state what assumptions you're making. --- # Table of contents **Part 1** 1. Spatial data and quick mapping 1. Reference systems and projections **Part 2** 1. [Spatial queries](#queries) 1. [Spatial subsetting](#subset) 1. [Geometry operations](#geometry) 1. [Spatial joins](#joins)