class: center, middle, inverse, title-slide .title[ # Lecture 16: Spatial Analysis, Part 1 ] .author[ ### Nick Hagerty*
ECNS 460/560 Fall 2023
Montana State University ] .date[ ### .small[
*Adapted from
“R Geospatial Fundamentals”
by the UC Berkeley D-Lab, used under
CC BY-NC 4.0
.] ] --- name: toc <style type="text/css"> # CSS for including pauses in printed PDF output (see bottom of lecture) @media print { .has-continuation { display: block !important; } } .remark-code-line { font-size: 95%; } .small { font-size: 75%; } .medsmall { font-size: 90%; } .scroll-output-full { height: 90%; overflow-y: scroll; } .scroll-output-75 { height: 75%; overflow-y: scroll; } </style> # Table of contents **Part 1** 1. [Spatial data and quick mapping](#start) 1. [Reference systems and projections](#crs) **Part 2** 1. Spatial queries 1. Spatial subsetting 1. Geometry operations 1. Spatial joins --- class: inverse, middle name: start # Spatial data and quick mapping --- # GIS The most widespread **geographic information system** is **ArcGIS.** **Advantages of ArcGIS:** * Avoid coding. * Interface for browsing and exploring data is incredibly comprehensive and fast. **Why we're using R instead:** - Free. - Reproducible. - Scriptable. - Easily integrated with the rest of your project. - Easy to export attractive, professional maps. - Honestly, easier if you know some R already. --- # sf: Simple Features `sf` is the main package for working with **vector** data in R. Install and load it (and a couple others): ```r library(sf) library(tidyverse) library(tmap) ``` Set your working directory (in your console) to the location of this file. E.g.: ```r setwd("C:/git/491/course-materials/lecture-slides/16-Spatial") ``` --- # Shapefiles The ESRI **shapefile** is the most widely used type of file format for storing geospatial vector data. A "shapefile" is actually a collection of 3+ files: Required files: - `shp`: The main file that stores the feature geometry - `shx`: A positional index for locating the feature geometry in the `shp` file - `dbf`: The data table (in dBase IV format) that stores the attribute information for each feature Optional files: - `prj`: Stores the coordinate reference system information. (**should be required!**) - `sbn` and `sbx`: spatial index to speed up geometry operations (*used only by ESRI software*) - `xml`: Metadata — Stores information about the shapefile. - `cpg`: Specifies the code page for identifying the character encoding set to be used. All files need to be kept together in the same directory. --- # Load shapefile data List the files: ```r dir("data/california_counties") ``` ``` ## [1] "CaliforniaCounties.dbf" "CaliforniaCounties.prj" ## [3] "CaliforniaCounties.shp" "CaliforniaCounties.shp.xml" ## [5] "CaliforniaCounties.shx" ``` Load the data: ```r counties = st_read("data/california_counties/CaliforniaCounties.shp") ``` ``` ## Reading layer `CaliforniaCounties' from data source ## `C:\git\491\course-materials\lecture-slides\16-Spatial\data\california_counties\CaliforniaCounties.shp' ## using driver `ESRI Shapefile' ## Simple feature collection with 58 features and 24 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -374445.4 ymin: -604500.7 xmax: 540038.5 ymax: 450022 ## Projected CRS: NAD83 / California Albers ``` --- # Geodatabases Shapefiles have some severe limitations. * They must be less than 2 GB. * Column names cannot be longer than 10 characters. * The number of columns is limited to 255. Another, newer file format is called a **geodatabase.** `st_read` can handle geodatabases with the `layer` argument. * The important thing to keep in mind is that in your computer, the `.gdb` file *appears* to be a folder, but the individual files within it are uninterpretable. * `st_layers` will show you the list of layers in a geodatabase. --- # Quick mapping With base R: ```r plot(counties["MED_AGE"]) ``` <img src="16-Spatial-part1_files/figure-html/unnamed-chunk-6-1.png" width="55%" style="display: block; margin: auto;" /> --- # Quick mapping With tmap: ```r qtm(counties) ``` <img src="16-Spatial-part1_files/figure-html/unnamed-chunk-7-1.png" width="55%" style="display: block; margin: auto;" /> --- # Quick mapping With tmap, interactively: ```r tmap_mode("view") # the default is mode = "plot" qtm(counties) ``` ``` ## Error: Shape contains invalid polygons. Please fix it or set tmap_options(check.and.fix = TRUE) and rerun the plot ``` -- This is a common problem and only takes a quick fix: ```r counties = st_read("data/california_counties/CaliforniaCounties.shp") |> st_make_valid() ``` ``` ## Reading layer `CaliforniaCounties' from data source ## `C:\git\491\course-materials\lecture-slides\16-Spatial\data\california_counties\CaliforniaCounties.shp' ## using driver `ESRI Shapefile' ## Simple feature collection with 58 features and 24 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -374445.4 ymin: -604500.7 xmax: 540038.5 ymax: 450022 ## Projected CRS: NAD83 / California Albers ``` --- # Quick mapping With tmap, interactively: ```r tmap_mode("view") # the default is mode = "plot" qtm(counties) ```
--- # sf objects An **sf** object is a standard data frame with an extra **geometry** column. .scroll-output-full[ ```r str(counties) ``` ``` ## Classes 'sf' and 'data.frame': 58 obs. of 25 variables: ## $ NAME : chr "Kern" "Kings" "Lake" "Lassen" ... ## $ STATE_NAME: chr "California" "California" "California" "California" ... ## $ POP2010 : int 839631 152982 64665 34895 9818605 150865 252409 18251 87841 255793 ... ## $ POP10_SQMI: num 102.9 109.9 48.6 7.4 2402.3 ... ## $ POP2012 : int 851089 155039 65253 35039 9904341 153025 255509 18455 88094 256841 ... ## $ POP12_SQMI: num 104.28 111.43 49.08 7.42 2423.26 ... ## $ WHITE : int 499766 83027 52033 25532 4936599 94456 201963 16103 67218 148381 ... ## $ BLACK : int 48921 11014 1232 2834 856874 5629 6987 138 622 9926 ... ## $ AMERI_ES : int 12676 2562 2049 1234 72828 4136 1523 527 4277 3473 ... ## $ ASIAN : int 34846 5620 724 356 1346865 2802 13761 204 1450 18836 ... ## $ HAWN_PI : int 1252 271 108 165 26094 162 509 26 119 583 ... ## $ HISPANIC : int 413033 77866 11088 6117 4687889 80992 39069 1676 19505 140485 ... ## $ OTHER : int 204314 42996 5455 3562 2140632 37380 16973 508 10185 62665 ... ## $ MULT_RACE : int 37856 7492 3064 1212 438713 6300 10693 745 3970 11929 ... ## $ MALES : int 433108 86344 32469 22416 4839654 72682 124072 9269 43983 128737 ... ## $ FEMALES : int 406523 66638 32196 12479 4978951 78183 128337 8982 43858 127056 ... ## $ MED_AGE : num 30.7 31.1 45 37 34.8 33.1 44.5 49.2 41.6 29.6 ... ## $ AVE_HH_SZ : num 3.15 3.19 2.39 2.5 2.98 3.28 2.36 2.28 2.46 3.32 ... ## $ AVE_FAM_SZ: num 3.61 3.59 2.94 2.98 3.58 3.63 2.94 2.77 3.02 3.74 ... ## $ HSE_UNITS : int 284367 43867 35492 12710 3445076 49140 111214 10188 40323 83698 ... ## $ VACANT : int 29757 2634 8944 2652 203872 5823 8004 2495 5378 8056 ... ## $ OWNER_OCC : int 152828 22329 17472 6590 1544749 27726 64637 5227 20601 41196 ... ## $ RENTER_OCC: int 101782 18904 9076 3468 1696455 15591 38573 2466 14344 34446 ... ## $ CountyFIPS: chr "06103" "06089" "06106" "06086" ... ## $ geometry :sfc_GEOMETRY of length 58; first list element: List of 1 ## ..$ : num [1:943, 1:2] 213673 213724 213750 213712 213801 ... ## ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg" ## - attr(*, "sf_column")= chr "geometry" ## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ... ## ..- attr(*, "names")= chr [1:24] "NAME" "STATE_NAME" "POP2010" "POP10_SQMI" ... ``` ] --- # sf objects The geometry column is **sticky:** ```r head(counties[1:3]) ``` ``` ## Simple feature collection with 6 features and 3 fields ## Geometry type: GEOMETRY ## Dimension: XY ## Bounding box: xmin: -267387.9 ymin: -578158.6 xmax: 216677.6 ymax: 352693.6 ## Projected CRS: NAD83 / California Albers ## NAME STATE_NAME POP2010 geometry ## 1 Kern California 839631 POLYGON ((213672.6 -243975,... ## 2 Kings California 152982 POLYGON ((12524.03 -179431.... ## 3 Lake California 64665 POLYGON ((-235734.3 155339.... ## 4 Lassen California 34895 POLYGON ((12.28914 351918.3... ## 5 Los Angeles California 9818605 MULTIPOLYGON (((147026.8 -5... ## 6 Madera California 150865 MULTIPOLYGON (((11664.15 -1... ``` --- # sf objects If you want to remove the geometry: ```r counties_df = st_drop_geometry(counties) head(counties_df[1:3]) ``` ``` ## NAME STATE_NAME POP2010 ## 1 Kern California 839631 ## 2 Kings California 152982 ## 3 Lake California 64665 ## 4 Lassen California 34895 ## 5 Los Angeles California 9818605 ## 6 Madera California 150865 ``` ```r class(counties_df) ``` ``` ## [1] "data.frame" ``` --- # Data wrangling works normally Map all counties and overlay Alameda County in green: ```r alameda = counties |> filter(NAME == "Alameda") map_alameda = tm_shape(counties) + tm_polygons(border.col = "white") + tm_shape(alameda) + tm_borders(col = "green", lwd = 3) map_alameda ```
--- # Save your map and your shapefile ```r # Static image tmap_save(map_alameda, filename = "output/map_alameda.png") # Interactive version tmap_save(map_alameda, filename = "output/map_alameda.html") # Shapefile st_write(alameda, dsn = "output/alameda.shp", delete_dsn = TRUE) ``` ``` ## Deleting source `output/alameda.shp' using driver `ESRI Shapefile' ## Writing layer `alameda' to data source ## `output/alameda.shp' using driver `ESRI Shapefile' ## Writing 1 features with 24 fields and geometry type Multi Polygon. ``` --- class: inverse, middle name: crs # Reference systems and projections --- # Another shapefile Load in another shapefile of US state borders: ```r states = st_read("data/us_states_contiguous/states_contiguous.shp") ``` ``` ## Reading layer `states_contiguous' from data source ## `C:\git\491\course-materials\lecture-slides\16-Spatial\data\us_states_contiguous\states_contiguous.shp' ## using driver `ESRI Shapefile' ## Simple feature collection with 49 features and 3 fields ## Geometry type: MULTIPOLYGON ## Dimension: XY ## Bounding box: xmin: -124.7318 ymin: 24.54547 xmax: -66.97626 ymax: 49.38436 ## Geodetic CRS: WGS 84 ``` --- # Another shapefile Plot it alongside counties: ```r plot(counties$geometry, col = 'lightgrey', border = 'white') plot(states$geometry, col = 'blue', border = 'red', lwd = 5, add = T) ``` <img src="16-Spatial-part1_files/figure-html/unnamed-chunk-17-1.png" width="55%" style="display: block; margin: auto;" /> --- # Another shapefile What do you think happened? -- ```r st_bbox(counties) ``` ``` ## xmin ymin xmax ymax ## -374445.4 -604500.7 540038.5 450022.0 ``` ```r st_bbox(states) ``` ``` ## xmin ymin xmax ymax ## -124.73183 24.54547 -66.97626 49.38436 ``` --- # Getting the CRS What is the CRS of `states`? ```r st_crs(states) ``` ``` ## Coordinate Reference System: ## User input: WGS 84 ## wkt: ## GEOGCRS["WGS 84", ## DATUM["World Geodetic System 1984", ## ELLIPSOID["WGS 84",6378137,298.257223563, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["latitude",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["longitude",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4326]] ``` --- # Getting the CRS And the CRS of `counties`? .scroll-output-full[ ```r st_crs(counties) ``` ``` ## Coordinate Reference System: ## User input: NAD83 / California Albers ## wkt: ## PROJCRS["NAD83 / California Albers", ## BASEGEOGCRS["NAD83", ## DATUM["North American Datum 1983", ## ELLIPSOID["GRS 1980",6378137,298.257222101, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## ID["EPSG",4269]], ## CONVERSION["California Albers", ## METHOD["Albers Equal Area", ## ID["EPSG",9822]], ## PARAMETER["Latitude of false origin",0, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8821]], ## PARAMETER["Longitude of false origin",-120, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8822]], ## PARAMETER["Latitude of 1st standard parallel",34, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8823]], ## PARAMETER["Latitude of 2nd standard parallel",40.5, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8824]], ## PARAMETER["Easting at false origin",0, ## LENGTHUNIT["metre",1], ## ID["EPSG",8826]], ## PARAMETER["Northing at false origin",-4000000, ## LENGTHUNIT["metre",1], ## ID["EPSG",8827]]], ## CS[Cartesian,2], ## AXIS["easting (X)",east, ## ORDER[1], ## LENGTHUNIT["metre",1]], ## AXIS["northing (Y)",north, ## ORDER[2], ## LENGTHUNIT["metre",1]], ## USAGE[ ## SCOPE["State-wide spatial data management."], ## AREA["United States (USA) - California."], ## BBOX[32.53,-124.45,42.01,-114.12]], ## ID["EPSG",3310]] ``` ] --- # EPSG codes CRSs are most commonly referenced by EPSG codes, which you can Google. * The counties shapefile was in California Albers Equal, or EPSG:3310. [https://epsg.io/3310](https://epsg.io/3310) To get two shapefiles to work with each other, we need to give them the same CRS. To change a CRS, we need to **project** (or re-project) our data. **Do you *have* to use a projected CRS?** - No -- much of the time, it's fine to keep your spatial data in a geographic CRS. - Most functions will project "on the fly" using a default. **When *should* you use a projected CRS?** - When doing calculations, like area or distance. - When you want to control the appearance of your map output. **How should you choose a projected CRS?** - What you want to preserve (area, direction, or distance). - Location on Earth of the area of focus. --- # Common projections **1. Web Mercator** - Preserves direction/angle/shape but distorts area and distance. - Decent starting point for most places in the world. ```r states_mercator = st_transform(states, crs = 3857) tmap_mode("plot") qtm(states_mercator) ``` <img src="16-Spatial-part1_files/figure-html/unnamed-chunk-21-1.png" width="50%" style="display: block; margin: auto;" /> --- # Common projections **2. U.S. National Atlas (Albers) Equal Area** - Preserves area but distorts direction/angle/shape and distance. - My favorite for the continental U.S. ```r states_albers = st_transform(states, crs = 2163) qtm(states_albers) ``` <img src="16-Spatial-part1_files/figure-html/unnamed-chunk-22-1.png" width="50%" style="display: block; margin: auto;" /> --- # Common projections **3. UTM Zone 11N** - Preserves direction/angle/shape but distorts area and distance. - Different UTM zones are centered at different locations. - Good for maps of smaller areas. ```r states_utm11N = st_transform(states, crs = 2955) qtm(states_utm11N) ``` <img src="16-Spatial-part1_files/figure-html/unnamed-chunk-23-1.png" width="45%" style="display: block; margin: auto;" /> --- # Common projections **4. Pseudo Plate Caree** - Distorts everything! Simply a graph of latitude vs. longitude. - Common default, but no excuse for using it in 2023. ```r plot(states$geometry, asp = 1) ``` <img src="16-Spatial-part1_files/figure-html/unnamed-chunk-24-1.png" width="65%" style="display: block; margin: auto;" /> --- # (Re)projecting a shapefile The safest way to reproject is to directly reference the CRS of another layer: ```r counties_unprojected = st_transform(counties, crs = st_crs(states)) stopifnot(st_crs(states) == st_crs(counties_unprojected)) tm_shape(states) + tm_polygons(col = "lightgrey") + tm_shape(counties_unprojected) + tm_polygons(col = "darkgreen") ``` <img src="16-Spatial-part1_files/figure-html/unnamed-chunk-25-1.png" width="55%" style="display: block; margin: auto;" />