R: Introduction and review

EC421, Set 01(R)

Econometrics

Recall: Applied econometrics, data science, analytics require:

  1. Intuition for the theory behind statistics/econometrics
    (assumptions, results, strengths, weaknesses).
  2. Practical knowledge of how to apply theoretical methods to data.
  3. Efficient methods for working with data
    (cleaning, aggregating, joining, visualizing).

This course aims to deepen your knowledge in each of these three areas.

  • 1: As before.
  • 2-3: R

R

R

What is R?

To quote the R project website:

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

What does that mean?

  • R was created for the statistical and graphical work required by econometrics.
  • R has a vibrant, thriving online community. (stack overflow)
  • Plus it’s free and open source.

R

Why are we using R?

  1. R is free and open source saving both you and the university 💰💵💰.
  2. Related: Outside of a small group of economists, private- and public-sector employers favor R over Stata and most competing softwares.
  3. R is very flexible and powerful adaptable to nearly any task, e.g., metrics, spatial data analysis, machine learning, web scraping, data cleaning, website building, teaching (these slides).

R

Why are we using R?

  1. Related: R imposes no limitations on your amount of observations, variables, memory, or processing power. (I’m looking at you, Stata.)
  2. If you put in the work, you will come away with a valuable and marketable tool.
  3. I 💖 R
  4. R is a nice gateway to (and plays well with) other programming languages (e.g., Python, SQL, C++, JavaScript).

[†] Learning R definitely requires time and effort.

R + Examples

R + Data

# Load the gapminder dataset (via the package)
library(gapminder)
# View the first 5 rows
gapminder |> head(5)
# A tibble: 5 × 6
  country     continent  year lifeExp      pop gdpPercap
  <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
1 Afghanistan Asia       1952    28.8  8425333      779.
2 Afghanistan Asia       1957    30.3  9240934      821.
3 Afghanistan Asia       1962    32.0 10267083      853.
4 Afghanistan Asia       1967    34.0 11537966      836.
5 Afghanistan Asia       1972    36.1 13079460      740.

R + Regression

# Fit a linear regression
fit = lm(lifeExp ~ 1 + I(gdpPercap / 1e3) + I(pop / 1e6), data = gapminder)
# Show the results
fit |> summary() |> coef()
                      Estimate  Std. Error    t value      Pr(>|t|)
(Intercept)       53.648242353 0.322479721 166.361600  0.000000e+00
I(gdpPercap/1000)  0.767564618 0.025681107  29.888299 4.035572e-158
I(pop/1e+06)       0.009728224 0.002384659   4.079504  4.721928e-05

R + Regression

# Fit a linear regression
fit = lm(lifeExp ~ 1 + I(gdpPercap / 1e3) + I(pop / 1e6), data = gapminder)
# 'tidy' the results (from the broom package)
fit |> broom::tidy()
# A tibble: 3 × 5
  term              estimate std.error statistic   p.value
  <chr>                <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)       53.6       0.322      166.   0        
2 I(gdpPercap/1000)  0.768     0.0257      29.9  4.04e-158
3 I(pop/1e+06)       0.00973   0.00238      4.08 4.72e-  5

R + Plotting (w/ plot)

R + Plotting (w/ plot)

# Load packages with dataset
library(gapminder)

# Create simple plot
plot(
  x = gapminder$gdpPercap,
  y = gapminder$lifeExp,
  xlab = 'GDP per capita',
  ylab = 'Life Expectancy'
)

R + Plotting (w/ ggplot2)

R + Plotting (w/ ggplot2)

# Load packages
library(gapminder)
library(dplyr)

# Plot with ggplot2
ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp)) +
  geom_point(alpha = 0.75) +
  scale_x_continuous('GDP per capita', label = scales::comma) +
  labs(y = 'Life Expectancy') +
  theme_pander(base_size = 16)

R + More plotting (w/ ggplot2)

Notice we’ve moved to log-10 scale on the \(x\)-axis.

R + More plotting (w/ ggplot2)

# Plot with ggplot2
ggplot(
  data = filter(gapminder, year %in% c(1952, 2002)),
  aes(x = gdpPercap, y = lifeExp, color = continent, group = country)
) +
  geom_path(alpha = 0.25) +
  geom_point(aes(shape = as.character(year), size = pop), alpha = 0.75) +
  scale_x_log10('GDP per capita', label = scales::comma) +
  labs(y = 'Life Expectancy') +
  scale_shape_manual('Year', values = c(1, 17)) +
  scale_color_viridis('Continent', discrete = TRUE, end = 0.95) +
  guides(size = FALSE) +
  theme_pander(base_size = 16)

R + Animated plots (w/ gganimate)

Gapminder

R + Animated plots (w/ gganimate)

library(gganimate)
ggplot(
  data = gapminder |> filter(continent != 'Oceania'),
  aes(gdpPercap, lifeExp, size = pop, color = country)
) +
  geom_point(alpha = 0.7, show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  scale_x_log10('GDP per capita', label = scales::comma) +
  facet_wrap(~continent) +
  theme_pander(base_size = 16) +
  theme(panel.border = element_rect(color = 'grey90', fill = NA)) +
  labs(title = 'Year: {frame_time}', y = 'Life Expectancy') +
  transition_time(year) +
  ease_aes('linear')

R + Interactive plots (w/ plotly)

plot_ly(
  data = gapminder %>% filter(year == 2007),
  x = ~gdpPercap,
  y = ~lifeExp,
  type = 'scatter',
  mode = 'markers',
  size = ~pop,
  color = ~continent,
  colors = viridis::plasma(n = 5, end = .93),
  text = ~paste(
    'Country: ', country,
    '<br>GDP per capita:', scales::dollar(gdpPercap, 1),
    '<br>Life Expectancy:', scales::comma(lifeExp, 1),
    '<br>Population:', scales::comma(pop)
  ),
  hoverinfo = 'text',
  sizes = c(5, 100)
) |>
  layout(
    title = 'Gapminder data in 2007',
    xaxis = list(title = 'GDP per capita (log scale)', type = 'log'),
    yaxis = list(title = 'Life Expectancy')
  )

R + Maps

Getting started with R

Starting R

Installation

  • Install R.
  • Install RStudio.
  • Install Quarto.
  • Optional/Overkill: Git
    • Create an account on GitHub
    • Register for a student/educator discount.
    • For installation guidance and troubleshooting, check out Jenny Bryan’s website.
  • Note: Many UO labs have R installed and ready (helpful in a pinch).

R resources

Free(-ish)

Money

Short online courses, e.g., DataCamp

Starting R

Some R basics

You will dive deeper into R in lab, but here are six big points about R:

  1. Everything is an object.
  2. Every object has a name and value.
  3. You use functions on these objects.
  4. Functions come in libraries (packages)
  5. R will try to help you.
  6. R has its quirks.

foo

foo = 2

mean(foo)

library(dplyr)

?dplyr

NA; error; warning

Next: Metrics review(s)