Problem set 2 solutions

EC607

Part 1: Contemplate

1.1 (5 points) It’s that time again. Take 15 minutes to think about (1) your research interests, (2) how you are doing, and (3) what matters to you. No screens. No paper. No nothing. Just sit somewhere (ideally not PLC) quietly and think.

To submit: Where and when did you complete this task?

Answer: Completion.

Part 2: Extend your OLS function

2.1 (5 points) Extend your OLS function from the last problem set to estimate “classic OLS” standard errors (i.e., standard errors that assume homoskedastic and independent disturbances).

As with the first problem set: Do not use lm or other regression functions. You need to program the standard error estimator yourself (you can use matrix operations, sum, mean, etc.).

Answer: Here’s my function…

# The function
ols_fun = function(X, y) {
  # Create a column of ones
  ones = matrix(1, nrow = nrow(X), ncol = 1)
  # Combine the X matrix with the column of ones
  X_ones = cbind(ones, X)
  # Calculate the coefficients
  b = solve(t(X_ones) %*% X_ones) %*% t(X_ones) %*% y
  # Calculate the residuals
  e = y - X_ones %*% b
  # Calculate the residual variance
  s2 = (t(e) %*% e) / (nrow(X_ones) - ncol(X_ones))
  s2 = s2 |> as.numeric()
  # Spherical VCOV
  b_vcov = s2 * solve(t(X_ones) %*% X_ones)
  # Return the coefficients and standard errors
  data.frame(
    b = b[,1],
    se = sqrt(diag(b_vcov))
  )
}
# Run the function on cars
ols_fun(
  X = matrix(cars$speed, ncol = 1),
  y = matrix(cars$dist, ncol = 1)
)

           b        se
1 -17.579095 6.7584402
2   3.932409 0.4155128

# Check against fixest
fixest::feols(dist ~ speed, data = cars)

OLS estimation, Dep. Var.: dist
Observations: 50
Standard-errors: IID 
             Estimate Std. Error  t value   Pr(>|t|)    
(Intercept) -17.57909   6.758440 -2.60106 1.2319e-02 *  
speed         3.93241   0.415513  9.46399 1.4898e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 15.1   Adj. R2: 0.64381

They match!

2.2 (10 points) Now further extend your OLS function to estimate heteroskedasticity-robust standard errors (same rules).

You will want to do one of the following:

Add a new argument to your function that dictates the type of standard error to compute or create a new function that computes the heteroskedasticity-robust standard errors.
Just return both types of standard errors.

Answer: Updating the function…

# The function
ols_fun = function(X, y, se_type = 'iid') {
  # Create a column of ones
  ones = matrix(1, nrow = nrow(X), ncol = 1)
  # Combine the X matrix with the column of ones
  X_ones = cbind(ones, X)
  # Define n and p
  n = nrow(X_ones)
  p = ncol(X_ones)
  # Calculate the coefficients
  b = solve(t(X_ones) %*% X_ones) %*% t(X_ones) %*% y
  # Calculate the residuals
  e = y - X_ones %*% b
  # Calculate the residual variance
  s2 = (t(e) %*% e) / (n - p)
  s2 = s2 |> as.numeric()
  # Spherical VCOV
  vcov_sph = s2 * solve(t(X_ones) %*% X_ones)
  # Heteroskedasticity-robust VCOV
  vcov_het =
    solve(t(X_ones) %*% X_ones) %*%
    (t(X_ones) %*% diag(as.vector(e)^2) %*% X_ones) %*%
    solve(t(X_ones) %*% X_ones)
  # Return the coefficients and standard errors
  data.frame(
    b = b[,1],
    se_sph = sqrt(diag(vcov_sph)),
    se_het = sqrt(diag(vcov_het)),
    se_hc1 = sqrt(diag(vcov_het) * n / (n - p))
  )
}
# Run the function on cars
ols_fun(
  X = matrix(cars$speed, ncol = 1),
  y = matrix(cars$dist, ncol = 1)
)

           b    se_sph    se_het   se_hc1
1 -17.579095 6.7584402 5.5418722 5.656150
2   3.932409 0.4155128 0.3986809 0.406902

# Check against fixest
fixest::feols(dist ~ speed, data = cars, vcov = 'het')

OLS estimation, Dep. Var.: dist
Observations: 50
Standard-errors: Heteroskedasticity-robust 
             Estimate Std. Error  t value   Pr(>|t|)    
(Intercept) -17.57909   5.656150 -3.10796 3.1627e-03 ** 
speed         3.93241   0.406902  9.66427 7.6542e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 15.1   Adj. R2: 0.64381

Notice that the default heteroskedasticity-robust standard error estimator (in fixest) is the “HC1” estimator.

Part 3: Simulate with a familiar DGP

3.1 (10 points) Now run a simulation (at least 1,000 iterations) to see how well each of the standard error estimators behaves with a fairly small sample size.

In each iteration draw 25 observations from the DGP from problem 2.1 in the first problem set, i.e.,

\[ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} + \epsilon_i \]

where

\(\beta_0 = 0\), \(\beta_1 = 1\), \(\beta_2 = 2\), and \(\beta_3 = 3\)
\(x_{1i} \sim N(0, 1)\)
\(x_{2i} \sim N(0, 1)\)
\(x_{3i} = u_i + w_i\)
\(\epsilon_i = v_i + w_i\)
and \(w_i \sim N(0, 1)\), \(u_i \sim N(0, 1)\), and \(v_i \sim N(0, 1)\)

To be clear: within each iteration, use OLS to estimate the regression coefficients, the “classical” standard errors, and the “heteroskedasticity-robust” standard errors.

Answer: Run the simulation…

# Load the `tidyverse` (using `pacman`)
library(pacman)
p_load(tidyverse, purrr, furrr, future)

# Set the seed
set.seed(12)
# Define the iteration function
iter_fun = function(it, n = 25) {
  # Draw n observations from the population
  sample_df = tibble(
    id = 1:n,
    u = rnorm(n),
    v = rnorm(n),
    w = rnorm(n),
    epsilon = v + w,
    x1 = rnorm(n),
    x2 = rnorm(n),
    x3 = u + w,
    y = 0 + 1 * x1 + 2 * x2 + 3 * x3 + epsilon
  )
  # Now run our function
  est =
    ols_fun(
      X = sample_df |> select(x1, x2, x3) |> as.matrix(),
      y = sample_df |> select(y) |> as.matrix()
    ) |>
    mutate(
      param = c('beta_0', 'beta_1', 'beta_2', 'beta_3'),
      iteration = it
    )
}
# Run the simulation 25 times (in parallel)
plan(multisession, workers = 10)
sim_list = future_map(
  1:1e4,
  iter_fun,
  .options = furrr_options(seed = TRUE)
)
# Combine the results into a single data frame
sim_df = bind_rows(sim_list) |> tibble()

3.2 (10 points) Using the results of your simulation in 3.1: What percent of the 95% confidence intervals for \(\beta_1\) contain the true value?

Answer: Calculate the coverage rate…

# Filter results to beta_1
b1_df = sim_df |> filter(param == 'beta_1')
# Calculate the lower and upper bounds of the 95% CI
q = qnorm(0.975)
b1_df =
  b1_df |>
  mutate(
    lower_sph = b - q * se_sph,
    upper_sph = b + q * se_sph,
    lower_het = b - q * se_het,
    upper_het = b + q * se_het,
    lower_hc1 = b - q * se_hc1,
    upper_hc1 = b + q * se_hc1
  )
# Determine coverage
b1_df =
  b1_df |>
  mutate(
    covers_sph = (lower_sph < 1) * (upper_sph > 1),
    covers_het = (lower_het < 1) * (upper_het > 1),
    covers_hc1 = (lower_hc1 < 1) * (upper_hc1 > 1)
  )
# Coverage rate by estimator
b1_df |>
  summarise(
    coverage_sph = mean(covers_sph),
    coverage_het = mean(covers_het),
    coverage_hc1 = mean(covers_hc1)
  )

# A tibble: 1 × 3
  coverage_sph coverage_het coverage_hc1
         <dbl>        <dbl>        <dbl>
1        0.935        0.880        0.908

3.3 (10 points) Now repeat 3.1 and 3.2 but with 1,000 observations per sample (rather than 25).

Answer: Run the simulation with 1,000 observations per iteration…

# Run the simulation 10,000 times (in parallel)
plan(multisession, workers = 10)
sim_1k = future_map(
  1:1e4,
  iter_fun,
  n = 1e3,
  .options = furrr_options(seed = TRUE)
) |> bind_rows(sim_list) |> tibble()
# Filter results to beta_1
b1_1k = sim_1k |> filter(param == 'beta_1')
# Calculate the lower and upper bounds of the 95% CI
q = qnorm(0.975)
b1_1k =
  b1_1k |>
  mutate(
    lower_sph = b - q * se_sph,
    upper_sph = b + q * se_sph,
    lower_het = b - q * se_het,
    upper_het = b + q * se_het,
    lower_hc1 = b - q * se_hc1,
    upper_hc1 = b + q * se_hc1
  )
# Determine coverage
b1_1k =
  b1_1k |>
  mutate(
    covers_sph = (lower_sph < 1) * (upper_sph > 1),
    covers_het = (lower_het < 1) * (upper_het > 1),
    covers_hc1 = (lower_hc1 < 1) * (upper_hc1 > 1)
  )
# Coverage rate by estimator
b1_1k |>
  summarise(
    coverage_sph = mean(covers_sph),
    coverage_het = mean(covers_het),
    coverage_hc1 = mean(covers_hc1)
  )

# A tibble: 1 × 3
  coverage_sph coverage_het coverage_hc1
         <dbl>        <dbl>        <dbl>
1        0.942        0.913        0.927

3.4 (5 points) How do the confidence intervals’ coverage rates compare (a) across the two standard-error estimators and (b) across the sample sizes? Explain why this is suprising or expected.

Answer: The coverage rates in small samples are not great, with the exception of the “classic” OLS standard errors (they should be close 95%). Once the sample size increases, coverage rates get better—especially for the HC1 estimator.

The fact that the “classic” OLS standard errors perform well is actually expected: our DGP does not have heteroskedasticity. Thus, the classic OLS standard errors are unbiased in our setting.

The estimator that I’ve labeled “het” is biased but consistent. Our small sample (n = 25) is not large enough to get close to the asymptotic distribution. It looks like it might not be getting close even with 1,000 observations.

The HC1 estimator is unbiased but less efficient than the “classic” OLS standard errors when the disturbance actually is homoskedastic.

3.5 (5 points) What is the average confidence interval width for each of the estimator by sample-size combinations? Does this result match what you would expect? Explain.

Answer: First, let’s check the average width of the confidence intervals for each of the estimators in the 25-observation simulation.

# For the 25-observation simulation
b1_df |>
  transmute(
    w_sph_25 = upper_sph - lower_sph,
    w_het_25 = upper_het - lower_het,
    w_hc1_25 = upper_hc1 - lower_hc1
  ) |>
  summarise(across(everything(), mean))

# A tibble: 1 × 3
  w_sph_25 w_het_25 w_hc1_25
     <dbl>    <dbl>    <dbl>
1     1.05    0.910    0.993

Now let’s check the average width of the confidence intervals for each of the estimators in the 1,000-observation simulation.

# For the 1,000-observation simulation
b1_1k |>
  transmute(
    w_sph_1k = upper_sph - lower_sph,
    w_het_1k = upper_het - lower_het,
    w_hc1_1k = upper_hc1 - lower_hc1
  ) |>
  summarise(across(everything(), mean))

# A tibble: 1 × 3
  w_sph_1k w_het_1k w_hc1_1k
     <dbl>    <dbl>    <dbl>
1    0.600    0.531    0.572

Given what we have already seen, the size of the intervals make sense. The estimators with too-small coverage rates are too tight.

The 1,000-observation intervals are narrower than the 25-observation intervals, which is expected: the more information we have, the more precise our estimates should be.