EC 421, Set 05
Prologue
Heteroskedasticity: Issues and tests
R does the calculations and has already memorized the formulas.
I want you to know what the formulas mean, when/why we use them, and when they fail/work.
This course has the potential to be one of the most useful/valuable/applicable/marketable classes that you take at UO.
Heteroskedasticity
Review
Three review questions
Question 1: What is the difference between \(u_i\) and \(e_i\)?
Question 2: We spend a lot of time discussing \(u_i^2\). Why?
Question 3: We also spend a lot of time discussing \(e_i^2\). Why?
Question 1: What is the difference between \(u_i\) and \(e_i\)?
Answer 1: \(\textcolor{#e64173}{u_i}\) gives the population disturbance for the \(i\)th observation.
\(u_i\) measures how far the \(i\)th observation is from the population line, i.e.,
\[ u_i = y_i - \textcolor{#e64173}{\underbrace{\left(\beta_0 + \beta_1 x_i\right)}_{\text{Population line}}} \]
\(\textcolor{#6A5ACD}{e_i}\) gives the regression residual (error) for the \(i\)th observation.
\(e_i\) measures how far the \(i\)th observation is from the sample regression line, i.e.,
\[ e_i = y_i - \textcolor{#6A5ACD}{\underbrace{\left(\hat{\beta}_0 + \hat{\beta}_1 x_i\right)}_{\text{Sample reg. line} = \hat{y}}} = y_i - \textcolor{#6A5ACD}{\hat{y}_i} \]
Question 2: We spend a lot of time discussing \(u_i^2\). Why?
Answer 2:
One of major assumptions is that our disturbances (the \(u_i\)’s) are homoskedastic (they have constant variance), i.e., \(\mathop{\text{Var}} \left( u_i \middle| x_i \right) = \sigma^2\).
We also assume that the mean of these disturbances is zero, \(\textcolor{#e64173}{\mathop{\boldsymbol{E}}\left[ u_i \middle| x_i \right] = 0}\).
By definition, \(\mathop{\text{Var}} \left( u_i \middle| x_i \right) = \mathop{\boldsymbol{E}}\Big[ u_i^2 - \underbrace{\textcolor{#e64173}{\mathop{\boldsymbol{E}}\left[ u_i \middle| x_i \right]}^2}_{\textcolor{#e64173}{=0}} \Big| x_i \Big] = \mathop{\boldsymbol{E}}\left[ u_i^2 \middle| x_i \right]\)
Thus, if we want to learn about the variance of \(u_i\), we can focus on \(u_i^2\).
Question 3: We also spend a lot of time discussing \(e_i^2\). Why?
Answer 3:
We cannot observe \(u_i\) (or \(u_i^2\)).
But \(u_i^2\) tells us about the variance of \(u_i\).
We use \(e_i^2\) to learn about \(u_i^2\) and, consequently, \(\sigma_i^2\).
We’re currently focused on assumption #5:
5. The disurbances have constant variance \(\sigma^2\) and zero covariance, i.e.,
Specifically, we’re discussing the assumption of constant variance (also known as homoskedasticity).
Violation of this assumption: Our disturbances have different variances.
Heteroskedasticity: \(\mathop{\text{Var}} \left( u_i \right) = \sigma^2_i\) and \(\sigma^2_i \neq \sigma^2_j\) for some \(i\neq j\).
Classic example of heteroskedasticity: The funnel
Variance of \(u\) increases with \(x\)
Another example of heteroskedasticity: (double funnel?)
Variance of \(u\) increasing at the extremes of \(x\)
Another example of heteroskedasticity:
Differing variances of \(u\) by group
Heteroskedasticity is present when the variance of \(u\) changes with any combination of our explanatory variables \(x_1\) through \(x_k\).
We have some tests that may help us detect heteroskedasticity.
What do we do if we detect it?
Living with heteroskedasticity
In the presence of heteroskedasticity, OLS is
On average, we get the right answer but with more noise (less precision).
Also: Our standard errors are biased.
Options:
As we’ve discussed, the specification1 of your regression model matters a lot for the unbiasedness and efficiency of your estimator.
Response #1: Ensure your specification doesn’t cause heteroskedasticity.
Example: Let the population relationship be \[ y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + u_i \]
with \(\mathop{\boldsymbol{E}}\left[ u_i \middle| x_i \right] = 0\) and \(\mathop{\text{Var}} \left( u_i \middle| x_i \right) = \sigma^2\).
However, we omit \(x^2\) and estimate \[ y_i = \gamma_0 + \gamma_1 x_i + w_i \]
Then \[ w_i = u_i + \beta_2 x_i^2 \implies \mathop{\text{Var}} \left( w_i \right) = f(x_i) \]
I.e., the variance of \(w_i\) changes systematically with \(x_i\) (heteroskedasticity).
Truth: \(\textcolor{#e64173}{\log\left(y_i\right) = \beta_0 + \beta_1 x_i + u_i}\) Misspecification: \(\textcolor{#314f4f}{y_i = \beta_0 + \beta_1 x_i + v_i}\)
Truth: \(\textcolor{#e64173}{\log\left(y_i\right) = \beta_0 + \beta_1 x_i + u_i}\) Misspecification: \(\textcolor{#314f4f}{y_i = \beta_0 + \beta_1 x_i + v_i}\)
More generally:
Misspecification problem: Incorrect specification of the regression model can cause heteroskedasticity (among other problems).
Solution: 💡 Get it right (e.g., don’t omit \(x^2\)).
New problems:
Conclusion: Specification often will not “solve” heteroskedasticity.
However, correctly specifying your model is still really important.
Weighted least squares (WLS) presents another approach.
Response #2: Increase efficiency by weighting our observations.
Let the true population relationship be
\[ \begin{align} y_i = \beta_0 + \beta_1 x_{i} + u_i \tag{1} \end{align} \]
with \(u_i \sim \mathop{N} \left( 0,\, \sigma_i^2 \right)\).
Now transform \((1)\) by dividing each observation’s data by \(\sigma_i\), i.e.,
\[ \begin{align} \dfrac{y_i}{\sigma_i} &= \beta_0 \dfrac{1}{\sigma_i} + \beta_1 \dfrac{x_{i}}{\sigma_i} + \dfrac{u_i}{\sigma_i} \tag{2} \end{align} \]
\[ \begin{align} y_i &= \beta_0 + \beta_1 x_{i} + u_i \tag{1} \\[1em] \dfrac{y_i}{\sigma_i} &= \beta_0 \dfrac{1}{\sigma_i} + \beta_1 \dfrac{x_{i}}{\sigma_i} + \dfrac{u_i}{\sigma_i} \tag{2} \end{align} \]
Whereas \((1)\) is heteroskedastic,
\(\textcolor{#e64173}{(2)}\) is homoskedastic.
∴ OLS is efficient and unbiased for estimating the \(\beta_k\) in \((2)\)!
Why is \((2)\) homoskedastic?
\(\mathop{\text{Var}} \left( \dfrac{u_i}{\sigma_i} \middle| x_i \right) =\) \(\dfrac{1}{\sigma_i^2} \mathop{\text{Var}} \left( u_i \middle| x_i \right) =\) \(\dfrac{1}{\sigma_i^2} \sigma_i^2 =\) \(1\)
WLS is great, but we need to know \(\sigma_i^2\), which is generally unlikely.
We can slightly relax this requirement—instead requiring
\(\mathop{\text{Var}} \left( u_i | x_i \right) = \sigma_i^2 = \sigma^2 h(x_i)\)
We know \(h(x)\).
As before, we transform our heteroskedastic model into a homoskedastic model. This time we divide each observation’s data1 by \(\sqrt{h(x_i)}\).
\[ \begin{align} y_i &= \beta_0 + \beta_1 x_{i} + u_i \tag{1} \\[1em] \dfrac{y_i}{\sqrt{h(x_i)}} &= \beta_0 \dfrac{1}{\sqrt{h(x_i)}} + \beta_1 \dfrac{x_{i}}{\sqrt{h(x_i)}} + \dfrac{u_i}{\sqrt{h(x_i)}} \tag{2} \end{align} \] with \(\mathop{\text{Var}} \left( u_i | x_i \right) = \sigma^2 h(x_i)\).
Now let’s check that \((2)\) is indeed homoskedastic.
\(\mathop{\text{Var}} \left( \dfrac{u_i}{\sqrt{h(x_i)}} \middle| x_i \right) =\) \(\dfrac{1}{h(x_i)} \mathop{\text{Var}} \left( u_i \middle| x_i \right) =\) \(\dfrac{1}{h(x_i)} \sigma^2 h(x_i) =\) \(\textcolor{#e64173}{\sigma^2}\)
Homoskedasticity!
Weighted least squares (WLS) estimators are a special class of
generalized least squares (GLS) estimators focused on heteroskedasticity.
\[ y_i = \beta_0 + \beta_1 x_{1i} + u_i \quad \textcolor{#e64173}{\text{ vs. }} \quad \dfrac{y_i}{\sigma_i} = \beta_0 \dfrac{1}{\sigma_i} + \beta_1 \dfrac{x_{1i}}{\sigma_i} + \dfrac{u_i}{\sigma_i} \]
Notes:
Response #3:
Q: What is a standard error?
A: The standard deviation of an estimator’s distribution.
Estimators (like \(\hat{\beta}_1\)) are random variables, so they have distributions.
Standard errors give us a sense of how much variability is in our estimator.
Recall: We can write the OLS estimator for \(\beta_1\) as
\[ \hat{\beta}_1 = \beta_1 + \dfrac{\sum_i \left( x_i - \overline{x} \right) u_i}{\sum_i \left( x_i - \overline{x} \right)^2} = \beta_1 + \dfrac{\sum_i \left( x_i - \overline{x} \right) u_i}{\text{SST}_x} \tag{3} \]
Let \(\mathop{\text{Var}} \left( u_i \middle| x_i \right) = \sigma_i^2\).
We can use \((3)\) to write the variance of \(\hat{\beta}_1\), i.e.,
\[ \mathop{\text{Var}} \left( \hat{\beta}_1 \middle| x_i \right) = \dfrac{\sum_i \left( x_i - \overline{x} \right)^2 \sigma_i^2}{\text{SST}_x^2} \tag{4} \]
If we want unbiased estimates for our standard errors, we need an unbiased estimate for
\[ \dfrac{\sum_i \left( x_i - \overline{x} \right)^2 \sigma_i^2}{\text{SST}_x^2} \]
Our old friend Hal White provided such an estimator:1
\[ \widehat{\mathop{\text{Var}}} \left( \hat{\beta}_1 \right) = \dfrac{\sum_i \left( x_i - \overline{x} \right)^2 e_i^2}{\text{SST}_x^2} \]
where the \(e_i\) comes from the OLS regression of interest.
Our heteroskedasticity-robust estimators for the standard error of \(\beta_j\).
Case 1 Simple linear regression, \(y_i = \beta_0 + \beta_1 x_i + u_i\)
\[ \widehat{\mathop{\text{Var}}} \left( \hat{\beta}_1 \right) = \dfrac{\sum_i \left( x_i - \overline{x} \right)^2 e_i^2}{\text{SST}_x^2} \]
Case 2 Multiple (linear) regression, \(y_i = \beta_0 + \beta_1 x_{1i} + \cdots + \beta_k x_{ki} + u_i\)
\[ \widehat{\mathop{\text{Var}}} \left( \hat{\beta}_j \right) = \dfrac{\sum_i \hat{r}_{ij}^2 e_i^2}{\text{SST}_{x_j^2}} \]
where \(\hat{r}_{ij}\) denotes the ith residual from regressing \(x_j\) on all other explanatory variables.
With these standard errors, we can return to correct statistical inferencel
E.g., we can update our previous \(t\) statistic formula with our new heteroskedasticity-robust standard erros.
\[ t = \dfrac{\text{Estimate} - \text{Hypothesized value}}{\text{Standard error}} \]
Notes
These standard errors go by many names
Do not say: “Robust standard errors”. The problem: “robust” to what?
Living with heteroskedasticity
Examples
Back to our test-scores dataset…
# Load packages
library(pacman)
p_load(tidyverse, Ecdat)
# Select and rename desired variables; assign to new dataset; format as tibble
test_df = Caschool %>% select(
test_score = testscr, ratio = str, income = avginc, enrollment = enrltot
) %>% as_tibble()
# View first 2 rows of the dataset
head(test_df, 2)#> # A tibble: 2 × 4
#> test_score ratio income enrollment
#> <dbl> <dbl> <dbl> <int>
#> 1 691. 17.9 22.7 195
#> 2 661. 21.5 9.82 240
We found significant evidence of heteroskedasticity.
Let’s check if it was due to misspecifying our model.
Model1: \(\text{Score}_i = \beta_0 + \beta_1 \text{Ratio}_i + \beta_2 \text{Income}_i + u_i\) lm(test_score ~ ratio + income, data = test_df)
Model2: \(\log\left(\text{Score}_i\right) = \beta_0 + \beta_1 \text{Ratio}_i + \beta_2 \text{Income}_i + u_i\) lm(log(test_score) ~ ratio + income, data = test_df)
Model3: \(\log\left(\text{Score}_i\right) = \beta_0 + \beta_1 \text{Ratio}_i + \beta_2 \log\left(\text{Income}_i\right) + u_i\) lm(log(test_score) ~ ratio + log(income), data = test_df)
Let’s test this new specification with the White test for heteroskedasticity.
Model3: \(\log\left(\text{Score}_i\right) = \beta_0 + \beta_1 \text{Ratio}_i + \beta_2 \log\left(\text{Income}_i\right) + u_i\)
The regression for the White test
\[ \begin{align} e_i^2 = &\alpha_0 + \alpha_1 \text{Ratio}_i + \alpha_2 \log\left(\text{Income}_i\right) + \alpha_3 \text{Ratio}_i^2 + \alpha_4 \left(\log\left(\text{Income}_i\right)\right)^2 \\ &+ \alpha_5 \left(\text{Ratio}_i\times\log\left(\text{Income}_i\right)\right) + v_i \end{align} \]
yields \(R_e^2\approx0.029\) and test statistic of \(\widehat{\text{LM}} = n\times R_e^2 \approx 12.2\).
Under H0, \(\text{LM}\) is distributed as \(\chi_5^2\) \(\implies\) p-value \(\approx\) 0.033.
∴ Reject H0. Conclusion: There is (still) statistically significant evidence of heteroskedasticity at the 5%-percent level.
Okay, we tried adjusting our specification, but there is still evidence of heteroskedasticity.
Next: In general, you will turn to heteroskedasticity-robust standard errors.
OLS is still unbiased for the coefficients (the \(\beta_j\)’s)
Heteroskedasticity-robust standard errors are unbiased for the
standard errors of the \(\hat{\beta}_j\)’s, i.e., \(\sqrt{\mathop{\text{Var}} \left( \hat{\beta}_j \right)}\).
Let’s return to our model
\[ \text{Score}_i = \beta_0 + \beta_1 \text{Ratio}_i + \beta_2 \text{Income}_i + u_i \]
We can use the fixest package in R to calculate standard errors.
\[ \text{Score}_i = \beta_0 + \beta_1 \text{Ratio}_i + \beta_2 \text{Income}_i + u_i \]
1. Run the regression with feols() (instead of lm())
Notice that feols() uses the same syntax as lm() for this regression.
\[ \text{Score}_i = \beta_0 + \beta_1 \text{Ratio}_i + \beta_2 \text{Income}_i + u_i \]
2. Estimate het.-robust standard errors with vcov = 'hetero' option in summary()
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 638.729155 7.301234 87.48235 < 2.2e-16 ***
#> ratio -0.648740 0.353340 -1.83602 0.067066 .
#> income 1.839112 0.114733 16.02949 < 2.2e-16 ***
Ceofficients and heteroskedasticity-robust standard errors:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 638.729155 7.301234 87.48235 < 2.2e-16 ***
#> ratio -0.648740 0.353340 -1.83602 0.067066 .
#> income 1.839112 0.114733 16.02949 < 2.2e-16 ***
Ceofficients and plain OLS standard errors (assumes homoskedasticity):
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 638.729155 7.449077 85.74608 < 2.2e-16 ***
#> ratio -0.648740 0.354405 -1.83051 0.067888 .
#> income 1.839112 0.092787 19.82083 < 2.2e-16 ***
We mentioned that WLS is often not possible—we need to know the functional form of the heteroskedasticity—either
A. \(\sigma_i^2\)
or
B. \(h(x_i)\), where \(\sigma_i^2 = \sigma^2 h(x_i)\)
There are occasions in which we can know \(h(x_i)\).
Imagine individuals in a population have homoskedastic disturbances.
However, instead of observing individuals’ data, we observe (in data) groups’ averages (e.g., cities, counties, school districts).
If these groups have different sizes, then our dataset will be heteroskedastic—in a predictable fashion.
Recall: The variance of the sample mean depends upon the sample size, \[ \mathop{\text{Var}} \left( \overline{x} \right) = \dfrac{\sigma_x^2}{n} \]
Example: Our school testing data is averaged at the school level.
Example: Our school testing data is averaged at the school level.
Even if individual students have homoskedastic disturbances, the schools would have heteroskedastic disturbances, i.e.,
Individual-level model: \(\quad \text{Score}_i = \beta_0 + \beta_1 \text{Ratio}_i + \beta_2 \text{Income}_i + u_i\)
School-level model: \(\quad \overline{\text{Score}}_s = \beta_0 + \beta_1 \overline{\text{Ratio}}_s + \beta_2 \overline{\text{Income}}_s + \overline{u}_s\)
where the \(s\) subscript denotes an individual school (just as \(i\) indexes an individual person).
\[ \mathop{\text{Var}} \left( \overline{u}_s \right) = \dfrac{\sigma^2}{n_s} \]
For WLS, we’re looking for a function \(h(x_s)\) such that \(\mathop{\text{Var}} \left( \overline{u}_s | x_s \right) = \sigma^2 h(x_s)\).
We just showed1 that \(\mathop{\text{Var}} \left( \overline{u}_s |x_s \right) = \dfrac{\sigma^2}{n_s}\).
Thus, \(h(x_s) = 1/n_s\), where \(n_s\) is the number of students in school \(s\).
To implement WLS, we divide each observation’s data by \(1/\sqrt{h(x_s)}\), meaning we need to multiply each school’s data by \(\sqrt{n_s}\).
The variable enrollment in the test_df dataset is our \(n_s\).
Using WLS to estimate \(\text{Score}_i = \beta_0 + \beta_1 \text{Ratio}_i + \beta_2 \text{Income}_i + u_i\)
Step 1: Multiply each variable by \(1/\sqrt{h(x_i)} = \sqrt{\text{Enrollment}_i}\)
Notice that we are creating a transformed intercept.
Using WLS to estimate \(\text{Score}_i = \beta_0 + \beta_1 \text{Ratio}_i + \beta_2 \text{Income}_i + u_i\)
Step 2: Run our WLS (transformed) regression
Note: The -1 in our regression tells R not to add an intercept, since we are adding a transformed intercept (intercept_wls).
The WLS estimates and standard errors:
#> Estimate Std. Error t value Pr(>|t|)
#> intercept_wls 618.78331 8.26929 74.829 <2e-16 ***
#> ratio_wls -0.21314 0.37676 -0.566 0.572
#> income_wls 2.26493 0.09065 24.985 <2e-16 ***
The OLS estimates and het.-robust standard errors:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 638.729155 7.301234 87.48235 < 2.2e-16 ***
#> ratio -0.648740 0.353340 -1.83602 0.067066 .
#> income 1.839112 0.114733 16.02949 < 2.2e-16 ***
Alternative to doing your own weighting: feed lm() some weights.
#> Estimate Std. Error t value Pr(>|t|)
#> intercept_wls 618.78331 8.26929 74.829 <2e-16 ***
#> ratio_wls -0.21314 0.37676 -0.566 0.572
#> income_wls 2.26493 0.09065 24.985 <2e-16 ***
In this example
Heteroskedasticity-robust standard errors did not change our standard errors very much (relative to plain OLS standard errors).
WLS changed our answers a bit—coefficients and standard errors.
These examples highlighted a few things:
Using the correct estimator for your standard errors really matters.1
Econometrics doesn’t always offer an obviously correct route.
To see #1, let’s run a simulation.
Let’s examine a simple linear regression model with heteroskedasticity.
\[ y_i = \underbrace{\beta_0}_{=1} + \underbrace{\beta_1}_{=10} x_i + u_i \]
where \(\mathop{\text{Var}} \left( u_i | x_i \right) = \sigma_i^2 = \sigma^2 x_i^2\).
Note regarding WLS:
Since \(\mathop{\text{Var}} \left( u_i | x_i \right) = \sigma^2 x_i^2\),
\[ \mathop{\text{Var}} \left( u_i | x_i \right) = \sigma^2 h(x_i) \implies h(x_i) = x_i^2 \]
WLS multiplies each variable by \(1/\sqrt{h(x_i)} = 1/x_i\).
In this simulation, we want to compare
The efficiency of
How well our standard errors perform (via confidence intervals) with
The simulation plan:
Do 10,000 times:
Generate a sample of size 30 from the population
Calculate/save OLS and WLS (×2) estimates for β1
Calculate/save standard errors for β1 using
For one iteration of the simulation:
Code to generate the data…
For one iteration of the simulation:
Code to estimate our coefficients and standard errors…
# OLS
ols = feols(y ~ x, data = sample_df)
# WLS: Correct weights
wls_t = lm(y ~ x, data = sample_df, weights = 1/x^2)
# WLS: Correct weights
wls_f = lm(y ~ x, data = sample_df, weights = 1/x)
# Coefficients and standard errors
summary(ols, vcov = 'iid')
summary(ols, vcov = 'hetero')
summary(wls_t)
summary(wls_f)Then save the results.
Summarizing our simulation results (10,000 iterations)
Estimation: Summary of \(\hat{\beta}_1\)’s
| Estimator | Mean | S.D. |
|---|---|---|
| OLS | 9.994 | 0.882 |
| WLS Correct | 9.989 | 0.678 |
| WLS Incorrect | 9.991 | 0.764 |
Inference: % of times we reject \(\beta_1\)
| Estimators | % Reject |
|---|---|
| OLS + Het.-robust | 6.7 |
| OLS + Homosk. | 7.3 |
| WLS Correct | 6.4 |
| WLS Incorrect | 7.4 |
Going further…
Recall our old assumption that led to this heteroskedasticity discussion:
5. The disurbances have constant variance \(\sigma^2\) and zero covariance, i.e.,
It’s also possible (likely) that the disturbances are correlated.
Ignoring this correlation is even more problematic for inference.
In many cases, observations’ disturbances \(\left(u_i,\,u_j\right)\) may correlate.
Remember
If these observation-level relationships extend to the variables in the disturbance, then disturbances can correlate.
\(\implies \mathop{\text{Cov}} \left( u_i, \, u_j \middle| X_i,\,X_j \right) \textcolor{#e64173}{\neq} 0\).
Ignoring this correlation can cause big problems in your inference.
Why is ignoring this correlation problematic?
False precision: We can get “overconfident” in our knowledge.
When we teating correlated observations as independent, we OLS thinks we’re learning more than we actually are.
Extreme example: If duplicate your dataset (stack it on top of itself), plain OLS standard errors would decrease every time you duplicated the dataset.
The effect of duplicating our data on the OLS standard error of ratio.
The effect of duplicating our data on the OLS standard error of ratio.
The effect of duplicating our data on the OLS standard error of ratio.
Correcting our standard errors for clustering (observations’ correlation).
“Real” examples where disturbances might correlate:
Just like we calculate heteroskedasticity-robust standard errors,
we can also calculate standard errors robust to correlated disturbances.
People call these cluster-robust standard errors (or just clustered).
From fixest package:
feols(y ~ x, data = fake_data, cluster = 'cluster_var')
or even
feols(y ~ x, data = fake_data, cluster = c('cluster1', 'cluster2'))
You should
Failure to address these issues can lead to