Metrics review, part 2

EC421, Set 03

Prologue

R showcase

ggplot2

  • Powerful graphing and mapping package for R.
  • Idea: Build your figures layer by layer.
  • Exportable to many applications; part of the tidyverse.

shiny

Schedule

Last Time

We reviewed the fundamentals of statistics and econometrics.

Today

We review more of the main/basic results in metrics.

This week

We will post the first assignment (focused on review) soon.
First we need to finish more (of this) review.

Multiple regression

Multiple regression

More explanatory variables

We’re moving from simple linear regression
(one outcome variable and one explanatory variable)

\[ \textcolor{#e64173}{y_i} = \beta_0 + \beta_1 \textcolor{#6A5ACD}{x_i} + u_i \]

to the land of multiple linear regression
(one outcome variable and multiple explanatory variables)

\[ \textcolor{#e64173}{y_i} = \beta_0 + \beta_1 \textcolor{#6A5ACD}{x_{1i}} + \beta_2 \textcolor{#6A5ACD}{x_{2i}} + \cdots + \beta_k \textcolor{#6A5ACD}{x_{ki}} + u_i \]

Why?

We can better explain variation in \(y\), improve predictions, avoid OV Bias, …

Multiple regression

\(y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i \quad\) \(x_1\) is continuous \(\quad x_2\) is categorical

Multiple regression

The intercept and categorical variable \(x_2\) control for the groups’ means.

Multiple regression

With groups’ means removed:

Multiple regression

\(\hat{\beta}_1\) estimates the relationship between \(y\) and \(x_1\) after controlling for \(x_2\).

Multiple regression

Another way to think about it: We’re estimating two (parallel) lines.

Looking at our estimator can also help.

Multiple regression

For the simple linear regression \(y_i = \beta_0 + \beta_1 x_i + u_i\)

\[ \begin{aligned} \hat{\beta}_1 &= \\[0.3em] &= \dfrac{\sum_i \left( x_i - \overline{x} \right) \left( y_i - \overline{y} \right)}{\sum_i \left( x_i -\overline{x} \right)} \\[0.3em] &= \dfrac{\sum_i \left( x_i - \overline{x} \right) \left( y_i - \overline{y} \right)/(n-1)}{\sum_i \left( x_i -\overline{x} \right) / (n-1)} \\[0.3em] &= \dfrac{\mathop{\hat{\text{Cov}}}(x,\,y)}{\mathop{\hat{\text{Var}}} \left( x \right)} \end{aligned} \]

Multiple regression

Simple linear regression estimator:

\[ \hat{\beta}_1 = \dfrac{\mathop{\hat{\text{Cov}}}(x,\,y)}{\mathop{\hat{\text{Var}}} \left( x \right)} \]

Moving to multiple linear regression, the estimator changes slightly:

\[ \hat{\beta}_1 = \dfrac{\mathop{\hat{\text{Cov}}}(\textcolor{#e64173}{\tilde{x}_1},\,y)}{\mathop{\hat{\text{Var}}} \left( \textcolor{#e64173}{\tilde{x}_1} \right)} \]

where \(\textcolor{#e64173}{\tilde{x}_1}\) is the residualized \(x_1\) variable—the variation remaining in \(x\) after controlling for the other explanatory variables.

Multiple regression

Consider the multiple-regression model

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + u \]

Residualized \(x_{1}\) (\(\textcolor{#e64173}{\tilde{x}_1}\)) comes from regressing \(x_1\) on an intercept and all other explanatory variables (then collecting the residuals), i.e.,

\[ \begin{aligned} \hat{x}_{1} &= \hat{\gamma}_0 + \hat{\gamma}_2 \, x_{2} + \hat{\gamma}_3 \, x_{3} \\ \textcolor{#e64173}{\tilde{x}_{1}} &= x_{1} - \hat{x}_{1} \end{aligned} \]

allowing us to better understand our OLS multiple-regression estimator

\[ \hat{\beta}_1 = \dfrac{\mathop{\hat{\text{Cov}}}(\textcolor{#e64173}{\tilde{x}_1},\,y)}{\mathop{\hat{\text{Var}}} \left( \textcolor{#e64173}{\tilde{x}_1} \right)} \]

Multiple regression

Model fit

Measures of goodness of fit quantify how well a model describes/fits the data.

Common measure: \(R^2\) [R-squared] (a.k.a. coefficient of determination)

\[ R^2 = \dfrac{\sum_i (\hat{y}_i - \overline{y})^2}{\sum_i \left( y_i - \overline{y} \right)^2} = 1 - \dfrac{\sum_i \left( y_i - \hat{y}_i \right)^2}{\sum_i \left( y_i - \overline{y} \right)^2} \]

Notice our old friend SSE: \(\sum_i \left( y_i - \hat{y}_i \right)^2 = \sum_i e_i^2\).

\(R^2\) literally tells us the share of the var. in \(y\) our current models accounts for.
Thus \(0 \leq R^2 \leq 1\).

Multiple regression

The problem: As we add variables to our model, \(R^2\) mechanically increases.

To see this problem, we can simulate a dataset of 10,000 observations on \(y\) and 1,000 random \(x_k\) variables. No relations between \(y\) and the \(x_k\)!

Pseudo-code outline of the simulation:

  • Generate 10,000 observations on \(y\)
  • Generate 10,000 observations on variables \(x_1\) through \(x_{1000}\)
  • Regressions
    • LM1: Regress \(y\) on \(x_1\); record R2
    • LM2: Regress \(y\) on \(x_1\) and \(x_2\); record R2
    • LM3: Regress \(y\) on \(x_1\), \(x_2\), and \(x_3\); record R2
    • LM1000: Regress \(y\) on \(x_1\), \(x_2\), …, \(x_{1000}\); record R2

Multiple regression

The problem: As we add variables to our model, \(R^2\) mechanically increases.

R code for the simulation:

set.seed(1234)
y = rnorm(1e4)
x = matrix(data = rnorm(1e7), nrow = 1e4)
x %<>% cbind(matrix(data = 1, nrow = 1e4, ncol = 1), x)
r_df = mclapply(X = 1:(1e3-1), mc.cores = detectCores() - 1, FUN = function(i) {
  tmp_reg = lm(y ~ x[,1:(i+1)]) %>% summary()
  data.frame(
    k = i + 1,
    r2 = tmp_reg %$% r.squared,
    r2_adj = tmp_reg %$% adj.r.squared
  )
}) %>% bind_rows()

Multiple regression

The problem: As we add variables to our model, \(\textcolor{#314f4f}{R^2}\) mechanically increases.

Multiple regression

One solution: Adjusted \(\textcolor{#e64173}{R^2}\)

Multiple regression

The problem: As we add variables to our model, \(R^2\) mechanically increases.

One solution: Penalize for the number of variables, e.g., adjusted \(R^2\):

\[ \overline{R}^2 = 1 - \dfrac{\sum_i \left( y_i - \hat{y}_i \right)^2/(n-k-1)}{\sum_i \left( y_i - \overline{y} \right)^2/(n-1)} \]

Note: Adjusted \(R^2\) need not be between 0 and 1.

Multiple regression

Tradeoffs

There are tradeoffs to remember as we add/remove variables:

Fewer variables

  • generally explain less variation in \(y\),
  • provide simple interpretations and visualizations (parsimonious),
  • may need to worry about omitted-variable bias (OVB).

More variables

  • more likely to find spurious relationships (statistically significant due to chance—does not reflect a true, population-level relationship),
  • more difficult to interpret the model,
  • may still miss important variabless—still OVB.

Omitted-variable bias

Omitted-variable bias

We’ll go deeper into this issue in a few weeks, but as a refresher:

Omitted-variable bias (OVB) arises when we omit a variable that

  1. affects our outcome variable \(y\)

  2. correlates with an explanatory variable \(x_j\)

As it’s name suggests, this situation leads to bias in our estimate of \(\beta_j\).

Note: OVB Is not exclusive to multiple linear regression, but it does require multiple variables affect \(y\).

Omitted-variable bias

Example

Let’s imagine a simple model of the returns to schooling \[ \text{Pay}_i = \beta_0 + \beta_1 \text{School}_i + \beta_2 \text{Male}_i + u_i \] where \(\text{School}_i\) gives \(i\)’s years of schooling; \(\text{Male}_i\) represents an indicator variable for whether individual \(i\) is male.

Thus

  • \(\beta_1\): the returns to an additional year of schooling (ceteris paribus)
  • \(\beta_2\): the “premium” for being male (ceteris paribus)

If \(\beta_2 > 0\), then males are favored in the labor market
(discrimination, all else equal).

Omitted-variable bias

Example, continued

From our population model

\[ \text{Pay}_i = \beta_0 + \beta_1 \text{School}_i + \beta_2 \text{Male}_i + u_i \]

If a study focuses on the relationship between pay and schooling, i.e., \[ \text{Pay}_i = \beta_0 + \beta_1 \text{School}_i + \left(\beta_2 \text{Male}_i + u_i\right) \] \[ \text{Pay}_i = \beta_0 + \beta_1 \text{School}_i + \varepsilon_i \] the “disturbance” becomes \(\varepsilon_i = \beta_2 \text{Male}_i + u_i\).

OLS needs exogeneity to be unbiasedness. Likely violated here.

But even if \(\mathop{\boldsymbol{E}}\left[ u | X \right] = 0\), it is not true that \(\mathop{\boldsymbol{E}}\left[ \varepsilon | X \right] = 0\) so long as \(\beta_2 \neq 0\).

Omitted-variable bias

Example, continued

From our population model

\[ \text{Pay}_i = \beta_0 + \beta_1 \text{School}_i + \beta_2 \text{Male}_i + u_i \]

If a study focuses on the relationship between pay and schooling, i.e., \[ \text{Pay}_i = \beta_0 + \beta_1 \text{School}_i + \left(\beta_2 \text{Male}_i + u_i\right) \] \[ \text{Pay}_i = \beta_0 + \beta_1 \text{School}_i + \varepsilon_i \] the “disturbance” becomes \(\varepsilon_i = \beta_2 \text{Male}_i + u_i\).

OLS needs exogeneity to be unbiasedness. Likely violated here.

Unless \(\text{School}\) and \(\text{Male}\) are unrelated, OLS is biased.

Omitted-variable bias

Example, continued

Let’s try to see this result graphically.

Population model:

\[ \text{Pay}_i = 20 + 0.5 \times \text{School}_i + 10 \times \text{Male}_i + u_i \]

Our regression model that suffers from omitted-variable bias:

\[ \text{Pay}_i = \hat{\beta}_0 + \hat{\beta}_1 \times \text{School}_i + e_i \]

Finally, imagine that women, on average, receive more schooling than men.

Omitted-variable bias

Example, continued: \(\text{Pay}_i = 20 + 0.5 \times \text{School}_i + 10 \times \text{Male}_i + u_i\)

The relationship between pay and schooling.

Omitted-variable bias

Example, continued: \(\text{Pay}_i = 20 + 0.5 \times \text{School}_i + 10 \times \text{Male}_i + u_i\)

Biased regression estimate: \(\widehat{\text{Pay}}_i = 31.3 + -0.9 \times \text{School}_i\)

Omitted-variable bias

Example, continued: \(\text{Pay}_i = 20 + 0.5 \times \text{School}_i + 10 \times \text{Male}_i + u_i\)

Recalling the omitted variable: Gender (female and male)

Omitted-variable bias

Example, continued: \(\text{Pay}_i = 20 + 0.5 \times \text{School}_i + 10 \times \text{Male}_i + u_i\)

Recalling the omitted variable: Gender (female and male)

Omitted-variable bias

Example, continued: \(\text{Pay}_i = 20 + 0.5 \times \text{School}_i + 10 \times \text{Male}_i + u_i\)

Unbiased regression estimate: \(\widehat{\text{Pay}}_i = 20.9 + 0.4 \times \text{School}_i + 9.1 \times \text{Male}_i\)

Omitted-variable bias

Solutions

  1. Don’t omit variables

  2. Instrumental variables and two-stage least squares

Warning: There are situations in which neither solution is possible.

  1. Proceed with caution (sometimes you can sign the bias).

  2. Maybe just stop.

Interpreting coefficients

Interpreting coefficients

Continuous variables

Consider the relationship

\[ \text{Pay}_i = \beta_0 + \beta_1 \, \text{School}_i + u_i \]

where

  • \(\text{Pay}_i\) is a continuous variable measuring an individual’s pay
  • \(\text{School}_i\) is a continuous variable that measures years of education

Interpretations

  • \(\beta_0\): the \(y\)-intercept, i.e., \(\text{Pay}\) when \(\text{School} = 0\)
  • \(\beta_1\): the expected increase in \(\text{Pay}\) for a one-unit increase in \(\text{School}\)

Interpreting coefficients

Continuous variables

Deriving the slope’s interpretation:

\[ \begin{aligned} \mathop{\boldsymbol{E}}\left[ \text{Pay} | \text{School} = \ell + 1 \right] - \mathop{\boldsymbol{E}}\left[ \text{Pay} | \text{School} = \ell \right] &= \\[0.5em] \mathop{\boldsymbol{E}}\left[ \beta_0 + \beta_1 (\ell + 1) + u \right] - \mathop{\boldsymbol{E}}\left[ \beta_0 + \beta_1 \ell + u \right] &= \\[0.5em] \left[ \beta_0 + \beta_1 (\ell + 1) \right] - \left[ \beta_0 + \beta_1 \ell \right] &= \\[0.5em] \beta_0 - \beta_0 + \beta_1 \ell - \beta_1 \ell + \beta_1 &= \beta_1 \end{aligned} \]

I.e., the slope gives the expected increase in our outcome variable for a one-unit increase in the explanatory variable.

Interpreting coefficients

Continuous variables

If we have multiple explanatory variables, e.g.,

\[ \text{Pay}_i = \beta_0 + \beta_1 \, \text{School}_i + \beta_2 \, \text{Ability}_i + u_i \]

then the interpretation changes slightly.

\[ \begin{aligned} \mathop{\boldsymbol{E}}\left[ \text{Pay} | \text{School} = \ell + 1 \land \text{Ability} = \alpha \right] - & \\ \mathop{\boldsymbol{E}}\left[ \text{Pay} | \text{School} = \ell \land \text{Ability} = \alpha \right] &= \\ \mathop{\boldsymbol{E}}\left[ \beta_0 + \beta_1 (\ell + 1) + \beta_2 \alpha + u \right] - \mathop{\boldsymbol{E}}\left[ \beta_0 + \beta_1 \ell + \beta_2 \alpha + u \right] &= \\ \left[ \beta_0 + \beta_1 (\ell + 1) + \beta_2 \alpha \right] - \left[ \beta_0 + \beta_1 \ell + \beta_2 \alpha \right] &= \\ \beta_0 - \beta_0 + \beta_1 \ell - \beta_1 \ell + \beta_1 + \beta_2 \alpha - \beta_2 \alpha &= \beta_1 \end{aligned} \]

I.e., the slope gives the expected increase in our outcome variable for a one-unit increase in the explanatory variable, holding all other variables constant (ceteris paribus).

Interpreting coefficients

Continuous variables

Alternative derivation

Consider the model

\[ y = \beta_0 + \beta_1 \, x + u \]

Differentiate the model:

\[ \dfrac{dy}{dx} = \beta_1 \]

Interpreting coefficients

Categorical variables

Consider the relationship

\[ \text{Pay}_i = \beta_0 + \beta_1 \, \text{Female}_i + u_i \]

where

  • \(\text{Pay}_i\) is a continuous variable measuring an individual’s pay
  • \(\text{Female}_i\) is a binary/indicator variable taking \(1\) when \(i\) is female

Interpretations

  • \(\beta_0\): the expected \(\text{Pay}\) for non-females (i.e., when \(\text{Female} = 0\))
  • \(\beta_1\): the expected difference in \(\text{Pay}\) between females and non-females
  • \(\beta_0 + \beta_1\): the expected \(\text{Pay}\) for females

Interpreting coefficients

Categorical variables

Derivations

\[ \begin{aligned} \mathop{\boldsymbol{E}}\left[ \text{Pay} | \text{Non-female} \right] &= \mathop{\boldsymbol{E}}\left[ \beta_0 + \beta_1\times 0 + u_i \right] \\ &= \mathop{\boldsymbol{E}}\left[ \beta_0 + 0 + u_i \right] \\ &= \beta_0 \end{aligned} \]

\[ \begin{aligned} \mathop{\boldsymbol{E}}\left[ \text{Pay} | \text{Female} \right] &= \mathop{\boldsymbol{E}}\left[ \beta_0 + \beta_1\times 1 + u_i \right] \\ &= \mathop{\boldsymbol{E}}\left[ \beta_0 + \beta_1 + u_i \right] \\ &= \beta_0 + \beta_1 \end{aligned} \]

Note: If there are no other variables to condition on, then \(\hat{\beta}_1\) equals the difference in group means, e.g., \(\overline{x}_\text{Female} - \overline{x}_\text{Non-female}\).

Note2: The holding all other variables constant interpretation also applies for categorical variables in multiple regression settings.

Interpreting coefficients

Categorical variables

\(y_i = \beta_0 + \beta_1 x_i + u_i\) for binary variable \(x_i = \{\textcolor{#314f4f}{0}, \, \textcolor{#e64173}{1}\}\)

Interpreting coefficients

Categorical variables

\(y_i = \beta_0 + \beta_1 x_i + u_i\) for binary variable \(x_i = \{\textcolor{#314f4f}{0}, \, \textcolor{#e64173}{1}\}\)

Interpreting coefficients

Interactions

Interactions allow the effect of one variable to change based upon the level of another variable.

Examples

  1. Does the effect of schooling on pay change by gender?

  2. Does the effect of gender on pay change by race?

  3. Does the effect of schooling on pay change by experience?

Interpreting coefficients

Interactions

Previously, we considered a model that allowed women and men to have different wages, but the model assumed the effect of school on pay was the same for everyone:

\[ \text{Pay}_i = \beta_0 + \beta_1 \, \text{School}_i + \beta_2 \, \text{Female}_i + u_i \]

but we can also allow the effect of school to vary by gender:

\[ \text{Pay}_i = \beta_0 + \beta_1 \, \text{School}_i + \beta_2 \, \text{Female}_i + \beta_3 \, \text{School}_i\times\text{Female}_i + u_i \]

Interpreting coefficients

Interactions

The model where schooling has the same effect for everyone (F and M):

Interpreting coefficients

Interactions

The model where schooling’s effect can differ by gender (F and M):

Interpreting coefficients

Interactions

Interpreting coefficients can be a little tricky with interactions, but the key is to carefully work through the math1.

\[ \text{Pay}_i = \beta_0 + \beta_1 \, \text{School}_i + \beta_2 \, \text{Female}_i + \beta_3 \, \text{School}_i\times\text{Female}_i + u_i \]

Expected returns for an additional year of schooling for women:

\[ \begin{aligned} \mathop{\boldsymbol{E}}\left[ \text{Pay}_i | \text{Female} \land \text{School} = \ell + 1 \right] - \mathop{\boldsymbol{E}}\left[ \text{Pay}_i | \text{Female} \land \text{School} = \ell \right] &= \\[.5em] \mathop{\boldsymbol{E}}\left[ \beta_0 + \beta_1 (\ell+1) + \beta_2 + \beta_3 (\ell + 1) + u_i \right] - \mathop{\boldsymbol{E}}\left[ \beta_0 + \beta_1 \ell + \beta_2 + \beta_3 \ell + u_i \right] &= \\[.5em] \beta_1 + \beta_3 \end{aligned} \]

Interpreting coefficients

Interactions

Interpreting coefficients can be a little tricky with interactions, but the key is to carefully work through the math1.

\[ \text{Pay}_i = \beta_0 + \beta_1 \, \text{School}_i + \beta_2 \, \text{Female}_i + \beta_3 \, \text{School}_i\times\text{Female}_i + u_i \]

Expected returns for an additional year of schooling for women: \(\beta_1 + \beta_3\)

  • \(\beta_1\): the expected return for an add. yr. of schooling for non-females;

  • \(\beta_3\): the difference in the returns to schooling for females vs. non-females.

Interpreting coefficients

Interactions

The previous slides focused on interactions where one variable was binary.

If both variables are continuous, then the interpretation is slightly trickier.

Key Interactions simply mean the effect of one variable depends on the level of another variable.

Interpreting coefficients

Interactions

Suppose we’re interested in the model

\[ \text{Pay}_i = \beta_0 + \beta_1 \, \text{School}_i + \beta_2 \, \text{Experience}_i + \beta_3 \, \text{School}_i\times\text{Experience}_i + u_i \]

where \(\text{School}_i\) and \(\text{Experience}_i\) are both continuous variables (in years).

How do we interpret the interaction here?

School’s effect on pay now depends on the level of experience.

Interpretation Consider the partial derivative:

\[ \dfrac{\partial\text{Pay}_i}{\partial\text{School}_i} = \beta_1 + \beta_3 \text{Experience}_i \]

Interpreting coefficients

Interactions

In the model

\[ \text{Pay}_i = \beta_0 + \beta_1 \, \text{School}_i + \beta_2 \, \text{Experience}_i + \beta_3 \, \text{School}_i\times\text{Experience}_i + u_i \]

all else equal, an additional year of school changes pay by

\[ \beta_1 + \beta_3 \text{Experience} \]

Interpreting coefficients

Polynomials

Polynomials are just interactions: they interact a variable with itself.

\[ \text{Pay}_i = \beta_0 + \beta_1 \, \text{School}_i + \beta_2 \, \text{School}_i^2 + u_i \]

Here the effect of schooling depends on an individual’s level of schooling.

Interpretation Back to the partial derivative:

\[ \dfrac{\partial\text{Pay}_i}{\partial\text{School}_i} = \beta_1 + 2 \beta_2 \text{School}_i \]

all else equal, an additional year of school changes pay by

\[ \beta_1 + 2 \beta_2 \text{School}_i \]

Interpreting coefficients

Binary outcomes

When your outcome variable is binary, the interpretation changes slightly.

Recall: The avg. of a binary variable gives the % of observations with a ‘1’.

Example: Avg(0, 0, 0, 1, 1) = 0.40 \(\implies\) 40% of obserations = 1.

If your outcome is binary, then you are modeling the probability (percent) that the outcome equals one.

\[ \text{Employed}_i = \beta_0 + \beta_1 \text{School}_i + u_i \]

Interpretation \(\beta_1\) is the effect of one additional year of schooling on the probability an individual is employed (all else equal).

Interpreting coefficients

Log-linear specification

In economics, you will frequently see logged outcome variables with linear (non-logged) explanatory variables, e.g.,

\[ \log(\text{Pay}_i) = \beta_0 + \beta_1 \, \text{School}_i + u_i \]

This specification changes our interpretation of the slope coefficients.

Interpretation

  • A one-unit increase in our explanatory variable increases the outcome variable by approximately \(\beta_1\times 100\) percent.

  • Example: An additional year of schooling increases pay by approximately 3 percent (for \(\beta_1 = 0.03\)).

Interpreting coefficients

Log-linear specification

Derivation

Consider the log-linear model

\[ \log(y) = \beta_0 + \beta_1 \, x + u \]

and differentiate

\[ \dfrac{dy}{y} = \beta_1 dx \]

So a marginal change in \(x\) (i.e., \(dx\)) leads to a \(\beta_1 dx\) percentage change in \(y\).

Interpreting coefficients

Log-linear specification

Because the log-linear specification comes with a different interpretation, you need to make sure it fits your data-generating process/model.

Does \(x\) change \(y\) in levels (e.g., a 3-unit increase) or percentages (e.g., a 10-percent increase)?

I.e., you need to be sure an exponential relationship makes sense:

\[ \log(y_i) = \beta_0 + \beta_1 \, x_i + u_i \iff y_i = e^{\beta_0 + \beta_1 x_i + u_i} \]

Interpreting coefficients

Log-linear specification

Interpreting coefficients

Log-log specification

Similarly, econometricians frequently employ log-log models, in which the outcome variable is logged and at least one explanatory variable is logged

\[ \log(\text{Pay}_i) = \beta_0 + \beta_1 \, \log(\text{School}_i) + u_i \]

Interpretation:

  • A one-percent increase in \(x\) will lead to a \(\beta_1\) percent change in \(y\).
  • Often interpreted as an elasticity.

Interpreting coefficients

Log-log specification

Derivation

Consider the log-log model

\[ \log(y) = \beta_0 + \beta_1 \, \log(x) + u \]

and differentiate

\[ \dfrac{dy}{y} = \beta_1 \dfrac{dx}{x} \]

which says that for a one-percent increase in \(x\), we will see a \(\beta_1\) percent increase in \(y\). As an elasticity:

\[ \dfrac{dy}{dx} \dfrac{x}{y} = \beta_1 \]

Interpreting coefficients

Log-linear with a binary variable

Note: If you have a log-linear model with a binary indicator variable, the interpretation for the coefficient on that variable changes.

Consider

\[ \log(y_i) = \beta_0 + \beta_1 x_1 + u_i \]

for binary variable \(x_1\).

The interpretation of \(\beta_1\) is now

  • When \(x_1\) changes from 0 to 1, \(y\) will change by \(100 \times \left( e^{\beta_1} -1 \right)\) percent.
  • When \(x_1\) changes from 1 to 0, \(y\) will change by \(100 \times \left( e^{-\beta_1} -1 \right)\) percent.

Interpreting coefficients

Log-log specification

Additional topics

Additional topics

Inference vs. prediction

So far, we’ve focused mainly statistical (causal) inference—using estimators and their distributions properties to try to learn about underlying, unknown population parameters.

\[ y_i = \textcolor{#e64173}{\hat{\beta}_{0}} + \textcolor{#e64173}{\hat{\beta_1}} \, x_{1i} + \textcolor{#e64173}{\hat{\beta_2}} \, x_{2i} + \cdots + \textcolor{#e64173}{\hat{\beta}_{k}} \, x_{ki} + e_i \]

Prediction includes a fairly different set of topics/tools within econometrics (and data science/machine learning)—creating models that accurately estimate individual observations.

\[ \textcolor{#e64173}{\hat{y}_i} = \mathop{\hat{f}}\left( x_1,\, x_2,\, \ldots x_k \right) \]

Additional topics

Inference vs. prediction

Succinctly

  • Inference: causality, \(\hat{\beta}_k\) (consistent and efficient), standard errors/hypothesis tests for \(\hat{\beta}_k\), generally OLS

  • Prediction:1 correlation, \(\hat{y}_i\) (low error), model selection, nonlinear models are much more common

Additional topics

Treatment effects and causality

Much of modern (micro)econometrics focuses on causally estimating (identifying) the effect of programs/policies, e.g.,

In this literature, the program is often a binary variable, and we place high importance on finding an unbiased estimate for the program’s effect.

Additional topics

Transformations

Our linearity assumption requires

  1. parameters enter linearly (i.e., the \(\beta_k\) multiplied by variables)
  2. the \(u_i\) disturbances enter additively

We allow nonlinear relationships between \(y\) and the explanatory variables.

Examples

  • Polynomials and interactions: \(y_i = \beta_0 + \beta_1 x_1 + \beta_2 x_1^2 + \beta_3 x_2 + \beta_4 x_2^2 + \beta_5 \left( x_1 x_2 \right) + u_i\)

  • Exponentials and logs: \(\log(y_i) = \beta_0 + \beta_1 x_1 + \beta_2 e^{x_2} + u_i\)

  • Indicators and thresholds: \(y_i = \beta_0 + \beta_1 x_1 + \beta_2 \, \mathbb{I}(x_1 \geq 100) + u_i\)

Additional topics

Transformation challenge: (literally) infinite possibilities. What do we pick?

Additional topics

\(y_i = \beta_0 + u_i\)

Additional topics

\(y_i = \beta_0 + \beta_1 x + u_i\)

Additional topics

\(y_i = \beta_0 + \beta_1 x + \beta_2 x^2 + u_i\)

Additional topics

\(y_i = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + u_i\)

Additional topics

\(y_i = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 x^4 + u_i\)

Additional topics

\(y_i = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 x^4 + \beta_5 x^5 + u_i\)

Additional topics

Truth: \(y_i = 2 e^{x} + u_i\)

Additional topics

Outliers

Because OLS minimizes the sum of the squared errors, outliers can play a large role in our estimates.

Additional topics

Outliers

Because OLS minimizes the sum of the squared errors, outliers can play a large role in our estimates.

Additional topics

Outliers

Because OLS minimizes the sum of the squared errors, outliers can play a large role in our estimates.

Additional topics

Outliers

Because OLS minimizes the sum of the squared errors, outliers can play a large role in our estimates.

Common responses

  • remove the outliers from the dataset;

  • related: leave-one-out regression to identify influential observations;

  • replace outliers with the 99th percentile of their variable (Windsorize);

  • take the log of the variable to “take care of” outliers.

Another option
Do nothing. Outliers are not always bad. Some people are “far” from the average. It may not make sense to try to change this variation.

Additional topics

Missing data

Similarly, missing data can affect your results.

R doesn’t know how to deal with a missing observation.

#> [1] NA

If you run a regression1 with missing values, R drops the observations missing those values.

If the observations are missing in a nonrandom way, a random sample may end up nonrandom.

Wrapping up

Wrapping up

We’ve refreshed the main ingredients for OLS regression:

  • OLS estimates unknown population parameters using a random sample;
    uncertainty is unavoidable.
  • standard errors, conf. intervals, and hyp. tests help us infer from sample to population (while accounting for sampling variability/uncertainty).
  • omitted variables bias \(\beta\) est. when they affect \(y\) & corr. w/ regressors.
  • coef. interpretation depends on the variable type & model spec.

Wrapping up

Moving forward…

So far, the big message has been

  • OLS is powerful when we satisfy its assumptions.
  • Different assumptions protect different results
    • exogeneity provides for unbiased-ness,
    • disturbance var./cov. affects for efficiency and inference.
  • Residuals \(e_i\) help us learn about unobserved disturbances \(u_i\).

Next What happens when we violate \(u_i\) var. or cov. assumptions?

  • Is OLS biased? What happens to efficiency and inference?
  • What can we do about it?