class: center, middle, inverse, title-slide # Classical Assumptions ## EC 320: Introduction to Econometrics ### Kyle Raze ### Fall 2019 --- class: inverse, middle # Prologue --- # Housekeeping Lab Attendance Problem Set 2 grades Problem Set 3 - Due this Wednesday by 17:00. - For computational problems, submit an HTML document generated by R Markdown. --- # Agenda ## Last Week How does OLS estimate a regression line? - Minimize RSS. What are the direct consequences of minimizing RSS? - Residuals sum to zero. - Residuals and the explanatory variable `\(X\)` are uncorrelated. - Mean values of `\(X\)` and `\(Y\)` are on the fitted regression line. Whatever do we mean by *goodness of fit*? - What information does `\(R^2\)` convey? --- # Agenda ## Today Under what conditions is OLS *desirable*? - **Desired properties:** Unbiasedness, efficiency, and ability to conduct hypothesis tests. - **Cost:** Six .hi-green[classical assumptions] about the population relationship and the sample. --- # Returns to Schooling __Policy Question:__ How much should the state subsidize higher education? - Could higher education subsidies increase future tax revenue? - Could targeted subsidies reduce income inequality and racial wealth gaps? - Are there positive externalities associated with higher education? -- __Empirical Question:__ What is the monetary return to an additional year of education? - Focuses on the private benefits of education. Not the only important question! - Useful for learning about the econometric assumptions that allow causal interpretation. --- # Returns to Schooling **Step 1:** Write down the population model. `$$\log(\text{Earnings}_i) = \beta_1 + \beta_2\text{Education}_i + u_i$$` -- **Step 2:** Find data. - *Source:* Blackburn and Neumark (1992). -- **Step 3:** Run a regression using OLS. `$$\log(\hat{\text{Earnings}_i}) = \hat{\beta}_1 + \hat{\beta}_2\text{Education}_i$$` --- # Returns to Schooling `\(\log(\hat{\text{Earnings}_i})\)` `\(=\)` .hi-purple[5.97] `\(+\)` .hi-purple[0.06] `\(\times\)` `\(\text{Education}_i\)`. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-1-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling Additional year of school associated with a .hi-purple[6%] increase in earnings. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling `\(R^2\)` `\(=\)` .hi-purple[0.097]. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling Education explains .hi-purple[9.7%] of the variation in wages. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling What must we __assume__ to interpret `\(\hat{\beta}_2\)` `\(=\)` .hi-purple[0.06] as the return to schooling? <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> --- # Residuals *vs.* Errors The most important assumptions concern the error term `\(u_i\)`. -- **Important:** An error `\(u_i\)` and a residual `\(\hat{u}_i\)` are related, but different. -- - .hi-green[Error:] Difference between the wage of a worker with 16 years of education and the .hi-green[expected wage] with 16 years of education. -- - .hi-purple[Residual:] Difference between the wage of a worker with 16 years of education and the .hi-purple[average wage] of workers with 16 years of education. -- - .hi-green[Population] ***vs.*** .hi-purple[sample]**.** --- # Residuals *vs.* Errors A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- # Residuals *vs.* Errors A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> --- # Residuals *vs.* Errors An .hi-orange[error] tells us how a .hi-slate[worker]'s wages compare to the expected wages of workers in the .hi-green[population] with the same level of education. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> --- class: inverse, middle # Classical Assumptions --- # Classical Assumptions of OLS 1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term. -- 2. **Sample Variation:** There is variation in `\(X\)`. -- 3. **Random Sampling:** We have a random sample from the population of interest. -- 4. **Exogeneity:** The `\(X\)` variable is .hi[exogenous] (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`). -- 5. **Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `\(\mathop{\text{Var}}(u|X) = \sigma^2\)`). -- 6. **Normality:** The population error term is normally distributed with mean zero and variance `\(\sigma^2\)` (*i.e.,* `\(u \sim N(0,\sigma^2)\)`) --- class: inverse, middle # When Can We Trust OLS? --- # Bias An estimator is __biased__ if its expected value is different from the true population parameter. .pull-left[ **Unbiased estimator:** `\(\mathop{\mathbb{E}}\left[ \hat{\beta} \right] = \beta\)` <img src="09-Classical_Assumptions_files/figure-html/unbiased pdf-1.svg" style="display: block; margin: auto;" /> ] -- .pull-right[ **Biased estimator:** `\(\mathop{\mathbb{E}}\left[ \hat{\beta} \right] \neq \beta\)` <img src="09-Classical_Assumptions_files/figure-html/biased pdf-1.svg" style="display: block; margin: auto;" /> ] --- # When is OLS Unbiased? ## Assumptions 1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term. 2. **Sample Variation:** There is variation in `\(X\)`. 3. **Random Sampling:** We have a random sample from the population of interest. 4. **Exogeneity:** The `\(X\)` variable is .hi[exogenous] (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`). -- ## Result OLS is unbiased. --- # Linearity ## Assumption The population relationship is __linear in parameters__ with an additive error term. -- ## Examples - `\(\text{Wage}_i = \beta_1 + \beta_2 \text{Experience}_i + u_i\)` -- - `\(\log(\text{Happiness}_i) = \beta_1 + \beta_2 \log(\text{Money}_i) + u_i\)` -- - `\(\sqrt{\text{Convictions}_i} = \beta_1 + \beta_2 (\text{Early Childhood Lead Exposure})_i + u_i\)` -- - `\(\log(\text{Earnings}_i) = \beta_1 + \beta_2 \text{Education}_i + u_i\)` --- # Linearity ## Assumption The population relationship is __linear in parameters__ with an additive error term. ## Violations - `\(\text{Wage}_i = (\beta_1 + \beta_2 \text{Experience}_i)u_i\)` -- - `\(\text{Consumption}_i = \frac{1}{\beta_1 + \beta_2 \text{Income}_i} + u_i\)` -- - `\(\text{Population}_i = \frac{\beta_1}{1 + e^{\beta_2 + \beta_3 \text{Food}_i}} + u_i\)` -- - `\(\text{Batting Average}_i = \beta_1 (\text{Wheaties Consumption})_i^{\beta_2} + u_i\)` --- # Sample Variation ## Assumption There is variation in `\(X\)`. ## Example <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> --- # Sample Variation ## Assumption There is variation in `\(X\)`. ## Violation <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-10-1.svg" style="display: block; margin: auto;" /> --- # Random Sampling ## Assumption We have a random sample from the population of interest. ## Examples Random sampling generates many cross-sectional datasets (especially surveys). - Government surveys (*e.g.,* Current Population Survey, American Community Survey). - Scientific surveys (*e.g.,* General Social Survey, American National Election Study). - High-quality political polls (*e.g.,* YouGov, Quinnipiac University, Gallup). --- # Random Sampling ## Assumption We have a random sample from the population of interest. ## Violations - Data collected from non-probability sampling (*e.g.* snowball sampling). - Most (all?) time-series data. - Self-selected samples. --- # Exogeneity ## Assumption The `\(X\)` variable is __exogenous:__ `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`. - For _any_ value of `\(X\)`, the mean of the error term is zero. .hi[The most important assumption!] -- Really two assumptions bundled into one: 1. On average, the error term is zero: `\(\mathop{\mathbb{E}}\left(u\right) = 0\)`. 2. The mean of the error term is the same for each value of `\(X\)`: `\(\mathop{\mathbb{E}}\left( u|X \right) = \mathop{\mathbb{E}}\left(u\right)\)`. --- # Exogeneity ## Assumption The `\(X\)` variable is __exogenous:__ `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`. - The assignment of `\(X\)` is effectively random. - **Implication:** .hi-purple[no selection bias] and .hi-green[no omitted-variable bias]. -- ## Examples In the labor market, an important component of `\(u\)` is unobserved ability. - `\(\mathop{\mathbb{E}}\left( u|\text{Education} = 12 \right) = 0\)` and `\(\mathop{\mathbb{E}}\left( u|\text{Education} = 20 \right) = 0\)`. - `\(\mathop{\mathbb{E}}\left( u|\text{Experience} = 0 \right) = 0\)` and `\(\mathop{\mathbb{E}}\left( u|\text{Experience} = 40 \right) = 0\)`. - **Do you believe this?** --- layout: false class: white-slide, middle Graphically... --- exclude: true --- class: white-slide Valid exogeneity, _i.e._, `\(\mathop{\mathbb{E}}\left( u \mid X \right) = 0\)` <img src="09-Classical_Assumptions_files/figure-html/ex_good_exog-1.svg" style="display: block; margin: auto;" /> --- class: white-slide Invalid exogeneity, _i.e._, `\(\mathop{\mathbb{E}}\left( u \mid X \right) \neq 0\)` <img src="09-Classical_Assumptions_files/figure-html/ex_bad_exog-1.svg" style="display: block; margin: auto;" /> --- class: inverse, middle # Variance Matters, Too --- # Why Variance Matters Unbiasedness tells us that OLS gets it right, _on average_. - But we can't tell whether our sample is "typical." -- **Variance** tells us how far OLS can deviate from the population mean. - How tight is OLS centered on its expected value? -- The smaller the variance, the closer OLS gets to the true population parameters _on any sample_. - Given two unbiased estimators, we want the one with smaller variance. --- # OLS Variance To calculate the variance of OLS, we need: 1. The same four assumptions we made for unbiasedness. 2. __Homoskedasticity.__ --- # Homoskedasticity ## Assumption The error term has the same variance for each value of the independent variable: `$$\mathop{\text{Var}}(u|X) = \sigma^2.$$` ## Example <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> --- # Homoskedasticity ## Assumption The error term has the same variance for each value of the independent variable: `$$\mathop{\text{Var}}(u|X) = \sigma^2.$$` ## Violation: Heteroskedasticity <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" /> --- count: false # Homoskedasticity ## Assumption The error term has the same variance for each value of the independent variable: `$$\mathop{\text{Var}}(u|X) = \sigma^2.$$` ## Violation: Heteroskedasticity <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> --- # OLS Variance Variance of the slope estimator: `$$\mathop{\text{Var}}(\hat{\beta}_2) = \frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2}.$$` - As the error variance increases, the variance of the slope estimator increases. - As the variation in `\(X\)` increases, the variance of the slope estimator decreases. - Larger sample sizes exhibit more variation in `\(X \implies \mathop{\text{Var}}(\hat{\beta}_2)\)` falls as `\(n\)` rises. --- class: inverse, middle # Gauss-Markov --- # Gauss-Markov Theorem OLS is the __Best Linear Unbiased Estimator (BLUE)__ when: 1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term. 2. **Sample Variation:** There is variation in `\(X\)`. 3. **Random Sampling:** We have a random sample from the population of interest. 4. **Exogeneity:** The `\(X\)` variable is .hi[exogenous] (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`). 5. **Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `\(\mathop{\text{Var}}(u|X) = \sigma^2\)`). --- class: middle # Gauss-Markov Theorem OLS is the __Best Linear Unbiased Estimator (BLUE)__ <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" /> --- class: inverse, middle # Population *vs.* Sample, Revisited --- layout: true # Population *vs.* Sample **Question:** Why do we care about *population vs. sample*? --- -- .pull-left[ <img src="09-Classical_Assumptions_files/figure-html/pop1-1.svg" style="display: block; margin: auto;" /> .center[**Population**] ] -- .pull-right[ <img src="09-Classical_Assumptions_files/figure-html/scatter1-1.svg" style="display: block; margin: auto;" /> .center[**Population relationship**] $$ y_i = 2.53 + 0.57 x_i + u_i $$ $$ y_i = \beta_1 + \beta_2 x_i + u_i $$ ] --- .pull-left[ <img src="09-Classical_Assumptions_files/figure-html/sample1-1.svg" style="display: block; margin: auto;" /> .center[**Sample 1:** 30 random individuals] ] -- .pull-right[ <img src="09-Classical_Assumptions_files/figure-html/sample1 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(y_i = 2.53 + 0.57 x_i + u_i\)` **Sample relationship** <br> `\(\hat{y}_i = 1.36 + 0.76 x_i\)` ] ] --- count: false .pull-left[ <img src="09-Classical_Assumptions_files/figure-html/sample2-1.svg" style="display: block; margin: auto;" /> .center[**Sample 2:** 30 random individuals] ] .pull-right[ <img src="09-Classical_Assumptions_files/figure-html/sample2 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(y_i = 2.53 + 0.57 x_i + u_i\)` **Sample relationship** <br> `\(\hat{y}_i = 3.53 + 0.34 x_i\)` ] ] --- count: false .pull-left[ <img src="09-Classical_Assumptions_files/figure-html/sample3-1.svg" style="display: block; margin: auto;" /> .center[**Sample 3:** 30 random individuals] ] .pull-right[ <img src="09-Classical_Assumptions_files/figure-html/sample3 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(y_i = 2.53 + 0.57 x_i + u_i\)` **Sample relationship** <br> `\(\hat{y}_i = 1.44 + 0.86 x_i\)` ] ] --- layout: false class: white-slide, middle Repeat **10,000 times** (Monte Carlo simulation). --- layout: true class: white-slide --- <img src="09-Classical_Assumptions_files/figure-html/simulation scatter-1.png" style="display: block; margin: auto;" /> --- layout: true # Population *vs.* Sample **Question:** Why do we care about *population vs. sample*? --- .pull-left[ <img src="09-Classical_Assumptions_files/figure-html/simulation scatter2-1.png" style="display: block; margin: auto;" /> ] .pull-right[ - On **average**, the regression lines match the population line nicely. - However, **individual lines** (samples) can miss the mark. - Differences between individual samples and the population create **uncertainty**. ] --- -- **Answer:** Uncertainty matters. `\(\hat{\beta}_1\)` and `\(\hat{\beta}_2\)` are random variables that depend on the random sample. We can't tell if we have a "good" sample (similar to the population) or a "bad sample" (very different than the population). -- Next time, we will leverage all six classical assumptions, including **normality**, to conduct hypothesis tests.