class: center, middle, inverse, title-slide # Classical Assumptions ## EC 320: Introduction to Econometrics ### Philip Economides ### Winter 2022 --- class: inverse, middle # Prologue --- # Housekeeping Survey results in: - Assignments (x2), more room between lab and HW - Lecture slides (x1), upload Sunday nights max -- Updates - .hi-pink[Problem Set 3:] due by Wednesday 11:59pm - This lecture last one relevant to .hi-pink[Midterm exam] - Revise material, have questions ready for .hi-pink[review session] --- # Agenda ## Last Week How does OLS estimate a regression line? - .hi-pink[Minimize RSS]. What are the direct consequences of minimizing RSS? - Residuals sum to zero. - Residuals and the explanatory variable `\(X\)` are uncorrelated. - Mean values of `\(X\)` and `\(Y\)` are on the fitted regression line. Whatever do we mean by *goodness of fit*? - What information does `\(R^2\)` convey? --- # Agenda ## Today Under what conditions is OLS *desirable*? - **Desired properties:** Unbiasedness, efficiency, and ability to conduct hypothesis tests. - **Cost:** Six .hi-green[classical assumptions] about the population relationship and the sample. --- # Returns to Schooling __Policy Question:__ How much should the state subsidize higher education? - Could higher education subsidies increase future tax revenue? - Could targeted subsidies reduce income inequality and racial wealth gaps? - Are there positive externalities associated with higher education? -- __Empirical Question:__ What is the monetary return to an additional year of education? - Focuses on the private benefits of education. Not the only important question! - Useful for learning about the econometric assumptions that allow causal interpretation. --- # Returns to Schooling **Step 1:** Write down the population model. `$$\log(\text{Earnings}_i) = \beta_1 + \beta_2\text{Education}_i + u_i$$` -- **Step 2:** Find data. - *Source:* [Blackburn and Neumark (1992)](https://econpapers.repec.org/article/oupqjecon/v_3a107_3ay_3a1992_3ai_3a4_3ap_3a1421-1436..htm). -- **Step 3:** Run a regression using OLS. `$$\log(\hat{\text{Earnings}_i}) = \hat{\beta}_1 + \hat{\beta}_2\text{Education}_i$$` --- # Returns to Schooling `\(\log(\hat{\text{Earnings}_i})\)` `\(=\)` .hi-purple[5.97] `\(+\)` .hi-purple[0.06] `\(\times\)` `\(\text{Education}_i\)`. <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-1-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling Additional year of school associated with a .hi-purple[6%] increase in earnings. <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling `\(R^2\)` `\(=\)` .hi-purple[0.097]. <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling Education explains .hi-purple[9.7%] of the variation in wages. <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling What must we __assume__ to interpret `\(\hat{\beta}_2\)` `\(=\)` .hi-purple[0.06] as the return to schooling? <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> --- # Residuals *vs.* Errors The most important assumptions concern the error term `\(u_i\)`. -- **Important:** An error `\(u_i\)` and a residual `\(\hat{u}_i\)` are related, but different. -- - .hi-green[Error:] Difference between the wage of a worker with 16 years of education and the .hi-green[expected wage] with 16 years of education. -- - .hi-purple[Residual:] Difference between the wage of a worker with 16 years of education and the .hi-purple[average wage] of workers with 16 years of education. -- - .hi-green[Population] ***vs.*** .hi-purple[sample]**.** --- # Residuals *vs.* Errors A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education. <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- # Residuals *vs.* Errors A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education. <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> --- # Residuals *vs.* Errors An .hi-orange[error] tells us how a .hi-slate[worker]'s wages compare to the expected wages of workers in the .hi-green[population] with the same level of education. <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> --- class: inverse, middle # Classical Assumptions --- # Classical Assumptions of OLS 1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term. -- 2. **Sample Variation:** There is variation in `\(X\)`. -- 3. **Exogeneity:** The `\(X\)` variable is .hi[exogenous] (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`).<sup>.pink[†]</sup> -- 4. **Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `\(\mathop{\text{Var}}(u|X) = \sigma^2\)`). -- 5. **Non-autocorrelation:** The values of error terms have independent distributions (*i.e.,* `\(E[u_i u_j]=0, \forall i \text{ s.t. } i \neq j\)`) -- 6. **Normality:** The population error term is normally distributed with mean zero and variance `\(\sigma^2\)` (*i.e.,* `\(u \sim N(0,\sigma^2)\)`) .footnote[ .pink[†] Implies assumption of **Random Sampling:** We have a random sample from the population of interest. ] --- class: inverse, middle # When Can We Trust OLS? --- # Bias An estimator is __biased__ if its expected value is different from the true population parameter. .pull-left[ **Unbiased estimator:** `\(\mathop{\mathbb{E}}\left[ \hat{\beta} \right] = \beta\)` <img src="08-Classical_Assumptions_files/figure-html/unbiased pdf-1.svg" style="display: block; margin: auto;" /> ] -- .pull-right[ **Biased estimator:** `\(\mathop{\mathbb{E}}\left[ \hat{\beta} \right] \neq \beta\)` <img src="08-Classical_Assumptions_files/figure-html/biased pdf-1.svg" style="display: block; margin: auto;" /> ] --- # When is OLS Unbiased? ## Required Assumptions 1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term. 2. **Sample Variation:** There is variation in `\(X\)`. 3. **Exogeneity:** The `\(X\)` variable is .hi[exogenous] (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`). -- ☛ (3) implies **Random Sampling**. Without, the internal validity of OLS uncompromised, but our external validity becomes uncertain.<sup>.pink[†]</sup> .footnote[ .pink[†] **Internal Validity:** relates to how well a study is conducted (does it satisfy OLS assumptions?).<br> **External Validity:** relates to how applicable the findings are to the real world. ] --- ## Result OLS is unbiased. --- # Linearity (A1.) ## Assumption The population relationship is __linear in parameters__ with an additive error term. -- ## Examples - `\(\text{Wage}_i = \beta_1 + \beta_2 \text{Experience}_i + u_i\)` -- - `\(\log(\text{Happiness}_i) = \beta_1 + \beta_2 \log(\text{Money}_i) + u_i\)` -- - `\(\sqrt{\text{Convictions}_i} = \beta_1 + \beta_2 (\text{Early Childhood Lead Exposure})_i + u_i\)` -- - `\(\log(\text{Earnings}_i) = \beta_1 + \beta_2 \text{Education}_i + u_i\)` --- # Linearity (A1.) ## Assumption The population relationship is __linear in parameters__ with an additive error term. ## Violations - `\(\text{Wage}_i = (\beta_1 + \beta_2 \text{Experience}_i)u_i\)` -- - `\(\text{Consumption}_i = \frac{1}{\beta_1 + \beta_2 \text{Income}_i} + u_i\)` -- - `\(\text{Population}_i = \frac{\beta_1}{1 + e^{\beta_2 + \beta_3 \text{Food}_i}} + u_i\)` -- - `\(\text{Batting Average}_i = \beta_1 (\text{Wheaties Consumption})_i^{\beta_2} + u_i\)` --- # Sample Variation (A2.) ## Assumption There is variation in `\(X\)`. ## Example <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> --- # Sample Variation (A2.) ## Assumption There is variation in `\(X\)`. ## Violation <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-10-1.svg" style="display: block; margin: auto;" /> --- # Exogeneity (A3.) ## Assumption The `\(X\)` variable is __exogenous:__ `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`. - For _any_ value of `\(X\)`, the mean of the error term is zero. .hi[The most important assumption!] -- Really two assumptions bundled into one: 1. On average, the error term is zero: `\(\mathop{\mathbb{E}}\left(u\right) = 0\)`. 2. The mean of the error term is the same for each value of `\(X\)`: `\(\mathop{\mathbb{E}}\left( u|X \right) = \mathop{\mathbb{E}}\left(u\right)\)`. --- # Exogeneity (A3.) ## Assumption The `\(X\)` variable is __exogenous:__ `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`. - The assignment of `\(X\)` is effectively random. - **Implication:** .hi-purple[no selection bias] and .hi-green[no omitted-variable bias]. -- ## Examples In the labor market, an important component of `\(u\)` is unobserved ability. - `\(\mathop{\mathbb{E}}\left( u|\text{Education} = 12 \right) = 0\)` and `\(\mathop{\mathbb{E}}\left( u|\text{Education} = 20 \right) = 0\)`. - `\(\mathop{\mathbb{E}}\left( u|\text{Experience} = 0 \right) = 0\)` and `\(\mathop{\mathbb{E}}\left( u|\text{Experience} = 40 \right) = 0\)`. - **Do you believe this?** --- layout: false class: white-slide, middle Graphically... --- exclude: true --- class: white-slide Valid exogeneity, _i.e._, `\(\mathop{\mathbb{E}}\left( u \mid X \right) = 0\)` <img src="08-Classical_Assumptions_files/figure-html/ex_good_exog-1.svg" style="display: block; margin: auto;" /> --- class: white-slide Invalid exogeneity, _i.e._, `\(\mathop{\mathbb{E}}\left( u \mid X \right) \neq 0\)` <img src="08-Classical_Assumptions_files/figure-html/ex_bad_exog-1.svg" style="display: block; margin: auto;" /> --- class: inverse, middle # Variance Matters, Too --- # Why Variance Matters Unbiasedness tells us that OLS gets it right, _on average_. - But we can't tell whether our sample is "typical." -- **Variance** tells us how far OLS can deviate from the population mean. - How tight is OLS centered on its expected value? - This determines the .hi-pink[efficiency] of our estimator. -- The smaller the variance, the closer OLS gets, **on average**, to the true population parameters _on any sample_. - Given two unbiased estimators, we want the one with smaller variance. - If (A4.) and (A5.) are satisfied as well, we are using the .hi-pink[most efficient] linear estimator. --- # OLS Variance To calculate the variance of OLS, we need: 1. The same four assumptions we made for unbiasedness. 2. __Homoskedasticity.__ 3. __Non-autocorrelation__ --- # Homoskedasticity (A4.) ## Assumption The error term has the same variance for each value of the independent variable: `$$\mathop{\text{Var}}(u|X) = \sigma^2.$$` ## Example <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> --- # Homoskedasticity (A4.) ## Assumption The error term has the same variance for each value of the independent variable: `$$\mathop{\text{Var}}(u|X) = \sigma^2.$$` ## Violation: Heteroskedasticity <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" /> --- count: false # Homoskedasticity (A4.) ## Assumption The error term has the same variance for each value of the independent variable: `$$\mathop{\text{Var}}(u|X) = \sigma^2$$` ## Violation: Heteroskedasticity <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> --- # Non-Autocorrelation ## Assumption Any individual's error term is drawn independently of other error terms. `$$\mathop{\text{Cov}}(u_i, u_j) = E[(u_i - \mu_u)(u_j - \mu_u)]\\ = E[u_i u_j] = E[u_i] E[u_j] = 0, \text{where } i \neq j$$` - This implies no systematic association between error term values for any pair of individuals - In practice, there is always some correlatio in unobservables across individuals (e.g. common correlation in unobservables among individuals within a given US state) - Referred to as .hi-pink[clustering] problem. Standard errors can be adjusted to address --- # OLS Variance Variance of the slope estimator: `$$\mathop{\text{Var}}(\hat{\beta}_2) = \frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2}.$$` - As the error variance increases, the variance of the slope estimator increases. - As the variation in `\(X\)` increases, the variance of the slope estimator decreases. - Larger sample sizes exhibit more variation in `\(X \implies \mathop{\text{Var}}(\hat{\beta}_2)\)` falls as `\(n\)` rises. --- class: inverse, middle # Gauss-Markov --- # Gauss-Markov Theorem OLS is the __Best Linear Unbiased Estimator (BLUE)__ when: 1. **Linearity:** The population relationship is .hi[linear in parameters] with an additive error term. 2. **Sample Variation:** There is variation in `\(X\)`. 3. **Exogeneity:** The `\(X\)` variable is .hi[exogenous] (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`). 4. **Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `\(\mathop{\text{Var}}(u|X) = \sigma^2\)`). 5. **Non-Autocorrelation:** Any pair of error terms are drawn independently of eachother (*i.e.,* `\(\mathop{\text{E}}(u_i u_j) = 0 \ \forall \ i \text{ s.t. } i \neq j\)`) --- class: middle # Gauss-Markov Theorem OLS is the __Best Linear Unbiased Estimator (BLUE)__ <img src="08-Classical_Assumptions_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" /> --- class: inverse, middle # Population *vs.* Sample, Revisited --- layout: true # Population *vs.* Sample **Question:** Why do we care about *population vs. sample*? --- -- .pull-left[ <img src="08-Classical_Assumptions_files/figure-html/pop1-1.svg" style="display: block; margin: auto;" /> .center[**Population**] ] -- .pull-right[ <img src="08-Classical_Assumptions_files/figure-html/scatter1-1.svg" style="display: block; margin: auto;" /> .center[**Population relationship**] $$ y_i = 2.53 + 0.57 x_i + u_i $$ $$ y_i = \beta_1 + \beta_2 x_i + u_i $$ ] --- .pull-left[ <img src="08-Classical_Assumptions_files/figure-html/sample1-1.svg" style="display: block; margin: auto;" /> .center[**Sample 1:** 30 random individuals] ] -- .pull-right[ <img src="08-Classical_Assumptions_files/figure-html/sample1 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(y_i = 2.53 + 0.57 x_i + u_i\)` **Sample relationship** <br> `\(\hat{y}_i = 2.36 + 0.61 x_i\)` ] ] --- count: false .pull-left[ <img src="08-Classical_Assumptions_files/figure-html/sample2-1.svg" style="display: block; margin: auto;" /> .center[**Sample 2:** 30 random individuals] ] .pull-right[ <img src="08-Classical_Assumptions_files/figure-html/sample2 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(y_i = 2.53 + 0.57 x_i + u_i\)` **Sample relationship** <br> `\(\hat{y}_i = 2.79 + 0.56 x_i\)` ] ] --- count: false .pull-left[ <img src="08-Classical_Assumptions_files/figure-html/sample3-1.svg" style="display: block; margin: auto;" /> .center[**Sample 3:** 30 random individuals] ] .pull-right[ <img src="08-Classical_Assumptions_files/figure-html/sample3 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(y_i = 2.53 + 0.57 x_i + u_i\)` **Sample relationship** <br> `\(\hat{y}_i = 3.21 + 0.45 x_i\)` ] ] --- layout: false class: white-slide, middle Repeat **10,000 times** (Monte Carlo simulation). --- layout: true class: white-slide --- <img src="08-Classical_Assumptions_files/figure-html/simulation scatter-1.png" style="display: block; margin: auto;" /> --- layout: true # Population *vs.* Sample **Question:** Why do we care about *population vs. sample*? --- .pull-left[ <img src="08-Classical_Assumptions_files/figure-html/simulation scatter2-1.png" style="display: block; margin: auto;" /> ] .pull-right[ - On **average**, the regression lines match the population line nicely. - However, **individual lines** (samples) can miss the mark. - Differences between individual samples and the population create **uncertainty**. ] --- -- **Answer:** Uncertainty matters. `\(\hat{\beta}_1\)` and `\(\hat{\beta}_2\)` are random variables that depend on the random sample. We can't tell if we have a "good" sample (similar to the population) or a "bad sample" (very different than the population). -- Next time, we will leverage all six classical assumptions, including **normality**, to conduct hypothesis tests. --- exclude: true