class: center, middle, inverse, title-slide # Classical Assumptions ## EC 320: Introduction to Econometrics ### Winter 2022 --- class: inverse, middle # Prologue --- # Housekeeping - Problem Set 02 due today by 11:59 pm on Canvas - The solution to the problem set will be released on Wednesday. - Midterm grade appeal until tomorrow. Solution posted tomorrow. No appeals will be addressed after the solution is being posted. --- # Agenda ## Last Week How does OLS estimate a regression line? - Minimize RSS. What are the direct consequences of minimizing RSS? - Residuals sum to zero. - Residuals and the explanatory variable `\(X\)` are uncorrelated. - Mean values of `\(X\)` and `\(Y\)` are on the fitted regression line. Whatever do we mean by *goodness of fit*? - What information does `\(R^2\)` convey? "The proportion of the variance explained by the regression line" --- # Agenda ## Today Under what conditions is OLS *desirable*? - **Desired properties:** Unbiasedness, efficiency, and ability to conduct hypothesis tests. - **Cost:** Six .hi-green[classical assumptions] about the population relationship and the sample. --- # Returns to Schooling __Policy Question:__ How much should the state subsidize higher education? - Could higher education subsidies increase future tax revenue? - Could targeted subsidies reduce income inequality and racial wealth gaps? - Are there positive externalities associated with higher education? -- __Empirical Question:__ What is the monetary return to an additional year of education? - Focuses on the private benefits of education. Not the only important question! - Useful for learning about the econometric assumptions that allow causal interpretation. --- # Returns to Schooling **Step 1:** Write down the population model. `$$\log(\text{Earnings}_i) = \beta_0 + \beta_1\text{Education}_i + u_i$$` -- **Step 2:** Find data. - *Source:* Blackburn and Neumark (1992). -- **Step 3:** Run a regression using OLS. `$$\log(\hat{\text{Earnings}_i}) = \hat{\beta}_0 + \hat{\beta}_1\text{Education}_i$$` --- # Returns to Schooling `\(\log(\hat{\text{Earnings}_i})\)` `\(=\)` .hi-purple[5.97] `\(+\)` .hi-purple[0.06] `\(\times\)` `\(\text{Education}_i\)`. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-1-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling Additional year of school associated with a .hi-purple[6%] increase in earnings. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling `\(R^2\)` `\(=\)` .hi-purple[0.097]. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling Education explains .hi-purple[9.7%] of the variation in wages. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" /> --- # Returns to Schooling What must we __assume__ to interpret `\(\hat{\beta}_1\)` `\(=\)` .hi-purple[0.06] as the return to schooling? <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> --- # Residuals *vs.* Errors The most important assumptions concern the error term `\(u_i\)`. -- **Important:** An error `\(u_i\)` and a residual `\(\hat{u}_i\)` are related, but different. -- .pull-left[ .hi-green[Population] - `\(Y_i = \beta_0 + \beta_1 X_i + u_i\)` - .hi-green[Error:] Difference between the wage of a worker with 16 years of education and the .hi-green[expected wage] with 16 years of education. ] -- .pull-right[ .hi-purple[Sample] - `\(Y_i = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{u_i}\)` - .hi-purple[Residual:] Difference between the wage of a worker with 16 years of education and the .hi-purple[average wage] of workers with 16 years of education. ] --- # Residuals *vs.* Errors A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- # Residuals *vs.* Errors A .hi[residual] tells us how a .hi-slate[worker]'s wages compare to the average wages of workers in the .hi-purple[sample] with the same level of education. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> --- # Residuals *vs.* Errors An .hi-orange[error] tells us how a .hi-slate[worker]'s wages compare to the expected wages of workers in the .hi-green[population] with the same level of education. <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> --- class: inverse, middle # Classical Assumptions --- # Classical Assumptions of OLS - **A1. Linearity:** The population relationship is .hi[linear in parameters] with an additive error term. - **A2. Sample Variation:** There is variation in `\(X\)`. - **A3. Exogeneity:** The `\(X\)` variable is .hi[exogenous] (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`)<sup>.pink[†]</sup> - **A4. Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `\(\mathop{\text{Var}}(u|X) = \sigma^2\)`). - **A5. Non-autocorrelation:** The values of error terms are independent from one another (*i.e.,* `\(E[u_i u_j]=0, \forall i \text{ s.t. } i \neq j\)`).<sup>.pink[†]</sup> - **A6. Normality:** The population error term is normally distributed with mean zero and variance `\(\sigma^2\)` (*i.e.,* `\(u \sim N(0,\sigma^2)\)`) .footnote[ .pink[†] **Random Sampling:** Notice up until now, our underlying data type used for analysis was cross-sectional data. Under random sampling, this yields `\(u_i\)` and `\(u_j\)` independent for any two observations `\(i\)` and `\(j\)`, which is what non-autocorrelation assumption implies. However here I explicitly state **non-autocorrelation** to 1) generalize this case to account for a case we use time series data, 2) to be consistent with the notations from the textbook. Also it could be shown that the errors for different observations are independent conditional on the explanatory variables in the cross-sectional sample under random sampling. ] --- class: inverse, middle # When Can We Trust OLS? --- # Bias An estimator is __biased__ if its expected value is different from the true population parameter. .pull-left[ **Unbiased estimator:** `\(\mathop{\mathbb{E}}\left[ \hat{\beta} \right] = \beta\)` <img src="09-Classical_Assumptions_files/figure-html/unbiased pdf-1.svg" style="display: block; margin: auto;" /> ] -- .pull-right[ **Biased estimator:** `\(\mathop{\mathbb{E}}\left[ \hat{\beta} \right] \neq \beta\)` <img src="09-Classical_Assumptions_files/figure-html/biased pdf-1.svg" style="display: block; margin: auto;" /> ] --- # When is OLS Unbiased? ## Assumptions **A1. Linearity:** The population relationship is .hi[linear in parameters] with an additive error term. **A2. Sample Variation:** There is variation in `\(X\)`. **A3. Exogeneity:** The `\(X\)` variable is .hi[exogenous] (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`). <sup>.pink[†]</sup> .footnote[ .pink[†] Should A3 be true, random sampling should be performed in cross sectional data, meaning that A3 implies random sampling in a sense. ] ## Result OLS is unbiased. --- # Linearity ## Assumption The population relationship is __linear in parameters__ with an additive error term. -- ## Examples - `\(\text{Wage}_i = \beta_0 + \beta_1 \text{Experience}_i + u_i\)` -- - `\(\log(\text{Happiness}_i) = \beta_0 + \beta_1 \log(\text{Money}_i) + u_i\)` -- - `\(\sqrt{\text{Convictions}_i} = \beta_0 + \beta_1 (\text{Early Childhood Lead Exposure})_i + u_i\)` -- - `\(\log(\text{Earnings}_i) = \beta_0 + \beta_1 \text{Education}_i + u_i\)` --- # Linearity ## Assumption The population relationship is __linear in parameters__ with an additive error term. ## Violations - `\(\text{Wage}_i = (\beta_0 + \beta_1 \text{Experience}_i)u_i\)` -- - `\(\text{Consumption}_i = \frac{1}{\beta_0 + \beta_1 \text{Income}_i} + u_i\)` -- - `\(\text{Population}_i = \frac{\beta_0}{1 + e^{\beta_1 + \beta_3 \text{Food}_i}} + u_i\)` -- - `\(\text{Batting Average}_i = \beta_0 (\text{Wheaties Consumption})_i^{\beta_1} + u_i\)` --- # Sample Variation ## Assumption There is variation in `\(X\)`. ## Example <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> --- # Sample Variation ## Assumption There is variation in `\(X\)`. ## Violation <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-10-1.svg" style="display: block; margin: auto;" /> --- # Exogeneity ## Assumption The `\(X\)` variable is __exogenous:__ `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`. - For _any_ value of `\(X\)`, the mean of the error term is zero. .hi[The most important assumption!] -- Really two assumptions bundled into one: 1. On average, the error term is zero: `\(\mathop{\mathbb{E}}\left(u\right) = 0\)`. 2. The mean of the error term is the same for each value of `\(X\)`: `\(\mathop{\mathbb{E}}\left( u|X \right) = \mathop{\mathbb{E}}\left(u\right)\)`. --- # Exogeneity ## Assumption The `\(X\)` variable is __exogenous:__ `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`. - The assignment of `\(X\)` is effectively random. - **Implication:** .hi-purple[no selection bias] and .hi-green[no omitted-variable bias]. -- ## Examples In the labor market, an important component of `\(u\)` is unobserved ability. - `\(\mathop{\mathbb{E}}\left( u|\text{Education} = 12 \right) = 0\)` and `\(\mathop{\mathbb{E}}\left( u|\text{Education} = 20 \right) = 0\)`. - `\(\mathop{\mathbb{E}}\left( u|\text{Experience} = 0 \right) = 0\)` and `\(\mathop{\mathbb{E}}\left( u|\text{Experience} = 40 \right) = 0\)`. - **Do you believe this?** --- layout: false class: white-slide, middle Graphically... --- exclude: true --- class: white-slide Valid exogeneity, _i.e._, `\(\mathop{\mathbb{E}}\left( u \mid X \right) = 0\)` <img src="09-Classical_Assumptions_files/figure-html/ex_good_exog-1.svg" style="display: block; margin: auto;" /> --- class: white-slide Invalid exogeneity, _i.e._, `\(\mathop{\mathbb{E}}\left( u \mid X \right) \neq 0\)` <img src="09-Classical_Assumptions_files/figure-html/ex_bad_exog-1.svg" style="display: block; margin: auto;" /> --- class: inverse, middle # Variance Matters, Too --- # Why Variance Matters Unbiasedness tells us that OLS gets it right, _on average_. - But we can't tell whether our sample is "typical." -- **Variance** tells us how far OLS can deviate from the population parameter. - How tight is OLS centered on its expected value? -- The smaller the variance, the closer OLS gets to the true population parameters _on any sample_. - Given two unbiased estimators, we want the one with smaller variance. --- # OLS Variance To calculate the variance of OLS, we need: 1. The same four assumptions we made for unbiasedness. 2. __Homoskedasticity__ 3. __Non-autocorrelation__ --- # Homoskedasticity ## Assumption The error term has the same variance for each value of the independent variable: `$$\mathop{\text{Var}}(u|X) = \sigma^2.$$` ## Example <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> --- # Homoskedasticity ## Assumption The error term has the same variance for each value of the independent variable: `$$\mathop{\text{Var}}(u|X) = \sigma^2.$$` ## Violation: Heteroskedasticity <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" /> --- count: false # Homoskedasticity ## Assumption The error term has the same variance for each value of the independent variable: `$$\mathop{\text{Var}}(u|X) = \sigma^2.$$` ## Violation: Heteroskedasticity <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" /> --- # Non-Autocorrelation ## Assumption The population covariance between `\(u_i\)` and `\(u_j\)` is zero, meaning that any individual's error term is drawn independently of other error terms. `$$\mathop{\text{Cov}}(u_i, u_j) = E[(u_i - \mu_u)(u_j - \mu_u)]\\ = E[u_i u_j] = E[u_i] E[u_j] = 0, \text{where } i \neq j$$` - .small[This implies no systematic association between error term values for any pair of individuals. The values of the error term should be independent of one another. If this assumption is not satisfied, OLS gives inefficient estimates.] - .small[**Example:** The magnitude and the sign of disturbance term in one observation should not lead to a tendency of determining magnitude and the sign of the disturbance term of the other. ] - .small[**Violation:** Errors that are correlated with time (autocorrelation)] --- # OLS Variance Variance of the slope estimator: `$$\mathop{\text{Var}}(\hat{\beta}_1) = \frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2}.$$` - As the error variance increases, the variance of the slope estimator increases. - As the variation in `\(X\)` increases, the variance of the slope estimator decreases. - Larger sample sizes exhibit more variation in `\(X \implies \mathop{\text{Var}}(\hat{\beta}_1)\)` falls as `\(n\)` rises. --- class: inverse, middle # Gauss-Markov --- # Gauss-Markov Theorem OLS is the __Best Linear Unbiased Estimator__ under assumptions A1-A5: - **A1. Linearity:** The population relationship is .hi[linear in parameters] with an additive error term. - **A2. Sample Variation:** There is variation in `\(X\)`. - **A3. Exogeneity:** The `\(X\)` variable is .hi[exogenous] (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`). - **A4. Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `\(\mathop{\text{Var}}(u|X) = \sigma^2\)`). - **A5. Non-Autocorrelation:** Any pair of error terms are drawn independently of each other (*i.e.,* `\(\mathop{\text{E}}(u_i u_j) = 0 \ \forall \ i \text{ s.t. } i \neq j\)`) --- class: middle # Gauss-Markov Theorem OLS is the __Best Linear Unbiased Estimator (BLUE)__ <img src="09-Classical_Assumptions_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" /> --- class: inverse, middle # Population *vs.* Sample, Revisited --- layout: true # Population *vs.* Sample **Question:** Why do we care about *population vs. sample*? --- -- .pull-left[ <img src="09-Classical_Assumptions_files/figure-html/pop1-1.svg" style="display: block; margin: auto;" /> .center[**Population**] ] -- .pull-right[ <img src="09-Classical_Assumptions_files/figure-html/scatter1-1.svg" style="display: block; margin: auto;" /> .center[**Population relationship**] $$ y_i = 2.53 + 0.57 x_i + u_i $$ $$ y_i = \beta_0 + \beta_1 x_i + u_i $$ ] --- .pull-left[ <img src="09-Classical_Assumptions_files/figure-html/sample1-1.svg" style="display: block; margin: auto;" /> .center[**Sample 1:** 30 random individuals] ] -- .pull-right[ <img src="09-Classical_Assumptions_files/figure-html/sample1 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(y_i = 2.53 + 0.57 x_i + u_i\)` **Sample relationship** <br> `\(\hat{y}_i = 2.36 + 0.61 x_i\)` ] ] --- count: false .pull-left[ <img src="09-Classical_Assumptions_files/figure-html/sample2-1.svg" style="display: block; margin: auto;" /> .center[**Sample 2:** 30 random individuals] ] .pull-right[ <img src="09-Classical_Assumptions_files/figure-html/sample2 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(y_i = 2.53 + 0.57 x_i + u_i\)` **Sample relationship** <br> `\(\hat{y}_i = 2.79 + 0.56 x_i\)` ] ] --- count: false .pull-left[ <img src="09-Classical_Assumptions_files/figure-html/sample3-1.svg" style="display: block; margin: auto;" /> .center[**Sample 3:** 30 random individuals] ] .pull-right[ <img src="09-Classical_Assumptions_files/figure-html/sample3 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(y_i = 2.53 + 0.57 x_i + u_i\)` **Sample relationship** <br> `\(\hat{y}_i = 3.21 + 0.45 x_i\)` ] ] --- layout: false class: white-slide, middle Repeat **10,000 times** (Monte Carlo simulation). --- layout: true class: white-slide --- <img src="09-Classical_Assumptions_files/figure-html/simulation scatter-1.png" style="display: block; margin: auto;" /> --- layout: true # Population *vs.* Sample **Question:** Why do we care about *population vs. sample*? --- .pull-left[ <img src="09-Classical_Assumptions_files/figure-html/simulation scatter2-1.png" style="display: block; margin: auto;" /> ] .pull-right[ - On **average**, the regression lines match the population line nicely. - However, **individual lines** (samples) can miss the mark. - Differences between individual samples and the population create **uncertainty**. ] --- -- **Answer:** Uncertainty matters. `\(\hat{\beta}_0\)` and `\(\hat{\beta}_1\)` are random variables that depend on the random sample. We can't tell if we have a "good" sample (similar to the population) or a "bad sample" (very different than the population). -- Next time, we will leverage all six classical assumptions, including **normality**, to conduct hypothesis tests.