class: center, middle, inverse, title-slide .title[ # .b[The Classical Linear Regression Model] ] .subtitle[ ## .b[.green[EC 339]] ] .author[ ### Marcio Santetti ] .date[ ### Fall 2022 ] --- class: inverse, middle # Motivation --- # OLS works, but it needs assumptions <br><br> - The goal when using OLS is to obtain .b[unbiased], .b[efficient], and .b[consistent] estimators. -- - Moreover, we want to be able to do .b[hypothesis testing]. -- - All these properties are made possible through .b[7 assumptions]. -- - This set of assumptions is known as the .b[Classical Linear Regression Model] (CLRM). --- class: inverse, middle # The Classical Assumptions --- # The set of Classical Assumptions <br> **1**. The regression model is .b[linear], .b[correctly specified], and has an .b[additive] stochastic error term. -- **2**. The stochastic error term `\((u_i)\)` has a .b[zero] population mean. -- **3**. All explanatory variables `\((x_i)\)` are .b[uncorrelated] with the error term. -- **4**. Observations of the error term are .b[uncorrelated] with each other. -- **5**. The error term has a .b[constant variance]. -- **6**. No explanatory variable is a .b[perfect linear function] of any other explanatory variable. -- **7**. The error term is .b[normally distributed]. --- # Assumption 1 > "*The regression model is .b[linear], .b[correctly specified], and has an .b[additive] stochastic error term.*" -- - .it[Linear] means linear in .b[parameters] `\((\beta_i)\)`; - .it[Correctly specified] means that it has the correct .b[functional form] and .b[no] omitted variables. - And an .b[additive] error term implies .b[no] other form in which `\(u_i\)` appears in a model. -- <br> - **Examples**: $$ `\begin{align} y_i = \beta_0 \beta_1x_{1i} + \beta_2x_{2i} + u_i \end{align}` $$ $$ `\begin{align} y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i}u_i \end{align}` $$ $$ `\begin{align} y_i = \beta_0 + log(\beta_1)x_{1i} + \beta_2x_{2i} + u_i \end{align}` $$ --- # Assumption 1 One of the main reasons for a .it[violation] of CLRM Assumption I is an .b[incorrectly specified] model. -- - This may happen due to - Incorrect .b[functional form] (data visualization matters!); - .b[Omitted] variables (leading to omitted variables bias). -- <br> A regression's error term may sometimes be a .b[black box]. -- - Recall that any potentially omitted variable(s) lie(s) there! -- Therefore, our models must have a .b[theoretical] motivation. --- # What is bias? An estimator is .b[biased] if its expected value is different from the *true* population parameter. -- When considering our slope coefficients `\((\hat{\beta}_i)\)`, we expect that they, on average, are close to the .b["true"] population parameter, `\(\beta_{pop}\)`. .pull-left[ **Unbiased:** `\(\mathop{\mathbb{E}}\left[ \hat{\beta}_{OLS} \right] = \beta_{pop}\)` <img src="003-clrm_files/figure-html/unnamed-chunk-1-1.svg" style="display: block; margin: auto;" /> ] -- .pull-right[ **Biased:** `\(\mathop{\mathbb{E}}\left[ \hat{\beta}_{OLS} \right] \neq \beta_{pop}\)` <img src="003-clrm_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> ] --- # Assumption 2 > *"The stochastic error term `\((u_i)\)` has a .b[zero] population mean."* -- <br> - Values of the stochastic error term are defined by .b[pure chance]. - It follows a probability .b[distribution] centered around zero. - Also known as the .b[exogeneity] assumption. -- <br> From standard Microeconomic theory, recall: - Factors that influence the .b[demand] for a given good: - Price of the good itself, price of substitutes, preferences... --- # Assumption 2 <br> > *"The stochastic error term `\((u_i)\)` has a .b[zero] population mean."* <br><br> In practice, what is the difference between `\(\mathbb{E}[u \ | \ x] = 0\)` and `\(\mathbb{E}[u \ | \ x] \neq 0\)`? --- # Assumption 3 > *"All explanatory variables `\((x_i)\)` are .b[uncorrelated] with the error term."* -- <br><br> - Observed values of the independent variable are determined .b[independently] of the values contained in the error term - `\(Cor(x_i, u_i) \neq 0 \implies\)` .b[violation] of CLRM Assumption III. - A possible reason: a variable correlated with some `\(x_i\)` being .b[omitted] from the model. --- # Assumption 4 <br><br> > *"Observations of the error term are .b[uncorrelated] with each other."* -- <br> - Also known as .b[autocorrelation]. - Common in .b[time-series] data. - Occurs when the model's disturbances are correlated .b[over time], i.e., `\(Cor(u_t, u_j) \neq 0\)` for `\(t \neq j\)`. --- # Assumption 4 Behavior of `\(u_t\)` over time (positive serial correlation) <img src="003-clrm_files/figure-html/positive auto u-1.svg" style="display: block; margin: auto;" /> --- # Assumption 4 Behavior of `\(u_t\)` over time (negative serial correlation) <img src="003-clrm_files/figure-html/negative auto u-1.svg" style="display: block; margin: auto;" /> --- # Assumption 5 > *"The error term has a .b[constant variance]."* -- <br> - Also known as the .b[homoskedasticity] assumption. - If violated, we have .b[heteroskedasticity]. - Extremely .b[common] in cross-section data sets (also in financial time-series data) -- <br> - This assumption implies that the error term has the .b[same variance] for each value of the independent variable. - `\(Var(u|x) = \sigma^2\)` --- # Assumption 5 - .b[Homoskedastic] residuals: <img src="003-clrm_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" /> --- # Assumption 5 - .b[Heteroskedastic] residuals: <img src="003-clrm_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> --- # Assumption 6 > *"No explanatory variable is a .b[perfect linear function] of any other explanatory variable."* -- <br><br> - Also known as the .b[no perfect multicollinearity] assumption. - Only completely .b[violated] if an independent variable `\(x_i\)` is a .b[deterministic] function of another variable `\(x_j\)`, for `\(i \neq j\)` -- <br> Examples: - `\(x_3 = x_1 - 1,000\)` - `\(x_2 = 50 + x_1\)` --- # Assumption 7 <br><br> > *"The error term is .b[normally distributed]."* -- <br><br> - Summarized by `\(u_i \sim{\mathcal{N}(0, \sigma^2)}\)`. -- <br> OLS .b[still works] without this assumption! -- But crucial for .b[hypothesis testing and inference]. --- layout: false class: inverse, middle # The Gauss-Markov theorem --- # The Gauss-Markov theorem <br><br> From CLRM Assumptions .b[I through VI], we guarantee that OLS is .hi-blue[BLUE]. -- <br><br> We will learn how to deal with the most common .b[violations] of CLRM Assumption after the Midterm exam. --- layout: false class: inverse, middle # Next time: CLRM in practice --- exclude: true