class: center, middle, inverse, title-slide # Multiple Linear Regression: Inference ## EC 320: Introduction to Econometrics ### Winter 2022 --- class: inverse, middle # Prologue --- # Housekeeping - Problem Set 03 out - Due next Monday 11:59 p.m. - Lab today & Ex 06 due today - Midterm next Wednesday - Update on lecture09, Classical Assumption slides --- # Review Suppose that an economist studies the effect of years of schooling on hourly earnings by estimating `$$\text{Earnings}_i = \beta_0 + \beta_1 \text{Schooling}_i + u_i,$$` 1. What do we have to assume to interpret `\(\beta_1\)` as the true effect of schooling on earnings? 2. What omitted variables would bias the estimator of `\(\beta_1\)`? 3. For each omitted variable, how would you sign the bias? --- class: inverse, middle # OLS Variances --- # OLS Variances Multiple regression model: `\(Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_{m} X_{mi} + u_i\)`. -- The variance of a slope estimator `\(\hat{\beta_j}\)` on an independent variable `\(X_{j}\)` is `$$\mathop{\text{Var}} \left( \hat{\beta_j} \right) = \dfrac{\sigma^2}{\left( 1 - R^2_j \right)\sum_{i=1}^n \left( X_{ji} - \bar{X}_j \right)^2},$$` where `\(j \in \{1,2,\dots,m\}\)`, `\(R^2_j\)` is the `\(R^2\)` from a regression of `\(X_{j}\)` on the other independent variables and an intercept, and `\(X_{ji}\)` denotes `\(i^\text{th}\)` observation of explanatory variable `\(X_{j}\)`. --- # OLS Variances `$$\mathop{\text{Var}} \left( \hat{\beta_j} \right) = \dfrac{\sigma^2}{\left( 1 - R^2_j \right)\sum_{i=1}^n \left( X_{ji} - \bar{X}_j \right)^2}$$` ## Moving parts 1. **Error variance:** As `\(\sigma^2\)` increases, `\(\mathop{\text{Var}} \left( \hat{\beta_j} \right)\)` increases. -- 2. **Total variation in** `\(X_j\)`**:** As `\(\sum_{i=1}^n \left( X_{ji} - \bar{X}_j \right)^2\)` increases, `\(\mathop{\text{Var}} \left( \hat{\beta_j} \right)\)` decreases. -- 3. **Relationships between independent variables:** As `\(R^2_j\)` increases, `\(\mathop{\text{Var}} \left( \hat{\beta_j} \right)\)` increases. --- # Multicollinearity Suppose that we want to understand the relationship between crime rates and poverty rates in US cities. We could estimate the model `$$\text{Crime}_i = \beta_0 + \beta_1 \text{Poverty}_i + \beta_2 \text{Income}_i + u_i,$$` where `\(\text{Income}_i\)` controls for median income in city `\(i\)`. -- Before obtaining standard errors and conducting hypothesis tests, we need: `$$\mathop{\text{Var}} \left( \hat{\beta}_1 \right) = \dfrac{\sigma^2}{\left( 1 - R^2_1 \right)\sum_{i=1}^n \left( \text{Poverty}_{i} - \overline{\text{Poverty}} \right)^2}.$$` -- `\(R^2_1\)` is the `\(R^2\)` from a regression of poverty on median income: `$$\text{Poverty}_i = \gamma_0 + \gamma_1 \text{Income}_i + v_i.$$` --- # Multicollinearity **Scenario 1:** If `\(\text{Income}_i\)` explains most of the variation in `\(\text{Poverty}_i\)`, then `\(R^2_1\)` will approach one. - If `\(R^2_1\)` is one, then `\(\text{Poverty}_i\)` and `\(\text{Income}_i\)` are perfectly collinear (violates the _no perfect collinearity_ assumption). -- **Scenario 2:** If `\(\text{Income}_i\)` explains none of the variation in `\(\text{Poverty}_i\)`, then `\(R^2_1\)` is zero. -- **Question:** In which scenario is the variance of the poverty coefficient smaller? `$$\mathop{\text{Var}} \left( \hat{\beta}_1 \right) = \dfrac{\sigma^2}{\left( 1 - R^2_1 \right)\sum_{i=1}^n \left( \text{Poverty}_{i} - \overline{\text{Poverty}} \right)^2}$$` -- **Answer:** Scenario 2. --- # Multicollinearity <img src="12-Multiple_Linear_Regression_Inference_files/figure-html/unnamed-chunk-1-1.svg" style="display: block; margin: auto;" /> --- # Multicollinearity <img src="12-Multiple_Linear_Regression_Inference_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # Multicollinearity As the relationships between the variables increase, `\(R^2_j\)` increases. For high `\(R^2_j\)`, `\(\mathop{\text{Var}} \left( \hat{\beta_j} \right)\)` is large: `$$\mathop{\text{Var}} \left( \hat{\beta_j} \right) = \dfrac{\sigma^2}{\left( 1 - R^2_j \right)\sum_{i=1}^n \left( X_{ji} - \bar{X}_j \right)^2}.$$` -- This phenomenon is known as .hi[multicollinearity]. - Some view multicollinearity as a "problem" to be solved. - Can increase `\(n\)` or drop independent variables that are highly related to the others. - .hi[Warning:] Dropping variables can generate omitted variable bias. --- # Multicollinearity **Example:** Effect of different types of school spending on high school graduation rates. $$ `\begin{aligned} \text{Graduation}_i = \beta_0 &+ \beta_1\text{Salaries}_i + \beta_2 \text{Athletics}_i \\ & + \beta_3 \text{Textbooks}_i + \beta_4 \text{Facilities}_i + u_i \end{aligned}` $$ - Schools that spend more on teachers also tend to spend more on athletic programs, textbooks, and building maintenance. - While total spending likely has a statistically significant effect on graduation rates, might not be able to detect statistically significant effects for individual line items. -- **Potential solutions:** Re-define research question to consider the effect of total spending on graduation rates _or_ gather more data to decrease OLS variances (*i.e.*, increase `\(n\)`). --- # Irrelevant Variables Suppose that the true relationship between birth weight and _in utero_ exposure to toxic air pollution is `$$(\text{Birth Weight})_i = \beta_0 + \beta_1 \text{Pollution}_i + u_i.$$` -- Suppose that, instead of estimating the "true model," an analyst estimates `$$(\text{Birth Weight})_i = \tilde{\beta_0} + \tilde{\beta_1} \text{Pollution}_i + \tilde{\beta_2}\text{NBA}_i + u_i,$$` where `\(\text{NBA}_i\)` is the record of the nearest NBA team during the season before birth. -- One can show that `\(\mathop{\mathbb{E}} \left( \hat{\tilde{\beta_1}} \right) = \beta_1\)` (*i.e.*, `\(\hat{\tilde{\beta_1}}\)` is unbiased). However, the variances of `\(\hat{\tilde{\beta_1}}\)` and `\(\hat{\beta_1}\)` differ. --- # Irrelevant Variables <img src="12-Multiple_Linear_Regression_Inference_files/figure-html/venn-1.svg" style="display: block; margin: auto;" /> --- # Irrelevant Variables The variance of `\(\hat{\beta}_1\)` from estimating the "true model" is `$$\mathop{\text{Var}} \left( \hat{\beta_1} \right) = \dfrac{\sigma^2}{\sum_{i=1}^n \left( \text{Pollution}_{i} - \overline{\text{Pollution}} \right)^2}.$$` The variance of `\(\hat{\tilde\beta}_1\)` from estimating the model with the irrelevant variable is `$$\mathop{\text{Var}} \left( \hat{\tilde{\beta_1}} \right) = \dfrac{\sigma^2}{\left( 1 - R^2_1 \right)\sum_{i=1}^n \left( \text{Pollution}_{i} - \overline{\text{Pollution}} \right)^2}.$$` Notice that `\(\mathop{\text{Var}} \left( \hat{\beta_1} \right) \leq \mathop{\text{Var}} \left( \hat{\tilde{\beta_1}} \right)\)`. -- .hi[Including irrelevant control variables can increase OLS variances!] --- # Estimating Error Variance We cannot observe `\(\sigma^2\)`, so we must estimate it using the residuals from an estimated regression: `$$s_u^2 = \dfrac{\sum_{i=1}^n \hat{u}_i^2}{n - k}$$` - `\(k\)` is the number of parameters (one "slope" for each `\(X\)` variable and an intercept). When we have `\(m\)` number of explanatory variables, then `\(k=m+1\)`, as we also have an intercept parameter. - `\(n - k\)` .mono[=] degrees of freedom. - Using the first 5 OLS assumptions, one can prove that `\(s_u^2\)` is an unbiased estimator of `\(\sigma^2\)`. --- # Standard Errors The formula for the standard error is the square root of `\(\mathop{\text{Var}} \left( \hat{\beta_j} \right)\)`: `$$\mathop{\text{SE}}(\hat{\beta_j}) = \sqrt{ \frac{s^2_u}{( 1 - R^2_j ) \sum_{i=1}^n ( X_{ji} - \bar{X}_j )^2} }.$$` --- class: inverse, middle # Inference --- # OLS Classical Assumptions - **A1. Linearity:** The population relationship is linear in parameters with an additive error term. - **A2. No perfect collinearity:** No `\(X\)` variable is a perfect linear combination of the others. <!-- 3. **Random Sampling:** We have a random sample from the population of interest. --> - **A3. Exogeneity:** The `\(X\)` variable is exogenous (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`). - **A4. Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `\(\mathop{\text{Var}}(u|X) = \sigma^2\)`). - **A5. Non-Autocorrelation:** The values of error terms are independent from one another (*i.e.,* `\(E[u_i u_j]=0, \forall i \text{ s.t. } i \neq j\)`) - **A6. Normality:** The population error term is normally distributed with mean zero and variance `\(\sigma^2\)` (*i.e.,* `\(u \sim N(0,\sigma^2)\)`) -- 1-3 imply .hi[unbiasedness.] -- 4-5 imply .hi[efficiency.] --- # Normality With the first five assumptions, normality buys us a __sampling distribution__ for `\(\hat{\beta_j}\)`: - `\(\hat{\beta_j} \sim \mathop{\text{Normal}}\left( \beta_j, \ \mathop{\text{Var}} \left( \hat{\beta_j} \right) \right)\)` - `\(\frac{\hat{\beta_j} - \beta_j}{\sqrt{\mathop{\text{Var}} \left( \hat{\beta_j} \right)}} \sim \mathop{\text{Normal}}(0, 1)\)` -- Common violations: .hi-green[autocorrelation] and .hi-orange[spatially correlated errors]. --- # Sampling Distribution In practice, we can only estimate `\(\sigma^2\)`, so we use the `\(t\)` distribution: - `\(\frac{\hat{\beta_j} - \beta_j}{\mathop{\text{SE}} \left( \hat{\beta_j} \right)} \sim t_{n-k} = t_{\text{df}}\)`. - Use this to construct `\(t\)`-statistics and conduct hypothesis testing. -- Where are the critical values? - Critical values describe specific quantiles of the `\(t_{\text{df}}\)` distribution. - `\(t_{\text{df}}\)` is the entire sampling distribution. --- # Hypothesis Testing **Conduct a one-sided (right tail) test at the 5% level.** ```r lm(read4 ~ lexppp + lunch, data = meap01) %>% tidy() ``` ``` #> # A tibble: 3 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) -14.0 14.2 -0.989 3.23e- 1 *#> 2 lexppp 10.8 1.68 6.45 1.40e- 10 #> 3 lunch -0.463 0.0136 -33.9 5.72e-196 ``` -- H.sub[0]: `\(\beta_\text{Spend} = 0\)` *vs.* H.sub[a]: `\(\beta_\text{Spend} > 0\)` -- `\(t_\text{stat} = 6.45\)` and `\(t_{0.95, \ 1823-3} = 1.65\)` -- Reject H.sub[0] if `\(t_\text{stat} = 6.45 > t_{0.95, \ 1823-3} = 1.65\)`. -- Statement is true, so we .hi[reject H.sub[0]] at the 5% level. --- # Hypothesis Testing **Conduct a one-sided (left tail) test at the 5% level.** ```r lm(read4 ~ lexppp + lunch, data = meap01) %>% tidy() ``` ``` #> # A tibble: 3 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) -14.0 14.2 -0.989 3.23e- 1 *#> 2 lexppp 10.8 1.68 6.45 1.40e- 10 #> 3 lunch -0.463 0.0136 -33.9 5.72e-196 ``` -- H.sub[0]: `\(\beta_\text{Spend} = 0\)` *vs.* H.sub[a]: `\(\beta_\text{Spend} < 0\)` -- `\(t_\text{stat} = 6.45\)` and `\(t_{0.95, \ 1823-3} = 1.65\)` -- Reject H.sub[0] if `\(t_\text{stat} = 6.45 < -t_{0.95, \ 1823-3} = -1.65\)`. -- Statement is false, so we .hi[fail to reject H.sub[0]] at the 5% level. --- # Hypothesis Testing **Conduct a two-sided test at the 5% level.** ```r lm(read4 ~ lexppp + lunch, data = meap01) %>% tidy() ``` ``` #> # A tibble: 3 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) -14.0 14.2 -0.989 3.23e- 1 *#> 2 lexppp 10.8 1.68 6.45 1.40e- 10 #> 3 lunch -0.463 0.0136 -33.9 5.72e-196 ``` -- H.sub[0]: `\(\beta_\text{Spend} = 0\)` *vs.* H.sub[a]: `\(\beta_\text{Spend} \neq 0\)` -- `\(t_\text{stat} = 6.45\)` and `\(t_{0.975, \ 1823-3} = 1.96\)` -- Reject H.sub[0] if `\(|t_\text{stat}| = |6.45| > t_{0.975, \ 1823-3} = 1.96\)`. -- Statement is true, so we .hi[reject H.sub[0]] at the 5% level. --- # Hypothesis Testing **Conduct a two-sided test at the 5% level.** ```r lm(read4 ~ lexppp + lunch, data = meap01) %>% tidy() ``` ``` #> # A tibble: 3 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) -14.0 14.2 -0.989 3.23e- 1 #> 2 lexppp 10.8 1.68 6.45 1.40e- 10 *#> 3 lunch -0.463 0.0136 -33.9 5.72e-196 ``` H.sub[0]: `\(\beta_\text{Lunch} = -1\)` *vs.* H.sub[a]: `\(\beta_\text{Lunch} \neq -1\)` -- `\(t_\text{stat} = \frac{\hat{\beta}_\text{Lunch} - \beta_\text{Lunch}^0}{\mathop{\text{SE}}(\hat{\beta}_\text{Lunch})} = 39.49\)` and `\(t_{0.975, \ 1823-3} = 1.96\)` -- Reject H.sub[0] if `\(|t_\text{stat}| = |39.49| > t_{0.975, \ 1823-3} = 1.96\)`. -- Statement is true, so we .hi[reject H.sub[0]] at the 5% level. --- # *F* Tests .hi-purple[*t* tests] allow us to test simple hypotheses involving a .purple[single parameter]. - _e.g._, `\(\beta_1 = 0\)` or `\(\beta_2 = 1\)`. -- .hi[*F* tests] allow us to test hypotheses that involve .pink[multiple parameters]. - _e.g._, `\(\beta_1 = \beta_2\)` or `\(\beta_3 + \beta_4 = 1\)`. --- # *F* Tests **Example** Economists often say that "money is fungible." We might want to test whether money received as income actually has the same effect on consumption as money received from tax credits. `$$\text{Consumption}_i = \beta_0 + \beta_1 \text{Income}_{i} + \beta_2 \text{Credit}_i + u_i$$` --- # *F* Tests **Example, continued** We can write our null hypothesis as `$$H_0:\: \beta_1 = \beta_2 \iff H_0 :\: \beta_1 - \beta_2 = 0$$` Imposing the null hypothesis gives us a **restricted model** `$$\text{Consumption}_i = \beta_0 + \beta_1 \text{Income}_{i} + \beta_1 \text{Credit}_i + u_i$$` `$$\text{Consumption}_i = \beta_0 + \beta_1 \left( \text{Income}_{i} + \text{Credit}_i \right) + u_i$$` --- # *F* Tests **Example, continued** To test the null hypothesis `\(H_o :\: \beta_1 = \beta_2\)` against `\(H_a :\: \beta_1 \neq \beta_2\)`, <br>we use the `\(F\)` statistic $$ `\begin{align} F_{q,\,n-k} = \dfrac{\left(\text{RSS}_r - \text{RSS}_u\right)/q}{\text{RSS}_u/(n-k)} \end{align}` $$ which (as its name suggests) follows the `\(F\)` distribution with `\(q\)` numerator degrees of freedom and `\(n-k\)` denominator degrees of freedom. Here, `\(q\)` is the number of restrictions we impose via `\(H_0\)`. --- # *F* Tests **Example, continued** The term `\(\text{RSS}_r\)` is the sum of squared residuals (RSS) from our **restricted model** `$$\text{Consumption}_i = \beta_0 + \beta_1 \left( \text{Income}_{i} + \text{Credit}_i \right) + u_i$$` and `\(\text{RSS}_u\)` is the sum of squared residuals (RSS) from our **unrestricted model** `$$\text{Consumption}_i = \beta_0 + \beta_1 \text{Income}_{i} + \beta_2 \text{Credit}_i + u_i$$` --- # *F* Tests Finally, we compare our `\(F\)`-statistic to a critical value of `\(F\)` to test the null hypothesis. If `\(F\)` > `\(F_\text{crit}\)`, then reject the null hypothesis at the `\(\alpha \times 100\)` percent level. - Find `\(F_\text{crit}\)` in a table using the desired significance level, numerator degrees of freedom, and denominator degrees of freedom. -- **Aside:** Why are `\(F\)`-statistics always positive? --- # *F* Tests RSS is usually a large cumbersome number. **Alternative:** Calculate the `\(F\)`-statistic using `\(R^2\)`. $$ `\begin{align} F = \dfrac{\left(R^2_u - R^2_r\right)/q}{ (1 - R^2_u)/(n-k)} \end{align}` $$ -- Where does this come from? - `\(\text{TSS} = \text{RSS} + \text{ESS}\)` - `\(R^2 = \text{ESS}/\text{TSS}\)` - `\(\text{RSS}_r = \text{TSS}(1-R^2_r)\)` - `\(\text{RSS}_u = \text{TSS}(1-R^2_u)\)` --- class: inverse, middle # Application: Hedonic Modeling --- # Hedonic Modeling **Questions** - How much are home buyers willing to pay for houses with additional bedrooms? - How much salary are workers willing to give up in exchange for safer working conditions? - What is the market value of my neighbor's house? -- **Answers?** .hi[Hedonic modeling] is a specific application of multiple regression. - Prices or wages on the left hand side. - Attributes of a good or a job on the right-hand side. - Use coefficient estimates and fitted values. --- # Hedonic Modeling ## Example Using data on home sales, you run a regression and obtain the fitted model `$$\hat{\text{Price}}_i = 75000 + 50 \cdot (\text{Sq. ft.})_i + 16000 \cdot \text{Bedrooms}_i + 10000 \cdot \text{Bathrooms}_i$$` -- What is the forecasted price of a 1000-square-foot house with 1 bedroom and 1 bathroom? -- `$$\hat{\text{Price}} = 75000 + 50 \cdot (1000) + 16000 \cdot (1) + 10000 \cdot (1) = 1.51\times 10^{5}$$` -- A homeowner is thinking about adding 1500 square feet to their home with 3 more bedrooms and an additional bathroom. How much extra money could she expect if she completed the addition and sold her home? -- `$$\Delta\text{Price} = 50 \cdot (1500) + 16000 \cdot (3) + 10000 \cdot (1) = 1.33\times 10^{5}$$`