class: center, middle, inverse, title-slide # Simple Linear Regression: Inference ## EC 320: Introduction to Econometrics ### Winter 2022 --- class: inverse, middle # Prologue --- # Housekeeping - Lab today & Ex05 due today - Midterm 1 solution posted - Extra OH 7pm-8pm on Zoom --- # Last Time We discussed the .hi-green[classical assumptions of OLS:] > 1. **Linearity:** The population relationship is linear in parameters with an additive error term. 2. **Sample Variation:** There is variation in `\(X\)`. 3. **Random Sampling:** We have a random sample from the population of interest. 4. **Exogeneity:** The `\(X\)` variable is exogenous (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`). 5. **Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `\(\mathop{\text{Var}}(u|X) = \sigma^2\)`). 6. **Normality:** The population error term is normally distributed with mean zero and variance `\(\sigma^2\)` (*i.e.,* `\(u \sim N(0,\sigma^2)\)`) We restricted our attention to the first 5 assumptions. --- count: false # Last Time We discussed the .hi-green[classical assumptions of OLS:] > 1. **Linearity:** The population relationship is linear in parameters with an additive error term. 2. **Sample Variation:** There is variation in `\(X\)`. 3. **Random Sampling:** We have a random sample from the population of interest. 4. **Exogeneity:** The `\(X\)` variable is exogenous (*i.e.,* `\(\mathop{\mathbb{E}}\left( u|X \right) = 0\)`). 5. **Homoskedasticity:** The error term has the same variance for each value of the independent variable (*i.e.,* `\(\mathop{\text{Var}}(u|X) = \sigma^2\)`). 6. .hi[Normality:] .pink[The population error term is normally distributed with mean zero and variance] `\(\color{#e64173}{\sigma^2}\)` .pink[(*i.e.,*] `\(\color{#e64173}{u \sim N(0,\sigma^2)}\)`.pink[)] We restricted our attention to the first 5 assumptions. --- # Classical Assumptions ## Last Time 1. We used the first 4 assumptions to show that OLS is unbiased: `\(\mathop{\mathbb{E}}\left[ \hat{\beta} \right] = \beta\)` 2. We used the first 5 assumptions to derive a formula for the __variance__ of the OLS estimator: `\(\mathop{\text{Var}}(\hat{\beta}_1) = \frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2}\)`. --- # Classical Assumptions ## Today We will use the sampling distribution of `\(\hat{\beta_1}\)` to conduct hypothesis tests. - Can use all 6 classical assumptions to show that OLS is normally distributed: `$$\hat{\beta}_1 \sim \mathop{N}\left( \beta_1, \frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2} \right)$$` - We'll "prove" this using .mono[R]. --- # Simulation .pull-left[ <img src="10-Simple_Linear_Regression_Inference_files/figure-html/pop1-1.svg" style="display: block; margin: auto;" /> .center[**Population**] ] -- .pull-right[ <img src="10-Simple_Linear_Regression_Inference_files/figure-html/scatter1-1.svg" style="display: block; margin: auto;" /> .center[**Population relationship**] $$ Y_i = 2.53 + 0.57 X_i + u_i $$ $$ Y_i = \beta_0 + \beta_1 X_i + u_i $$ ] --- # Simulation .pull-left[ <img src="10-Simple_Linear_Regression_Inference_files/figure-html/sample1-1.svg" style="display: block; margin: auto;" /> .center[**Sample 1:** 30 random individuals] ] -- .pull-right[ <img src="10-Simple_Linear_Regression_Inference_files/figure-html/sample1 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(Y_i = 2.53 + 0.57 X_i + u_i\)` **Sample relationship** <br> `\(\hat{Y}_i = 2.36 + 0.61 X_i\)` ] ] --- # Simulation .pull-left[ <img src="10-Simple_Linear_Regression_Inference_files/figure-html/sample2-1.svg" style="display: block; margin: auto;" /> .center[**Sample 2:** 30 random individuals] ] .pull-right[ <img src="10-Simple_Linear_Regression_Inference_files/figure-html/sample2 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(Y_i = 2.53 + 0.57 Y_i + u_i\)` **Sample relationship** <br> `\(\hat{Y}_i = 2.79 + 0.56 X_i\)` ] ] --- # Simulation .pull-left[ <img src="10-Simple_Linear_Regression_Inference_files/figure-html/sample3-1.svg" style="display: block; margin: auto;" /> .center[**Sample 3:** 30 random individuals] ] .pull-right[ <img src="10-Simple_Linear_Regression_Inference_files/figure-html/sample3 scatter-1.svg" style="display: block; margin: auto;" /> .center[ **Population relationship** <br> `\(Y_i = 2.53 + 0.57 X_i + u_i\)` **Sample relationship** <br> `\(\hat{Y}_i = 3.21 + 0.45 X_i\)` ] ] --- layout: false class: white-slide, middle Repeat **10,000 times** (Monte Carlo simulation). --- class: white-slide <img src="10-Simple_Linear_Regression_Inference_files/figure-html/simulation scatter-1.png" style="display: block; margin: auto;" /> --- class: white-slide, middle .pull-left[ .center[ **Intercept Estimates** ] <img src="10-Simple_Linear_Regression_Inference_files/figure-html/simulation hist1-1.png" style="display: block; margin: auto;" /> ] .pull-right[ .center[ **Slope Estimates** ] <img src="10-Simple_Linear_Regression_Inference_files/figure-html/simulation hist2-1.png" style="display: block; margin: auto;" /> ] --- # Simulation Can you spot the classical assumptions? ```r # Set population and sample sizes n_p <- 100 n_s <- 30 # Generate population data pop_df <- tibble( x = rnorm(n_p, mean = 5, sd = 1.5), e = rnorm(n_p, mean = 0, sd = 1), y = 2.53 + 0.57 * x + e ) # Define simulation procedure sim_ols <- function(x, size = n_s) { lm(y ~ x, data = pop_df %>% sample_n(size = size)) %>% tidy() %>% mutate(iteration = x) } # Run simulation sim_df <- map_df(1:10000, ~sim_ols(.x, size = n_s)) ``` --- count: false # Simulation Can you spot the classical assumptions? ```r # Set population and sample sizes *n_p <- 100 *n_s <- 30 # Generate population data pop_df <- tibble( x = rnorm(n_p, mean = 5, sd = 1.5), e = rnorm(n_p, mean = 0, sd = 1), y = 2.53 + 0.57 * x + e ) # Define simulation procedure sim_ols <- function(x, size = n_s) { lm(y ~ x, data = pop_df %>% sample_n(size = size)) %>% tidy() %>% mutate(iteration = x) } # Run simulation sim_df <- map_df(1:10000, ~sim_ols(.x, size = n_s)) ``` --- count: false # Simulation Can you spot the classical assumptions? ```r # Set population and sample sizes n_p <- 100 n_s <- 30 # Generate population data pop_df <- tibble( * x = rnorm(n_p, mean = 5, sd = 1.5), e = rnorm(n_p, mean = 0, sd = 1), y = 2.53 + 0.57 * x + e ) # Define simulation procedure sim_ols <- function(x, size = n_s) { lm(y ~ x, data = pop_df %>% sample_n(size = size)) %>% tidy() %>% mutate(iteration = x) } # Run simulation sim_df <- map_df(1:10000, ~sim_ols(.x, size = n_s)) ``` --- count: false # Simulation Can you spot the classical assumptions? ```r # Set population and sample sizes n_p <- 100 n_s <- 30 # Generate population data pop_df <- tibble( x = rnorm(n_p, mean = 5, sd = 1.5), * e = rnorm(n_p, mean = 0, sd = 1), y = 2.53 + 0.57 * x + e ) # Define simulation procedure sim_ols <- function(x, size = n_s) { lm(y ~ x, data = pop_df %>% sample_n(size = size)) %>% tidy() %>% mutate(iteration = x) } # Run simulation sim_df <- map_df(1:10000, ~sim_ols(.x, size = n_s)) ``` --- count: false # Simulation Can you spot the classical assumptions? ```r # Set population and sample sizes n_p <- 100 n_s <- 30 # Generate population data pop_df <- tibble( x = rnorm(n_p, mean = 5, sd = 1.5), e = rnorm(n_p, mean = 0, sd = 1), * y = 2.53 + 0.57 * x + e ) # Define simulation procedure sim_ols <- function(x, size = n_s) { lm(y ~ x, data = pop_df %>% sample_n(size = size)) %>% tidy() %>% mutate(iteration = x) } # Run simulation sim_df <- map_df(1:10000, ~sim_ols(.x, size = n_s)) ``` --- count: false # Simulation Can you spot the classical assumptions? ```r # Set population and sample sizes n_p <- 100 n_s <- 30 # Generate population data pop_df <- tibble( x = rnorm(n_p, mean = 5, sd = 1.5), e = rnorm(n_p, mean = 0, sd = 1), y = 2.53 + 0.57 * x + e ) # Define simulation procedure *sim_ols <- function(x, size = n_s) { * lm(y ~ x, data = pop_df %>% sample_n(size = size)) %>% * tidy() %>% * mutate(iteration = x) *} # Run simulation sim_df <- map_df(1:10000, ~sim_ols(.x, size = n_s)) ``` --- count: false # Simulation Can you spot the classical assumptions? ```r # Set population and sample sizes n_p <- 100 n_s <- 30 # Generate population data pop_df <- tibble( x = rnorm(n_p, mean = 5, sd = 1.5), e = rnorm(n_p, mean = 0, sd = 1), y = 2.53 + 0.57 * x + e ) # Define simulation procedure sim_ols <- function(x, size = n_s) { lm(y ~ x, data = pop_df %>% sample_n(size = size)) %>% tidy() %>% mutate(iteration = x) } # Run simulation *sim_df <- map_df(1:10000, ~sim_ols(.x, size = n_s)) ``` --- class: inverse, middle # Inference --- # Motivation What does statistical evidence say about existing theories? We want to test hypotheses posed by politicians, economists, scientists, people with foil hats, _etc._ - Does building a giant wall **reduce crime**? - Does shutting down a government **adversely affect the economy**? - Does legal cannabis **reduce drunk driving** or **reduce opioid use**? - Do air quality standards **improve health** or **reduce jobs**? -- While uncertainty exists, we can still conduct *reliable* statistical tests (rejecting or failing to reject a hypothesis). --- # Inference We know OLS has some nice properties, and we know how to estimate an intercept and slope coefficient using OLS. Our current workflow: - Get data (points with `\(X\)` and `\(Y\)` values). - Regress `\(Y\)` on `\(X\)`. - Plot the fitted values (*i.e.*, `\(\hat{Y_i} = \hat{\beta}_0 + \hat{\beta}_1X_i\)`) and report the estimates. -- But how do we actually **learn** something from this exercise? -- - Based upon our value of `\(\hat{\beta}_1\)`, can we rule out previously hypothesized values? - How confident should we be in the precision of our estimates? -- We need to be able to deal with uncertainty. Enter: **Inference.** --- # Inference We use the standard error of `\(\hat{\beta}_1\)`, along with `\(\hat{\beta}_1\)` itself, to learn about the parameter `\(\beta_1\)`. After deriving the distribution of `\(\hat{\beta}_1\)`,<sup>.pink[†]</sup> we have two (related) options for formal statistical inference (learning) about our unknown parameter `\(\beta_1\)`: - **Hypothesis tests:** Determine whether there is statistically significant evidence to reject a hypothesized value or range of values. - **Confidence intervals:** Use the estimate and its standard error to create an interval that, when repeated, will generally<sup>.pink[††]</sup> contain the true parameter. .footnote[ .pink[†] *Hint:* It's normal with mean `\(\beta_1\)` and variance `\(\frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2}\)`. <br> .pink[††] _E.g._, similarly constructed 95% confidence intervals will contain the true parameter 95% of the time. ] --- # OLS Variance Hypothesis tests and confidence intervals require information about the variance of the OLS estimator: `$$\mathop{\text{Var}}(\hat{\beta}_1) = \frac{\sigma^2}{\sum_{i=1}^n (X_i - \bar{X})^2}.$$` -- **Problem** - The variance formula has a population parameter: `\(\sigma^2\)` (a.k.a. error variance). - We can't observe population parameters. -- - **Solution:** Estimate `\(\sigma^2\)`. --- # Estimating Error Variance ## Learning from our (prediction) errors We can estimate the variance of `\(u_i\)` (a.k.a. `\(\sigma^2\)`) using the sum of squared residuals: $$ s^2_u = \dfrac{\sum_i \hat{u}_i^2}{n - k} $$ where `\(k\)` gives the number of regression parameters. - In a simple linear regression, `\(k=2\)`. - `\(s^2_u\)` is an unbiased estimator of `\(\sigma^2\)`. --- # OLS Variance, Take 2 With `\(s^2_u = \dfrac{\sum_i \hat{u}_i^2}{n - k}\)`, we can calculate `$$\mathop{\text{Var}}(\hat{\beta}_1) = \frac{s^2_u}{\sum_{i=1}^n (X_i - \bar{X})^2}.$$` -- Taking the square root, we get the __standard error__ of the OLS estimator: `$$\mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right) = \sqrt{ \frac{s^2_u}{\sum_{i=1}^n (X_i - \bar{X})^2} }.$$` - Standard error .mono[=] standard deviation of an estimator. --- # Standard Errors .mono[R]'s `lm()` function estimates standard errors out of the box: ```r tidy(lm(y ~ x, pop_df)) ``` ``` #> # A tibble: 2 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) 2.53 0.422 6.00 3.38e- 8 #> 2 x 0.567 0.0793 7.15 1.59e-10 ``` I won't ask you to estimate standard errors by hand! --- class: inverse, middle # Hypothesis Tests --- # Hypothesis Tests __Null hypothesis (H.sub[0]):__ `\(\beta_1 = 0\)` __Alternative hypothesis (H.sub[a]):__ `\(\beta_1 \neq 0\)` -- There are four possible outcomes of our test: 1. We __fail to reject__ the null hypothesis and the null is true. 2. We __reject__ the null hypothesis and the null is false. 3. We __reject__ the null hypothesis, but the null is actually true (**Type I error**). 4. We __fail to reject__ the null hypothesis, but the null is actually false (**Type II error**). --- # Hypothesis Tests **Goal:** Make a statement about `\(\beta_1\)` using information on `\(\hat{\beta}_1\)`. -- `\(\hat{\beta}_1\)` is random: it could be anything, even if `\(\beta_1 = 0\)` is true. - But if `\(\beta_1 = 0\)` is true, then `\(\hat{\beta}_1\)` is unlikely to take values far from zero. - As the standard error shrinks, we are even less likely to observe "extreme" values of `\(\hat{\beta}_1\)` (assuming `\(\beta_1 = 0\)`). -- Our test should take .pink[extreme values] of `\(\hat{\beta}_1\)` as .pink[evidence against the null hypothesis], but it should also weight them by what we know about the variance of `\(\hat{\beta}_1\)`. --- # Hypothesis Tests .pull-left[ __Null hypothesis__ H.sub[0]: `\(\beta_1 = 0\)` ] .pull-right[ __Alternative hypothesis__ H.sub[a]: `\(\beta_1 \neq 0\)` ] <br> <br> To conduct the test, we calculate a `\(t\)`-statistic: `$$t = \frac{\hat{\beta}_1 - \beta_1^0}{\mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right)}$$` - Distributed according to a `\(t\)`-distribution with `\(n-2\)` _degrees of freedom_. - `\(\beta_1^0\)` is the value of `\(\beta_1\)` in our null hypothesis (*e.g.,* `\(\beta_1^0 = 0\)`). --- # Hypothesis Tests Next, we use the `\(\color{#708090}{t}\)`.hi-slate[-statistic] to calculate a `\(\color{#007935}{p}\)`.hi-green[-value]. <img src="10-Simple_Linear_Regression_Inference_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> Describes the probability of seeing a `\(t\)`-statistic as extreme as the one we observe _if the null hypothesis is actually true_. -- But...we still need some benchmark to compare our `\(p\)`-value against. --- # Hypothesis Tests We worry mostly about false positives, so we conduct hypothesis tests based on the probability of making a Type I error. **How?** We select a .hi[significance level] `\(\color{#e64173}{\alpha}\)` that specifies our tolerance for false positives. This is the probability of Type I error we choose to live with. <img src="10-Simple_Linear_Regression_Inference_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> --- # Hypothesis Tests We then compare `\(\alpha\)` to the `\(p\)`-value of our test. - If the `\(p\)`-value is less than `\(\alpha\)`, then we __reject the null hypothesis__ at the `\(\alpha\cdot100\)` percent level. - If the `\(p\)`-value is greater than `\(\alpha\)`, then we __fail to reject the null hypothesis__. - **Note:** _Fail to reject_ `\(\neq\)` _accept_. --- # Hypothesis Tests **Example:** Are campus police associated with campus crime? ```r lm(crime ~ police, data = campus) %>% tidy() ``` ``` #> # A tibble: 2 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) 18.4 2.38 7.75 1.06e-11 *#> 2 police 1.76 1.30 1.35 1.81e- 1 ``` H.sub[0]: `\(\beta_\text{Police} = 0\)` *v.s.* H.sub[a]: `\(\beta_\text{Police} \neq 0\)` -- Significance level: `\(\alpha = 0.05\)` (*i.e.,* 5 percent test) -- Test Condition: Reject H.sub[0] if `\(p < \alpha\)` -- `\(p = 0.18\)`. **Do we reject the null hypothesis?** --- # Hypothesis Tests `\(p\)`-values are difficult to calculate by hand. __Alternative:__ Compare `\(\color{#708090}{t}\)`.hi-slate[-statistic] to .hi-purple[critical values] from the `\(t\)`-distribution. <img src="10-Simple_Linear_Regression_Inference_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> --- # Hypothesis Tests **Notation:** `\(t_{1-\alpha/2, n-2}\)` or `\(t_\text{crit}\)`. - Find in a `\(t\)` table using the significance level `\(\alpha\)` and `\(n-2\)` degrees of freedom. Compare the the critical value to your `\(t\)`-statistic: - If `\(|t| > |t_{1-\alpha/2, n-2}|\)`, then __reject the null__. - If `\(|t| < |t_{1-\alpha/2, n-2}|\)`, then __fail to reject the null__. --- # Two-Sided Tests Based on a critical value of `\(t_{1-\alpha/2, n-2} = t_{0.975, 100} =\)` 1.98, we can identify a .hi[rejection region] on the `\(t\)`-distribution. <img src="10-Simple_Linear_Regression_Inference_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" /> -- If our `\(t\)` statistic is in the rejection region, then we reject the null hypothesis at the 5 percent level. --- # Two-Sided Tests .mono[R] defaults to testing hypotheses against the null hypothesis of zero. ```r lm(y ~ x, data = pop_df) %>% tidy() ``` ``` #> # A tibble: 2 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) 2.53 0.422 6.00 3.38e- 8 *#> 2 x 0.567 0.0793 7.15 1.59e-10 ``` -- H.sub[0]: `\(\beta_1 = 0\)` *vs.* H.sub[a]: `\(\beta_1 \neq 0\)` -- Significance level: `\(\alpha = 0.05\)` (*i.e.,* 5 percent test) -- `\(t_\text{stat} = 7.15\)` and `\(t_\text{0.975, 28} = 2.05\)` -- , which implies that `\(p < 0.05\)`. -- Therefore, we .hi[reject H.sub[0]] at the 5% level. --- # Two-Sided Tests **Example:** Are campus police associated with campus crime? ```r lm(crime ~ police, data = campus) %>% tidy() ``` ``` #> # A tibble: 2 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) 18.4 2.38 7.75 1.06e-11 *#> 2 police 1.76 1.30 1.35 1.81e- 1 ``` H.sub[0]: `\(\beta_\text{Police} = 0\)` *v.s.* H.sub[a]: `\(\beta_\text{Police} \neq 0\)` -- Significance level: `\(\alpha = 0.1\)` (*i.e.,* 10 percent test) -- Test Condition: Reject H.sub[0] if `\(|t| > t_\text{crit}\)` -- `\(t = 1.35\)` and `\(t_\text{crit} = 1.66\)`. **Do we reject the null hypothesis?** --- # One-Sided Tests Sometimes we are confident that a parameter is non-negative or non-positive. A __one-sided__ test assumes that values on one side of the null hypothesis are impossible. - __Option 1:__ H.sub[0]: `\(\beta_1 = 0\)` *vs.* H.sub[a]: `\(\beta_1 > 0\)` - __Option 2:__ H.sub[0]: `\(\beta_1 = 0\)` *vs.* H.sub[a]: `\(\beta_1 < 0\)` -- If this assumption is reasonable, then our rejection region changes. - Same `\(\alpha\)`. --- # One-Sided Tests __Left-tailed:__ Based on a critical value of `\(t_{1-\alpha, n-2} = t_{0.95, 100} =\)` 1.66, we can identify a .hi[rejection region] on the `\(t\)`-distribution. <img src="10-Simple_Linear_Regression_Inference_files/figure-html/unnamed-chunk-14-1.svg" style="display: block; margin: auto;" /> -- If our `\(t\)` statistic is in the rejection region, then we reject the null hypothesis at the 5 percent level. --- # One-Sided Tests __Right-tailed:__ Based on a critical value of `\(t_{1-\alpha, n-2} = t_{0.95, 100} =\)` 1.66, we can identify a .hi[rejection region] on the `\(t\)`-distribution. <img src="10-Simple_Linear_Regression_Inference_files/figure-html/unnamed-chunk-15-1.svg" style="display: block; margin: auto;" /> -- If our `\(t\)` statistic is in the rejection region, then we reject the null hypothesis at the 5 percent level. --- # One-Sided Tests **Example:** Do campus police deter campus crime? ```r lm(crime ~ police, data = campus) %>% tidy() ``` ``` #> # A tibble: 2 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) 18.4 2.38 7.75 1.06e-11 *#> 2 police 1.76 1.30 1.35 1.81e- 1 ``` H.sub[0]: `\(\beta_\text{Police} = 0\)` *v.s.* H.sub[a]: `\(\beta_\text{Police} < 0\)` -- Significance level: `\(\alpha = 0.1\)` (*i.e.,* 10 percent test) -- Test Condition: Reject H.sub[0] if `\(t < -t_\text{crit}\)` -- `\(t = 1.35\)` and `\(t_\text{crit} = 1.29\)`. **Do we reject the null hypothesis?** --- class: inverse, middle # Confidence Intervals --- # Confidence Intervals Until now, we have considered __point estimates__ of population parameters. - Sometimes a range of values is more interesting/honest. -- We can construct `\((1-\alpha)\cdot100\)`-percent level confidence intervals for `\(\beta_1\)` `$$\hat{\beta}_1 \pm t_{1-\alpha/2, n-2} \, \mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right)$$` -- `\(t_{1-\alpha/2,n-2}\)` denotes the `\(1-\alpha/2\)` quantile of a `\(t\)` distribution with `\(n-2\)` degrees of freedom. --- # Confidence Intervals **Q:** Where does the confidence interval formula come from? -- **A:** The confidence interval formula comes from the rejection condition of a two-sided test. > Reject H.sub[0] if `\(|t| > t_\text{crit}\)` -- The test condition implies > Fail to reject H.sub[0] if `\(|t| \leq t_\text{crit}\)` which is equivalent to > Fail to reject H.sub[0] if `\(-t_\text{crit} \leq t \leq t_\text{crit}\)`. --- # Confidence Intervals Replacing `\(t\)` with its formula gives > Fail to reject H.sub[0] if `\(-t_\text{crit} \leq \frac{\hat{\beta}_1 - \beta_1^0}{\mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right)} \leq t_\text{crit}\)`. -- Standard errors are always positive, so the inequalities do not flip when we multiply by `\(\mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right)\)`: > Fail to reject H.sub[0] if `\(-t_\text{crit} \mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right) \leq \hat{\beta}_1 - \beta_1^0\leq t_\text{crit} \mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right)\)`. -- Subtracting `\(\hat{\beta}_1\)` yields > Fail to reject H.sub[0] if `\(-\hat{\beta}_1 -t_\text{crit} \mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right) \leq - \beta_1^0 \leq - \hat{\beta}_1 + t_\text{crit} \mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right)\)`. --- # Confidence Intervals Multiplying by -1 and rearranging gives > Fail to reject H.sub[0] if <br> `\(\hat{\beta}_1 - t_\text{crit} \mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right) \leq \beta_1^0 \leq \hat{\beta}_1 + t_\text{crit} \mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right)\)`. -- Replacing `\(\beta_1^0\)` with `\(\beta_1\)` and dropping the test condition yields the interval `$$\hat{\beta}_1 - t_\text{crit} \mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right) \leq \beta_1 \leq \hat{\beta}_1 + t_\text{crit} \mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right)$$` which is equivalent to `$$\hat{\beta}_1 \pm t_\text{crit} \, \mathop{\hat{\text{SE}}} \left( \hat{\beta}_1 \right).$$` --- # Confidence Intervals **Insight:** A confidence interval is related to a two-sided hypothesis test. - If a 95 percent confidence interval contains zero, then we fail to reject the null hypothesis at the 5 percent level. - If a 95 percent confidence interval does not contain zero, then we reject the null hypothesis at the 5 percent level. - **Generally:** A `\((1- \alpha) \cdot 100\)` percent confidence interval embeds a two-sided test at the `\(\alpha \cdot 100\)` level. --- # Confidence Intervals ## Example ```r lm(y ~ x, data = pop_df) %>% tidy() ``` ``` #> # A tibble: 2 × 5 #> term estimate std.error statistic p.value #> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) 2.53 0.422 6.00 3.38e- 8 *#> 2 x 0.567 0.0793 7.15 1.59e-10 ``` -- ```r # find degrees of freedom dof <- summary(lm(y ~ x, data = pop_df))$df[2] # return critical value qt(0.975, dof) ``` ``` #> [1] 1.984467 ``` -- **95% confidence interval** for `\(\beta_1\)` is `\(0.567 \pm 1.98 \times 0.0793 = \left[ 0.410,\, 0.724 \right]\)` --- # Confidence Intervals We have a confidence interval for `\(\beta_1\)`, *i.e.,* `\(\left[ 0.410,\, 0.724 \right]\)`. .hi[What does it mean?] -- **Informally:** The confidence interval gives us a region (interval) in which we can place some trust (confidence) for containing the parameter. -- **More formally:** If we repeatedly sample from our population and construct confidence intervals for each of these samples, then `\((1-\alpha) \cdot100\)` percent of our intervals (*e.g.,* 95%) will contain the population parameter *somewhere in the interval*. -- Now back to our simulation... --- # Confidence Intervals We drew 10,000 samples (each of size `\(n = 30\)`) from our population and estimated our regression model for each sample: $$ Y_i = \hat{\beta}_1 + \hat{\beta}_1 X_i + \hat{u}_i $$ <center>(repeated 10,000 times)</center> Now, let's estimate 95% confidence intervals for each of these intervals... --- # Confidence Intervals **From our previous simulation:** 97.9% of 95% confidence intervals contain the true parameter value of `\(\beta_1\)`. <img src="10-Simple_Linear_Regression_Inference_files/figure-html/simulation ci-1.svg" style="display: block; margin: auto;" /> --- # Confidence Intervals ## Example: Association of police with crime You can instruct `tidy` to return a 95 percent confidence interval for the association of campus police with campus crime: ```r lm(crime ~ police, data = campus) %>% tidy(conf.int = TRUE, conf.level = 0.95) ``` ``` #> # A tibble: 2 × 7 #> term estimate std.error statistic p.value conf.low conf.high #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 (Intercept) 18.4 2.38 7.75 1.06e-11 13.7 23.1 *#> 2 police 1.76 1.30 1.35 1.81e- 1 -0.830 4.34 ``` --- # Confidence Intervals ## Example: Association of police with crime <img src="10-Simple_Linear_Regression_Inference_files/figure-html/unnamed-chunk-19-1.svg" style="display: block; margin: auto;" /> Four confidence intervals for the same coefficient.