class: center, middle, inverse, title-slide # ECON 3818 ## Chapter 18 ### Kyle Butts ### 27 September 2021 --- class: clear, middle <!-- Custom css --> <style type="text/css"> /* ------------------------------------------------------- * * !! This file was generated by xaringanthemer !! * * Changes made to this file directly will be overwritten * if you used xaringanthemer in your xaringan slides Rmd * ------------------------------------------------------- */ @import url(https://fonts.googleapis.com/css?family=Roboto&display=swap); @import url(https://fonts.googleapis.com/css?family=Roboto&display=swap); @import url(https://fonts.googleapis.com/css?family=Source+Code+Pro:400,700&display=swap); @import url(https://fonts.googleapis.com/css2?family=Atkinson+Hyperlegible&display=swap); :root { /* Fonts */ --text-font-family: 'Atkinson Hyperelegible'; --text-font-is-google: 1; --text-font-family-fallback: Roboto, -apple-system, BlinkMacSystemFont, avenir next, avenir, helvetica neue, helvetica, Ubuntu, roboto, noto, segoe ui, arial; --text-font-base: sans-serif; --header-font-family: 'Atkinson Hyperelegible' --header-font-is-google: 1; --header-font-family-fallback: Georgia, serif; --code-font-family: 'Source Code Pro'; --code-font-is-google: 1; --base-font-size: 20px; --text-font-size: 1rem; --code-font-size: 0.9rem; --code-inline-font-size: 1em; --header-h1-font-size: 1.75rem; --header-h2-font-size: 1.6rem; --header-h3-font-size: 1.5rem; /* Colors */ --text-color: #131516; --text-color-light: #555F61; --header-color: #FFF; --background-color: #FFF; --link-color: #107895; --code-highlight-color: rgba(255,255,0,0.5); --inverse-text-color: #d6d6d6; --inverse-background-color: #272822; --inverse-header-color: #f3f3f3; --inverse-link-color: #107895; --title-slide-background-color: #272822; --title-slide-text-color: #d6d6d6; --header-background-color: #FFF; --header-background-text-color: #FFF; } html { font-size: var(--base-font-size); } body { font-family: var(--text-font-family), var(--text-font-family-fallback), var(--text-font-base); font-weight: normal; color: var(--text-color); } h1, h2, h3 { font-family: var(--header-font-family), var(--header-font-family-fallback); color: var(--text-color-light); } .remark-slide-content { background-color: var(--background-color); font-size: 1rem; padding: 24px 32px 16px 32px; width: 100%; height: 100%; } .remark-slide-content h1 { font-size: var(--header-h1-font-size); } .remark-slide-content h2 { font-size: var(--header-h2-font-size); } .remark-slide-content h3 { font-size: var(--header-h3-font-size); } .remark-code, .remark-inline-code { font-family: var(--code-font-family), Menlo, Consolas, Monaco, Liberation Mono, Lucida Console, monospace; } .remark-code { font-size: var(--code-font-size); } .remark-inline-code { font-size: var(--code-inline-font-size); color: #000; } .remark-slide-number { color: #107895; opacity: 1; font-size: 0.9em; } a, a > code { color: var(--link-color); text-decoration: none; } .footnote { position: absolute; bottom: 60px; padding-right: 6em; font-size: 0.9em; } .remark-code-line-highlighted { background-color: var(--code-highlight-color); } .inverse { background-color: var(--inverse-background-color); color: var(--inverse-text-color); } .inverse h1, .inverse h2, .inverse h3 { color: var(--inverse-header-color); } .inverse a, .inverse a > code { color: var(--inverse-link-color); } img, video, iframe { max-width: 100%; } blockquote { border-left: solid 5px lightgray; padding-left: 1em; } @page { margin: 0; } @media print { .remark-slide-scaler { width: 100% !important; height: 100% !important; transform: scale(1) !important; top: 0 !important; left: 0 !important; } } /* Modified metropolis */ .clear{ border-top: 0px solid #FAFAFA; } h1 { margin-top: -5px; margin-left: -00px; margin-bottom: 30px; color: var(--text-color-light); font-weight: 200; } h2, h3, h4 { padding-top: -15px; padding-bottom: 00px; color: #1A292C; text-shadow: none; font-weight: 400; text-align: left; margin-left: 00px; margin-bottom: -10px; } .title-slide .inverse .remark-slide-content { background-color: #FAFAFA; } .title-slide { background-color: #FAFAFA; border-top: 80px solid #FAFAFA; } .title-slide h1 { color: var(--text-color); font-size: 40px; text-shadow: none; font-weight: 400; text-align: left; margin-left: 15px; } .title-slide h2 { margin-top: -15px; color: var(--link-color); text-shadow: none; font-weight: 300; font-size: 35px; text-align: left; margin-left: 15px; } .title-slide h3 { color: var(--text-color-light); text-shadow: none; font-weight: 300; font-size: 25px; text-align: left; margin-left: 15px; margin-bottom: 0px; } .title-slide h3:last-of-type { font-style: italic; font-size: 1rem; } /* Remove orange line */ hr, .title-slide h2::after, .mline h1::after { content: ''; display: block; border: none; background-color: #e5e5e5; color: #e5e5e5; height: 1px; } hr, .mline h1::after { margin: 1em 15px 0 15px; } .title-slide h2::after { margin: 10px 15px 35px 0; } .mline h1::after { margin: 10px 15px 0 15px; } /* turns off slide numbers for title page: https://github.com/gnab/remark/issues/298 */ .title-slide .remark-slide-number { display: none; } /* Custom CSS */ /* More line spacing */ body { line-height: 1.5; } /* Font styling */ .hi { font-weight: 600; } .mono { font-family: monospace; } .ul { text-decoration: underline; } .ol { text-decoration: overline; } .st { text-decoration: line-through; } .bf { font-weight: bold; } .it { font-style: italic; } /* Font Sizes */ .bigger { font-size: 125%; } .huge{ font-size: 150%; } .small { font-size: 95%; } .smaller { font-size: 85%; } .smallest { font-size: 75%; } .tiny { font-size: 50%; } /* Remark customization */ .clear .remark-slide-number { display: none; } .inverse .remark-slide-number { display: none; } .remark-code-line-highlighted { background-color: rgba(249, 39, 114, 0.5); } /* Xaringan tweeks */ .inverse { background-color: #23373B; text-shadow: 0 0 20px #333; /* text-shadow: none; */ } .title-slide { background-color: #ffffff; border-top: 80px solid #ffffff; } .footnote { bottom: 1em; font-size: 80%; color: #7f7f7f; } /* Lists */ li { margin-top: 4px; } /* Mono-spaced font, smaller */ .mono-small { font-family: monospace; font-size: 16px; } .mono-small .mjx-chtml { font-size: 103% !important; } .pseudocode, .pseudocode-small { font-family: monospace; background: #f8f8f8; border-radius: 3px; padding: 10px; padding-top: 0px; padding-bottom: 0px; } .pseudocode-small { font-size: 16px; } .remark-code { font-size: 68%; } .remark-inline-code { background: #F5F5F5; /* lighter */ /* background: #e7e8e2; /* darker */ border-radius: 3px; padding: 4px; } /* Super and Subscripts */ .super{ vertical-align: super; font-size: 70%; line-height: 1%; } .sub{ vertical-align: sub; font-size: 70%; line-height: 1%; } /* Subheader */ .subheader{ font-weight: 100; font-style: italic; display: block; margin-top: -25px; margin-bottom: 25px; } /* 2/3 left; 1/3 right */ .more-left { float: left; width: 63%; } .less-right { float: right; width: 31%; } .more-right ~ * { clear: both; } /* 9/10 left; 1/10 right */ .left90 { padding-top: 0.7em; float: left; width: 85%; } .right10 { padding-top: 0.7em; float: right; width: 9%; } /* 95% left; 5% right */ .left95 { padding-top: 0.7em; float: left; width: 91%; } .right05 { padding-top: 0.7em; float: right; width: 5%; } .left5 { padding-top: 0.7em; margin-left: 0em; margin-right: -0.4em; float: left; width: 7%; } .left10 { padding-top: 0.7em; margin-left: -0.2em; margin-right: -0.5em; float: left; width: 10%; } .left30 { padding-top: 0.7em; float: left; width: 30%; } .right30 { padding-top: 0.7em; float: right; width: 30%; } .thin-left { padding-top: 0.7em; margin-left: -1em; margin-right: -0.5em; float: left; width: 27.5%; } /* Example */ .ex { font-weight: 300; color: #555F61 !important; font-style: italic; } .col-left { float: left; width: 47%; margin-top: -1em; } .col-right { float: right; width: 47%; margin-top: -1em; } .clear-up { clear: both; margin-top: -1em; } /* Format tables */ table { color: #000000; font-size: 14pt; line-height: 100%; border-top: 1px solid #ffffff !important; border-bottom: 1px solid #ffffff !important; } th, td { background-color: #ffffff; } table th { font-weight: 400; } /* Attention */ .attn { font-weight: 500; color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Note */ .note { font-weight: 300; font-style: italic; color: #314f4f !important; /* color: #cccccc !important; */ font-family: 'Zilla Slab' !important; } /* Question and answer */ .qa { font-weight: 500; /* color: #314f4f !important; */ color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Figure Caption */ .caption { font-size: 0.8888889em; line-height: 1.5; margin-top: 1em; color: #6b7280; } </style> <!-- From xaringancolor --> <div style = "position:fixed; visibility: hidden"> $$ \require{color} \definecolor{purple}{rgb}{0.337254901960784, 0.00392156862745098, 0.643137254901961} \definecolor{navy}{rgb}{0.0509803921568627, 0.23921568627451, 0.337254901960784} \definecolor{ruby}{rgb}{0.603921568627451, 0.145098039215686, 0.0823529411764706} \definecolor{alice}{rgb}{0.0627450980392157, 0.470588235294118, 0.584313725490196} \definecolor{daisy}{rgb}{0.92156862745098, 0.788235294117647, 0.266666666666667} \definecolor{coral}{rgb}{0.949019607843137, 0.427450980392157, 0.129411764705882} \definecolor{kelly}{rgb}{0.509803921568627, 0.576470588235294, 0.337254901960784} \definecolor{jet}{rgb}{0.0745098039215686, 0.0823529411764706, 0.0862745098039216} \definecolor{asher}{rgb}{0.333333333333333, 0.372549019607843, 0.380392156862745} \definecolor{slate}{rgb}{0.192156862745098, 0.309803921568627, 0.309803921568627} \definecolor{cranberry}{rgb}{0.901960784313726, 0.254901960784314, 0.450980392156863} $$ </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { purple: ["{\\color{purple}{#1}}", 1], navy: ["{\\color{navy}{#1}}", 1], ruby: ["{\\color{ruby}{#1}}", 1], alice: ["{\\color{alice}{#1}}", 1], daisy: ["{\\color{daisy}{#1}}", 1], coral: ["{\\color{coral}{#1}}", 1], kelly: ["{\\color{kelly}{#1}}", 1], jet: ["{\\color{jet}{#1}}", 1], asher: ["{\\color{asher}{#1}}", 1], slate: ["{\\color{slate}{#1}}", 1], cranberry: ["{\\color{cranberry}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .purple {color: #5601A4;} .navy {color: #0D3D56;} .ruby {color: #9A2515;} .alice {color: #107895;} .daisy {color: #EBC944;} .coral {color: #F26D21;} .kelly {color: #829356;} .jet {color: #131516;} .asher {color: #555F61;} .slate {color: #314F4F;} .cranberry {color: #E64173;} </style> ## Chapter 18: Inference in Practice --- # Making Inferences So far we have discussed two ways to make inferences about the parameter using our estimate - Confidence intervals - Hypothesis testing --- # Cautions about Confidence Intervals Important to note that the .hi.daisy[margin of error] doesn't cover all errors - Address only the randomness due to grabbing a *random* sample - Does not address issues such as undercoverage, nonresponse, etc. --- # Choosing Sample for Confidence Intervals A researcher can determine the number of observations required in the sample in order to achieve a desired margin of error. `$$\daisy{m} = z^* \frac{\sigma}{\sqrt{n}} \implies n= \left( \frac{z^*\sigma}{m} \right)^2$$` where `\(\daisy{m}\)` is the desired margin of error, and `\(z^*\)` is the z-score associated with the confidence interval level --- # Example Say we are recording tip size of patrons when a waiter writes a message on the receipt. We know `\(\sigma=2\)`. We want to estimate the mean percentage tip `\(\mu\)` for patrons who receive the message within `\(\pm 0.5\)` with 90% confidence. How many patrons must we observe? In other words we want `\(\daisy{m} = 0.5\)`: -- <br/> $$ n = \left( \frac{z^*\sigma}{m} \right)^2 \implies n = \left(\frac{1.645\cdot 2}{0.5} \right)^2 = 43.3 $$ --- # Cautions about Hypothesis Testing These tests of significance depend on: - The alternative hypothesis (left-tail, rigth-tail, two-tail) - The sample size, `\(n\)` - The level of significant, `\(\alpha\)` --- # Planning for Hypothesis Testing How do we choose `\(\alpha\)`? Our choice of level of significance, `\(\alpha\)`, depends on whether we REALLY want not wrongly reject `\(H_0\)` or if we REALLY don't want to fail to reject `\(H_0\)` - .ex[Example:] Are you NASA trying to land someone on the moon? small `\(\alpha\)`!!! - .ex[Example:] Are you a business trying to figure out if an A/B test on your website went well? can have a larger `\(\alpha\)` --- # Types of Error In any statistical test there are four possible outcomes:
\(H_0\) true
\(H_a\) true
Reject \(H_0\)
Type I Error
Correct
Fail to Reject \(H_0\)
Correct
Type II Error
--- # Type I Error .subheader.coral[False Positive] .hi.coral[Type I Error]: We reject `\(H_0\)`, even though `\(H_0\)` is true - False-positive on a covid test - `\(H_0\)`: You do not have covid Denote the probability of a type I error as `\(\alpha\)` <br/> Since our null hypothesis is .it[typically] that there is no effect, a type I error .it[typically] says there is an effect when in reality there is not --- class: clear,middle <img src="data:image/png;base64,#ch18_files/figure-html/type-1-ex1-1.svg" style="display: block; margin: auto;" /> --- # Type II Error .subheader.ruby[False Negative] .hi.ruby[Type II Error]: We fail to reject `\(H_0\)`, even though `\(H_0\)` is false - False-negative on covid test - `\(H_0\)`: You do not have covid Denote the probability of type II error as `\(\beta\)` <br/> Since our null hypothesis is .it[typically] that there is no effect, a type II error .it[typically] says there is not an effect when in reality there is something different going on --- class: clear,middle <img src="data:image/png;base64,#ch18_files/figure-html/type-2-ex1-1.svg" style="display: block; margin: auto;" /> --- class: clear,middle <img src="data:image/png;base64,#ch18_files/figure-html/type-2-ex2-1.svg" style="display: block; margin: auto;" /> --- class: clear,middle <img src="data:image/png;base64,#ch18_files/figure-html/type-2-ex3-1.svg" style="display: block; margin: auto;" /> --- # How to remember > When the boy cried wolf, the village committed Type I and Type II errors, in that order <br/> There is no wolf - Village rejects correct null (Type I) - Village incorrectly fails to reject false null (Type II) --- # Clicker Question Suppose we have the following hypothesis test: - `\(H_0\)`: Taking multivitamins does not impact your running speed - `\(H_1\)`: Taking multivitamins .it[will increase] your running speed If we make the claim "Taking vitamins in the morning will increase your running speed" and it is not true, we have committed a: <ol type = "a"> <li>Type I error</li> <li>Type II error</li> </ol> --- # Errors in Hypothesis Testing How do these errors happen? - Our conclusions are based on sample data and probabilities - p-value tells us probability of observing it. The p-value is $ >0$ so it is possible to observe it - We do not have enough information (sample size) - We do not choose to be very rigorous ( `\(\alpha\)` ) In particular we control - Type I error is determined by the significance of the test `\(\alpha\)` - Type II error depends on the .hi[true distribution] when the null is false - However, we can mitigate it by increasing the sample size --- # Improving power by increasing sample size <img src="data:image/png;base64,#ch18_files/figure-html/unnamed-chunk-1-1.svg" style="display: block; margin: auto;" /> --- # Improving power by increasing sample size <img src="data:image/png;base64,#ch18_files/figure-html/type-2-ex4-1.svg" style="display: block; margin: auto;" /> --- # Size of a Test Now that we've defined Type I error, lets define size: The .hi.coral[size] of a test, `\(\coral{\alpha}\)`, is the probability of making a Type I error. Given a null hypothesis `\(H_0: \theta = \theta_0\)`; a test statistic `\(\hat{\theta}\)`; and a rejection region R, The size is: `$$\coral{\alpha} = P(\coral{\text{Type I Error}})=P(\hat{\theta}\in R \ \vert \ \theta=\theta_0)$$` --- # Calculating the Size of a Test How do we actually calculate `\(\alpha\)`? Let's suppose we have `\(n=16\)` and `\(\sigma=1\)`, and we want to test `\(H_0\)`: `\(\mu=3\)` vs. `\(H_a\)`: `\(\mu > 3\)`. Given a rejection region of `\(R=\{ \bar{X} \ \vert \ \bar{X} > 3.41 \}\)`, what is `\(\alpha\)`? -- <br/> $$ \alpha = P( \hat{\theta} \in R \ \vert \ \theta = \theta_0 ) = P\left(\bar{X} > 3.41 \ \vert \ {\mu=3}\right) $$ $$ = P\left(\frac{\bar{X} - \mu}{\sigma / \sqrt{n}} > \frac{3.41 - 3}{1/\sqrt{16}}\right) = Pr(Z > 1.64) = 0.05 $$ --- # Choosing Size Note that we have to pick *either* the rejection region or the size - We generally pick a size and calculate the rejection region based off that size - Because the size is the probability of a rejecting a true null, by choosing `\(\alpha\)` we are choosing how much we are willing to risk .it[incorrectly] rejecting the null hypothesis - Higher `\(\alpha\)` will mean more of the sample statistics are in the rejection region, meaning a higher risk of rejecting the null even though it's true --- # Power of a Test While size deals with Type I Errors, power deals with Type II. The .hi.ruby[power] is the probability of correctly rejecting a false null, or 1 - P(Type II Error) $$ \ruby{\text{Power}} = 1 - P(\ruby{\text{Type II Error}} ) $$ Intuitively, power is the likelihood of detecting a false null using your test statistic. --- # Power and Probability of Type II Errors A .ruby[Type II] error is the probability of failing to reject a false null `$$P(\ruby{\text{Type II}})=P(\bar{X} \notin R \ \vert \ \mu = \mu_A)$$` The .hi.ruby[power] is the probability of correctly rejecting a false null $$ \ruby{\text{Power}} =P(\bar{X} \in R \ \vert \ \mu = \mu_A) $$ <br/> You can think of power as the probability of .it[not making a .ruby[Type II] error] You can calculate power by doing `\(1 - P(\ruby{\text{Type II}})\)` or by calculating the power directly. --- # Power <img src="data:image/png;base64,#ch18_files/figure-html/power-ex1-1.svg" style="display: block; margin: auto;" /> --- # Calculating the Power of a Test Back to previous example, where `\(n=16\)`, `\(\sigma=1\)`, and `\(R=\{ \bar{X} \ \vert \ \bar{X} > 3.41 \}\)`. And we are testing `\(H_0: \mu=3\)` vs. `\(H_1: \mu>3\)`: Power can be calculated in two ways: $$ \ruby{\text{Power}} = P(\text{reject } H_0 \ \vert \ \mu_0 = \mu^* ) = P(\bar{X} \in R \ \vert \ H_0 \text{ false}) $$ $$ \ruby{\text{Power}} = 1 - P(\ruby{\text{type II Error}}) = 1 - P(\bar{X} \notin R \ \vert \ H_0 \text{ false}) $$ --- # Calculating the Power of a Test In order to calculate the power of a test, we must assume a specific true mean, `\(\mu_A\)`. For example, what is the power of the test if the true mean is `\(\mu_A = 4\)`? $$ \ruby{\text{Power}} = P(\bar{X} \in R \ \vert \ \mu = \mu_A = 4) $$ -- $$ P(\bar{X} > 3.41 \ \vert \ \mu=4) = P(Z > \frac{3.41-4}{1/\sqrt{16}})= 0.9908 $$ --- # Calculating the Power of a Test We can also calculate the power of a test by subtracting the probability of making a Type II error (\( \beta \) from 1. $$ \ruby{\beta} = P(\bar{X} < 3.41 \ \vert \ \mu = \mu_A = 4) $$ -- $$ \implies P(Z < \frac{3.41-4}{1/\sqrt{16}}) = 0.0092 $$ Meaning the power of the test is: $$ \ruby{\text{Power}} = 1 - \ruby{\beta} = 1-.0092 = .9908 $$ There is a 99.1% chance that in *repeated sampling* we reject the null that `\(\mu = 3\)` if the true mean is equal to 4. --- # Group Question Assume `\(X\sim N(\mu,5^2)\)`. From a sample size of `\(n=100\)`, we wish to test the following at the `\(\alpha=0.05\)` level $$ H_0: \mu=3 $$ $$ H_1: \mu>3 $$ What is the power of your test if `\(\mu = \mu_A = 4?\)` <ol type = "a"> <li> 0.85</li> <li> 0.15</li> <li> 0.64</li> <li> 0.36</li> </ol> --- # Interpreting Power Power is the probability of correctly rejecting a false null hypothesis - Can be thought of as our ability to identify a true value from an alternative <br/> In general, the power is a function of the true value.sup[*] - It changes as we try out different possible true values <div class="footnote"> <span class="sup">*</span> Must specify a specific true \(\mu\) in order to calculate power </div> --- # Power <img src="data:image/png;base64,#ch18_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" /> --- # Visualizing Underpowered Estimates .subheader.alice[Imprecise Estimates] <img src="data:image/png;base64,#ch18_files/figure-html/power-ex2-1.svg" style="display: block; margin: auto;" /> --- # Visualizing Underpowered Estimates .subheader.alice[Small Relative Differences] <img src="data:image/png;base64,#ch18_files/figure-html/power-ex3-1.svg" style="display: block; margin: auto;" /> --- # Spotting Underpowered Estimates How can we avoid underpowered estimates? There are two main root causes: Imprecise estimates - Low precision/high variance - Large standard errors interpreted as "no effect" Small relative differences between `\(\theta_0\)` and `\(\theta_A\)` - Precise estimates can detect small relative differences - Imprecise estimates require large relative differences to detect the truth. Watch for imprecise estimates! They are often interpreted as a true result when really they are underpowered. --- # Example .subheader.alice[Underpowered Estimates] Suppose from 10 observations you estimate that raising the minimum wage by 1% would lead to only a 0.1% decline in employment on average with a standard deviation of 6%. Can you reject the null that employment wouldn't decrease at the 5% significance level? $$ p\text{-value} = Pr(\bar{X} < -0.1 \ \vert \ \mu=0) = Pr\left(\frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}} < \frac{-0.1 - 0}{6/\sqrt{10}}\right) $$ $$ = Pr(Z < -0.053) = 0.479 $$ Since `\(p\text{-value} \nleqslant \alpha\)`, we conclude there is not enough evidence to say that average employment reduction is not 0% (no effect of minimum wage). --- # Example .subheader.alice[Underpowered Estimates] Great news! Raising the minimum wage has no statistically discernible effect on employment, right? Well.. hold on... If there is an effect on employment our statistic may be too underpowered to detect it. Let's calculate the power of this test.... --- # Example .subheader.alice[Underpowered Estimates] Calculate power by `\(P(\bar{X} \in R \ \vert \ \mu_0 = -0.5)\)` This means we must first calculate the rejection region If `\(\alpha=.05\)`, then the rejection region is `\(R = \{ \bar{X} \ \vert \ \bar{X} < -3.12 \}\)`. --- # Example .subheader.alice[Underpowered Estimates] Let's assume a reasonable negative impact on employment of 0.5%. (So we're assuming the true `\(\mu=-0.5\)`). Then the power is: $$ P(\text{Reject }H_0 \ \vert \ \mu=-0.5) $$ $$ P(\text{Reject }H_0 \ \vert \ \mu=-0.5) $$ $$ P(\text{Reject }H_0 \ \vert \ \mu=-0.5) = P(Z < -1.38) \approx 0.0836 $$ Our power to detect a measurable effect is a measly 8.4%!