class: center, middle, inverse, title-slide # ECON 3818 ## Chapter 22 ### Kyle Butts ### 01 September 2021 --- class: clear, middle <!-- Custom css --> <style type="text/css"> /* ------------------------------------------------------- * * !! This file was generated by xaringanthemer !! * * Changes made to this file directly will be overwritten * if you used xaringanthemer in your xaringan slides Rmd * ------------------------------------------------------- */ @import url(https://fonts.googleapis.com/css?family=Roboto&display=swap); @import url(https://fonts.googleapis.com/css?family=Roboto&display=swap); @import url(https://fonts.googleapis.com/css?family=Source+Code+Pro:400,700&display=swap); @import url(https://fonts.googleapis.com/css2?family=Atkinson+Hyperlegible&display=swap); :root { /* Fonts */ --text-font-family: 'Atkinson Hyperelegible'; --text-font-is-google: 1; --text-font-family-fallback: Roboto, -apple-system, BlinkMacSystemFont, avenir next, avenir, helvetica neue, helvetica, Ubuntu, roboto, noto, segoe ui, arial; --text-font-base: sans-serif; --header-font-family: 'Atkinson Hyperelegible' --header-font-is-google: 1; --header-font-family-fallback: Georgia, serif; --code-font-family: 'Source Code Pro'; --code-font-is-google: 1; --base-font-size: 20px; --text-font-size: 1rem; --code-font-size: 0.9rem; --code-inline-font-size: 1em; --header-h1-font-size: 1.75rem; --header-h2-font-size: 1.6rem; --header-h3-font-size: 1.5rem; /* Colors */ --text-color: #131516; --text-color-light: #555F61; --header-color: #FFF; --background-color: #FFF; --link-color: #107895; --code-highlight-color: rgba(255,255,0,0.5); --inverse-text-color: #d6d6d6; --inverse-background-color: #272822; --inverse-header-color: #f3f3f3; --inverse-link-color: #107895; --title-slide-background-color: #272822; --title-slide-text-color: #d6d6d6; --header-background-color: #FFF; --header-background-text-color: #FFF; } html { font-size: var(--base-font-size); } body { font-family: var(--text-font-family), var(--text-font-family-fallback), var(--text-font-base); font-weight: normal; color: var(--text-color); } h1, h2, h3 { font-family: var(--header-font-family), var(--header-font-family-fallback); color: var(--text-color-light); } .remark-slide-content { background-color: var(--background-color); font-size: 1rem; padding: 24px 32px 16px 32px; width: 100%; height: 100%; } .remark-slide-content h1 { font-size: var(--header-h1-font-size); } .remark-slide-content h2 { font-size: var(--header-h2-font-size); } .remark-slide-content h3 { font-size: var(--header-h3-font-size); } .remark-code, .remark-inline-code { font-family: var(--code-font-family), Menlo, Consolas, Monaco, Liberation Mono, Lucida Console, monospace; } .remark-code { font-size: var(--code-font-size); } .remark-inline-code { font-size: var(--code-inline-font-size); color: #000; } .remark-slide-number { color: #107895; opacity: 1; font-size: 0.9em; } a, a > code { color: var(--link-color); text-decoration: none; } .footnote { position: absolute; bottom: 60px; padding-right: 6em; font-size: 0.9em; } .remark-code-line-highlighted { background-color: var(--code-highlight-color); } .inverse { background-color: var(--inverse-background-color); color: var(--inverse-text-color); } .inverse h1, .inverse h2, .inverse h3 { color: var(--inverse-header-color); } .inverse a, .inverse a > code { color: var(--inverse-link-color); } img, video, iframe { max-width: 100%; } blockquote { border-left: solid 5px lightgray; padding-left: 1em; } @page { margin: 0; } @media print { .remark-slide-scaler { width: 100% !important; height: 100% !important; transform: scale(1) !important; top: 0 !important; left: 0 !important; } } /* Modified metropolis */ .clear{ border-top: 0px solid #FAFAFA; } h1 { margin-top: -5px; margin-left: -00px; margin-bottom: 30px; color: var(--text-color-light); font-weight: 200; } h2, h3, h4 { padding-top: -15px; padding-bottom: 00px; color: #1A292C; text-shadow: none; font-weight: 400; text-align: left; margin-left: 00px; margin-bottom: -10px; } .title-slide .inverse .remark-slide-content { background-color: #FAFAFA; } .title-slide { background-color: #FAFAFA; border-top: 80px solid #FAFAFA; } .title-slide h1 { color: var(--text-color); font-size: 40px; text-shadow: none; font-weight: 400; text-align: left; margin-left: 15px; } .title-slide h2 { margin-top: -15px; color: var(--link-color); text-shadow: none; font-weight: 300; font-size: 35px; text-align: left; margin-left: 15px; } .title-slide h3 { color: var(--text-color-light); text-shadow: none; font-weight: 300; font-size: 25px; text-align: left; margin-left: 15px; margin-bottom: 0px; } .title-slide h3:last-of-type { font-style: italic; font-size: 1rem; } /* Remove orange line */ hr, .title-slide h2::after, .mline h1::after { content: ''; display: block; border: none; background-color: #e5e5e5; color: #e5e5e5; height: 1px; } hr, .mline h1::after { margin: 1em 15px 0 15px; } .title-slide h2::after { margin: 10px 15px 35px 0; } .mline h1::after { margin: 10px 15px 0 15px; } /* turns off slide numbers for title page: https://github.com/gnab/remark/issues/298 */ .title-slide .remark-slide-number { display: none; } /* Custom CSS */ /* More line spacing */ body { line-height: 1.5; } /* Font styling */ .hi { font-weight: 600; } .mono { font-family: monospace; } .ul { text-decoration: underline; } .ol { text-decoration: overline; } .st { text-decoration: line-through; } .bf { font-weight: bold; } .it { font-style: italic; } /* Font Sizes */ .bigger { font-size: 125%; } .huge{ font-size: 150%; } .small { font-size: 95%; } .smaller { font-size: 85%; } .smallest { font-size: 75%; } .tiny { font-size: 50%; } /* Remark customization */ .clear .remark-slide-number { display: none; } .inverse .remark-slide-number { display: none; } .remark-code-line-highlighted { background-color: rgba(249, 39, 114, 0.5); } /* Xaringan tweeks */ .inverse { background-color: #23373B; text-shadow: 0 0 20px #333; /* text-shadow: none; */ } .title-slide { background-color: #ffffff; border-top: 80px solid #ffffff; } .footnote { bottom: 1em; font-size: 80%; color: #7f7f7f; } /* Lists */ li { margin-top: 4px; } /* Mono-spaced font, smaller */ .mono-small { font-family: monospace; font-size: 16px; } .mono-small .mjx-chtml { font-size: 103% !important; } .pseudocode, .pseudocode-small { font-family: monospace; background: #f8f8f8; border-radius: 3px; padding: 10px; padding-top: 0px; padding-bottom: 0px; } .pseudocode-small { font-size: 16px; } .remark-code { font-size: 68%; } .remark-inline-code { background: #F5F5F5; /* lighter */ /* background: #e7e8e2; /* darker */ border-radius: 3px; padding: 4px; } /* Super and Subscripts */ .super{ vertical-align: super; font-size: 70%; line-height: 1%; } .sub{ vertical-align: sub; font-size: 70%; line-height: 1%; } /* Subheader */ .subheader{ font-weight: 100; font-style: italic; display: block; margin-top: -25px; margin-bottom: 25px; } /* 2/3 left; 1/3 right */ .more-left { float: left; width: 63%; } .less-right { float: right; width: 31%; } .more-right ~ * { clear: both; } /* 9/10 left; 1/10 right */ .left90 { padding-top: 0.7em; float: left; width: 85%; } .right10 { padding-top: 0.7em; float: right; width: 9%; } /* 95% left; 5% right */ .left95 { padding-top: 0.7em; float: left; width: 91%; } .right05 { padding-top: 0.7em; float: right; width: 5%; } .left5 { padding-top: 0.7em; margin-left: 0em; margin-right: -0.4em; float: left; width: 7%; } .left10 { padding-top: 0.7em; margin-left: -0.2em; margin-right: -0.5em; float: left; width: 10%; } .left30 { padding-top: 0.7em; float: left; width: 30%; } .right30 { padding-top: 0.7em; float: right; width: 30%; } .thin-left { padding-top: 0.7em; margin-left: -1em; margin-right: -0.5em; float: left; width: 27.5%; } /* Example */ .ex { font-weight: 300; color: #555F61 !important; font-style: italic; } .col-left { float: left; width: 47%; margin-top: -1em; } .col-right { float: right; width: 47%; margin-top: -1em; } .clear-up { clear: both; margin-top: -1em; } /* Format tables */ table { color: #000000; font-size: 14pt; line-height: 100%; border-top: 1px solid #ffffff !important; border-bottom: 1px solid #ffffff !important; } th, td { background-color: #ffffff; } table th { font-weight: 400; } /* Attention */ .attn { font-weight: 500; color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Note */ .note { font-weight: 300; font-style: italic; color: #314f4f !important; /* color: #cccccc !important; */ font-family: 'Zilla Slab' !important; } /* Question and answer */ .qa { font-weight: 500; /* color: #314f4f !important; */ color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Figure Caption */ .caption { font-size: 0.8888889em; line-height: 1.5; margin-top: 1em; color: #6b7280; } </style> <!-- From xaringancolor --> <div style = "position:fixed; visibility: hidden"> $$ \require{color} \definecolor{purple}{rgb}{0.337254901960784, 0.00392156862745098, 0.643137254901961} \definecolor{navy}{rgb}{0.0509803921568627, 0.23921568627451, 0.337254901960784} \definecolor{ruby}{rgb}{0.603921568627451, 0.145098039215686, 0.0823529411764706} \definecolor{alice}{rgb}{0.0627450980392157, 0.470588235294118, 0.584313725490196} \definecolor{daisy}{rgb}{0.92156862745098, 0.788235294117647, 0.266666666666667} \definecolor{coral}{rgb}{0.949019607843137, 0.427450980392157, 0.129411764705882} \definecolor{kelly}{rgb}{0.509803921568627, 0.576470588235294, 0.337254901960784} \definecolor{jet}{rgb}{0.0745098039215686, 0.0823529411764706, 0.0862745098039216} \definecolor{asher}{rgb}{0.333333333333333, 0.372549019607843, 0.380392156862745} \definecolor{slate}{rgb}{0.192156862745098, 0.309803921568627, 0.309803921568627} \definecolor{cranberry}{rgb}{0.901960784313726, 0.254901960784314, 0.450980392156863} $$ </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { purple: ["{\\color{purple}{#1}}", 1], navy: ["{\\color{navy}{#1}}", 1], ruby: ["{\\color{ruby}{#1}}", 1], alice: ["{\\color{alice}{#1}}", 1], daisy: ["{\\color{daisy}{#1}}", 1], coral: ["{\\color{coral}{#1}}", 1], kelly: ["{\\color{kelly}{#1}}", 1], jet: ["{\\color{jet}{#1}}", 1], asher: ["{\\color{asher}{#1}}", 1], slate: ["{\\color{slate}{#1}}", 1], cranberry: ["{\\color{cranberry}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .purple {color: #5601A4;} .navy {color: #0D3D56;} .ruby {color: #9A2515;} .alice {color: #107895;} .daisy {color: #EBC944;} .coral {color: #F26D21;} .kelly {color: #829356;} .jet {color: #131516;} .asher {color: #555F61;} .slate {color: #314F4F;} .cranberry {color: #E64173;} </style> ## Chapter 22: Inference about a Population Proportion --- # Inference about a Population Proportion Previously we have discussed making inference in population *means* This chapter talks about questions where we're interested in the proportion of an outcome .bf.alice[Single population proportion] - .ex[Examples:] Proportion of people voting for a candidate; percent of people who are vaccinated; percent of people who support an issue; etc. .bf.alice[Comparing two population proportions] - .ex[Examples:] Is there a difference between the proportion of male students and proportion of female students that smoke cigarettes; Do Republicans and Democrats differ in their support for policy X; etc. --- # The Sample Proportion, `\(\hat{p}\)` The statistic that estimates the population proportion, `\(p\)`, is the .hi.coral[sample proportion]: $$ \coral{\hat{p}} = \frac{\text{number of "successes" in the sample}}{n} $$ For example: Say we want to estimate the proportion of heterosexual adults who have had more than one sexual partners in the past year. To estimate this proportion, a researcher collected survey data and contacted 2673 people, and 170 said they had multiple sex partners $$ \coral{\hat{p}} = \frac{170}{2673} = 0.0636 $$ --- # Sampling Distribution of a Sample Proportion .subheader.alice[Binomial Distribution Review] We can think of binary random variables (take only two values) as a Bernoulli distribution: - Assign one outcome 0 and the other outcome 1 - `\(X \sim B(1,p)\)` - This means `\(p\)` is the **unobserved** probability of outcome 1 occuring We use the sample statistic `\(\coral{\hat{p}} = \frac{\text{number of successes}}{\text{total observations}}\)` - The .hi.kelly[mean] of the sampling distribution is `\(p\)` - The .hi.kelly[standard deviation] of the sampling distribution is `\(\sqrt{\frac{p(1-p)}{n}}\)` --- # Sampling Distribution of a Sample Proportion .subheader.alice[Binomial Distribution Review] Say we draw a simple random sample of size `\(n\)` from a large population that contains `\(p\)` proportion of successes. Let `\(\coral{\hat{p}}\)` be the .hi.purple[sample proportion] of successes, $$ \coral{\hat{p}} = \frac{\text{number of successes in the sample}}{n} $$ The .hi.purple[Central Limit Theorem] tells us that with a large enough sample size, the standardized value of `\(\coral{\hat{p}}\)` will be approximately normal: $$ \frac{\coral{\hat{p}} - p}{\sqrt{p(1-p)/n}} \sim N(0, 1) $$ - `\(p\)` is the true population proportion - `\(\sqrt{p(1-p)/n}\)` is the true population standard deviation --- # Clicker Question A study investigated ways to prevent staph infections in surgery patients. In a first step, the researchers examined the nasal secretions of a random sample of 6771 patients admitted to various hospitals for surgery. They found that 1251 of these patients tested positive for .it[Staphylococcus aureus], a bacterium responsible for most staph infections. What is the population and what is the parameter `\(p\)`? -- Calculate the statistic `\(\coral{\hat{p}}\)` that estimates `\(p\)` <ol type = "a"> <li> 5.41 </li> <li> 0.185 </li> <li> 0.341 </li> </ol> --- # Election A poll by YouGov asked `\(1360\)` voters in Pennsylvania if they were going to vote for Biden or Trump the day before the election. We will code a vote for Biden `\(=1\)`, so the proportion `\(\coral{\hat{p}}\)` is the proportion of people who will vote for Biden. Biden will win Pennsylvania if the population portion is `\(p > .5\)`. They find that `\(\coral{\hat{p}} = .53\)`. What is the sampling distrubtion of `\(\coral{\hat{p}}\)`? .footnote.small[Source: https://projects.fivethirtyeight.com/polls/] --- # What's the probability Biden Wins PA? Using the sampling distribution, what's the probability that `\(p > .50\)`? --- # Clicker Question Only 30% of American adults eat breakfast daily. A cereal manufacturer contacts an SRS of 1500 American adults to see what proportion of them consume breakfast daily. What is the approximate distribution of `\(\coral{\hat{p}}\)`? --- # Confidence Intervals for a Population Proportion We follow the same path from sampling distribution to confidence interval as we did for `\(\bar{X}\)` Note, the standard deviation of `\(\coral{\hat{p}}\)` depends on the parameter `\(p\)` -- a value that we don't know. -- We therefore estimate the standard deviation with the standard error of `\(\coral{\hat{p}}\)`: $$ SE_{\coral{\hat{p}}}=\sqrt{\frac{\coral{\hat{p}}(1-\coral{\hat{p}})}{n}} $$ -- - Remember! Estimating `\(SE\)` means `\(t\)`-dist if sample is small!!! --- # Confidence Intervals for a Population Proportion Say we draw a simple random sample of size `\(n\)` from a large population that contains an unknown proportion `\(p\)` of successes. An approximate C% .hi.purple[confidence interval] for p is: $$ \coral{\hat{p}} \pm Z^\frac{1-C}{2} \sqrt{\frac{\coral{\hat{p}}(1-\coral{\hat{p}})}{n}} $$ -- What do we mean by large? Can only use this confidence interval when number of successes and failures in the sample are both at least 15 (to remember, half of 30 each). --- # Example A poll by YouGov asked 1360 voters in Pennsylvania if they were going to vote for Biden or Trump. We will code a vote for Biden `\(=1\)`, so the proportion `\(\coral{\hat{p}}\)` is the proportion of people who will vote for biden. Biden will win Pennsylvania if the population portion is `\(p > .5\)`. They find that `\(\coral{\hat{p}} = .53\)`. What is the sampling distrubtion of `\(\coral{\hat{p}}\)`? Check the conditions: - SRS `\(\checkmark\)` - number of success ($1360 * 0.53$) and failures ($1360 * 0.47$) are both larger than `\(15\)` `\(\checkmark\)` So we can go ahead and calculate a 95\% confidence interval for the population parameter `\(p\)`... --- --- # Example The behavioral survey found that 170 individuals of a simple random sample of 2673 adult heterosexuals had had multiple partners. That is `\(\coral{\hat{p}} = \frac{170}{2673} = 0.0636\)`. Provide a 99% confidence interval for the proportion `\(p\)` of all adult heterosexuals who have multiple partners. Check the conditions: - SRS `\(\checkmark\)` - number of success (170) and failures (1503) are both larger than 15 `\(\checkmark\)` So we can go ahead and calculate the confidence interval...... --- # Example $$\coral{\hat{p}} \pm Z^\frac{1-C}{2} \sqrt{\frac{\coral{\hat{p}}(1-\coral{\hat{p}})}{n}} = $$ -- `$$0.0636 \pm 2.576 \cdot \sqrt{\frac{(0.0636)\cdot(0.9364)}{2673}}$$` We are 99% confident that the mean proportion of heterosexual adults who have had more than one partner in the past year is between 5.14% and 7.58%. --- # Clicker Question We are given that `\(n = 670\)`, `\(\coral{\hat{p}} = 0.85\)`, we will use the standard error of the sample proportion as $$ SE_{\coral{\hat{p}}}=\sqrt{\coral{\hat{p}}(1-\coral{\hat{p}})/n} $$ Which of the following is the correct calculation for a 95% confidence interval? <ol type = "a"> <li> \( 0.85 \pm 1.96 \cdot \sqrt{\frac{0.85\cdot 0.15}{670}} \) </li> <li> \( 0.85 \pm 1.645 \cdot \sqrt{\frac{0.85\cdot 0.15}{670}} \) </li> <li> \( 0.85 \pm 1.96 \cdot \frac{0.85\cdot 0.15}{\sqrt{670}} \) </li> <li> \( 571 \pm 1.96 \cdot \sqrt{\frac{571\cdot 99}{670}} \) </li> </ol> --- # Hypothesis Testing We design a hypothesis test such as: $$ H_0: \coral{\hat{p}} = p_0 \ \text{ vs. } \ H_1: \coral{\hat{p}} \neq p_0 $$ Or one-sided alternatives, such as: `\(\coral{\hat{p}} < p_0\)` or `\(\coral{\hat{p}} > p_0\)`. We reject `\(H_0\)` if our p-value is lower than our *level of significance* - p-value: probability of calculating the sample proportion we have, or more extreme value, *given* the null hypothesis is true --- # Test Statistic Draw an SRS of size `\(n\)` from a large population that contains an unknown proportion `\(p\)` of successes. To test the hypothesis `\(H_0: \coral{\hat{p}} = p_0\)`, compute the following z-statistic: $$ Z=\frac{\coral{\hat{p}}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} $$ Look up this `\(Z\)` value in the `\(Z\)`-table when the sample size `\(n\)` is so large that both `\(n \cdot p_0\)` and `\(n \cdot (1-p_0) = 15\)` or more. --- # Breakout Group A survey found that 571 out of 670 (85%) of Americans answered a question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design? $$ H_0: p=0.8 \ \text{ vs. } \ H_1: p>0.8 $$ --- # Breakout Group A survey found that 571 out of 670 (85%) of Americans answered a question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design? $$ H_0: p = 0.8 \ \text{ vs. } \ H_1: p > 0.8 $$ Calculate the the p-value: $$ P(\coral{\hat{p}} > 0.85 \ \vert \ p = 0.8) $$ $$ P(Z > \frac{0.85-0.8}{\sqrt{\frac{0.8 \cdot 0.2}{670}}}) = P(Z > 3.25) = 0.0006 $$ Since `\(p\)`-value `\(= 0.0006 < \alpha = 0.05\)`, reject `\(H_0\)`. --- # Election: Hypothesis Testing On Nov. 1st, the New York Times and Siena College released a poll for Wisconsin with `\(n = 1253\)` and the sample proportion of people supporting Biden was `\(\coral{\hat{p}} = 0.52\)`. On election day, we learned the population proportion supporting Biden was `\(p = 0.495\)`. Would we have rejected the following hypothesis at the `\(\alpha = 0.05\)` significance level? $$ H_0: p = 0.495 $$ $$ H_1: p > 0.495 $$ --- # Election Polling and Simple Random Sample Why did we reject the null hypothesis which was true? Which of the following problems do we think could have occured (list from ch. 9)? - .hi.purple[Undercoverage]: when some groups in the population are left out of the process of choosing the sample - .hi.purple[Oversampling]: when some groups are sampled more often than others in a way that is not representative of the population - .hi.purple[Nonresponse]: when an individual chosen for the sample can't be contacted or refuses to participate - .hi.purple[Response Bias]: a systematic pattern of incorrect responses in a sample survey - .hi.purple[Wording Effect]: a systematic pattern of responses due to poor (or manipulated) wording of survey questions --- # Breakout Group Suppose you are an epidemiologist studying cancer incidence in an old manufacturing town. It is believed the cancer incidence in this town is above average. You know that the proportion of the national population that has a certain certain cancer is 0.03. The manufacturing town has an observed cancer incidence of 0.045 among a sample of 400 residents. Test the following hypothesis at the `\(\alpha = 0.05\)` significance level. $$ H_0: p = 0.03 $$ $$ H_1: p > 0.03 $$ <ol type = "a"> <li> Reject `\(H_0\)` </li> <li> Fail to reject `\(H_0\)` </li> </ol> --- --- # Breakout Group Suppose you are an epidemiologist studying cancer incidence in an old manufacturing town. It is believed the cancer incidence in this town is above average. You know that the proportion of the national population that has a certain certain cancer is 0.03. The manufacturing town has an observed cancer incidence of 0.045 among a sample of 400 residents. Test the following hypothesis at the `\(\alpha = 0.05\)` significance level. For what region of sample proportions `\(\coral{\hat{p}}\)` will you reject the following null hypothesis at the `\(\alpha = 0.05\)` significance level. $$ H_0: p = 0.03 $$ $$ H_1: p > 0.03 $$ ---