class: center, middle, inverse, title-slide # ECON 3818 ## Chapter 15 ### Kyle Butts ### 03 October 2021 --- exclude: true --- class: clear, middle <!-- Custom css --> <style type="text/css"> /* ------------------------------------------------------- * * !! This file was generated by xaringanthemer !! * * Changes made to this file directly will be overwritten * if you used xaringanthemer in your xaringan slides Rmd * ------------------------------------------------------- */ @import url(https://fonts.googleapis.com/css?family=Roboto&display=swap); @import url(https://fonts.googleapis.com/css?family=Roboto&display=swap); @import url(https://fonts.googleapis.com/css?family=Source+Code+Pro:400,700&display=swap); @import url(https://fonts.googleapis.com/css2?family=Atkinson+Hyperlegible&display=swap); :root { /* Fonts */ --text-font-family: 'Atkinson Hyperelegible'; --text-font-is-google: 1; --text-font-family-fallback: Roboto, -apple-system, BlinkMacSystemFont, avenir next, avenir, helvetica neue, helvetica, Ubuntu, roboto, noto, segoe ui, arial; --text-font-base: sans-serif; --header-font-family: 'Atkinson Hyperelegible' --header-font-is-google: 1; --header-font-family-fallback: Georgia, serif; --code-font-family: 'Source Code Pro'; --code-font-is-google: 1; --base-font-size: 20px; --text-font-size: 1rem; --code-font-size: 0.9rem; --code-inline-font-size: 1em; --header-h1-font-size: 1.75rem; --header-h2-font-size: 1.6rem; --header-h3-font-size: 1.5rem; /* Colors */ --text-color: #131516; --text-color-light: #555F61; --header-color: #FFF; --background-color: #FFF; --link-color: #107895; --code-highlight-color: rgba(255,255,0,0.5); --inverse-text-color: #d6d6d6; --inverse-background-color: #272822; --inverse-header-color: #f3f3f3; --inverse-link-color: #107895; --title-slide-background-color: #272822; --title-slide-text-color: #d6d6d6; --header-background-color: #FFF; --header-background-text-color: #FFF; } html { font-size: var(--base-font-size); } body { font-family: var(--text-font-family), var(--text-font-family-fallback), var(--text-font-base); font-weight: normal; color: var(--text-color); } h1, h2, h3 { font-family: var(--header-font-family), var(--header-font-family-fallback); color: var(--text-color-light); } .remark-slide-content { background-color: var(--background-color); font-size: 1rem; padding: 24px 32px 16px 32px; width: 100%; height: 100%; } .remark-slide-content h1 { font-size: var(--header-h1-font-size); } .remark-slide-content h2 { font-size: var(--header-h2-font-size); } .remark-slide-content h3 { font-size: var(--header-h3-font-size); } .remark-code, .remark-inline-code { font-family: var(--code-font-family), Menlo, Consolas, Monaco, Liberation Mono, Lucida Console, monospace; } .remark-code { font-size: var(--code-font-size); } .remark-inline-code { font-size: var(--code-inline-font-size); color: #000; } .remark-slide-number { color: #107895; opacity: 1; font-size: 0.9em; } a, a > code { color: var(--link-color); text-decoration: none; } .footnote { position: absolute; bottom: 60px; padding-right: 6em; font-size: 0.9em; } .remark-code-line-highlighted { background-color: var(--code-highlight-color); } .inverse { background-color: var(--inverse-background-color); color: var(--inverse-text-color); } .inverse h1, .inverse h2, .inverse h3 { color: var(--inverse-header-color); } .inverse a, .inverse a > code { color: var(--inverse-link-color); } img, video, iframe { max-width: 100%; } blockquote { border-left: solid 5px lightgray; padding-left: 1em; } @page { margin: 0; } @media print { .remark-slide-scaler { width: 100% !important; height: 100% !important; transform: scale(1) !important; top: 0 !important; left: 0 !important; } } /* Modified metropolis */ .clear{ border-top: 0px solid #FAFAFA; } h1 { margin-top: -5px; margin-left: -00px; margin-bottom: 30px; color: var(--text-color-light); font-weight: 200; } h2, h3, h4 { padding-top: -15px; padding-bottom: 00px; color: #1A292C; text-shadow: none; font-weight: 400; text-align: left; margin-left: 00px; margin-bottom: -10px; } .title-slide .inverse .remark-slide-content { background-color: #FAFAFA; } .title-slide { background-color: #FAFAFA; border-top: 80px solid #FAFAFA; } .title-slide h1 { color: var(--text-color); font-size: 40px; text-shadow: none; font-weight: 400; text-align: left; margin-left: 15px; } .title-slide h2 { margin-top: -15px; color: var(--link-color); text-shadow: none; font-weight: 300; font-size: 35px; text-align: left; margin-left: 15px; } .title-slide h3 { color: var(--text-color-light); text-shadow: none; font-weight: 300; font-size: 25px; text-align: left; margin-left: 15px; margin-bottom: 0px; } .title-slide h3:last-of-type { font-style: italic; font-size: 1rem; } /* Remove orange line */ hr, .title-slide h2::after, .mline h1::after { content: ''; display: block; border: none; background-color: #e5e5e5; color: #e5e5e5; height: 1px; } hr, .mline h1::after { margin: 1em 15px 0 15px; } .title-slide h2::after { margin: 10px 15px 35px 0; } .mline h1::after { margin: 10px 15px 0 15px; } /* turns off slide numbers for title page: https://github.com/gnab/remark/issues/298 */ .title-slide .remark-slide-number { display: none; } /* Custom CSS */ /* More line spacing */ body { line-height: 1.5; } /* Font styling */ .hi { font-weight: 600; } .mono { font-family: monospace; } .ul { text-decoration: underline; } .ol { text-decoration: overline; } .st { text-decoration: line-through; } .bf { font-weight: bold; } .it { font-style: italic; } /* Font Sizes */ .bigger { font-size: 125%; } .huge{ font-size: 150%; } .small { font-size: 95%; } .smaller { font-size: 85%; } .smallest { font-size: 75%; } .tiny { font-size: 50%; } /* Remark customization */ .clear .remark-slide-number { display: none; } .inverse .remark-slide-number { display: none; } .remark-code-line-highlighted { background-color: rgba(249, 39, 114, 0.5); } /* Xaringan tweeks */ .inverse { background-color: #23373B; text-shadow: 0 0 20px #333; /* text-shadow: none; */ } .title-slide { background-color: #ffffff; border-top: 80px solid #ffffff; } .footnote { bottom: 1em; font-size: 80%; color: #7f7f7f; } /* Lists */ li { margin-top: 4px; } /* Mono-spaced font, smaller */ .mono-small { font-family: monospace; font-size: 16px; } .mono-small .mjx-chtml { font-size: 103% !important; } .pseudocode, .pseudocode-small { font-family: monospace; background: #f8f8f8; border-radius: 3px; padding: 10px; padding-top: 0px; padding-bottom: 0px; } .pseudocode-small { font-size: 16px; } .remark-code { font-size: 68%; } .remark-inline-code { background: #F5F5F5; /* lighter */ /* background: #e7e8e2; /* darker */ border-radius: 3px; padding: 4px; } /* Super and Subscripts */ .super{ vertical-align: super; font-size: 70%; line-height: 1%; } .sub{ vertical-align: sub; font-size: 70%; line-height: 1%; } /* Subheader */ .subheader{ font-weight: 100; font-style: italic; display: block; margin-top: -25px; margin-bottom: 25px; } /* 2/3 left; 1/3 right */ .more-left { float: left; width: 63%; } .less-right { float: right; width: 31%; } .more-right ~ * { clear: both; } /* 9/10 left; 1/10 right */ .left90 { padding-top: 0.7em; float: left; width: 85%; } .right10 { padding-top: 0.7em; float: right; width: 9%; } /* 95% left; 5% right */ .left95 { padding-top: 0.7em; float: left; width: 91%; } .right05 { padding-top: 0.7em; float: right; width: 5%; } .left5 { padding-top: 0.7em; margin-left: 0em; margin-right: -0.4em; float: left; width: 7%; } .left10 { padding-top: 0.7em; margin-left: -0.2em; margin-right: -0.5em; float: left; width: 10%; } .left30 { padding-top: 0.7em; float: left; width: 30%; } .right30 { padding-top: 0.7em; float: right; width: 30%; } .thin-left { padding-top: 0.7em; margin-left: -1em; margin-right: -0.5em; float: left; width: 27.5%; } /* Example */ .ex { font-weight: 300; color: #555F61 !important; font-style: italic; } .col-left { float: left; width: 47%; margin-top: -1em; } .col-right { float: right; width: 47%; margin-top: -1em; } .clear-up { clear: both; margin-top: -1em; } /* Format tables */ table { color: #000000; font-size: 14pt; line-height: 100%; border-top: 1px solid #ffffff !important; border-bottom: 1px solid #ffffff !important; } th, td { background-color: #ffffff; } table th { font-weight: 400; } /* Attention */ .attn { font-weight: 500; color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Note */ .note { font-weight: 300; font-style: italic; color: #314f4f !important; /* color: #cccccc !important; */ font-family: 'Zilla Slab' !important; } /* Question and answer */ .qa { font-weight: 500; /* color: #314f4f !important; */ color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Figure Caption */ .caption { font-size: 0.8888889em; line-height: 1.5; margin-top: 1em; color: #6b7280; } </style> <!-- From xaringancolor --> <div style = "position:fixed; visibility: hidden"> $$ \require{color} \definecolor{purple}{rgb}{0.337254901960784, 0.00392156862745098, 0.643137254901961} \definecolor{navy}{rgb}{0.0509803921568627, 0.23921568627451, 0.337254901960784} \definecolor{ruby}{rgb}{0.603921568627451, 0.145098039215686, 0.0823529411764706} \definecolor{alice}{rgb}{0.0627450980392157, 0.470588235294118, 0.584313725490196} \definecolor{daisy}{rgb}{0.92156862745098, 0.788235294117647, 0.266666666666667} \definecolor{coral}{rgb}{0.949019607843137, 0.427450980392157, 0.129411764705882} \definecolor{kelly}{rgb}{0.509803921568627, 0.576470588235294, 0.337254901960784} \definecolor{jet}{rgb}{0.0745098039215686, 0.0823529411764706, 0.0862745098039216} \definecolor{asher}{rgb}{0.333333333333333, 0.372549019607843, 0.380392156862745} \definecolor{slate}{rgb}{0.192156862745098, 0.309803921568627, 0.309803921568627} \definecolor{cranberry}{rgb}{0.901960784313726, 0.254901960784314, 0.450980392156863} $$ </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { purple: ["{\\color{purple}{#1}}", 1], navy: ["{\\color{navy}{#1}}", 1], ruby: ["{\\color{ruby}{#1}}", 1], alice: ["{\\color{alice}{#1}}", 1], daisy: ["{\\color{daisy}{#1}}", 1], coral: ["{\\color{coral}{#1}}", 1], kelly: ["{\\color{kelly}{#1}}", 1], jet: ["{\\color{jet}{#1}}", 1], asher: ["{\\color{asher}{#1}}", 1], slate: ["{\\color{slate}{#1}}", 1], cranberry: ["{\\color{cranberry}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .purple {color: #5601A4;} .navy {color: #0D3D56;} .ruby {color: #9A2515;} .alice {color: #107895;} .daisy {color: #EBC944;} .coral {color: #F26D21;} .kelly {color: #829356;} .jet {color: #131516;} .asher {color: #555F61;} .slate {color: #314F4F;} .cranberry {color: #E64173;} </style> ## Chapter 15: Parameters and Statistics --- # Parameters and Statistics We have discussed using sample data to make inference about the population. In particular, we will use sample .hi.kelly[statistics] to make inference about population .hi.purple[parameters]. A .hi.purple[parameter] is a number that describes the population. In practice, parameters are unknown because we cannot examine the entire population. A .hi.kelly[statistic] is a number that can be calculated from sample data without using any unknown parameters. In practice, we use statistics to estimate parameters. --- # Greek Letters and Statistics .pull-left[ .hi.kelly[Latin Letters] - Latin letters like `\(\bar{x}\)` and `\(s^2\)` are calculations that represent guesses (estimates) at the population values. ] .pull-right[ .hi.purple[Greek Letters] - Greek letters like `\(\mu\)` and `\(\sigma^2\)` represent the truth about the population. ] The goal for the class is for the latin letters to be good guesses for the greek letters: $$ \kelly{\text{Data}} \longrightarrow \kelly{\text{Calculation}} \longrightarrow \kelly{\text{Estimates}} \longrightarrow^{hopefully!} \purple{\text{Truth}} $$ For example, $$ \kelly{X} \longrightarrow \kelly{\frac{1}{n} \sum_{i=1}^n X_i} \longrightarrow \kelly{\bar{x}} \longrightarrow^{hopefullly!} \purple{\mu} $$ --- # Examples of Parameters Some parameters of distributions we've encountered are - `\(n\)` and `\(p\)` in `\(X\sim B(n,p)\)` with probability mass function $$ P(X=x)={n \choose x} p^x \left(1-p\right)^{n-x} $$ - `\(a\)` and `\(b\)` in `\(X\sim U(a,b)\)` with probability density function $$ f(x)=\frac{1}{b-a} $$ - `\(\mu\)` and `\(\sigma^2\)` in `\(X\sim N(\mu,\sigma^2)\)` with probability density function $$ f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\left(\frac{x-\mu}{\sigma}\right)^2} $$ --- # Mean and Variance Two population parameters of particular interest are - the mean, denoted `\(\mu\)`, defined by `\(E(X)\)` - the variance, denoted `\(\sigma^2\)`, defined by `\(E(X^2)-E(X)^2\)` We .hi[do not] observe these. Therefore, we guess using - the sample mean, `\(\bar{X}\)` - the sample variance, `\(s^2\)` Why do we use these as our guess? --- # Getting the right sample Before we talk about the properties of sample statistics, we need to make sure we have the right sample. We talked about good ways to generate a sample. .hi.it[The right sample is the most important part of any data analysis.] A .hi.kelly[Simple Random Sample] has no bias and has observations that are from the same population. --- # Identically Distributed If every observation is from the same population, we say all of the observations in our sample are .hi.cranberry[identically distributed]. In math, this means for any two observations `\(X_i\)` and `\(X_j\)`, $$ Pr(X_i < x) = Pr(X_j < x) $$ --- # Independent Observations Does observing `\(X_i\)` impact our best guess of `\(X_j\)`? Sometimes yes (time series, spatial dependence), but hopefully not. To simplify things, we need to assume .hi.red[independent sample observations], meaning $$ Pr(X_i=a \ \vert \ X_j=b) = Pr(X_i=a) $$ Intuitively, this means that .it[observing] one outcome doesn't help you .it[predict] any other outcome. To summarize, we want an .it[i.i.d.] sample, i.e. sample observations that are .hi.purple[independent and identically distributed]. --- # Sample Statistics are Random Variables For a sample `\(X_1,..., X_n\)` of the random variable `\(X\)`, any function of that sample, `\(\hat{\theta}=g(X_1,...,X_n)\)`, is a .hi.ruby[sample statistic]. For example, `$$\ruby{\bar{X}} = \frac{1}{n} \sum_{i=1}^{n} X_i$$` `$$\ruby{\displaystyle s^2} = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2$$` Because `\(X_1,..., X_n\)` are random variables, any sample statistic `\(\ruby{\hat{\theta}} = g(X_1,...,X_n)\)` is itself a random variable! That means, there is some distribution for the values of `\(\ruby{\hat{\theta}}\)` --- # Sampling Distributions This is one of the most important concepts in the course. One .hi[trial] would consist of the following: - .hi.kelly[Random Sample] - Grab a group of observations from the population - .hi.ruby[Sample Statistic] - Take your particular random sample and calculate a sample statistic (e.g. sample mean) .hi.coral[Sampling Distribution] - Imagine repeatedly grabbing a different group of observations from the population and calculating the sample mean. This is performing many .hi[trials]. The sample means themselves will have a distribution. --- class: clear, center <img src="data:image/png;base64,#frame1.png" width="100%" style="display: block; margin: auto;" /> --- class: clear, center <img src="data:image/png;base64,#frame2.png" width="100%" style="display: block; margin: auto;" /> --- class: clear, center .center[ <img style="width:100%;" src="data:image/png;base64,#sample_dist.gif"/> ] --- class: clear, middle, center <img src="data:image/png;base64,#frame1000.png" width="100%" style="display: block; margin: auto;" /> --- # Sample Size The variance of the .it.coral[sampling distribution] depends on the sample size. As \(n\) gets larger, each individual .hi[trial] gives a better guess at the mean. Hence, the .coral[sampling distribution] gets more narrow .center[ <img style="width:80%;" src="data:image/png;base64,#dist_n.gif"/> ] --- class: clear <img src="data:image/png;base64,#sample_dist_diff_n.png" width="100%" style="display: block; margin: auto;" /> --- # Sampling Distributions We will only observe 1 sample in the world though. How does the concept of .coral[sampling distribution] help us? -- - Since we don't know the true population parameter, Our .ruby[sample statistic] will be our best guess at the possible true value. - If we know the .coral[sampling distribution], then we can consider uncertainty about our .ruby[sample statistic]. --- # Law of Large Numbers Is `\(\bar{X}\)` actually a good guess for `\(\mu\)`? Under certain conditions, we can use the .hi.purple[Law of Large Numbers (LLN)] to guarantee that `\(\bar{X}\)` approaches `\(\mu\)` as the sample size grows large. -- .hi[Theorem]: Let `\(X_1,X_2,...,X_n\)` be an i.i.d. set of observations with `\(E(X_i) = \mu\)`. Define the sample mean of size `\(n\)` as `\(\bar{X}_n = \frac{1}{n}\sum_{i = 1}^{n}X_i\)`. Then $$ \bar{X}_n \to \mu \quad \text{as} \quad n \to \infty. $$ Intuitively, as we observe a larger and larger sample, we average over randomness and our sample mean approaches the true population mean. --- # Law of Large Numbers .center[ <img style="width: 90%;" src="data:image/png;base64,#lln.gif"/> ] --- # Law of Large Numbers <img src="data:image/png;base64,#sample_dist_diff_n.png" width="100%" style="display: block; margin: auto;" /> --- # Properties of the sample mean .hi[Theorem]: Let `\(X_1,X_2,...,X_n\)` be an i.i.d. sample with `\(E(X_i) = \mu\)` and `\(Var(X_i) = \sigma^2 < \infty\)`. Then `$$E(\bar{X}_n) = \mu$$` `$$Var(\bar{X}_n) = \frac{\sigma^2}{n}$$` Intuitively, we grab many samples from a population. The variance of our sample averages shrinks as we observe more observations per sample. --- # Clicker Question Suppose we sample 100 observations from a distribution with `\(\mu = 15\)` and `\(\sigma^2 = 25\)`. What are `\(E(\bar{X}_{100})\)` and `\(Var(\bar{X}_{100})\)`? <ol type = "a"> <li>\(E(\bar{X}_{100}) = 15\), \(Var(\bar{X}_{100}) = 25\) <li>\(E(\bar{X}_{100}) = 0.15\), \(Var(\bar{X}_{100}) = 0.25\) <li>\(E(\bar{X}_{100}) = 15\), \(Var(\bar{X}_{100}) = 5\) <li>\(E(\bar{X}_{100}) = 15\), \(Var(\bar{X}_{100}) = 0.25\) </ol> --- class: clear ## When is the sample mean Normally Distributed? Although we know the mean and variance of `\(\bar{X}\)`, we generally don't know its distribution function. .hi[Theorem]: Let `\(X_1,X_2,...,X_n\)` be an i.i.d. sample with `\(X_i \sim N(\mu, \sigma^2)\)` for `\(i=1,2,...,n\)`. Then $$ \bar{X}_n \sim N(\mu, \frac{\sigma^2}{n}). $$ Intuitively, if all the observations come from the same normal distribution then the sample average is also normally distributed and centered at the true mean (but much more narrow). --- # Central Limit Theorem What if `\(X_i\)` are not normally distributed? If the number of observation, `\(n\)`, per sample is large (we will discuss this more later), then the distribution of `\(X_i\)` doesn't matter. We will always have $$ \bar{X}_n \sim N(\mu, \frac{\sigma^2}{n}). $$