class: center, middle, inverse, title-slide # ECON 3818 ## Chapter 2 ### Kyle Butts ### 27 September 2021 --- class: clear, middle <!-- Custom css --> <style type="text/css"> /* ------------------------------------------------------- * * !! This file was generated by xaringanthemer !! * * Changes made to this file directly will be overwritten * if you used xaringanthemer in your xaringan slides Rmd * ------------------------------------------------------- */ @import url(https://fonts.googleapis.com/css?family=Roboto&display=swap); @import url(https://fonts.googleapis.com/css?family=Roboto&display=swap); @import url(https://fonts.googleapis.com/css?family=Source+Code+Pro:400,700&display=swap); @import url(https://fonts.googleapis.com/css2?family=Atkinson+Hyperlegible&display=swap); :root { /* Fonts */ --text-font-family: 'Atkinson Hyperelegible'; --text-font-is-google: 1; --text-font-family-fallback: Roboto, -apple-system, BlinkMacSystemFont, avenir next, avenir, helvetica neue, helvetica, Ubuntu, roboto, noto, segoe ui, arial; --text-font-base: sans-serif; --header-font-family: 'Atkinson Hyperelegible' --header-font-is-google: 1; --header-font-family-fallback: Georgia, serif; --code-font-family: 'Source Code Pro'; --code-font-is-google: 1; --base-font-size: 20px; --text-font-size: 1rem; --code-font-size: 0.9rem; --code-inline-font-size: 1em; --header-h1-font-size: 1.75rem; --header-h2-font-size: 1.6rem; --header-h3-font-size: 1.5rem; /* Colors */ --text-color: #131516; --text-color-light: #555F61; --header-color: #FFF; --background-color: #FFF; --link-color: #107895; --code-highlight-color: rgba(255,255,0,0.5); --inverse-text-color: #d6d6d6; --inverse-background-color: #272822; --inverse-header-color: #f3f3f3; --inverse-link-color: #107895; --title-slide-background-color: #272822; --title-slide-text-color: #d6d6d6; --header-background-color: #FFF; --header-background-text-color: #FFF; } html { font-size: var(--base-font-size); } body { font-family: var(--text-font-family), var(--text-font-family-fallback), var(--text-font-base); font-weight: normal; color: var(--text-color); } h1, h2, h3 { font-family: var(--header-font-family), var(--header-font-family-fallback); color: var(--text-color-light); } .remark-slide-content { background-color: var(--background-color); font-size: 1rem; padding: 24px 32px 16px 32px; width: 100%; height: 100%; } .remark-slide-content h1 { font-size: var(--header-h1-font-size); } .remark-slide-content h2 { font-size: var(--header-h2-font-size); } .remark-slide-content h3 { font-size: var(--header-h3-font-size); } .remark-code, .remark-inline-code { font-family: var(--code-font-family), Menlo, Consolas, Monaco, Liberation Mono, Lucida Console, monospace; } .remark-code { font-size: var(--code-font-size); } .remark-inline-code { font-size: var(--code-inline-font-size); color: #000; } .remark-slide-number { color: #107895; opacity: 1; font-size: 0.9em; } a, a > code { color: var(--link-color); text-decoration: none; } .footnote { position: absolute; bottom: 60px; padding-right: 6em; font-size: 0.9em; } .remark-code-line-highlighted { background-color: var(--code-highlight-color); } .inverse { background-color: var(--inverse-background-color); color: var(--inverse-text-color); } .inverse h1, .inverse h2, .inverse h3 { color: var(--inverse-header-color); } .inverse a, .inverse a > code { color: var(--inverse-link-color); } img, video, iframe { max-width: 100%; } blockquote { border-left: solid 5px lightgray; padding-left: 1em; } @page { margin: 0; } @media print { .remark-slide-scaler { width: 100% !important; height: 100% !important; transform: scale(1) !important; top: 0 !important; left: 0 !important; } } /* Modified metropolis */ .clear{ border-top: 0px solid #FAFAFA; } h1 { margin-top: -5px; margin-left: -00px; margin-bottom: 30px; color: var(--text-color-light); font-weight: 200; } h2, h3, h4 { padding-top: -15px; padding-bottom: 00px; color: #1A292C; text-shadow: none; font-weight: 400; text-align: left; margin-left: 00px; margin-bottom: -10px; } .title-slide .inverse .remark-slide-content { background-color: #FAFAFA; } .title-slide { background-color: #FAFAFA; border-top: 80px solid #FAFAFA; } .title-slide h1 { color: var(--text-color); font-size: 40px; text-shadow: none; font-weight: 400; text-align: left; margin-left: 15px; } .title-slide h2 { margin-top: -15px; color: var(--link-color); text-shadow: none; font-weight: 300; font-size: 35px; text-align: left; margin-left: 15px; } .title-slide h3 { color: var(--text-color-light); text-shadow: none; font-weight: 300; font-size: 25px; text-align: left; margin-left: 15px; margin-bottom: 0px; } .title-slide h3:last-of-type { font-style: italic; font-size: 1rem; } /* Remove orange line */ hr, .title-slide h2::after, .mline h1::after { content: ''; display: block; border: none; background-color: #e5e5e5; color: #e5e5e5; height: 1px; } hr, .mline h1::after { margin: 1em 15px 0 15px; } .title-slide h2::after { margin: 10px 15px 35px 0; } .mline h1::after { margin: 10px 15px 0 15px; } /* turns off slide numbers for title page: https://github.com/gnab/remark/issues/298 */ .title-slide .remark-slide-number { display: none; } /* Custom CSS */ /* More line spacing */ body { line-height: 1.5; } /* Font styling */ .hi { font-weight: 600; } .mono { font-family: monospace; } .ul { text-decoration: underline; } .ol { text-decoration: overline; } .st { text-decoration: line-through; } .bf { font-weight: bold; } .it { font-style: italic; } /* Font Sizes */ .bigger { font-size: 125%; } .huge{ font-size: 150%; } .small { font-size: 95%; } .smaller { font-size: 85%; } .smallest { font-size: 75%; } .tiny { font-size: 50%; } /* Remark customization */ .clear .remark-slide-number { display: none; } .inverse .remark-slide-number { display: none; } .remark-code-line-highlighted { background-color: rgba(249, 39, 114, 0.5); } /* Xaringan tweeks */ .inverse { background-color: #23373B; text-shadow: 0 0 20px #333; /* text-shadow: none; */ } .title-slide { background-color: #ffffff; border-top: 80px solid #ffffff; } .footnote { bottom: 1em; font-size: 80%; color: #7f7f7f; } /* Lists */ li { margin-top: 4px; } /* Mono-spaced font, smaller */ .mono-small { font-family: monospace; font-size: 16px; } .mono-small .mjx-chtml { font-size: 103% !important; } .pseudocode, .pseudocode-small { font-family: monospace; background: #f8f8f8; border-radius: 3px; padding: 10px; padding-top: 0px; padding-bottom: 0px; } .pseudocode-small { font-size: 16px; } .remark-code { font-size: 68%; } .remark-inline-code { background: #F5F5F5; /* lighter */ /* background: #e7e8e2; /* darker */ border-radius: 3px; padding: 4px; } /* Super and Subscripts */ .super{ vertical-align: super; font-size: 70%; line-height: 1%; } .sub{ vertical-align: sub; font-size: 70%; line-height: 1%; } /* Subheader */ .subheader{ font-weight: 100; font-style: italic; display: block; margin-top: -25px; margin-bottom: 25px; } /* 2/3 left; 1/3 right */ .more-left { float: left; width: 63%; } .less-right { float: right; width: 31%; } .more-right ~ * { clear: both; } /* 9/10 left; 1/10 right */ .left90 { padding-top: 0.7em; float: left; width: 85%; } .right10 { padding-top: 0.7em; float: right; width: 9%; } /* 95% left; 5% right */ .left95 { padding-top: 0.7em; float: left; width: 91%; } .right05 { padding-top: 0.7em; float: right; width: 5%; } .left5 { padding-top: 0.7em; margin-left: 0em; margin-right: -0.4em; float: left; width: 7%; } .left10 { padding-top: 0.7em; margin-left: -0.2em; margin-right: -0.5em; float: left; width: 10%; } .left30 { padding-top: 0.7em; float: left; width: 30%; } .right30 { padding-top: 0.7em; float: right; width: 30%; } .thin-left { padding-top: 0.7em; margin-left: -1em; margin-right: -0.5em; float: left; width: 27.5%; } /* Example */ .ex { font-weight: 300; color: #555F61 !important; font-style: italic; } .col-left { float: left; width: 47%; margin-top: -1em; } .col-right { float: right; width: 47%; margin-top: -1em; } .clear-up { clear: both; margin-top: -1em; } /* Format tables */ table { color: #000000; font-size: 14pt; line-height: 100%; border-top: 1px solid #ffffff !important; border-bottom: 1px solid #ffffff !important; } th, td { background-color: #ffffff; } table th { font-weight: 400; } /* Attention */ .attn { font-weight: 500; color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Note */ .note { font-weight: 300; font-style: italic; color: #314f4f !important; /* color: #cccccc !important; */ font-family: 'Zilla Slab' !important; } /* Question and answer */ .qa { font-weight: 500; /* color: #314f4f !important; */ color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Figure Caption */ .caption { font-size: 0.8888889em; line-height: 1.5; margin-top: 1em; color: #6b7280; } </style> <!-- From xaringancolor --> <div style = "position:fixed; visibility: hidden"> $$ \require{color} \definecolor{purple}{rgb}{0.337254901960784, 0.00392156862745098, 0.643137254901961} \definecolor{navy}{rgb}{0.0509803921568627, 0.23921568627451, 0.337254901960784} \definecolor{ruby}{rgb}{0.603921568627451, 0.145098039215686, 0.0823529411764706} \definecolor{alice}{rgb}{0.0627450980392157, 0.470588235294118, 0.584313725490196} \definecolor{daisy}{rgb}{0.92156862745098, 0.788235294117647, 0.266666666666667} \definecolor{coral}{rgb}{0.949019607843137, 0.427450980392157, 0.129411764705882} \definecolor{kelly}{rgb}{0.509803921568627, 0.576470588235294, 0.337254901960784} \definecolor{jet}{rgb}{0.0745098039215686, 0.0823529411764706, 0.0862745098039216} \definecolor{asher}{rgb}{0.333333333333333, 0.372549019607843, 0.380392156862745} \definecolor{slate}{rgb}{0.192156862745098, 0.309803921568627, 0.309803921568627} \definecolor{cranberry}{rgb}{0.901960784313726, 0.254901960784314, 0.450980392156863} $$ </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { purple: ["{\\color{purple}{#1}}", 1], navy: ["{\\color{navy}{#1}}", 1], ruby: ["{\\color{ruby}{#1}}", 1], alice: ["{\\color{alice}{#1}}", 1], daisy: ["{\\color{daisy}{#1}}", 1], coral: ["{\\color{coral}{#1}}", 1], kelly: ["{\\color{kelly}{#1}}", 1], jet: ["{\\color{jet}{#1}}", 1], asher: ["{\\color{asher}{#1}}", 1], slate: ["{\\color{slate}{#1}}", 1], cranberry: ["{\\color{cranberry}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .purple {color: #5601A4;} .navy {color: #0D3D56;} .ruby {color: #9A2515;} .alice {color: #107895;} .daisy {color: #EBC944;} .coral {color: #F26D21;} .kelly {color: #829356;} .jet {color: #131516;} .asher {color: #555F61;} .slate {color: #314F4F;} .cranberry {color: #E64173;} </style> ## Chapter 2: Describing Distribution with Numbers --- ## Chapter Overview Population vs. Sample Measures of Central Tendency - Mean - Median Measures of Variability - Quartiles - Variance and Standard Deviation --- # Population vs Sample .hi.purple[Population]: the entire entities under the study - Examples: all men, all NBA players, all children under 5 .hi.kelly[Sample]: subset of the population - Can be used to draw inferences about the population - Examples: our class, Denver Nuggets players, daycares in Colorado - Interested in parameters of the .hi.purple[population] distribution, we can estimate these parameters using data from .hi.kelly[samples] since finding population parameters is infeasible --- # Population Distribution .hi.kelly[Distribution of a variable]: tells us .it[what values] it takes and .it[how often] it takes these values - We are interested in the underlying population distribution of some variable - Fundamental problem of statistics is we can't collect data on every single observation --- class: clear <img src="pop_graph.png" width="100%" style="display: block; margin: auto;" /> --- # Population Inference What we do instead is use a sample of the population and use that sample distribution to determine parameters of interest .center[ <img style="width:80%;" src="sample_anim.gif"/> ] --- # Parameters of Interest Two primary .hi.purple[population] parameters of interest: - Measures of central tendency: - Population .coral[mean], `\(\mu\)` - Population .cranberry[median] - Measures of variability: - Population .alice[variance], `\(\sigma^2\)` -- We will .it.kelly[estimate] these using the .hi.kelly[sample] distribution --- # Measuring Center: the Mean The most common measure of center is the arithmetic average, or .hi.coral[mean] `$$\coral{\bar{x}} = \frac{x_1 + x_2 + .... + x_n}{n}$$` or more compactly: <img src="mean_annotated.png" width="45%" style="display: block; margin: auto;" /> --- # Population Inference: Mean .center[ <img style="width:100%;" src="sample_anim_mean.gif"/> ] --- # Population Inference: Mean .center[ <img style="width:100%;" src="sample_anim_mean_wider.gif"/> ] --- # Measuring Center: the Median The .hi.cranberry[median] is the midpoint of a distribution - Is more resistant to the influence of .hi[extreme observations] How to calculate median: - Arrange observations from smallest to largest - If there is odd number of observations, the median is the center observation. If there are even number of observations, the median is the average of two center observations --- # Mean vs. Median - Although we will primarily be using the mean throughout the semester, the biggest drawback of the mean is that it is not resistant to .hi.purple[outliers] - The median, however, is resistant to .hi.purple[outliers] so it can be important to calculate for smaller samples -- .center[ <img style="width: 60%;" src="meme.png"/> ] --- # Mean vs. Median Example <img src="rebounds.png" width="90%" style="display: block; margin: auto;" /> .hi[Median]: 205.5 rebounds and .hi[Mean]: 250.5 rebounds --- # Clicker Question What is the sample mean of the participants's age?
Sample of individuals
Age
Sex
BMI
Drinks per week
59
male
32.26
3 drinks
62
male
25.09
2 drinks
60
female
32.58
1 drink
18
male
99.99
6 drinks
57
female
31.88
2 drinks
56
male
42.80
3 drinks
<ol type="a"> <li>58</li> <li>51.2</li> <li>52</li> <li>49.7</li> </ol> --- # Clicker Question Which measure of central tendency best describes the age of participants?
Sample of individuals
Age
Sex
BMI
Drinks per week
59
male
32.26
3 drinks
62
male
25.09
2 drinks
60
female
32.58
1 drink
18
male
99.99
6 drinks
57
female
31.88
2 drinks
56
male
42.80
3 drinks
<ol type="a"> <li>Median</li> <li>Mean</li> </ol> --- # Measuring Variability Measures of central tendency do not tell the whole story. To further characterize the distribution, we need to know how the data is spread out - Quartiles - Variance --- # Variability: Quartiles Measure of center alone can be misleading. One way to measure variability is to use quartiles. How to calculate quartiles: - Arrange observations in increasing order and locate .hi.cranberry[median] - The .hi.kelly[first quartile] is the median of the observations located to the left of the median - The .hi.kelly[third quartile] is the median of observations located to the right of the median <img src="quartiles.png" width="50%" style="display: block; margin: auto;" /> --- # Boxplots .hi.alice[Five-number summary]: smallest observation (minimum), the first quartile, the median, the third quartile, and the largest observation (maximum) We can use the .hi.purple[boxplot] using this five number summary to display quantitative data How to make a boxplot: - A central box spans the first and third quartiles - A line in the box marks the median - Line extends from the box out to the smallest and largest observations --- # Boxplots <img src="ch2_files/figure-html/rodman-box-1.svg" width="80%" style="display: block; margin: auto;" /> --- # Interquartile Range The .hi.ruby[interquartile range], IQR, is the distance between the first and third quartiles - IQR = `\(Q_3 - Q_1\)` - The IQR measures the spread of the data and it also helps to identify outliers Rule for outliers: - An observation is an outlier if it falls more than `\(1.5 \times IQR\)` above the third quartile or below the first --- # Variability: Variance .hi.purple[Variance]: denoted, `\(s^2\)`, measures how "spread out" the data are on average `$$s^2 = \frac{(x_1-\coral{\bar{x}})^2 + (x_2-\coral{\bar{x}})^2 + .... + (x_n - \coral{\bar{x}})^2}{n-1},$$` or more compactly <img src="var_annotated.png" width="65%" style="display: block; margin: auto;" /> --- # Visualizing Variance <img src="ch2_files/figure-html/multiple-vars-1.svg" width="90%" style="display: block; margin: auto;" /> --- # Example <img src="giraffe_variance1.jpg" width="100%" style="display: block; margin: auto;" /> .footnote[Figure from [Teacups, Giraffes, & Statistics](https://tinystats.github.io/teacups-giraffes-and-statistics/04_variance.html)] 1. Calculate the mean height in sample --- # Example <img src="giraffe_variance2.jpg" width="100%" style="display: block; margin: auto;" /> .footnote[Figure from [Teacups, Giraffes, & Statistics](https://tinystats.github.io/teacups-giraffes-and-statistics/04_variance.html)] <ol start = "2"> <li>Calculate deviations from mean</li> <li>Square and sum</li> </ol> --- # Variability: Standard Deviation .hi.purple[Standard deviation]: looks at how far each observation is from the mean; square root of the variance `$$s=\sqrt{\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar{x})^2} = \sqrt{s^2}$$` - `\(n-1\)` is referred to as the degrees of freedom - `\(s\)` measures variability about the mean - More variable `\(\implies\)` larger `\(s\)` - `\(s\)` is always greater than or equal to zero, but usually `\(> 0\)` - When would it be `\(=0\)`? - `\(s\)` is not resistant to outliers. --- # Practice Question Calculate the standard deviation of age?
Sample of individuals
Age
Sex
BMI
Drinks per week
59
male
32.26
3 drinks
62
male
25.09
2 drinks
60
female
32.58
1 drink
18
male
99.99
6 drinks
57
female
31.88
2 drinks
56
male
42.80
3 drinks
--- # Summary of Summary Statistics Two basic ways to summarize the center and spread of a distribution - Mean and standard deviation (or variance) - The five-number summary .hi.slate[When to Use Which] Use `\(\bar{x}\)` and `\(s\)` when the distribution is reasonably symmetric and free of outliers Use five-number summary if distribution is skewed, or has outliers --- # Greek Letters and Statistics .pull-left[ .hi.kelly[Latin Letters] - Latin letters like `\(\bar{x}\)` and `\(s^2\)` are calculations that represent guesses (estimates) at the population values. ] .pull-right[ .hi.purple[Greek Letters] - Greek letters like `\(\mu\)` and `\(\sigma^2\)` represent the truth about the population. ] The goal for the class is for the latin letters to be good guesses for the greek letters: $$ \kelly{\text{Data}} \longrightarrow \kelly{\text{Calculation}} \longrightarrow \kelly{\text{Estimates}} \longrightarrow^{hopefully!} \purple{\text{Truth}} $$ For example, $$ \kelly{X} \longrightarrow \kelly{\frac{1}{n} \sum_{i=1}^n X_i} \longrightarrow \kelly{\bar{x}} \longrightarrow^{hopefullly!} \purple{\mu} $$