class: center, middle, inverse, title-slide # ECON 3818 ## Chapter 4 ### Kyle Butts ### 23 August 2021 --- class: clear, middle <!-- Custom css --> <style type="text/css"> /* ------------------------------------------------------- * * !! This file was generated by xaringanthemer !! * * Changes made to this file directly will be overwritten * if you used xaringanthemer in your xaringan slides Rmd * ------------------------------------------------------- */ @import url(https://fonts.googleapis.com/css?family=Roboto&display=swap); @import url(https://fonts.googleapis.com/css?family=Roboto&display=swap); @import url(https://fonts.googleapis.com/css?family=Source+Code+Pro:400,700&display=swap); @import url(https://fonts.googleapis.com/css2?family=Atkinson+Hyperlegible&display=swap); :root { /* Fonts */ --text-font-family: 'Atkinson Hyperelegible'; --text-font-is-google: 1; --text-font-family-fallback: Roboto, -apple-system, BlinkMacSystemFont, avenir next, avenir, helvetica neue, helvetica, Ubuntu, roboto, noto, segoe ui, arial; --text-font-base: sans-serif; --header-font-family: 'Atkinson Hyperelegible' --header-font-is-google: 1; --header-font-family-fallback: Georgia, serif; --code-font-family: 'Source Code Pro'; --code-font-is-google: 1; --base-font-size: 20px; --text-font-size: 1rem; --code-font-size: 0.9rem; --code-inline-font-size: 1em; --header-h1-font-size: 1.75rem; --header-h2-font-size: 1.6rem; --header-h3-font-size: 1.5rem; /* Colors */ --text-color: #131516; --text-color-light: #555F61; --header-color: #FFF; --background-color: #FFF; --link-color: #107895; --code-highlight-color: rgba(255,255,0,0.5); --inverse-text-color: #d6d6d6; --inverse-background-color: #272822; --inverse-header-color: #f3f3f3; --inverse-link-color: #107895; --title-slide-background-color: #272822; --title-slide-text-color: #d6d6d6; --header-background-color: #FFF; --header-background-text-color: #FFF; } html { font-size: var(--base-font-size); } body { font-family: var(--text-font-family), var(--text-font-family-fallback), var(--text-font-base); font-weight: normal; color: var(--text-color); } h1, h2, h3 { font-family: var(--header-font-family), var(--header-font-family-fallback); color: var(--text-color-light); } .remark-slide-content { background-color: var(--background-color); font-size: 1rem; padding: 24px 32px 16px 32px; width: 100%; height: 100%; } .remark-slide-content h1 { font-size: var(--header-h1-font-size); } .remark-slide-content h2 { font-size: var(--header-h2-font-size); } .remark-slide-content h3 { font-size: var(--header-h3-font-size); } .remark-code, .remark-inline-code { font-family: var(--code-font-family), Menlo, Consolas, Monaco, Liberation Mono, Lucida Console, monospace; } .remark-code { font-size: var(--code-font-size); } .remark-inline-code { font-size: var(--code-inline-font-size); color: #000; } .remark-slide-number { color: #107895; opacity: 1; font-size: 0.9em; } a, a > code { color: var(--link-color); text-decoration: none; } .footnote { position: absolute; bottom: 60px; padding-right: 6em; font-size: 0.9em; } .remark-code-line-highlighted { background-color: var(--code-highlight-color); } .inverse { background-color: var(--inverse-background-color); color: var(--inverse-text-color); } .inverse h1, .inverse h2, .inverse h3 { color: var(--inverse-header-color); } .inverse a, .inverse a > code { color: var(--inverse-link-color); } img, video, iframe { max-width: 100%; } blockquote { border-left: solid 5px lightgray; padding-left: 1em; } @page { margin: 0; } @media print { .remark-slide-scaler { width: 100% !important; height: 100% !important; transform: scale(1) !important; top: 0 !important; left: 0 !important; } } /* Modified metropolis */ .clear{ border-top: 0px solid #FAFAFA; } h1 { margin-top: -5px; margin-left: -00px; margin-bottom: 30px; color: var(--text-color-light); font-weight: 200; } h2, h3, h4 { padding-top: -15px; padding-bottom: 00px; color: #1A292C; text-shadow: none; font-weight: 400; text-align: left; margin-left: 00px; margin-bottom: -10px; } .title-slide .inverse .remark-slide-content { background-color: #FAFAFA; } .title-slide { background-color: #FAFAFA; border-top: 80px solid #FAFAFA; } .title-slide h1 { color: var(--text-color); font-size: 40px; text-shadow: none; font-weight: 400; text-align: left; margin-left: 15px; } .title-slide h2 { margin-top: -15px; color: var(--link-color); text-shadow: none; font-weight: 300; font-size: 35px; text-align: left; margin-left: 15px; } .title-slide h3 { color: var(--text-color-light); text-shadow: none; font-weight: 300; font-size: 25px; text-align: left; margin-left: 15px; margin-bottom: 0px; } .title-slide h3:last-of-type { font-style: italic; font-size: 1rem; } /* Remove orange line */ hr, .title-slide h2::after, .mline h1::after { content: ''; display: block; border: none; background-color: #e5e5e5; color: #e5e5e5; height: 1px; } hr, .mline h1::after { margin: 1em 15px 0 15px; } .title-slide h2::after { margin: 10px 15px 35px 0; } .mline h1::after { margin: 10px 15px 0 15px; } /* turns off slide numbers for title page: https://github.com/gnab/remark/issues/298 */ .title-slide .remark-slide-number { display: none; } /* Custom CSS */ /* More line spacing */ body { line-height: 1.5; } /* Font styling */ .hi { font-weight: 600; } .mono { font-family: monospace; } .ul { text-decoration: underline; } .ol { text-decoration: overline; } .st { text-decoration: line-through; } .bf { font-weight: bold; } .it { font-style: italic; } /* Font Sizes */ .bigger { font-size: 125%; } .huge{ font-size: 150%; } .small { font-size: 95%; } .smaller { font-size: 85%; } .smallest { font-size: 75%; } .tiny { font-size: 50%; } /* Remark customization */ .clear .remark-slide-number { display: none; } .inverse .remark-slide-number { display: none; } .remark-code-line-highlighted { background-color: rgba(249, 39, 114, 0.5); } /* Xaringan tweeks */ .inverse { background-color: #23373B; text-shadow: 0 0 20px #333; /* text-shadow: none; */ } .title-slide { background-color: #ffffff; border-top: 80px solid #ffffff; } .footnote { bottom: 1em; font-size: 80%; color: #7f7f7f; } /* Lists */ li { margin-top: 4px; } /* Mono-spaced font, smaller */ .mono-small { font-family: monospace; font-size: 16px; } .mono-small .mjx-chtml { font-size: 103% !important; } .pseudocode, .pseudocode-small { font-family: monospace; background: #f8f8f8; border-radius: 3px; padding: 10px; padding-top: 0px; padding-bottom: 0px; } .pseudocode-small { font-size: 16px; } .remark-code { font-size: 68%; } .remark-inline-code { background: #F5F5F5; /* lighter */ /* background: #e7e8e2; /* darker */ border-radius: 3px; padding: 4px; } /* Super and Subscripts */ .super{ vertical-align: super; font-size: 70%; line-height: 1%; } .sub{ vertical-align: sub; font-size: 70%; line-height: 1%; } /* Subheader */ .subheader{ font-weight: 100; font-style: italic; display: block; margin-top: -25px; margin-bottom: 25px; } /* 2/3 left; 1/3 right */ .more-left { float: left; width: 63%; } .less-right { float: right; width: 31%; } .more-right ~ * { clear: both; } /* 9/10 left; 1/10 right */ .left90 { padding-top: 0.7em; float: left; width: 85%; } .right10 { padding-top: 0.7em; float: right; width: 9%; } /* 95% left; 5% right */ .left95 { padding-top: 0.7em; float: left; width: 91%; } .right05 { padding-top: 0.7em; float: right; width: 5%; } .left5 { padding-top: 0.7em; margin-left: 0em; margin-right: -0.4em; float: left; width: 7%; } .left10 { padding-top: 0.7em; margin-left: -0.2em; margin-right: -0.5em; float: left; width: 10%; } .left30 { padding-top: 0.7em; float: left; width: 30%; } .right30 { padding-top: 0.7em; float: right; width: 30%; } .thin-left { padding-top: 0.7em; margin-left: -1em; margin-right: -0.5em; float: left; width: 27.5%; } /* Example */ .ex { font-weight: 300; color: #555F61 !important; font-style: italic; } .col-left { float: left; width: 47%; margin-top: -1em; } .col-right { float: right; width: 47%; margin-top: -1em; } .clear-up { clear: both; margin-top: -1em; } /* Format tables */ table { color: #000000; font-size: 14pt; line-height: 100%; border-top: 1px solid #ffffff !important; border-bottom: 1px solid #ffffff !important; } th, td { background-color: #ffffff; } table th { font-weight: 400; } /* Attention */ .attn { font-weight: 500; color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Note */ .note { font-weight: 300; font-style: italic; color: #314f4f !important; /* color: #cccccc !important; */ font-family: 'Zilla Slab' !important; } /* Question and answer */ .qa { font-weight: 500; /* color: #314f4f !important; */ color: #e64173 !important; font-family: 'Zilla Slab' !important; } /* Figure Caption */ .caption { font-size: 0.8888889em; line-height: 1.5; margin-top: 1em; color: #6b7280; } </style> <!-- From xaringancolor --> <div style = "position:fixed; visibility: hidden"> $$ \require{color} \definecolor{purple}{rgb}{0.337254901960784, 0.00392156862745098, 0.643137254901961} \definecolor{navy}{rgb}{0.0509803921568627, 0.23921568627451, 0.337254901960784} \definecolor{ruby}{rgb}{0.603921568627451, 0.145098039215686, 0.0823529411764706} \definecolor{alice}{rgb}{0.0627450980392157, 0.470588235294118, 0.584313725490196} \definecolor{daisy}{rgb}{0.92156862745098, 0.788235294117647, 0.266666666666667} \definecolor{coral}{rgb}{0.949019607843137, 0.427450980392157, 0.129411764705882} \definecolor{kelly}{rgb}{0.509803921568627, 0.576470588235294, 0.337254901960784} \definecolor{jet}{rgb}{0.0745098039215686, 0.0823529411764706, 0.0862745098039216} \definecolor{asher}{rgb}{0.333333333333333, 0.372549019607843, 0.380392156862745} \definecolor{slate}{rgb}{0.192156862745098, 0.309803921568627, 0.309803921568627} \definecolor{cranberry}{rgb}{0.901960784313726, 0.254901960784314, 0.450980392156863} $$ </div> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { purple: ["{\\color{purple}{#1}}", 1], navy: ["{\\color{navy}{#1}}", 1], ruby: ["{\\color{ruby}{#1}}", 1], alice: ["{\\color{alice}{#1}}", 1], daisy: ["{\\color{daisy}{#1}}", 1], coral: ["{\\color{coral}{#1}}", 1], kelly: ["{\\color{kelly}{#1}}", 1], jet: ["{\\color{jet}{#1}}", 1], asher: ["{\\color{asher}{#1}}", 1], slate: ["{\\color{slate}{#1}}", 1], cranberry: ["{\\color{cranberry}{#1}}", 1] }, loader: {load: ['[tex]/color']}, tex: {packages: {'[+]': ['color']}} } }); </script> <style> .purple {color: #5601A4;} .navy {color: #0D3D56;} .ruby {color: #9A2515;} .alice {color: #107895;} .daisy {color: #EBC944;} .coral {color: #F26D21;} .kelly {color: #829356;} .jet {color: #131516;} .asher {color: #555F61;} .slate {color: #314F4F;} .cranberry {color: #E64173;} </style> ## Chapter 4: Correlation --- # Multiple Variables Almost everything we've done so far has been .it[univariate] statistics, but often we're interested in how multiple random variables are related? - How does education affect earnings? - How does race affect earnings? - How does experience affect earnings? Many events are .it[dependent] on other random variables. In this chapter we'll formalize this concept. --- # Probability Theory Recall that with single random variables we characterized probabilities with - PMF (probability mass function), `\(P(X=x)\)`, in discrete case - PDF (probability density function), `\(f(x)\)`, in continuous case -- When we have multiple random variables we use the .hi.purple[joint distribution] - `\(P(X=x, Y=y)\)` in the discrete case - `\(f(x,y)\)` in the continuous case The joint distribution describes the probability of obtaining `\(X=x\)` *and* `\(Y=y\)`. - .ex[Example:] If `\(X\)` is education and `\(Y\)` is income, you could ask what's the probability that from the population I have someone who earns $65,000/year *and* has a masters degree. --- # Properties for Joint Distribution For short hand, `\(P(x,y) = P(X=x, Y=y)\)` In this class we'll focus solely on the discrete case - `\(0 \leq P(x,y) \leq 1\)` - `\(\sum_x \sum_y P(x,y)=1\)` As long as X and Y are .hi.purple[not independent] `$$P(x,y) \neq P(x)P(y)$$` --- # Example Suppose that `\(X\)` is the number of girls born out of three kids and `\(Y\)` is whether the first child is a girl.
Sample Space
Outcome
\(X\)
\(Y\)
BBB
0
0
GBB
1
1
BGB
1
0
BBG
1
0
GGB
2
1
GBG
2
1
BGG
2
0
GGG
3
1
--- # Example Notice that the sample spaces are `\(S_X =\{0,1,2,3\}\)` and `\(S_Y=\{0,1\}\)`. The associated joint probabilities are:
Joint PDF
\( Y = 0 \)
\( Y = 1 \)
\(X = 0 \)
0.125
0.000
\(X = 1 \)
0.250
0.125
\(X = 2 \)
0.125
0.250
\(X = 3 \)
0.000
0.125
--- # Example Let's check this table satisfies the definition of a joint distribution - `\(0 \leq P(x,y) \leq 1 \checkmark\)` -- - `\(\sum_x \sum_y P(x,y)=1\)` `\begin{align} \sum_{x \in S_X} \sum_{y \in S_Y} Pr(x,y) &= Pr(0,0) + Pr(0,1) + Pr(1,0) + Pr(1,1) \\\\ &+ Pr(2,0) + Pr(2,1) + Pr(3,0) + Pr(3,1) \\\\ &= 1/8 + 0 + 2/8 + 1/8 \\\\ &+ 1/8 + 2/8 + 0 + 1/8 = 1 \end{align}` --- # Clicker Question Given the following joint probability mass function, what is the probability of the NASDAQ increasing in value and your portfolio loses value?
NASDAQ
Portfolio Increases
Portfolio Decreases
NASDAQ Increases
0.40
0.05
NASDAQ Decreases
0.15
0.40
--- # Clicker Question Given the following joint probability mass function, what is the probability that the NASDAQ increases in value?
NASDAQ
Portfolio Increases
Portfolio Decreases
NASDAQ Increases
0.40
0.05
NASDAQ Decreases
0.15
0.40
<ol type = "a"> <li>0.40</li> <li>0.05</li> <li>0.15</li> <li>0.45</li> </ol> --- # Clicker Question Given the following joint probability mass function, what is the probability that the NASDAQ increases in value, conditional on the portfolio value decreases?
NASDAQ
Portfolio Increases
Portfolio Decreases
NASDAQ Increases
0.40
0.05
NASDAQ Decreases
0.15
0.40
<ol type = "a"> <li>0.111</li> <li>0.889</li> <li>0.05</li> <li>0.40</li> </ol> --- # Visualizing a Joint Distribution The most useful for displaying the relationship between two .ul[quantitative] variables is a .hi.kelly[scatterplot] - Shows relationship between two quantitative variables - Each axis represents a variable - Individual data appear as a point, fixed by the values of both variables --- # Scatterplot Example <img src="data:image/png;base64,#midtermscatter.png" width="90%" style="display: block; margin: auto;" /> --- # Interpreting a Scatterplot Looking for patterns, and deviations from that pattern - Direction, form, strength of relationship - Any outliers? Describing the association - .hi.kelly[Positive Association]: above-average values of one tend to accompany above-average values of the other, and below-average values also tend to occur together - .hi.ruby[Negative Association]: above-average values of one tend to accompany below-average values of the other, and vice versa In general, if one variable is explanatory (influences change) and one is a response variable (outcome), then the explanatory variable is plotted on the x-axis --- # Correlation We need to supplement the graph with a numerical measure, generally we use .hi.alice[correlation]. The .alice[correlation] measures the direction and strength of the linear relationship between two quantitive variables. Correlation is usually written as `\(r\)` --- # Covariance In order to understand correlations, we must first discuss .hi.purple[covariance] Recall: `\(V(aX+bY)=a^2V(X)+b^2V(Y)+2ab\cdot cov(X,Y)\)` Covariance measures the joint variability of two random variables - Sign of covariance explains direction of relationship - Magnitude of covariance is hard to interpret.sup[*] - The .alice[correlation coefficient] will not have this problem - Covariance equals zero whenever X and Y are .hi.purple[independent] .footnote[.sup[*] Double the units, double the covariance!!] --- # Covariance We use the following formula to calculate covariance `$$cov(X,Y)=E(XY)-E(X)E(Y)$$` Note: `\(E(XY) \neq E(X)E(Y)\)` unless X and Y are independent and then cov(X,Y)=0 The magnitude of covariance depends on the units of X and Y - This means `\(cov(A,B) > cov(C,D)\)` .hi[does not] imply that A and B have stronger relationship than C and D - In order to compare relationships we must find a way to normalize their covariances --- # Correlation The .hi.alice[correlation] measures the direction and strength of the linear relationship between two quantitive variables. Correlation is usually written as `\(r\)` To calculate correlation, we normalize the covariance as so: $$ r=\frac{cov(X,Y)}{\sqrt{V(X)}\cdot \sqrt{V(Y)}} $$ --- # Correlation .subheader.alice[Notes on correlation] Values are always between -1 and 1 - `\(1 \implies\)` perfectly linear positive relationship (variables move same direction and same magnitude) - `\(-1 \implies\)` pefectly linear negative relationship (variables move in opposite direction but same magnitude) - Correlations are unit-less -- .it.purple[Doesn't imply a causal relationship] -- Drawbacks of correlation - Only measures .it[linear relationships] (we will see what this means) - Just because correlation is zero doesn't necessarily mean variables are independent - Not resistant to outliers --- # Correlations - Example <img src="data:image/png;base64,#ch4_files/figure-html/unnamed-chunk-2-1.svg" width="90%" style="display: block; margin: auto;" /> --- # Correlations - Example <img src="data:image/png;base64,#ch4_files/figure-html/unnamed-chunk-3-1.svg" width="90%" style="display: block; margin: auto;" /> --- # Correlations - Visualized <img src="data:image/png;base64,#correlations.jpg" width="90%" style="display: block; margin: auto;" /> --- # Why Correlation isn't Perfect <img src="data:image/png;base64,#corr.png" width="90%" style="display: block; margin: auto;" /> Bottom row is an example of non-linear relationships <!-- --- class: center, middle  --> --- # Covariance and Independence Since covariance (and correlations) only measure linear relationships: `$$cov(X,Y) = 0 \not\implies X \text{ and } Y \text{ are independent}$$` -- <br/> However, since `\(E(XY)=E(X)E(Y)\)` when X \text{ and } Y are independent: `$$X \text{ and } Y \text{ are independent} \rightarrow cov(X,Y)=0$$` --- # Joint Distributions When calculating the covariance we use equation `\(cov(X,Y) = E(XY)-E(X)E(Y)\)`
\(Y = 0\)
\(Y = 1\)
\(X = 0\)
0.125
0.000
\(X = 1\)
0.250
0.125
\(X = 2\)
0.125
0.250
\(X = 3\)
0.000
0.125
`$$E(XY)=x\cdot y \cdot P(x,y)$$` In this example: `$$E(XY)=(0\cdot 0 \cdot 1/8) + (0\cdot 1\cdot 0) + (1\cdot 0 \cdot 2/8) + (1\cdot 1 \cdot 1/8) +$$` `$$(2\cdot 0 \cdot 1/8) + (2\cdot 1 \cdot 2/8) + (3\cdot 0 \cdot 0) + (3\cdot 1 \cdot 1/8)=1$$` --- # Marginal Probabilties In order to calculate `\(E(X)\)` and `\(E(Y)\)` from a joint distribution we must first calculate the .hi.purple[marginal probabilities] of both X and Y.
\(Y = 0\)
\(Y = 1\)
\( Pr(X) \)
\(X = 0\)
1/8
0
1/8
\(X = 1\)
2/8
1/8
3/8
\(X = 2\)
1/8
2/8
3/8
\(X = 3\)
0
1/8
1/8
\(Pr(Y)\)
4/8
4/8
1
These marginal probabilities, `\(P(X=x)\)` are calculated adding up the probabilities across each scenario where `\(X=x\)` --- # Marginal Probabilities We can use these marginal probabilities to calculate `\(E(X)\)` and `\(E(Y)\)`.
\(Y = 0\)
\(Y = 1\)
\( Pr(X) \)
\(X = 0\)
1/8
0
1/8
\(X = 1\)
2/8
1/8
3/8
\(X = 2\)
1/8
2/8
3/8
\(X = 3\)
0
1/8
1/8
\(Pr(Y)\)
4/8
4/8
1
`\(E(X)=(0 \cdot 1/8) + (1 \cdot 3/8) + (2 \cdot 3/8) + (3 \cdot 1/8) =1.5\)` `\(E(Y)=(0 \cdot 4/8) + (1 \cdot 4/8) =0.5\)` --- # Covariance of Joint Distribution All of that work leads us here: `$$E(XY)=1$$` $$E(X)= 1.5 $$ `$$E(Y)= 0.5$$` `$$cov(X,Y)=E(XY)-E(X)E(Y) =1 - (1.5 \cdot 0.5) = 0.25$$` -- What does `\(0.25\)` mean? Is this a strong relationship? A weak one? --- # Covariance to Correlation Again, we often use correlation instead of covariance because correlation .hi[does not depend on the units] To find correlation from covariance we use the following equation: `$$r = \frac{\text{cov(X,Y)}}{\sqrt{\text{V(X)}\cdot \text{V(Y)}}}$$` So we need to calculate the variance of X and Y, using information about the joint probabilities --- # Covariance to Correlation Recall the joint probabilities we gathered from the table
\(Y = 0\)
\(Y = 1\)
\( Pr(X) \)
\(X = 0\)
1/8
0
1/8
\(X = 1\)
2/8
1/8
3/8
\(X = 2\)
1/8
2/8
3/8
\(X = 3\)
0
1/8
1/8
\(Pr(Y)\)
4/8
4/8
1
$$ E(X^2) = (0^2 \cdot 1/8) + (1^2 \cdot 3/8) + (2^2 \cdot 3/8) + (3^2 \cdot 1/8) = 3$$ $$ E(Y^2)=(0^2 \cdot 4/8) + (1^2 \cdot 4/8) = 0.5$$ --- # Covariance to Correlation `\(E(X)=1.5\)` and `\(E(X^2)=3 \rightarrow V(X)=3-1.5^2= 0.75\)` `\(E(Y)=0.5\)` and `\(E(Y^2)=0.5 \rightarrow V(Y)=0.5-0.5^2 = 0.25\)` `\(cov(X,Y)=0.25\)` `$$r = \frac{\text{cov(X,Y)}}{\sqrt{\text{V(X)}}\cdot \sqrt{\text{V(Y)}}} =\frac{0.25}{\sqrt{0.75}\cdot\sqrt{0.25}}=0.577$$` --- # Clicker Question What can be said of the correlation between the brand of an automobile and its quality? <ol type = "a"> <li>The correlation is negative, because smaller cars tend to have higher quality and larger cars tend to have lower quality.</li> <li>The correlation is positive, because better brands have higher quality.</li> <li>If the correlation is negative, an arithmetic mistake was made; correlation must be positive.</li> <li>Correlation makes no sense here, because brand is a categorical variable.</li> </ol> --- # Clicker Question Which of the following statements is false? <ol type = "a"> <li>Older men tend to have lower muscle density, so the correlation between age and muscle density in older men must be negative.</li> <li>Older children tend to be taller than younger children, so the correlation between age and height in children must be positive.</li> <li>A researcher finds that the correlation between two variables is close to 0, so the two variables must be unrelated.</li> <li>Taller people tend to be heavier than shorter people, so the correlation between height and weight must be positive.</li> </ol>