class: center, middle, inverse, title-slide # Statistics Review I ## EC 320: Introduction to Econometrics ### Winter 2022 --- class: inverse, middle # Prologue --- # Housekeeping .small[ .hi-pink[Office hours] - Kyu's: T 1500-1600 & R 1400-1500 - Mine : MW 1400-1500 .hi-pink[Lab today] at 4 p.m. MCK 442 .hi-pink[Exercise 1] this Friday by 11:59 p.m. - Just this once. You'll need to submit other exercises normally by Wednesday and not Friday - No need to worry. The assigned exercises shouldn't take longer than the lab time. If you attend the lab, you'll be able to complete the exercise on the spot. - **[SUBMIT FORMAT]** Please have your work knitted in **html** format. .hi-pink[Problem Set 1] will be posted by the end of this week, which will be due next Friday 11:59 p.m. .hi-pink[Issues with .mono[R]?] - After class - I have office hours today after class (14:00-15:00). ] --- # Motivation The focus of our course is __regression analysis__, a useful toolkit for learning from data. -- To understand regression, its mechanics, and its pitfalls, __we need to understand the underlying statistical theory.__ - Insights from theory can help us become better practitioners and savvier consumers of science. -- Today, we will review important concepts you learned in Math 243. - Maybe some you missed, too. --- class: inverse, middle # A Brief Math Review --- # Notation __Data__ on a variable `\(X\)` __are__<sup>*</sup> a sequence of `\(n\)` observations, indexed by `\(i\)`: `$$\{x_i: 1, \dots, n \}.$$` -- .pull-left[
] .pull-right[ - `\(i\)` indicates the row number. - `\(n\)` is the number of rows. - `\(x_i\)` is the value of `\(X\)` for row `\(i\)`. ] .footnote[ <sup>*</sup> _Data_ .mono[=] __plural__ of _datum_. ] --- # Summation The __summation operator__ adds a sequence of numbers over an index: `$$\sum_{i=1}^{n} x_i \equiv x_1 + x_2 + \dots + x_n.$$` - "The sum of `\(x_i\)` from 1 to `\(n\)`." -- .pull-left[
] .pull-right[ $$ `\begin{aligned} \sum_{i=1}^{4} x_i &= 7 + 4 + 10 + 2 \\ &= 23 \end{aligned}` $$ ] --- # Summation ## Rule 1 For any constant `\(c\)`, `$$\sum_{i=1}^{n} c = nc.$$` -- .pull-left[
] .pull-right[ $$ `\begin{aligned} \sum_{i=1}^{4} 2 &= 4 \times 2 \\ &= 8 \end{aligned}` $$ ] --- # Summation ## Rule 2 For any constant `\(c\)`, `$$\sum_{i=1}^{n} cx_i = c \sum_{i=1}^{n} x_i.$$` -- .pull-left[
] .pull-right[ $$ `\begin{aligned} \sum_{i=1}^{3} 2x_i &= 2\times7 + 2\times4 + 2 \times10\\ &= 14 + 8 + 20\\ &= 42 \end{aligned}` $$ $$ `\begin{aligned} 2 \sum_{i=1}^{3} x_i &= 2(7 + 4 + 10) \\ &= 42 \end{aligned}` $$ ] --- # Summation ## Rule 3 If `\(\{(x_i, y_i): 1, \dots, n \}\)` is a set of `\(n\)` pairs, and `\(a\)` and `\(b\)` are constants, then `$$\sum_{i=1}^{n} (ax_i + by_i) = a \sum_{i=1}^{n} x_i + b \sum_{i=1}^{n} y_i.$$` -- .pull-left[
] .pull-right[ $$ `\begin{aligned} \sum_{i=1}^{2} (2x_i + y_i) &= 18 + 10 \\ &= 28 \end{aligned}` $$ $$ `\begin{aligned} 2 \sum_{i=1}^{2} x_i + \sum_{i=1}^{2} y_i &= 2 \times 11 + 6 \\ &= 28 \end{aligned}` $$ ] --- # Summation ## __Caution__ The .hi-purple[sum of the ratios] __is not__ the .hi-green[ratio of the sums]: `$$\color{#9370DB}{\sum_{i=1}^{n} x_i / y_i} \neq \color{#007935}{\left(\sum_{i=1}^{n} x_i \right) \Bigg/ \left(\sum_{i=1}^{n} y_i \right)}.$$` - If `\(n = 2\)`, then `\(\frac{x_1}{y_1} + \frac{x_2}{y_2} \neq \frac{x_1 + x_2}{y_1 + y_2}\)`. -- The .hi-purple[sum of squares] __is not__ the .hi-green[square of the sums]: `$$\color{#9370DB}{\sum_{i=1}^{n} x_i^2} \neq \color{#007935}{\left(\sum_{i=1}^{n} x_i \right)^2}.$$` - If `\(n = 2\)`, then `\(x_1^2 + x_2^2 \neq (x_1 + x_2)^2 = x_1^2 + 2x_1x_2 + x_2^2\)`. --- class: inverse, middle # Probability Review --- # Random Variables __Experiment:__ Any procedure that is _infinitely repeatable_ and has a _well-defined set of outcomes_. __Sample space:__ The set of all possible outcomes an experiment could generate __Event:__ A subset of the .purple[sample space] or a combination of outcomes __Random Variable:__ A variable with _numerical values determined by an experiment or a random phenomenon_. This could also be considered as a function that maps elements of the sample space to another set (often set of the real numbers). --- # Random Variables .pull-left[ Example 1: __Experiment:__ Roll two six-sided dice __Sample space:__ {11, 12, ... , 21, 22, ... , 66} __Event:__ The sum of two top number is equal to 10, this could be combination of outcomes {46, 55, 64} __Random variable:__ The sum of two top number {2, 3, ..., 12} ] .pull-right[ Example 2: __Experiment:__ Tossing a coin two times __Sample space:__ {HH, HT, TH, TT} __Event:__ The fist toss results in a Heads, {HH, HT} __Random variable:__ The number of heads, {0, 1, 2} ] --- # Random Variables __Notation:__ capital letters for random variables (_e.g._, `\(X\)`, `\(Y\)`, or `\(Z\)`) and lowercase letters for particular outcomes (_e.g._, `\(x\)`, `\(y\)`, or `\(z\)`). -- __Example 1:__ Flipping a coin. - Two outcomes: heads or tails. - Quantify the outcomes: Define a random variable `\(\text{Heads}\)` such that `\(\text{Heads}=1\)` if heads and `\(\text{Heads}=0\)` if tails. -- __Example 2:__ Flipping a coin 10 times. - Several outcomes: 10 heads and 0 tails, 9 heads and 1 tails, 8 heads and 2 tails, _etc_. - The number of heads is a random variable: `\(\{\text{Heads}: 0,1,2,3,4,5,6,7,8,9,10\}.\)` --- # Discrete Random Variables __Discrete Random Variable:__ A random variable that takes a countable set of values. -- A __Bernoulli__ (or binary) random variable takes values of either 1 or 0. - Characterized by `\(\mathop{\mathbb{P}}(X=1)\)`, "the probability of success." - Probabilities sum to 1: `\(\mathop{\mathbb{P}}(X=1) + \mathop{\mathbb{P}}(X=0) = 1\)`. - For a "fair" coin, `\(\mathop{\mathbb{P}}(\text{Heads}=1)=\frac{1}{2} \implies \mathop{\mathbb{P}}(\text{Heads}=0)=\frac{1}{2}\)`. -- - More generally, if `\(\mathop{\mathbb{P}}(X=1) = \theta\)` for some `\(\theta \in [0,1]\)`, then `\(\mathop{\mathbb{P}}(X=0) = 1 - \theta\)`. - If the probability of passing this class is 75%, then the probability of not passing is 25%. --- # Discrete Random Variables ## Probabilities We describe a discrete random variable by listing its possible values with associated probabilities. If `\(X\)` takes on `\(k\)` possible values `\(\{x_1, \dots, x_k\}\)`, then the probabilities `\(p_1, p_2, \dots, p_k\)` are defined by `$$p_j = \mathop{\mathbb{P}}(X=x_j), \quad j = 1,2, \dots, k,$$` where `$$p_j \in [0,1]$$` and `$$p_1 + p_2 + \dots + p_k = 1.$$` --- # Discrete Random Variables ## Probability density function The __probability density function__ (__pdf__) of `\(X\)` summarizes possible outcomes and associated probabilities: `$$f(x_j)=p_j, \quad j=1,2,\dots,k.$$` -- ## Example 2020 Presidential election: 538 electoral votes at stake. - `\(\{X:0,1, \dots, 538\}\)` is the number of electoral votes won by the Democratic candidate. - Extremely unlikely that she will win 0 votes or all 538 votes: `\(f(0) \approx 0\)` and `\(f(538) \approx 0\)`. - Nonzero probability of winning an exact majority: `\(f(270) > 0\)`. --- # Discrete Random Variables ## Example Basketball player goes to the foul line to shoot two free throws. - `\(X\)` is the number of shots made (either 0, 1, or 2). - Suppose the pdf of `\(X\)` is `\(f(0)= 0.3\)`, `\(f(1) = 0.4\)`, `\(f(2) = 0.3\)`. - __Note:__ the probabilities sum to 1. Use the pdf to calculate the probability of the .hi-green[event] that the player makes _at least one shot_, _i.e._, `\(\mathop{\mathbb{P}}(X \geq 1)\)`. -- - `\(\mathop{\mathbb{P}}(X \geq 1) = \mathop{\mathbb{P}}(X=1) + \mathop{\mathbb{P}}(X=2)= 0.4 + 0.3 = 0.7\)`. --- # Continuous Random Variables __Continuous Random Variable:__ A random variable that takes any real value with _zero_ probability (that takes an infinite number of possible values). A continuous random variable is not defined at specific values but over an interval of values. - .hi-purple[Wait, what?!] The variable takes so many values that we can't count all possibilities, so the probability of any one particular value is zero. -- Measurement is discrete (_e.g._, dollars and cents), but variables with many possible values are best treated as continuous. - _e.g._, height, wages, temperature, _etc._ --- # Continuous Random Variables Probability density functions also describe continuous random variables. - Difference: Interested in the probability of events within a _range_ of values. - _e.g._ What is the probability of more than 1 inch of rain tomorrow? --- # Continuous Random Variables ## Uniform Distribution The probability density function of a variable uniformly distributed between 0 and 2 is $$ f(x) = `\begin{cases} \frac{1}{2} & \text{if } 0 \leq x \leq 2 \\ 0 & \text{if } x < 0 \text{ or } x>2 \end{cases}` $$ <img src="02-Statistics_Review_files/figure-html/unif-1.svg" style="display: block; margin: auto;" /> --- # Continuous Random Variables ## Uniform Distribution By definition, the area under `\(f(x)\)` is equal to 1. The .hi[shaded area] illustrates the probability of the event `\(1 \leq X \leq 1.5\)`. - `\(\mathop{\mathbb{P}}(1 \leq X \leq 1.5) = (1.5-1) \times0.5 = 0.25\)`. <img src="02-Statistics_Review_files/figure-html/unif2-1.svg" style="display: block; margin: auto;" /> --- # Continuous Random Variables ## Normal Distribution .hi-purple[The "bell curve."] - Symmetric: mean and median occur at the same point (_i.e._, no skew). - Low-probability events in tails; high-probability events near center. <img src="02-Statistics_Review_files/figure-html/norm-1.svg" style="display: block; margin: auto;" /> --- # Continuous Random Variables ## Normal Distribution The .hi[shaded area] illustrates the probability of the event `\(-2 \leq X \leq 2\)`. - "Find area under curve" .mono[=] use integral calculus (or, in practice, .mono[R]). - `\(\mathop{\mathbb{P}}(-2 \leq X \leq 2) \approx 0.95\)`. <img src="02-Statistics_Review_files/figure-html/norm2-1.svg" style="display: block; margin: auto;" /> --- # Expected Value A density function describes an entire distribution, but sometimes we just want a summary. -- The __expected value__ describes the _central tendency_ of distribution in a single number. - _Central tendency_ .mono[=] typical value. --- # Expected Value ## Definition (Discrete) The expected value of a discrete random variable `\(X\)` is the weighted average of its `\(k\)` values `\(\{x_1, \dots, x_k\}\)` and their associated probabilities: $$ `\begin{aligned} \mathop{\mathbb{E}}(X) &= x_1 \mathop{\mathbb{P}}(x_1) + x_2 \mathop{\mathbb{P}}(x_2) + \dots +x_k \mathop{\mathbb{P}}(x_k) \\ &= \sum_{j=1}^{k} x_j\mathop{\mathbb{P}}(x_j). \end{aligned}` $$ -- - Also known as the .hi[population mean]. --- # Expected Value ## Example Rolling a six-sided die once can take values `\(\{1, 2, 3, 4, 5, 6\}\)`, each with equal probability. .hi-purple[What is the expected value of a roll?] -- `\(\mathop{\mathbb{E}}(\text{Roll}) = 1 \times \frac{1}{6} + 2 \times \frac{1}{6} + 3 \times \frac{1}{6} + 4 \times \frac{1}{6} + 5 \times \frac{1}{6} + 6 \times \frac{1}{6} = \color{#9370DB}{3.5}\)`. - __Note:__ The expected value can be a number that isn't a possible outcome of `\(X\)`. --- # Expected Value ## Definition (Continuous) If `\(X\)` is a continuous random variable and `\(f(x)\)` is its probability density function, then the expected value of `\(X\)` is $$ \mathop{\mathbb{E}}(X) = \int_{-\infty}^{\infty} x f(x) dx. $$ - __Note:__ `\(x\)` represents the particular values of `\(X\)`. - Same idea as the discrete definition: describes the .hi[population mean]. <img src="02-Statistics_Review_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- # Expected Value ## Rule 1 For any constant `\(c\)`, `\(\mathop{\mathbb{E}}(c) = c\)`. -- ## Not-so-exciting examples `\(\mathop{\mathbb{E}}(5) = 5\)`. `\(\mathop{\mathbb{E}}(1) = 1\)`. `\(\mathop{\mathbb{E}}(4700) = 4700\)`. --- # Expected Value ## Rule 2 For any constants `\(a\)` and `\(b\)`, `\(\mathop{\mathbb{E}}(aX + b) = a\mathop{\mathbb{E}}(X) + b\)`. -- ## Example Suppose `\(X\)` is the high temperature in degrees Celsius in Eugene during August. The long-run average is `\(\mathop{\mathbb{E}}(X) = 28\)`. If `\(Y\)` is the temperature in degrees Fahrenheit, then `\(Y = 32 + \frac{9}{5} X\)`. .hi-purple[What is] `\(\color{#9370DB}{\mathop{\mathbb{E}}(Y)}\)`.hi-purple[?] -- - `\(\mathop{\mathbb{E}}(Y) = 32 + \frac{9}{5} \mathop{\mathbb{E}}(X) = 32 + \frac{9}{5} \times 28 = \color{#9370DB}{82.4}\)`. --- # Expected Value ## Rule 3 If `\(\{a_1, a_2, \dots , a_n\}\)` are constants and `\(\{X_1, X_2, \dots , X_n\}\)` are random variables, then $$ \color{#FD5F00}{\mathop{\mathbb{E}}(a_1 X_1 + a_2 X_2 + \dots + a_n X_n)} = \color{#007935}{a_1 \mathop{\mathbb{E}}(X_1) + a_2 \mathop{\mathbb{E}}(X_2) + \dots + a_n \mathop{\mathbb{E}}(X_n)}. $$ In English, .hi-orange[the expected value of the sum] .mono[=] .hi-green[the sum of expected values]. --- # Expected Value ## Rule 3 .hi-orange[The expected value of the sum] .mono[=] .hi-green[the sum of expected values]. ## Example Suppose that a coffee shop sells `\(X_1\)` small, `\(X_2\)` medium, and `\(X_3\)` large caffeinated beverages in a day. The quantities sold are random with expected values `\(\mathop{\mathbb{E}}(X_1) = 43\)`, `\(\mathop{\mathbb{E}}(X_2) = 56\)`, and `\(\mathop{\mathbb{E}}(X_3) = 21\)`. The prices of small, medium, and large beverages are `\(1.75\)`, `\(2.50\)`, and `\(3.25\)` dollars. .hi-purple[What is expected revenue?] -- $$ `\begin{aligned} \color{#FD5F00}{\mathop{\mathbb{E}}(1.75 X_1 + 2.50 X_2 + 3.35 X_n)} &= \color{#007935}{1.75 \mathop{\mathbb{E}}(X_1) + 2.50 \mathop{\mathbb{E}}(X_2) + 3.25 \mathop{\mathbb{E}}(X_3)} \\ &= \color{#9370DB}{1.75(43) + 2.50(56) + 3.25(21)} \\ &= \color{#9370DB}{283.5} \end{aligned}` $$ --- # Expected Value ## __Caution__ Previously, we found that the expected value of rolling a six-sided die is `\(\mathop{\mathbb{E}} \left(\text{Roll} \right) = 3.5\)`. - If we square this number, we get `\(\left[\mathop{\mathbb{E}} ( \text{Roll} ) \right]^2 = 12.25\)`. __Is__ `\(\left[\mathop{\mathbb{E}} \left( \text{Roll} \right) \right]^2\)` __the same as__ `\(\mathop{\mathbb{E}} \left(\text{Roll}^2 \right)\)`__?__ -- __No!__ $$ `\begin{aligned} \mathop{\mathbb{E}} \left( \text{Roll}^2 \right) &= 1^2 \times \frac{1}{6} + 2^2 \times \frac{1}{6} + 3^2 \times \frac{1}{6} + 4^2 \times \frac{1}{6} + 5^2 \times \frac{1}{6} + 6^2 \times \frac{1}{6} \\ &\approx 15.167 \\ &\neq 12.25. \end{aligned}` $$ --- # Expected Value ## __Caution__ Except in special cases, .hi-purple[the transformation of an expected value] __is not__ .hi-green[the expected value of a transformed random variable]. For some function `\(g(\cdot)\)`, it is typically the case that `$$\color{#9370DB}{g \left( \mathop{\mathbb{E}}(X) \right)} \neq \color{#007935}{\mathop{\mathbb{E}} \left( g(X) \right)}.$$` --- # Variance Random variables `\(\color{#e64173}{X}\)` and `\(\color{#9370DB}{Y}\)` share the same population mean, but are distributed differently. <img src="02-Statistics_Review_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" /> --- # Variance How tightly is a random variable distributed about its mean? - Let `\(\mu = \mathop{\mathbb{E}}(X)\)`. - Describe the distance of `\(X\)` from its population mean `\(\mu\)` as the squared difference: `\((X - \mu)^2\)`. -- __Variance__ tells us how far `\(X\)` deviates from `\(\mu\)`, _on average_: $$ \mathop{\text{Var}}(X) \equiv \mathop{\mathbb{E}}\left( (X - \mu)^2 \right) = \sigma^2 $$ - `\(\sigma^2\)` is shorthand for variance. <!-- - Distributing the terms above yields `\(\sigma^2 = \mathop{\mathbb{E}}(X^2 - 2X \mu + \mu^2) = \mathop{\mathbb{E}}(X^2) - 2 \mu^2 + \mu^2 = \mathop{\mathbb{E}}(X^2) - \mu^2\)`. --> --- # Variance ## Rule 1 `\(\mathop{\text{Var}}(X) = 0 \iff X\)` is a constant. - If a random variable never deviates from its mean, then it has zero variance. - If a random variable is always equal to its mean, then it's a (not-so-random) constant. --- # Variance ## Rule 2 For any constants `\(a\)` and `\(b\)`, `\(\mathop{\text{Var}}(aX + b) = a^2\mathop{\text{Var}}(X)\)`. -- ## Example Suppose `\(X\)` is the high temperature in degrees Celsius in Eugene during August. If `\(Y\)` is the temperature in degrees Fahrenheit, then `\(Y = 32 + \frac{9}{5} X\)`. .hi-purple[What is] `\(\color{#9370DB}{\mathop{\text{Var}}(Y)}\)`.hi-purple[?] -- - `\(\mathop{\text{Var}}(Y) = (\frac{9}{5})^2 \mathop{\text{Var}}(X) = \color{#9370DB}{\frac{81}{25} \mathop{\text{Var}}(X)}\)`. --- # Standard Deviation __Standard deviation__ is the positive square root of the variance: $$ \mathop{\text{sd}}(X) = +\sqrt{\mathop{\text{Var}}(X)} = \sigma $$ - `\(\sigma\)` is shorthand for standard deviation. --- # Standard Deviation ## Rule 1 For any constant `\(c\)`, `\(\mathop{\text{sd}}(c) = 0\)`. -- ## Rule 2 For any constants `\(a\)` and `\(b\)`, `\(\mathop{\text{sd}}(aX + b) = \left| a \right|\mathop{\text{sd}}(X)\)`. --- # Standardizing a Random Variable When we're working with a random variable `\(X\)` with an unfamiliar scale, it is useful to __standardize__ it by defining a new variable `\(Z\)`: $$ Z \equiv \frac{X - \mu}{\sigma}. $$ -- `\(Z\)` has mean `\(0\)` and standard deviation `\(1\)`. How? -- - First, some simple trickery: `\(Z = aX + b\)`, where `\(a \equiv \frac{1}{\sigma}\)` and `\(b \equiv - \frac{\mu}{\sigma}\)`. -- - `\(\mathop{\mathbb{E}}(Z) = a \mathop{\mathbb{E}}(X) + b = \mu \frac{1}{\sigma} - \frac{\mu}{\sigma} = 0\)`. -- - `\(\text{Var}(Z) = a^2\text{Var}(X) = \frac{1}{\sigma^2} \sigma^2 = 1\)`. --- # Covariance __Idea:__ Characterize the relationship between two random variables `\(X\)` and `\(Y\)`. __Definition:__ `\(\mathop{\text{Cov}}(X, Y) \equiv \mathop{\mathbb{E}} \left[ (X - \mu_X) (Y - \mu_Y) \right] = \sigma_{xy}\)`. -- - .hi-orange[Positive correlation:] When `\(\sigma_{xy} > 0\)`, then `\(X\)` is .orange[above] its mean when `\(Y\)` is .orange[above] its mean, _on average_. - .hi-green[Negative correlation:] When `\(\sigma_{xy} < 0\)`, then `\(X\)` is .green[below] its mean when `\(Y\)` is .green[above] its mean, _on average_. --- # Covariance ## Rule 1 If `\(X\)` and `\(Y\)` are independent, then `\(\mathop{\text{Cov}}(X, Y) = 0\)`. -- - __Statistical independence:__ If `\(X\)` and `\(Y\)` are independent, then `\(\mathop{\mathbb{E}}(XY) = \mathop{\mathbb{E}}(X)\mathop{\mathbb{E}}(Y)\)`. -- - `\(\mathop{\text{Cov}}(X, Y) = 0\)` means that `\(X\)` and `\(Y\)` are _uncorrelated_. __Caution:__ `\(\mathop{\text{Cov}}(X, Y) = 0\)` .hi-red[does not imply] that `\(X\)` and `\(Y\)` are independent. --- # Covariance ## Rule 2 For any constants `\(a\)`, `\(b\)`, `\(c\)`, and `\(d\)`, `\(\mathop{\text{Cov}}(aX + b, cY + d) = ac\mathop{\text{Cov}}(X, Y)\)` --- # Correlation Coefficient A problem with covariance is that it is sensitive to units of measurement. -- The __correlation coefficient__ solves this problem by rescaling the covariance: $$ \mathop{\text{Corr}}(X,Y) \equiv \frac{\mathop{\text{Cov}}(X,Y)}{\mathop{\text{sd}}(X) \times \mathop{\text{sd}}(Y)} = \frac{\sigma_{XY}}{\sigma_X \sigma_Y}. $$ - Also denoted as `\(\rho_{XY}\)`. - `\(-1 \leq \mathop{\text{Corr}}(X,Y) \leq 1\)` - Invariant to scale: if I double `\(Y\)`, `\(\mathop{\text{Corr}}(X,Y)\)` will not change. --- # Correlation Coefficient Perfect positive correlation: `\(\mathop{\text{Corr}}(X,Y) = 1\)`. <img src="02-Statistics_Review_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> --- # Correlation Coefficient Perfect negative correlation: `\(\mathop{\text{Corr}}(X,Y) = -1\)`. <img src="02-Statistics_Review_files/figure-html/unnamed-chunk-9-1.svg" style="display: block; margin: auto;" /> --- # Correlation Coefficient Positive correlation: `\(\mathop{\text{Corr}}(X,Y) > 0\)`. <img src="02-Statistics_Review_files/figure-html/unnamed-chunk-10-1.svg" style="display: block; margin: auto;" /> --- # Correlation Coefficient Negative correlation: `\(\mathop{\text{Corr}}(X,Y) < 0\)`. <img src="02-Statistics_Review_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" /> --- # Correlation Coefficient No correlation: `\(\mathop{\text{Corr}}(X,Y) = 0\)`. <img src="02-Statistics_Review_files/figure-html/unnamed-chunk-12-1.svg" style="display: block; margin: auto;" /> --- # Variance, Revisited ## Variance Rule 3 For constants `\(a\)` and `\(b\)`, $$ \mathop{\text{Var}} (aX + bY) = a^2 \mathop{\text{Var}}(X) + b^2 \mathop{\text{Var}}(Y) + 2ab\mathop{\text{Cov}}(X, Y). $$ -- - If `\(X\)` and `\(Y\)` are uncorrelated, then `\(\mathop{\text{Var}} (X + Y) = \mathop{\text{Var}}(X) + \mathop{\text{Var}}(Y)\)` - If `\(X\)` and `\(Y\)` are uncorrelated, then `\(\mathop{\text{Var}} (X - Y) = \mathop{\text{Var}}(X) + \mathop{\text{Var}}(Y)\)`