Given a fair 6-sided dice 🎲, i.e. an outcome space \(\Omega = \{1,2,3,4,5,6\}\) and events
\[\begin{align} A &= \{2,4,6\},\\ B &= \{1,3,5\}. \end{align}\]
Define \(X\) as the outcome of rolling the dice once. What is \(\Pr(X \in A)\)?
In this simple sample space (i.e. a sample space where each element has the same probability of occuring), we can just count. There are 6 elements in \(\Omega\), event \(A\) contains 3 of those, i.e. in 3 out of 6 cases \(X\) could land on an even number. That is
\[\Pr(X \in A) = \frac{\text{number of success}}{\text{all potential outcomes}} = \frac{3}{6} = 0.5\]
class: inverse
Given two fair 6-sided dice 🎲 🎲, what is the probability of obtaining at least once the face “5”?
Again, counting. The total number of events from 2 dice are
o = outer(1:6,1:6,FUN = function(x,y){paste0("(",x,",",y,")")})
o
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "(1,1)" "(1,2)" "(1,3)" "(1,4)" "(1,5)" "(1,6)"
## [2,] "(2,1)" "(2,2)" "(2,3)" "(2,4)" "(2,5)" "(2,6)"
## [3,] "(3,1)" "(3,2)" "(3,3)" "(3,4)" "(3,5)" "(3,6)"
## [4,] "(4,1)" "(4,2)" "(4,3)" "(4,4)" "(4,5)" "(4,6)"
## [5,] "(5,1)" "(5,2)" "(5,3)" "(5,4)" "(5,5)" "(5,6)"
## [6,] "(6,1)" "(6,2)" "(6,3)" "(6,4)" "(6,5)" "(6,6)"
i.e. for each potential value of the first dice
(1,2,...,6
) we have 6 values for the second dice
(1,2,...,6
). In total, 36 potential outcomes.
How many of those correspond to at least once the face “5”? Well, just look for “5” above! (Caution, don’t count “(5,5)” twice!)
So, we have
o[,5]
## [1] "(1,5)" "(2,5)" "(3,5)" "(4,5)" "(5,5)" "(6,5)"
and
o[5,]
## [1] "(5,1)" "(5,2)" "(5,3)" "(5,4)" "(5,5)" "(5,6)"
which together would make 12 elements, but we must not count “(5,5)” twice. Therefore,
\[\Pr(\text{At least once "5"}) = \frac{11}{36}\]
First a trick. Oftentimes it’s easier to work with the following formulation of variance:
\[\begin{align} Var(X) &= E[(Y-\mu_Y)^2] \\ &= E[Y^2 -2 Y \mu_Y + \mu_Y^2] \\ &= E(X^2) - E(2 Y \mu_Y) + E(\mu_x^2) \\ &= E(X^2) - 2 \mu_Y E(Y) + \mu_x^2 \\ &= E(X^2) - 2 \mu_Y^2 + \mu_x^2 \\ &= E(X^2) - \mu_Y^2 \\ \end{align}\]
we know that \(\mu_x = 3.5 = \frac{7}{2}\) from before, so just need the first part.
\[E(X^2) = \sum_{i=1}^6 \Pr(X = i) i^2 = \frac{1}{6} 91\] Therefore \[Var(X) = E(X^2) - \mu_x^2 = \frac{91}{6} - \frac{49}{4} = 11.6667\]
Again,
\[Var(X) = E(X^2) - \mu_x^2 = E(X^2) - p^2\] \(E(X^2) = p 1^2 + (1-p) 0^2 = p\). this gives us
\[Var(X) = E(X^2) - \mu_x^2 = p - p^2 = p(1-p)\] Bonus question: what value of \(p\) maximizes this variance?
r = data.frame(x0 = c(0.15,0.15),x1 = c(0.07,0.63))
names(r) <- c("Rain (X=0)", "No Rain (X=1)")
rownames(r) <- c("Long Commute (Y=0)", "Short Commute (Y=1)")
r
## Rain (X=0) No Rain (X=1)
## Long Commute (Y=0) 0.15 0.07
## Short Commute (Y=1) 0.15 0.63
\[\Pr(Y = 0) = \Pr(Y=0, X = 0) + \Pr(Y=0, X = 1) = 0.22\] \[\Pr(Y = 1) = \Pr(Y=1, X = 0) + \Pr(Y=1, X = 1) = 0.78\] In general, just sum across rows (or columns):
(PrY = rowSums(r))
## Long Commute (Y=0) Short Commute (Y=1)
## 0.22 0.78
(PrX = colSums(r))
## Rain (X=0) No Rain (X=1)
## 0.3 0.7
Consider
## Rain (X=0) No Rain (X=1)
## Long Commute (Y=0) 0.15 0.07
## Short Commute (Y=1) 0.15 0.63
first, we can just grab the column of data where our imposed condition is true:
r[ , 2]
## [1] 0.07 0.63
however, we clearly see that this is not a valid probability distribution (it does not sum to 1). We must reweight the distribution, such that it accounts for the fact that within the sample space “No rain”, there is a total of 0.07 + 0.63 = 0.7 probability mass. In other words, the 0.07 here does not mean that, given “no rain”, Y=0 occurs with 7%. As with normal probabilities, we compute the probability by counting the number of successful events divded by the number of total events, in terms of probabilities,
\[\frac{\text{number of times }Y=0}{\text{number of times no rain}} = \frac{0.07}{0.07 + 0.63}= 0.01\]
gapminder <- dslabs::gapminder
gapminder_new <- gapminder %>%
filter(year == 2015) %>%
mutate(fertility_above_2 = (fertility > 2.1)) # dummy variable for fertility rate above replacement rate
gn = gapminder_new %>% filter(country != "Greenland")
abs_tab = table(gn$continent, gn$fertility_above_2)
prop_tab = round(prop.table(abs_tab),2)
prop_tab[,1] / sum(prop_tab[,1])
## Africa Americas Asia Europe Oceania
## 0.02325581 0.18604651 0.25581395 0.48837209 0.04651163
prop_tab[3,] / sum(prop_tab[3,])
## FALSE TRUE
## 0.4230769 0.5769231
colSums(prop_tab)
## FALSE TRUE
## 0.43 0.57
What is the expected value of \(M\)? Hint: remember the formula for \(E(Y)\) from a few slides ago
What is the expected value of \(M\) given \(A=0\)? Hint: Just replace the \(\Pr\) parts in your formula!
Formula is of course
\[E(M)= \sum_i \Pr(M = m_i) m_i\] So, what is \(\Pr(M = m_i)\), i.e. the unconditional distribution of M? Just fill in the values:
\[0.8 \times 0 + 0.1 \times 1 + 0.06 \times 2 + 0.03 \times 3 + 0.01 \times 4 = 0.35\]
Ok, next, the whole thing given \(A=0\). Just grab the correct row from the above table, and recompute.
pr_a0 = c(0.7,0.13,0.1,0.05,0.02)
# check
sum(pr_a0)
## [1] 1
ok, we just need to sum that against values 0,1,2,3,4. That’s a range, right?
pr_a0 %*% 0:4 # the %*% performs a vector multiplication. a dot-product.
## [,1]
## [1,] 0.56
Well, for \(A=1\) this is now super easy. Just grab the correct value from the table.
pr_a1 = c(0.9,0.07,0.02,0.01,0.00)
# check
sum(pr_a1)
## [1] 1
pr_a1 %*% 0:4
## [,1]
## [1,] 0.14
Show this result!
\[E(Y|X) = \mu_Y \Rightarrow cov(X,Y) = 0 \text{ and } corr(X,Y) = 0\]
\[\begin{align} E\left[ (X - \mu_X) (Y- \mu_Y) \right] &= \\ E\left[ (XY - X \mu_Y - \mu_X Y + \mu_X\mu_Y) \right] &=\\ E\left[ XY \right] \end{align}\]
\[\begin{align} E\left[ XY \right] &= \\ E\left[ E(YX | X) \right] &= \\ E\left[ E(Y | X) X\right] &= 0 \end{align}\]
because we were told that \(E(Y|X) = 0\) (first part of statement!)
\[var(\bar{y}) = var\left( \frac{1}{n}\sum_{i=1}^n y_i \right) = \frac{\sigma_Y^2}{n}\]
First, we need to know that \(var(x + y) = var(x) + var(y) + 2 cov(x,y)\). You can try to show that yourself (or ask Kacper!)
Then, we need to know that \(var(a X)\) where \(a\) is a constant is equal to \(a^2 var(x)\)
\[\begin{align} var(a X) &= E\left[ (aX)^2\right] - E\left[ (aX)\right]^2\\ &= E\left[ (aX)^2\right] - a^2 \mu_X^2\\ &= a^2 E\left[ X^2\right] - a^2 \mu_X^2\\ &= a^2 \left( E\left[ X^2\right] - \mu_X^2 \right) \\ &= a^2 var(x) \end{align}\]
\[\begin{align} var(\bar{y}) &= var\left( \frac{1}{n}\sum_{i=1}^n y_i \right)\\ &= \frac{1}{n^2}var\left( \sum_{i=1}^n y_i \right) \end{align}\]
\[var(y_1 + y_2) = var(y_1) + var(y_2) + 2 cov(y_1, y_2)\]
the last term being equal to zero courtesy of i.i.d. So, that means that this is actually just
\[var(y_1 + y_2) = var(y_1) + var(y_2) = 2 \sigma^2\]
\[var(y_1 + y_2 + y_3) = var(y_1) + var(y_2) + var(y_3) + 2 cov(y_1, y_2) + 2 cov(y_1, y_3) + 2 cov(y_2, y_3)\]
So, that’s \[var(y_1 + y_2 + y_3) = 3\sigma^2\]
This means that we end up with
\[\begin{align} var(\bar{y}) &= var\left( \frac{1}{n}\sum_{i=1}^n y_i \right)\\ &= \frac{1}{n^2}var\left( \sum_{i=1}^n y_i \right)\\ &= \frac{1}{n^2}n \sigma_Y^2\\ &= \frac{1}{n}\sigma_Y^2\\ \end{align}\]
Remember the definition of expected value. Expected value is a probability-weighted sum. When you add two random variables, you’re adding their outcomes in each state of the world, then taking the probability-weighted average. Because multiplication distributes over addition, you can either:
Both give the same answer.
Here, we want to establish that
\[E(X+Y) = E(X) + E(Y)\]
but that
\[E(g(X)) \neq g\left( E(X) \right)\]
for some non-linear function \(g(x)\).
Setup: You’re taking two classes. Your grade is uncertain.
Class A (X):
Class B (Y):
Assume independence between X and Y.
All possible outcomes for X + Y:
X | Y | X + Y | Probability |
---|---|---|---|
80 | 90 | 170 | (1/4)(1/3) = 1/12 |
80 | 60 | 140 | (1/4)(2/3) = 2/12 |
60 | 90 | 150 | (3/4)(1/3) = 3/12 |
60 | 60 | 120 | (3/4)(2/3) = 6/12 |
\[E(X + Y) = 170 \cdot \frac{1}{12} + 140 \cdot \frac{2}{12} + 150 \cdot \frac{3}{12} + 120 \cdot \frac{6}{12}\]
\[= \frac{170 + 280 + 450 + 720}{12} = \frac{1620}{12} = 135\]
Calculate E(X): \[E(X) = 80 \cdot \frac{1}{4} + 60 \cdot \frac{3}{4} = \frac{80 + 180}{4} = \frac{260}{4} = 65\]
Calculate E(Y): \[E(Y) = 90 \cdot \frac{1}{3} + 60 \cdot \frac{2}{3} = \frac{90 + 120}{3} = \frac{210}{3} = 70\]
Sum: \[E(X) + E(Y) = 65 + 70 = 135\]
✓ Same answer!
If we transform the values nonlinearly (i.e. we don’t add/substract the same amount to each value but instead we scale them unequally in some way), we distort the average.
Just take the dice \(X \in \{1,2,3,4,5,6\}\) each with \(p=1/6\). Let’s take
\[g(x) = x^2\] as a simple example.
We know that \(E(X) = 3.5\). So, the first part is simply
\[g\left( E(X) \right) = 3.5^2 = 12.25\]
But the second part is \[E(g(X)) = E(X^2) = \frac{1}{6}(1^2 + 2^2 + \dots + 6^2) = \frac{1}{6}91 = 15.17\]
So, the two results are clearly different.