Task 1: Probabilities

Given a fair 6-sided dice 🎲, i.e. an outcome space \(\Omega = \{1,2,3,4,5,6\}\) and events

\[\begin{align} A &= \{2,4,6\},\\ B &= \{1,3,5\}. \end{align}\]

Define \(X\) as the outcome of rolling the dice once. What is \(\Pr(X \in A)\)?

Answer

In this simple sample space (i.e. a sample space where each element has the same probability of occuring), we can just count. There are 6 elements in \(\Omega\), event \(A\) contains 3 of those, i.e. in 3 out of 6 cases \(X\) could land on an even number. That is

\[\Pr(X \in A) = \frac{\text{number of success}}{\text{all potential outcomes}} = \frac{3}{6} = 0.5\]

class: inverse

Task 2: 2 Dice

Given two fair 6-sided dice 🎲 🎲, what is the probability of obtaining at least once the face “5”?

Again, counting. The total number of events from 2 dice are

o = outer(1:6,1:6,FUN = function(x,y){paste0("(",x,",",y,")")})
o

##      [,1]    [,2]    [,3]    [,4]    [,5]    [,6]   
## [1,] "(1,1)" "(1,2)" "(1,3)" "(1,4)" "(1,5)" "(1,6)"
## [2,] "(2,1)" "(2,2)" "(2,3)" "(2,4)" "(2,5)" "(2,6)"
## [3,] "(3,1)" "(3,2)" "(3,3)" "(3,4)" "(3,5)" "(3,6)"
## [4,] "(4,1)" "(4,2)" "(4,3)" "(4,4)" "(4,5)" "(4,6)"
## [5,] "(5,1)" "(5,2)" "(5,3)" "(5,4)" "(5,5)" "(5,6)"
## [6,] "(6,1)" "(6,2)" "(6,3)" "(6,4)" "(6,5)" "(6,6)"

i.e. for each potential value of the first dice (1,2,...,6) we have 6 values for the second dice (1,2,...,6). In total, 36 potential outcomes.

How many of those correspond to at least once the face “5”? Well, just look for “5” above! (Caution, don’t count “(5,5)” twice!)

So, we have

o[,5]

## [1] "(1,5)" "(2,5)" "(3,5)" "(4,5)" "(5,5)" "(6,5)"

and

o[5,]

## [1] "(5,1)" "(5,2)" "(5,3)" "(5,4)" "(5,5)" "(5,6)"

which together would make 12 elements, but we must not count “(5,5)” twice. Therefore,

\[\Pr(\text{At least once "5"}) = \frac{11}{36}\]

Task 3: Computing Variance by hand

Compute the variance of one 6-sided dice!

First a trick. Oftentimes it’s easier to work with the following formulation of variance:

\[\begin{align} Var(X) &= E[(Y-\mu_Y)^2] \\ &= E[Y^2 -2 Y \mu_Y + \mu_Y^2] \\ &= E(X^2) - E(2 Y \mu_Y) + E(\mu_x^2) \\ &= E(X^2) - 2 \mu_Y E(Y) + \mu_x^2 \\ &= E(X^2) - 2 \mu_Y^2 + \mu_x^2 \\ &= E(X^2) - \mu_Y^2 \\ \end{align}\]

we know that \(\mu_x = 3.5 = \frac{7}{2}\) from before, so just need the first part.

\[E(X^2) = \sum_{i=1}^6 \Pr(X = i) i^2 = \frac{1}{6} 91\] Therefore \[Var(X) = E(X^2) - \mu_x^2 = \frac{91}{6} - \frac{49}{4} = 11.6667\]

compute the variance of \(X \sim \text{Bernoulli}(p)\)!

Again,

\[Var(X) = E(X^2) - \mu_x^2 = E(X^2) - p^2\] \(E(X^2) = p 1^2 + (1-p) 0^2 = p\). this gives us

\[Var(X) = E(X^2) - \mu_x^2 = p - p^2 = p(1-p)\] Bonus question: what value of \(p\) maximizes this variance?

Task 4

r = data.frame(x0 = c(0.15,0.15),x1 = c(0.07,0.63))
names(r) <- c("Rain (X=0)", "No Rain (X=1)")
rownames(r) <- c("Long Commute (Y=0)", "Short Commute (Y=1)")
r

##                     Rain (X=0) No Rain (X=1)
## Long Commute (Y=0)        0.15          0.07
## Short Commute (Y=1)       0.15          0.63

Compute the marginal distribution of \(X\)
Compute the marginal distribution of \(Y\)

\[\Pr(Y = 0) = \Pr(Y=0, X = 0) + \Pr(Y=0, X = 1) = 0.22\] \[\Pr(Y = 1) = \Pr(Y=1, X = 0) + \Pr(Y=1, X = 1) = 0.78\] In general, just sum across rows (or columns):

(PrY = rowSums(r))

##  Long Commute (Y=0) Short Commute (Y=1) 
##                0.22                0.78

(PrX = colSums(r))

##    Rain (X=0) No Rain (X=1) 
##           0.3           0.7

Task 5: compute conditional distribution

Consider

##                     Rain (X=0) No Rain (X=1)
## Long Commute (Y=0)        0.15          0.07
## Short Commute (Y=1)       0.15          0.63

Suppose we know that it does not rain \(x = 1\). What is the distribution of \(Y\), given this knowledge, i.e. what is the distribution \(\Pr(Y|X=1)\) ?

answer

first, we can just grab the column of data where our imposed condition is true:

r[ , 2]

## [1] 0.07 0.63

however, we clearly see that this is not a valid probability distribution (it does not sum to 1). We must reweight the distribution, such that it accounts for the fact that within the sample space “No rain”, there is a total of 0.07 + 0.63 = 0.7 probability mass. In other words, the 0.07 here does not mean that, given “no rain”, Y=0 occurs with 7%. As with normal probabilities, we compute the probability by counting the number of successful events divded by the number of total events, in terms of probabilities,

\[\frac{\text{number of times }Y=0}{\text{number of times no rain}} = \frac{0.07}{0.07 + 0.63}= 0.01\]

Task 6: More conditional dists

gapminder <- dslabs::gapminder
gapminder_new <- gapminder %>%
  filter(year == 2015) %>%
  mutate(fertility_above_2 = (fertility > 2.1)) # dummy variable for fertility rate above replacement rate
gn = gapminder_new %>% filter(country != "Greenland")
abs_tab = table(gn$continent, gn$fertility_above_2)
prop_tab = round(prop.table(abs_tab),2)

prop_tab[,1] / sum(prop_tab[,1])

##     Africa   Americas       Asia     Europe    Oceania 
## 0.02325581 0.18604651 0.25581395 0.48837209 0.04651163

prop_tab[3,] / sum(prop_tab[3,])

##     FALSE      TRUE 
## 0.4230769 0.5769231

colSums(prop_tab)

## FALSE  TRUE 
##  0.43  0.57

Task 7: Conditional Expectation

What is the expected value of \(M\)? Hint: remember the formula for \(E(Y)\) from a few slides ago
What is the expected value of \(M\) given \(A=0\)? Hint: Just replace the \(\Pr\) parts in your formula!

Formula is of course

\[E(M)= \sum_i \Pr(M = m_i) m_i\] So, what is \(\Pr(M = m_i)\), i.e. the unconditional distribution of M? Just fill in the values:

\[0.8 \times 0 + 0.1 \times 1 + 0.06 \times 2 + 0.03 \times 3 + 0.01 \times 4 = 0.35\]

Ok, next, the whole thing given \(A=0\). Just grab the correct row from the above table, and recompute.

pr_a0 = c(0.7,0.13,0.1,0.05,0.02)
# check
sum(pr_a0)

## [1] 1

ok, we just need to sum that against values 0,1,2,3,4. That’s a range, right?

pr_a0 %*% 0:4 # the %*% performs a vector multiplication. a dot-product.

##      [,1]
## [1,] 0.56

Well, for \(A=1\) this is now super easy. Just grab the correct value from the table.

pr_a1 = c(0.9,0.07,0.02,0.01,0.00)
# check
sum(pr_a1)

## [1] 1

pr_a1 %*% 0:4

##      [,1]
## [1,] 0.14

Task 8 : Show Correlation and Conditional Mean

Show this result!

\[E(Y|X) = \mu_Y \Rightarrow cov(X,Y) = 0 \text{ and } corr(X,Y) = 0\]

Let’s assume that both \(X,Y\) have a mean of zero (if they don’t, it’s easy to transform them first)
In that case, the covariance simplifies:

\[\begin{align} E\left[ (X - \mu_X) (Y- \mu_Y) \right] &= \\ E\left[ (XY - X \mu_Y - \mu_X Y + \mu_X\mu_Y) \right] &=\\ E\left[ XY \right] \end{align}\]

Ok, use the LIE

\[\begin{align} E\left[ XY \right] &= \\ E\left[ E(YX | X) \right] &= \\ E\left[ E(Y | X) X\right] &= 0 \end{align}\]

because we were told that \(E(Y|X) = 0\) (first part of statement!)

So, if \(E\left[ XY \right] =0\), and if that is our expression for covariance, well that means that covariance is zero.
Covariance zero implies correlation zero.

Task 9: Variance of the Sample Average

Similarly to the mean of the sample average, here we just plug in the expression for \(\bar{y}\) into the variance formula:

\[var(\bar{y}) = var\left( \frac{1}{n}\sum_{i=1}^n y_i \right) = \frac{\sigma_Y^2}{n}\]

Show this result!

First, we need to know that \(var(x + y) = var(x) + var(y) + 2 cov(x,y)\). You can try to show that yourself (or ask Kacper!)
Then, we need to know that \(var(a X)\) where \(a\) is a constant is equal to \(a^2 var(x)\)

\[\begin{align} var(a X) &= E\left[ (aX)^2\right] - E\left[ (aX)\right]^2\\ &= E\left[ (aX)^2\right] - a^2 \mu_X^2\\ &= a^2 E\left[ X^2\right] - a^2 \mu_X^2\\ &= a^2 \left( E\left[ X^2\right] - \mu_X^2 \right) \\ &= a^2 var(x) \end{align}\]

Great. Now we can tackle this expression head on:

\[\begin{align} var(\bar{y}) &= var\left( \frac{1}{n}\sum_{i=1}^n y_i \right)\\ &= \frac{1}{n^2}var\left( \sum_{i=1}^n y_i \right) \end{align}\]

So far so good. Let’s us prove the rest by induction. That is, we first show that this works for \(n=2\), then for \(n=3\), and so on until we think it’s a general result:

If \(n=2\), and given that the \(y_i\)s are i.i.d., we have that (by result 1 above)

\[var(y_1 + y_2) = var(y_1) + var(y_2) + 2 cov(y_1, y_2)\]

the last term being equal to zero courtesy of i.i.d. So, that means that this is actually just

\[var(y_1 + y_2) = var(y_1) + var(y_2) = 2 \sigma^2\]

What about \(n=3\)?

\[var(y_1 + y_2 + y_3) = var(y_1) + var(y_2) + var(y_3) + 2 cov(y_1, y_2) + 2 cov(y_1, y_3) + 2 cov(y_2, y_3)\]

So, that’s \[var(y_1 + y_2 + y_3) = 3\sigma^2\]

So, we conclude that for \(n\) we have \[var\left( \sum_{i=1}^n y_i \right) = n \sigma^2\].

This means that we end up with

\[\begin{align} var(\bar{y}) &= var\left( \frac{1}{n}\sum_{i=1}^n y_i \right)\\ &= \frac{1}{n^2}var\left( \sum_{i=1}^n y_i \right)\\ &= \frac{1}{n^2}n \sigma_Y^2\\ &= \frac{1}{n}\sigma_Y^2\\ \end{align}\]

Task 9: Show That \(E\) is a linear operator

Remember the definition of expected value. Expected value is a probability-weighted sum. When you add two random variables, you’re adding their outcomes in each state of the world, then taking the probability-weighted average. Because multiplication distributes over addition, you can either:

Add the outcomes first, then weight by probabilities, OR
Weight each variable separately, then add together.

Both give the same answer.

Here, we want to establish that

\[E(X+Y) = E(X) + E(Y)\]

but that

\[E(g(X)) \neq g\left( E(X) \right)\]

for some non-linear function \(g(x)\).

Example with linear operation

Setup: You’re taking two classes. Your grade is uncertain.

Class A (X):

1/4 chance you get 80
3/4 chance you get 60

Class B (Y):

1/3 chance you get 90
2/3 chance you get 60

Assume independence between X and Y.

Method 1: E(X + Y) directly

All possible outcomes for X + Y:

X	Y	X + Y	Probability
80	90	170	(1/4)(1/3) = 1/12
80	60	140	(1/4)(2/3) = 2/12
60	90	150	(3/4)(1/3) = 3/12
60	60	120	(3/4)(2/3) = 6/12

\[E(X + Y) = 170 \cdot \frac{1}{12} + 140 \cdot \frac{2}{12} + 150 \cdot \frac{3}{12} + 120 \cdot \frac{6}{12}\]

\[= \frac{170 + 280 + 450 + 720}{12} = \frac{1620}{12} = 135\]

Method 2: E(X) + E(Y)

Calculate E(X): \[E(X) = 80 \cdot \frac{1}{4} + 60 \cdot \frac{3}{4} = \frac{80 + 180}{4} = \frac{260}{4} = 65\]

Calculate E(Y): \[E(Y) = 90 \cdot \frac{1}{3} + 60 \cdot \frac{2}{3} = \frac{90 + 120}{3} = \frac{210}{3} = 70\]

Sum: \[E(X) + E(Y) = 65 + 70 = 135\]

✓ Same answer!

Weights do Not distribute equally over nonlinear functions

If we transform the values nonlinearly (i.e. we don’t add/substract the same amount to each value but instead we scale them unequally in some way), we distort the average.

Just take the dice \(X \in \{1,2,3,4,5,6\}\) each with \(p=1/6\). Let’s take

\[g(x) = x^2\] as a simple example.

We know that \(E(X) = 3.5\). So, the first part is simply

\[g\left( E(X) \right) = 3.5^2 = 12.25\]

But the second part is \[E(g(X)) = E(X^2) = \frac{1}{6}(1^2 + 2^2 + \dots + 6^2) = \frac{1}{6}91 = 15.17\]

So, the two results are clearly different.

Probability - Tasks

Florian Oswald

2025-10-07