Econometrics

.title[
# Econometrics
]
.subtitle[
## Probability: 🎲, 🔮, 🪙
]
.author[
### Florian Oswald
]
.date[
### UniTo ESOMAS </br> 2025-10-09
]

---

---

# Probability

This set of slides covers concepts from Stock and Watson chapter 2.

---

# Probabilities

*Probability* is a function that assigns a value in `$[0,1]$` to a *set* (representing *events*).

Consider a fair 6-sided dice. 🎲

* Outcome space: `$\Omega = \{1,2,3,4,5,6\}$`
* Event: a partition of `$\Omega$`.

For example, the events A ("the outcome is even") and B ("the outcome is odd") are:

`$$\begin{align}
A &= \{2,4,6\},\\
B &= \{1,3,5\}.
\end{align}$$`

Having a *fair* dice means that

`$$\Pr(1) = \dots = \Pr(6) = \frac{1}{6}$$`

---

# Task 1: Probabilities

Given a fair 6-sided dice 🎲, i.e. an outcome space `$\Omega = \{1,2,3,4,5,6\}$`
and events

`$$\begin{align}
A &= \{2,4,6\},\\
B &= \{1,3,5\}.
\end{align}$$`

how does one determine the ***likelihood*** (or the **probability**) of event A occuring?

---

# Discrete Random Variables

A *Discrete Random Variable* is a function mapping outcomes to measurements.

For example, we might call

* `$X$` the number we obtain from throwing the dice once.

* `$X_1,X_2$` the 2 numbers we obtain from throwing the dice two times.

* `$D$` an indicator (either `0`, `1`) for whether a randomly sampled person answers "yes" or "no" when asked whether they have children.

---
class: inverse

# Gambler's Ruin

The dealer tosses a fair coin. If it comes up tails (`T`) you win, if it's heads (`H`)you loose.

Suppose we've seen the following sequence of tosses so far:

```
outcome:  H   H   H   H   H   H   H   H   H   H   ?
toss num: 1.  2.  3.  4.  5.  6.  7.  8.  9.  10. 11.  
```

You have lost 10 times in a row now. Does this increase the probability that you will win the next round?

Which sequence is more likely to occur?

1. `TTTTTTTTTT` (10 `T` in a row)?
2. `HTHTTHTHHT` (random occurence of `T` and `H`)

---

## Gambler's Ruin

The probability of hitting 11 `H` in a row is

$$
\Pr(\text{11 Heads in a Row}) = \frac{1}{2^{11}} = 0.00048 = 0.05\%
$$
--

Ok, but I already had 10 `H`! What's the next toss likely going to be given that miserable history?

$$
\Pr(\text{toss 11 is Heads} | 10 \text{ Heads before})
$$
Let's calculate that probability on the board!

---

# Task 2: 2 Dice

Given two fair 6-sided dice 🎲 🎲, what is the probability of obtaining *at least* once the face "5"?

---

# Probability Distributions

---

---

## PDF/PMF and CDF of discrete RVs

* PDF/PMF: Probability Density/Mass Function. Table listing each outcome and the associated probability of observing it.
* CDF: Cumulative Distribution Function. Probability that a given RV takes on a value *less than or equal a certain value* (requires a notion of *ordering* - e.g. what about a dice with 6 different **colors**?)

### For our 6-sided fair dice

```
##   outcome pdf cdf
## 1       1 1/6 1/6
## 2       2 1/6 2/6
## 3       3 1/6 3/6
## 4       4 1/6 4/6
## 5       5 1/6 5/6
## 6       6 1/6 6/6
```

---

## PDF/PMF and CDF of discrete RVs

### For our 6-sided fair dice

<img src="chapter_probability_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" />
]

.pull-right[
<img src="chapter_probability_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" />
]

---

## PDF/PMF and CDF of discrete RVs

### For our fair coin

```
##   outcome pdf cdf
## 1    H(0) 1/2 1/2
## 2    T(1) 1/2 2/2
```

---

## PDF/PMF and CDF of discrete RVs

### For our fair coin

<img src="chapter_probability_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" />
]

.pull-right[
<img src="chapter_probability_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" />
]

---

## Bernoulli Distribution

* A special discrete RV with 2 outcomes: `0` and `1`, where event `1` ("success") occurs with probability `$p$`.

For example:

* Tomorrow it will rain with probability `$p$` (it will *not* rain with `$1-p$`).

What kind of RV is flipping a *fair* coin? What about an *unfair* coin?

---

## PDFs and CDFs of continuous Variables

* slightly more complicated because we need calculus and integration.
* The basic idea is the same!

---

# Expected Value and Variance

---

---

## Expected Value and Variance

We write

`$$E(Y) = y_1 p_1 + \dots + y_n p_n$$`

For example for our dice example, where `$X$` is the result of throwing the dice:

`$$\begin{align}
E(X) &= 1 \times \Pr(1) + 2 \times \Pr(2) + \dots + 6 \times \Pr(6) \\
     &=
\end{align}$$`

**Everybody calculate this result now!**

---

# Expected Value and Variance

We write

$$
E(Y) = y_1 p_1 + \dots + y_n p_n
$$

For example for our dice example, where `$X$` is the result of throwing the dice:

`$$\begin{align}
E(X) &= 1 \times \Pr(1) + 2 \times \Pr(2) + \dots + 6 \times \Pr(6) \\
     &= 1 \times \frac{1}{6} + 2 \times \frac{1}{6} + \dots + 6 \times \frac{1}{6} \\
     &= 3.5 \\
\end{align}$$`

* So: We need to know ***all*** weights and values (`$\Pr$` and `$X$`) in order to compute this quantity.

---

# `$E(X)$` is a Theoretical Concept. Mission Impossible?

* Imagine you get this message:

1. Your mission, should you choose to accept it, is to inspect this device : 🔮. Whenever you touch it, it displays a number in `$\{1,\dots,10\}$`. E.g. `4,8,4,9,1,5,10,9,1`...kind of random.
2. We must know the long-run average, i.e. E(🔮). Or something terrible will happen!
3. This message will destroy itself in 10,9,8,...💣

* You **don't know** the theoretical distribution of all the numbers in 🔮 (the `$\Pr$`'s). Time is running ⏰

* What could you do? 🤔

---

---

# `$E(X)$` is a Theoretical Concept. Mission Impossible?

* If we had a huge number of observations from 🔮 (a *sample*), we could come up with a *guess* of E(🔮), based on the data we got. Empirical, like.

``` r
set.seed(12345)  # to ensure reproducibility
Ps = runif(10)
Ps[c(2,3)] = 0
Ps = Ps / sum(Ps) # generate random weights
x = sample(1:10, size = 10000, replace = TRUE, prob = Ps)
xbar = mean(x)
xbar
```

```
## [1] 6.3418
```

* `xbar` = `$\bar{x} = \frac{1}{N} \sum_i^{N} x_i$` is called *Sample Mean*, *Arithmetic Average*, *Sample Average*
* `$\bar{x}$` is an ***estimator*** for `$E(X) = \mu_X$`. (`$E(X) = \mu_X$` by the way.)

---

# Central Tendency - Mean and Median

.pull-left[
`mean(x)`: the average of all values in `x`.
`$$E(X) = \mu_X = \frac{1}{N}\sum_{i=1}^N x_i = \bar{x}$$`

***The second equality 👆 is correct only if...???***

``` r
x <- c(1,2,2,2,2,100)
mean(x)
```

```
## [1] 18.16667
```

``` r
mean(x) == sum(x) / length(x)
```

```
## [1] FALSE
```
]

.pull-right[
`median`: the value `$x_j$` below and above which 50% of the values in `x` lie. `$m$` is the median if

`$$\Pr(X \leq m) \geq 0.5 \text{ and } \Pr(X \geq m) \geq 0.5$$`
    
The median is robust against *outliers*.

``` r
median(x)
```

```
## [1] 2
```
]

---

## Quick Review

1. EV of a bernoulli
2. Continuous RV

---

# Variance and Standard Deviation

* A measure of *spread* of a distribution.
* The definition of *variance* is

`$$var(Y) = E[(Y-\mu_Y)^2]$$`

if `$Y$` is discrete,

`$$var(Y) = \sum_{i=1}^N (y_i-\mu_Y)^2 \times p_i.$$`

* There is an issue with scaling: we *square* deviations.

* The standard deviation scales back to units of the data:

`$$\sigma_Y = \sqrt{var(Y)}$$`

???

* why squared?
* because deviations of same magnitude but opposite sign cancel out
* mean([5,15])

also because want to "exaggerate" larger deviations

* could also take absolute value of deviations no?

---

## Variance

Consider two `normal distributions` with equal mean at `0`:
]

.pull-right[
<img src="chapter_probability_files/figure-html/unnamed-chunk-11-1.svg" style="display: block; margin: auto;" />

Compute with:

``` r
var(x)
```
]

---

## Example of other Spread Measures

``` r
# % catholic in 47 french-speaking
# swiss cantons in 1888
plot(swiss$Catholic,rep(1,nrow(swiss)),pch = 3,
     cex = 2,xlab = "% Catholic",yaxt = "n",ylab = "") 
```

<img src="chapter_probability_files/figure-html/unnamed-chunk-13-1.svg" style="display: block; margin: auto;" />
]

| Measure            | `R`              | Result             |
|:---------:|:-------------------:|:---------------------:|
| Variance           | `var(swiss$Catholic)`   | 1739.29   |
| Standard Deviation | `sd(swiss$Catholic)`    | 41.7    |
| IQR                | `IQR(swiss$Catholic)`   | 87.93   |
| Minimum            | `min(swiss$Catholic)`   | 2.15   |
| Maximum            | `max(swiss$Catholic)`   | 100   |
| Range              | `range(swiss$Catholic)` | 2.15, 100 |

]

---

# Task 3: Computing Variance by hand

1. Compute the variance of our 6-sided dice!
2. compute the variance of `$X \sim \text{Bernoulli}(p)$`!
    Bonus question: what value of `$p$` maximizes this variance?

---

# Two Random Variables <br> `$(X,Y)$`, (🎲, 🎲), (🎲, 🪙), (🪙, 🪙)

---

---

# Two Random Variables: Example

.pull-left[
<img src="chapter_probability_files/figure-html/x-y-corr-1.svg" style="display: block; margin: auto;" />
]

* Here, `$X$` and `$Y$` are ***joint normally*** distributed.
* We would write $$(X, Y) \sim 
\mathcal{N}\biggl(
\begin{bmatrix}
\mu_X \\
\mu_Y
\end{bmatrix},
\,
\Sigma
\biggr)$$`
 
where `$\Sigma$` is a *matrix*

`$$\begin{bmatrix}
\sigma_X^2 & \rho \, \sigma_X \sigma_Y \\
\rho \, \sigma_X \sigma_Y & \sigma_Y^2
\end{bmatrix}$$`

]

Taking as example the data in this plot, the concepts *covariance* and *correlation* relate to the following type of question:

---

# Tabulating Data

`table(x)` is a useful function that counts the occurence of each unique value in `x`:

``` r
# install.packages("dslabs") in order to use this command
data("gapminder",package = "dslabs")
table(gapminder$continent)
```

```
## 
##   Africa Americas     Asia   Europe  Oceania 
##     2907     2052     2679     2223      684
```

The same can be achieved using the `count` function (from `dplyr`)

``` r
gapminder %>% count(continent)
```

```
##   continent    n
## 1    Africa 2907
## 2  Americas 2052
## 3      Asia 2679
## 4    Europe 2223
## 5   Oceania  684
```

---

# Tabulating Data

Given two variables, `table` produces a contingency table:

``` r
gapminder_new <- gapminder %>%
  filter(year == 2015) %>%
  mutate(fertility_above_2 = (fertility > 2.1)) # dummy variable for fertility rate above replacement rate
```

``` r
table(gapminder_new$fertility_above_2)
```

```
## 
## FALSE  TRUE 
##    80   104
```

``` r
table(gapminder_new$continent,gapminder_new$fertility_above_2)
```

```
##           
##            FALSE TRUE
##   Africa       2   49
##   Americas    15   20
##   Asia        20   27
##   Europe      39    0
##   Oceania      4    8
```

---

# Cross-Tabulating Data: A joint distribution!

The probability that `$X = x$` and `$Y=y$` is

`$$\Pr(X=x,Y=y)$$`
.pull-left[

``` r
# table with absolute numbers
abstab = table(gapminder_new$continent,
               gapminder_new$fertility_above_2)
(jd <- round(prop.table(abstab),2))  # proportions in each cell
```

```
##           
##            FALSE TRUE
##   Africa    0.01 0.27
##   Americas  0.08 0.11
##   Asia      0.11 0.15
##   Europe    0.21 0.00
##   Oceania   0.02 0.04
```

* check: proper probability distribution?

``` r
sum(jd)  # sums up all elements
```

```
## [1] 1
```
]

* for instance
`$$\Pr(\text{Asia},\text{TRUE}) = 0.14673$$`

* You can see 👀 that this is just the **share** of cases in each bin!

]

---

# Tabulating Data: marginal distributions!

``` r
# proportions by row
(m1 = prop.table(abstab, margin = 1))
```

```
##           
##                 FALSE       TRUE
##   Africa   0.03921569 0.96078431
##   Americas 0.42857143 0.57142857
##   Asia     0.42553191 0.57446809
##   Europe   1.00000000 0.00000000
##   Oceania  0.33333333 0.66666667
```

check?

``` r
rowSums(m1)
```

```
##   Africa Americas     Asia   Europe  Oceania 
##        1        1        1        1        1
```
]

``` r
# proportions by column
(m2 = prop.table(abstab, margin = 2))
```

```
##           
##                 FALSE       TRUE
##   Africa   0.02500000 0.47115385
##   Americas 0.18750000 0.19230769
##   Asia     0.25000000 0.25961538
##   Europe   0.48750000 0.00000000
##   Oceania  0.05000000 0.07692308
```

check?

``` r
colSums(m2)
```

```
## FALSE  TRUE 
##     1     1
```
]

---

# Joint Distributions

The probability that `$X = x$` and `$Y=y$` is

`$$\Pr(X=x,Y=y)$$`

```
##                     Rain (X=0) No Rain (X=1)
## Long Commute (Y=0)        0.15          0.07
## Short Commute (Y=1)       0.15          0.63
```

for instance,

* `$\Pr(Y = 1, X = 0) = 0.15$`
---

# Marginal Distributions

`$$\Pr(Y = y) = \sum_{i=1}^k \Pr(X=x_i, Y = y)$$`

```
##                     Rain (X=0) No Rain (X=1)
## Long Commute (Y=0)        0.15          0.07
## Short Commute (Y=1)       0.15          0.63
```

* Probability of long commute, *irrespective of rain*, is 0.15 + 0.07 = 0.22

* Probability of rain, *irrespective of length of commute*, is 0.15 + 0.15 = 0.3

---

# Task 4 : Computing Marginal Distributions

``` r
r
```

```
##                     Rain (X=0) No Rain (X=1)
## Long Commute (Y=0)        0.15          0.07
## Short Commute (Y=1)       0.15          0.63
```
]

.pull-right[
1. Compute the marginal distribution of `$X$`
1. Compute the marginal distribution of `$Y$`
]

---

# Conditional distributions

* Conditional distribution is `$\Pr(X=x|Y=y) = \frac{\Pr(X=x,Y=y)}{\Pr(Y=y)}$`

* We **fix** the value of one variable, and look at the resulting distribution of the other variable.

* Notice, the result must be a *valid* probability distribution, i.e., it must sum to 1.

* We compute the probability as before, but restrict attention to where the condition we impose is true.

---

# Task 5: compute conditional distribution

Consider

```
##                     Rain (X=0) No Rain (X=1)
## Long Commute (Y=0)        0.15          0.07
## Short Commute (Y=1)       0.15          0.63
```

* Suppose we know that it does not rain `$x = 1$`. What is the distribution of `$Y$`, *given* this knowledge, i.e. what is the distribution `$\Pr(Y|X=1)$` ?

---

# Task 6: Computing More Conditional Distributions

``` r
jd
```

```
##           
##            FALSE TRUE
##   Africa    0.01 0.27
##   Americas  0.08 0.11
##   Asia      0.11 0.15
##   Europe    0.21 0.00
##   Oceania   0.02 0.04
```
]

.pull-right[
1. Compute the Distribution of countries conditional on low fertility?
2. Compute the Distribution of countries conditional on being in Asia?
3. Compute the marginal distribution of high/low fertility
]

---

# Conditional Expectation

We define

`$$E(Y|X = x) = \sum_{i=1}^k y_i \Pr(Y = y_i|X=x)$$`

* Notice the `$y_i$` there!

* Basically this is expected value of `$Y$`, but **under the assumption that** `$X$` takes on the value `$x$`.

* We *fix* the joint distribution of `$(Y,X)$` at a certain value `$X=x$`.

---

# Task 7: Conditional Expectation

![:scale 80%](../img/photos/table2-3.png)
1. Compute `$E(M)$`

2. Compute `$E(M|A = 0)$`

---

# Law of Iterated Expectations

* We can decompose the expected value of `$Y$` into subgroups, see what fraction of total probability each group makes up, and compute their weighted average.

```
##    name sex height
## 1 Peter   M   1.90
## 2 James   M   1.75
## 3  Mary   F   1.68
## 4  John   M   1.86
## 5   Amy   F   1.59
```
* The average height here is `$(1.9 + 1.75 + 1.68 + 1.86 + 1.59) / 5 = 1.756$`:

---

# Law of Iterated Expectations

> The average height here is 1.756

* What is the average for each **group** by sex?

```
## # A tibble: 2 × 3
##   sex       n mean_height
##   <chr> <int>       <dbl>
## 1 F         2        1.64
## 2 M         3        1.84
```

* Can we recover the unconditional mean from this information? Yes we can!

* We need again *weights* and *values*, like for the standard `$E(Y)$`

``` r
weights = c(2/5, 3/5)
values = c(1.635, 1.836667)
sum(weights * values) # 2/5 * 1.635 + 3/5 * 1.836667
```

```
## [1] 1.756
```

---

# Law of Iterated Expectations

* What did we just do?

* We established that we can get the unconditional mean of `$Y$` from the *conditional* one, if we know the *weights* we have to give to each subgroup. More precisely:

.pull-left[
1. We know `$E(Y|X)$`:
    1. `$E(Y|\text{male})= 1.836667$`
    2. `$E(Y|\text{female})= 1.635$`
    
2. We know the proportion of both groups:
    1. `$\Pr(\text{male})= 3/5$`
    2. `$\Pr(\text{female})= 2/5$`

]
    
.pull-right[

Therefore, we can recompose `$E(Y)$`:

`$$\begin{align}
E(Y) &= \sum_{i=1}^k E(Y|X = x_i) \Pr(X = x_i)\\
     &= E_X\left[  E(Y|X) \right]
\end{align}$$`

]

---

# Law of Iterated Expectations

* One particular version of this comes up often in econometrics:

`$$E[Y|X] = 0 \Rightarrow E[Y] = 0$$`

* Why? Well, now you know that `$E[Y|X] = 0$` for all possible `$X$` is zero.

* Compute `$E(Y) = \sum_{i=1}^k 0 \times \Pr(X = x_i)$`!

---

# Conditional Expectation and Prediction Errors

* We said that the the squared error makes sense (talking about variance)

* Suppose we want to design a **predictor** `$g(X)$` which can **predict** `$Y$` reasonably well. One criterion for that might be the *average squared error of prediction*.

* In other words, what is the solution to this problem?

`$$\min_m E\left[ (Y - m)^2 \right]$$`

---

# Independence

* We call 2 RVs *independent* if knowing the value of one of them is not informative about the value of the other.

* The precise definition of this statement is `$$\Pr(X=x | Y = y) = \Pr(X=x)$$`

* or: `$$\Pr(X=x , Y = y) = \Pr(X=x) \Pr(Y=y)$$`

---

# Covariance

* Covariance tells us how two RVs co-vary.

`$$\begin{align}
cov(X,Y) = \sigma_{XY} &= E\left[ (X - \mu_X) (Y- \mu_Y) \right] \\
                       &= \sum_{i=1}^k \sum_{j=1}^m (x_i - \mu_X) (y_j- \mu_Y) \Pr(X = x_i, Y = y_j)
\end{align}$$`

* Notice how this object tells us how large deviations of `$X$` from its mean *jointly* occur with such deviations of `$Y$`.

* This confounds the scale of `$X$` and `$Y$`, however, so not easy to interpret

---

# Correlation

* Correlation fixes this issue.

`$$corr(X,Y) = \frac{cov(X,Y)}{\sqrt{var(X)var(Y)}} = \frac{\sigma_{XY}}{\sigma_X \sigma_Y}$$`

* Correlation is always between -1 and 1.

* `$corr(X,Y) = 0$` means they are *uncorrelated*.

---

# Correlation and Conditional Mean

* If the conditional mean of `$Y$` does **not** depend on `$X$`, then they must be uncorrelated.

* That is, if

`$$E(Y|X) = \mu_Y \Rightarrow  cov(X,Y) = 0 \text{ and } corr(X,Y) = 0$$`
---

# Task 8 : Show Correlation and Conditional Mean

Show this result!

`$$E(Y|X) = \mu_Y \Rightarrow  cov(X,Y) = 0 \text{ and } corr(X,Y) = 0$$`

---

# Random Sampling (SW KC 2.5)

* Suppose we **randomly** select `$n$` objects from a large population.

* `$Y_i$` is the value of `$i$`-th object chosen.

* Each object `$i$` has the same chance of getting drawn, which means that the *distribution of the `$Y$`s is the same*, and object `$i$` being drawn does not depend on object `$j$` being drawn.

* **Same** means **identical**. So, the `$Y$`s are ***identically and independently*** distributed.

---

# Sampling Distribution of Sample Average

* We already defined the **sample average**:

`$$\bar{y} = \frac{1}{n}(y_1 + y_2 + \dots + y_n) = \frac{1}{n}\sum_{i=1}^n y_i$$`

* A core concept is to realize that the value of `$\bar{y}$` is itself ***random***. Just imagine drawing a second sample of `$y$`'s

* Because `$\bar{y}$` varies with each sample, the object `$\bar{y}$` has a **sampling distribution**.

* Every well-defined distribution has a **mean** and a **variance** - same for this sampling distribution. Let's figure it out!

---

# Mean of the Sample Average

* The mean of the sample average (*the mean of the mean!*) is just

`$$E(\bar{y}) = E\left[\frac{1}{n}\sum_{i=1}^n y_i\right] = \frac{1}{n}\sum_{i=1}^n E[y_i] = \frac{1}{n} n \mu_Y$$`

* So that's good news. The expected value is equal to the population mean.

* This statement is important. We say that `$\bar{y}$` is an ***unbiased estimator*** for `$E(y)$`.

---

# Task 9: Variance of the Sample Average

* Similarly,

`$$var(\bar{y}) = var\left( \frac{1}{n}\sum_{i=1}^n y_i \right) = \frac{\sigma_Y^2}{n}$$`

* Show this result!

---

# Properties of the Sample Mean

The following facts **always** hold for the sample mean:

1. `$E(\bar{y}) = \mu_Y$`

2. `$var(\bar{y}) = \frac{\sigma_Y^2}{n}$`

3. `$std.dev(\bar{y}) = \frac{\sigma_Y}{\sqrt{n}}$`

Those are true ***whenever `$y$` is i.i.d***. In particular, we do **not** need the `$y$` to be normally distributed!

---

# Sampling from a Normal Distribution

1. The sum of normally distributed RVs is itself normally distributed.

2. Suppose we draw `$Y_i$` i.i.d from a normal distribution.

3. We already know the mean and variance of the sample mean `$\bar{y}$` in this case (previous slide.)

4. In this **special case** we actually ***know*** the full distribution of `$\bar{y}$`, which has to be *normal* itself (see point 1). That is

`$$\bar{y} \sim \mathcal{N}\left(\mu_y, \frac{\sigma_Y^2}{n} \right)$$`

What if we **don't know** the distribution of our `$Y_i$`?

---

# **NOT** Sampling from a Normal Distribution

* This is clearly the empirically relevant case. We almost **never** know the precise distribution of our data.

* Let us repeat that:

> The following facts **always** hold for the sample mean:

> 1. `$E(\bar{y}) = \mu_Y$`

> 2. `$var(\bar{y}) = \frac{\sigma_Y^2}{n}$`

> 3. `$std.dev(\bar{y}) = \frac{\sigma_Y}{\sqrt{n}}$`

> Those are true ***whenever `$y$` is i.i.d***. In particular, we do **not** need the `$y$` to be normally distributed!

---

# Asymptotic Distributions

* If we don't know the *exact* distribution of the data, we can still derive an **approximate** distribution.

* The approximation requires that we have a lot of data, i.e. `$n$` is large.

* We are using 2 key tools for this: The Law of Large Numbers and the Central Limit Theorem.

---

# The Law of Large Numbers

* We say that a sequence of RVs `$z_n$` **converges in probability** to the constant `$a$` if the probability that `$z_n$` comes to lie in the interval `$\left[a - c, a + c\right]$` goes to 1 as n goes to infinity.

* We would write that as `$$z_n \overset{p}{\to} a$$`

The Law of Large Numbers says that *if* `$Y_i$` are i.i.d. with `$E(Y)=\mu_Y$`, then - if large outliers are very unlikely (if `$var((Y) \sigma_Y^2 < \infty$`)) -  the sample mean converges in probability to the population mean, i.e.

`$$\bar{y} \overset{p}{\to} \mu_Y$$`

* This property (`$\bar{y} \overset{p}{\to} \mu_Y$`) is called ***consistency***: `$\bar{y}$` is **consistent** for `$\mu_Y$`.

---

# Demo

``` r
set.seed(45)
n = 2
p = 0.78
s = sample(c(0,1), size = n, prob = c(1-p, p), replace = TRUE)
print(s)
```

```
## [1] 1 1
```

``` r
mean(s) # ybar
```

```
## [1] 1
```

``` r
n = 10
s = sample(c(0,1), size = n, prob = c(1-p, p), replace = TRUE)
print(s)
```

```
##  [1] 1 1 1 1 1 1 1 1 1 0
```

``` r
mean(s) # ybar
```

```
## [1] 0.9
```
---

![](../img/bernoulli_grid.png)

---

# The Central Limit Theorem

* This important theorem says that under relavitely mild conditions, the distibution of `$\bar{y}$` is **normal** when `$n$` is large.

* We already knew from above that the mean and variance of `$\bar{y}$` are `$\mu_Y$` and `$\frac{\sigma_Y^2}{n}$` - *whatever* the underlying distribution of `$y$` was.

* But that `$\bar{y}$` would converge to a normal if `$n$` large, that is new!

* This holds for ***any*** distribution that `$Y$` might have!

---

# Demo

``` r
library(shinyCLT)
CLT()
```

---

``` r
library(CLT)
CLT()
```

# Task 4: Summarising data

1. Compute the mean of GDP in 2011 and assign to object `mean`. You should exclude missing values. (*Hint: read the help for `mean` to remove `NA`s*).

1. Compute the median of GDP in 2011 and assign to object `median`. Again, you should exclude missing values. Is it greater or smaller than the average?

1. Create a density plot of GDP in 2011 using `geom_density`. A density plot is a way of representing the distribution of a numeric variable. Add the following code to your plot to show the median and mean as vertical lines. What do you observe?
`geom_vline(xintercept = as.numeric(mean), colour = "red") +` <br>
    `geom_vline(xintercept = as.numeric(median), colour = "orange")`

1. Compute the correlation between fertility and infant mortality in 2015. To drop `NA`s in either variable set the argument `use` to "pairwise.complete.obs" in your `cor()` function. Is this correlation consistent with the graph you produced in Task 3?

---

class: title-slide-final, middle
background-image: url(../img/logo/esomas.png)
background-size: 250px
background-position: 9% 19%

# That's all for Probability!

|                                                                                                            |                                   |
| :--------------------------------------------------------------------------------------------------------- | :-------------------------------- |
| <a href="https://github.com/floswald/Econometrics-Slides">.ScPored[<i class="fa fa-link fa-fw"></i>] | Slides |
| <a href="https://floswald.github.io">.ScPored[<i class="fa fa-link fa-fw"></i>] | My Homepage |
| <a href="https://scpoecon.github.io/Econometrics/">.ScPored[<i class="fa fa-github fa-fw"></i>]                          | Book                       |