Slides.knit

class: center, inverse, middle

.pull-left-wide {
  float: left;
  width: 66%;
}
.pull-right-wide {
  float: right;
  width: 66%;
}
.pull-right-wide ~ p {
  clear: both;
}

.pull-left-narrow {
  float: left;
  width: 30%;
}
.pull-right-narrow {
  float: right;
  width: 30%;
}

.tiny123 {
  font-size: 0.40em;
}

.small123 {
  font-size: 0.80em;
}

.large123 {
  font-size: 2em;
}

.red {
  color: red
}

.orange {
  color: orange
}

.green {
  color: green
}
</style>

# Statistics
## Estimating the mean using a simple random sample
### (Chapter 10)

### Christian Vedel,<br>Department of Economics<br>University of Southern Denmark

### Email: [christian-vs@sam.sdu.dk](mailto:christian-vs@sam.sdu.dk)

### Updated 2026-04-17

---
class: middle
# Today's lecture

.pull-left-wide[
**Estimating the population mean from a sample, and understanding the properties that make an estimator reliable.**

- **Section 1:** Simple random sampling
- **Section 2:** Constructing an estimator of the mean value
- **Section 3:** Properties of estimators
- **Section 4:** The distribution of the estimator `$\bar{X}$`
]

.pull-right-narrow[
![Trees](Figures/Trees1.jpg)
]

---
# What are we building towards?

.pull-left-wide[
- Suppose you compute the sample average wage from 800 workers: `$\bar{x} = 47{,}200$` DKK
- That is just a number. The real question is: **what does it tell us about the world?**
]

.pull-left-wide[
- Is the true average wage above 45,000 DKK? Could the gap between men and women be zero and we just got unlucky with our sample?
- To answer these questions we need to know **how much to trust our estimate** — i.e. how the estimator `$\bar{X}$` behaves across all possible samples
]

.pull-left-wide[
- This lecture builds the foundation. Once we know the distribution of `$\bar{X}$`, we can:
  - Build a **confidence interval**: a range of plausible values for the true mean
  - Run a **hypothesis test**: decide whether an effect is real or just noise
  - Do **inference**: draw conclusions about populations from samples
]

.pull-right-narrow[
*We will 'invent' an estimator of the mean: the average of the sample. And then we will study its properties. This is a common task in econometrics.*
]

---
class: inverse, middle, center
# Simple random sampling

---
# Descriptive measures in a sample

.pull-left-wide[
- Often, we want to derive information from a sample about the unknown distribution in the population
]

.pull-left-wide[
- There are various ways to characterize the population distribution, but it is common to use a **descriptive measure**
]

.pull-left-wide[
- The question is: how can we use the information in the sample to infer something about the value of a descriptive measure in the population?
]

---
# Simple random sample

.pull-left-wide[
> A **simple random sample** is a sample `$(X_1, X_2, \ldots, X_n)$` such that:
>
> 1. `$X_1$`, `$X_2$`, `$\ldots$`, `$X_n$` are statistically independent
> 2. `$X_1$`, `$X_2$`, `$\ldots$`, `$X_n$` have the same marginal distribution `$f_X(x)$`
]

.pull-left-wide[
- The marginal distribution `$f_X(x)$` is the distribution of interest (in the population)
]

.pull-left-wide[
- The second condition assures that the simple random sample is a representative sample for `$X$`
]

.pull-right-narrow[
> .red[If data is generated according to the rules of being above, we say that it is **i.i.d.** (independent and identically distributed)]
]

---
# Distribution of a simple random sample

.pull-left-wide[
- It is easy to determine the joint distribution of the sample using the two conditions in the definition:
`$$f(x_1, \ldots, x_n) = f_X(x_1) \cdot f_X(x_2) \cdot \ldots \cdot f_X(x_n)$$`
]

.pull-left-wide[
- So, if we knew the distribution of `$X$` in the population, we would be able to determine the distribution of the simple random sample
]

---
# Distribution of a simple random sample

.pull-left-wide[
- For example, suppose `$X$` follows a Bernoulli distribution with `$p = 0.7$`, so `$f_X(0) = 0.3$` and `$f_X(1) = 0.7$`, and `$n = 3$`
]

.pull-left-wide[
- Then we can calculate `$f(x_1, x_2, x_3)$` for any realized sample:

`$$f(0, 0, 0) = f_X(0) \cdot f_X(0) \cdot f_X(0) = 0.3^3 = 0.027$$`

`$$f(0, 0, 1) = f_X(0) \cdot f_X(0) \cdot f_X(1) = 0.3^2 \cdot 0.7 = 0.063$$`

`$$\vdots$$`

`$$f(1, 1, 1) = f_X(1) \cdot f_X(1) \cdot f_X(1) = 0.7^3 = 0.343$$`
]

.pull-right-narrow[
.red[**But, remember:** In practice, we do not know the distribution of `$X$` in the population. We have to infer rather than deduce.]
]

---
# .red[Raise your hand 1: Simple random sampling]

.pull-left-wide[
**Q1.** A researcher surveys 200 workers by asking each manager to nominate two employees. Is this a simple random sample?

- **A)** Yes — the sample size is large enough to be representative
- **B)** No — nominations by managers likely violate independence and identical distribution; certain workers are more likely to be chosen
- **C)** Yes — as long as each worker could in principle be nominated
]

.pull-left-wide[
**Q2.** We draw a simple random sample `$(X_1, X_2, X_3)$`. A colleague says: "Since the draws are independent, `$f(x_1, x_2, x_3)$` tells us nothing about the population distribution `$f_X$`." Is this right?

- **A)** Yes — independence means the joint distribution carries no information about the marginal
- **B)** Yes — we need a much larger sample before `$f_X$` can be inferred
- **C)** No — because `$f(x_1, x_2, x_3) = f_X(x_1) \cdot f_X(x_2) \cdot f_X(x_3)$`, the joint distribution is built directly from `$f_X$`
]

---
class: inverse, middle, center
# Constructing an estimator of the mean value

---
# Estimating the mean value

.pull-left-wide[
- Recall that each element of the sample is a random variable following the same distribution
- In other words, each element of the sample has the same mean value
]

.pull-left-wide[
- We could use each realized value as an estimate (approximation) of the mean
]

.pull-left-wide[
- This gives us `$n$` different estimates of one value `$\Rightarrow$` we should combine them into one single value
]

---
# The analogy principle

.red[*This is the central principle of estimation. You meet it everywhere.*]

.pull-left-wide[
- The **analogy principle** of estimation says: use the formulas we would use if we knew the population distribution, but replace all unknown population quantities with the corresponding sample quantities
]

.pull-left-wide[
- If we knew the distribution of `$X$`, taking values `$x_1, \ldots, x_N$` with probabilities `$f(x_1), \ldots, f(x_N)$`, its mean value is:
`$$\mu = \sum_{i=1}^N x_i f(x_i) = x_1 f(x_1) + x_2 f(x_2) + \ldots + x_N f(x_N)$$`
]

---
# The sample average

.pull-left-wide[
- In practice, we have a sample of `$n$` values `$x_1, x_2, \ldots, x_n$`, each occurring with probability `$1/n$`
]

.pull-right-narrow[
.red[
**Recall (Lecture 3):**

`$X_i$` is a **random variable** — it could take many values before we observe the sample

`$x_i$` is a **realized value** — the specific number we actually observed

Uppercase = random; lowercase = realized
]
]

.pull-left-wide[
- The analogy principle gives us the **sample average** (realized value):
`$$\bar{x} = x_1 \cdot \frac{1}{n} + x_2 \cdot \frac{1}{n} + \cdots + x_n \cdot \frac{1}{n}$$`
`$$= \sum_{i=1}^n x_i \cdot \frac{1}{n} = \frac{1}{n} \cdot \sum_{i=1}^n x_i$$`
]

.pull-left-wide[
- Since this can be calculated for any realized sample, we define the **sample average estimator** as:
`$$\bar{X} = \frac{1}{n} \cdot \sum_{i=1}^n X_i$$`
]

---
# Estimate vs estimator

.pull-left-wide[
- `$\bar{X}$` is a *random variable*, while `$\bar{x}$` is a realized value of `$\bar{X}$`
]

.pull-left-wide[
- `$\bar{X}$` is an **estimator**: a random variable that is a function of the sample and aims to replicate a population parameter
]

.pull-left-wide[
- `$\bar{x}$` is an **estimate** (or **estimated value**): a realized value of the estimator, based on a particular sample
- The estimate is our guess of the true value of the parameter
]

.pull-left-wide[
#### Why the distinction?
]

.pull-left-wide[
- We can now begin to think of questions of the form `$P(\bar{X} = \bar{x})$`, which is the probability that the estimator takes a particular value
- This is the basis for inference.
]

.pull-right-narrow[
- Notice any problems here? `$\bar{X}$` is a **continuous** random variable.
- We usually study questions of the form `$P(\bar{X} \leq 5)$`
]

---
# .red[Practice 1: The sample average]

.pull-left-wide[
Suppose we observe the sample: `$x_1 = 3,\ x_2 = 7,\ x_3 = 2,\ x_4 = 8,\ x_5 = 5$`.

1. Calculate the sample average `$\bar{x}$`
2. We now draw a new sample of size `$n = 100$` from a population with `$\mu = 5$` and `$\sigma^2 = 4$`. What are `$E(\bar{X})$` and `$Var(\bar{X})$`?

.small123[
*Hint:* `$E(\bar{X}) = \dfrac{1}{n}\displaystyle\sum_{i=1}^n E(X_i)$` and `$Var(\bar{X}) = \dfrac{1}{n^2}\displaystyle\sum_{i=1}^n Var(X_i)$`
]
]

---
class: inverse, middle, center
# Properties of estimators

---
# Roadmap for the remainder of the lecture
.pull-left-wide[
  - We have now used the analogy principle to construct an estimator of the mean value: the sample average `$\bar{X}$`
  - But how good is this estimator? How much can we trust it?
  - ... and can we say anything about its probability distribution function in general?
]

.pull-left-wide[
  - We will derive the classical properties of a 'good' estimator: **unbiasedness**, **efficiency**, and **consistency**
  - One of these have a special name for the 'average' as an estimator: the **Law of Large Numbers**
  - We will then proceed to derive the *Central Limit Theorem*, which gives us the distribution of `$\bar{X}$` in large samples
]

---
# Properties of estimators

.pull-left-wide[
- We can construct infinitely many estimators, but we want a *good* one
- We need to define what makes an estimator *good*, given that we do not know the true value of the parameter
]

.pull-left-wide[
- We are generally interested in three properties:
  - **Unbiasedness:** Our guess is correct on average across all possible samples
  - **Efficiency:** We leave no information from the sample unused; the estimator is as precise as possible
  - **Consistency:** The estimator gets closer to the true parameter as the sample size grows — for the sample average this is known as the **Law of Large Numbers**
]

---
# Unbiasedness

.pull-left-wide[
> An estimator is **unbiased** if its expected value equals the parameter it aims to approximate
]

.pull-left[
- Suppose we could sample many times and calculate the estimator in each sample
- We would want these values to be "around" the true parameter — on average equal to it
]

.pull-left-wide[
#### Counter example
1. We take 100 samples from a normal distribution and use an estimator
2. Repeat this process 1,000 times to get a distribution of estimates
3. The vertical solid line is the true mean (5), while the dashed red line is the average of the estimates (5.05)  
*This estimator is **not** unbiased but close*
]

.pull-right-narrow[

![Unbiased](Figures/unbiased-fig.png)
]

---
# Unbiasedness of sample average

.pull-left-wide[
`$$\begin{align}
E(\bar{X}) &= E\!\left[\frac{1}{n}\sum_{i=1}^n X_i\right] \\[6pt]
&= \frac{1}{n} \cdot E\!\left[\sum_{i=1}^n X_i\right] && \text{(constant out)} \\[6pt]
&= \frac{1}{n} \cdot \sum_{i=1}^n E(X_i) && \text{(sum of expectations)} \\[6pt]
&= \frac{1}{n} \cdot n\mu = \mu && \text{(i.i.d.: } E(X_i)=\mu\text{)}
\end{align}$$`
]

.pull-right-narrow[
.red[
**Intuition:** Some draws land above `$\mu$`, some below — they cancel out on average. No systematic over- or underestimation.
]
]

---
# Efficiency

.pull-left-wide[
- In practice, we cannot resample many times — we usually only have one realized sample
]

.pull-left-wide[
> An estimator is more **efficient** than another if its variance is smaller
]

.pull-left[
- Since the estimator is a random variable, its realized value can be far from the parameter even if unbiased
- We want its variance to be as small as possible (a *precise* estimator)
]

.pull-right[

![Efficiency](Figures/efficiency-fig.png)
]

---
# Variance of sample average

.pull-left-wide[
- Because sample elements are independent, we can calculate the variance directly:

`$$Var(\bar{X}) = Var \left[ \frac{1}{n} \sum_{i=1}^n X_i \right] = \frac{1}{n^2} \cdot \sum_{i=1}^n Var(X_i)$$`
]

.pull-left-wide[
`$$= \frac{1}{n^2} \cdot n \cdot \sigma^2 = \frac{\sigma^2}{n}$$`

- The variance of `$\bar{X}$` decreases with sample size: the larger the sample, the more precise the estimator
]

.pull-left-wide[
- *Next semester you will learn about the 'Best Linear Unbiased Estimator' estimator.*
  - 'Best' means lowest possible variance
  - Turns out the 'average' is the best linear unbiased estimator of the mean value
]

---
# Consistency

.pull-left-wide[
> An estimator is **consistent** if it is unbiased and its variance goes to zero as `$n \to \infty$`
]

.pull-left-wide[
- As sample size grows toward infinity, the distribution of the estimator collapses on a single value
- If that value equals the true parameter, the estimator is consistent
]

.pull-left[
- This implies that a consistent estimator will have realized values very close to the true parameter in a "sufficiently large" sample
]

.pull-right[

![Consistency](Figures/consistency-fig.png)
]

---
# Consistency of sample average

.pull-left-wide[
- Recall that the sample average is unbiased: `$E(\bar{X}) = \mu$`
]

.pull-left-wide[
- Its variance goes to zero as sample size grows:
`$$\lim_{n \rightarrow \infty} Var(\bar{X}) = \lim_{n \rightarrow \infty} \frac{\sigma^2}{n} = 0$$`
]

.pull-left-wide[
- Therefore, the sample average is a **consistent** estimator of `$\mu$`
]

---
# Estimators: mixing and matching properties

| Estimator | Unbiased? | Efficient? | Consistent? |
|:---|:---:|:---:|:---:|
| OLS (Gauss-Markov assumptions hold) | ✓ | ✓ | ✓ |
| OLS (heteroskedastic errors) | ✓ | ✗ | ✓ |
| IV / 2SLS | ✗ | ✗ | ✓ |
| OLS (endogeneity) | ✗ | ✗ | ✗ |
| First observation `$X_1$` only | ✓ | ✗ | ✗ |
| Self-selected survey (e.g. opt-in online poll) | ✗ | ✗ | ✗ |

.orange[
.pull-left-wide[
.small123[
*You will meet all of these estimators in future courses. The point now is that the properties you are learning today — unbiasedness, efficiency, consistency — are exactly what practitioners use to evaluate and choose between estimators.*

- **OLS** is the workhorse of econometrics — its position in the table depends entirely on whether the assumptions hold
- **IV / 2SLS** is the go-to fix when OLS fails due to endogeneity — it sacrifices some efficiency to restore consistency
- **Convenience samples** are a warning: a biased and inconsistent estimator can look convincing but mislead systematically
]
]
]

---
# The Law of Large Numbers

.pull-left-wide[
> **Law of Large Numbers:** As the sample size `$n \to \infty$`, the sample average `$\bar{X}$` converges to the true population mean `$\mu$`
]

.pull-left-wide[
**Example 1 — Coin flips.** A fair coin has `$\mu = 0.5$`. With `$n = 10$` flips the proportion of heads might be 0.3 or 0.7. With `$n = 10{,}000$` flips it will be very close to 0.5. The randomness averages out.
]

.pull-left-wide[
**Example 2 — Wages.** Survey 5 workers: the sample average wage could easily be far from the true national average. Survey 50,000 workers: the sample average will be extremely close to the true average — any one unusual wage is drowned out by the rest.
]

.pull-left-wide[
**Example 3 — Dice.** Roll a die once: you get a number from 1 to 6. The true mean is 3.5. Roll it 100,000 times and average the results: you will get almost exactly 3.5.
]

---
# .red[Practice 2: Properties of estimators]

.pull-left-wide[
Consider `$\tilde{X} = X_1$` (just the first observation) as an estimator of `$\mu$`.

1. Is `$\tilde{X}$` unbiased? Show your work.
2. Compare `$Var(\tilde{X})$` to `$Var(\bar{X})$` for `$n > 1$`. Which is more efficient?
3. Is `$\tilde{X}$` consistent? Why or why not?
]

---
# .red[Raise your hand 2: Estimator properties]

.pull-left-wide[
**Q1.** The sample average `$\bar{X}$` is unbiased for `$\mu$`. Define `$\tilde{X} = 2\bar{X}$`. Is `$\tilde{X}$` unbiased for `$\mu$`?

- **A)** No — `$E(\tilde{X}) = 2\mu \neq \mu$`, so it is biased
- **B)** It depends on whether `$\mu > 0$`
- **C)** Yes — it is a function of the sample, so it inherits unbiasedness
]

.pull-left-wide[
**Q2.** Which statement about consistency is correct?

- **A)** Any unbiased estimator is automatically consistent
- **B)** Consistency only requires that the estimator's bias goes to zero
- **C)** An estimator can be consistent without being unbiased in finite samples
]

---
class: inverse, middle, center
# The distribution of the estimator `$\bar{X}$`

---
# Why does the shape of `$\bar{X}$` matter?

.pull-left-wide[
- We know `$E(\bar{X}) = \mu$` and `$Var(\bar{X}) = \sigma^2/n$` — but that only tells us the *centre* and *spread*
]

.pull-left-wide[
- In practice, we want to answer questions like:
  - *"A central bank surveys 500 firms. What is the probability that the true average inflation expectation across all firms exceeds 2% — enough to justify raising interest rates?"*
  - *"From 800 sampled workers, what can we say about the gender wage gap across all 3 million workers in the country?"*
- These require knowing the **shape** of the distribution of `$\bar{X}$`
]

.pull-left-wide[
- Coming up: The *Central Limit Theorem* — a result so powerful it works **regardless of what the distribution of `$X$` looks like**
]

.pull-right-narrow[
.red[**Key idea**]

*Chaos averages into order*

No matter how strange the individual outcomes are, their averages always converge to the same universal shape
]

---
# The distribution of `$\bar{X}$` if sample is normal

.red[*First a simpler result*]

.pull-left-wide[
- Suppose `$X \sim \mathcal{N}(\mu, \sigma^2)$`. Then each `$X_i \sim \mathcal{N}(\mu, \sigma^2)$`, so:
`$$\frac{X_i}{n} \sim \mathcal{N}\left(\frac{\mu}{n}, \frac{\sigma^2}{n^2} \right)$$`
]

.pull-left-wide[
- As a sum of normally-distributed random variables:
`$$\bar{X} = \frac{X_1}{n} + \frac{X_2}{n} + \ldots + \frac{X_n}{n} \sim \mathcal{N}\left(\mu, \frac{\sigma^2}{n} \right)$$`
]

.pull-left-wide[
> ***Interpretation:*** *If the sample is drawn from a normal distribution, the sample average is also normally distributed.*
]

.pull-right-narrow[
.red[**But what if the sample is not normal?**]
]

---
# The CLT in action

.pull-left-narrow[
- We draw samples of size `$n = 1$`, `$n = 2$`, `$n = 5$`, `$n = 10$`, `$n = 30$`, `$n = 100$`, and `$n = 200$` from a weird distribution
- We take the average
- What is the distribution of this average across many samples?
]

.pull-right-wide[
.panelset[
.panel[.panel-name[n = 1]
.center[
![Individual draws — the raw population](Figures/clt_n1.png)
]
]
.panel[.panel-name[n = 2]
.center[
![Sample mean of 2 draws](Figures/clt_n2.png)
]
]
.panel[.panel-name[n = 5]
.center[
![Sample mean of 5 draws](Figures/clt_n5.png)
]
]
.panel[.panel-name[n = 10]
.center[
![Sample mean of 10 draws](Figures/clt_n10.png)
]
]
.panel[.panel-name[n = 30]
.center[
![Sample mean of 30 draws](Figures/clt_n30.png)
]
]
.panel[.panel-name[n = 100]
  .center[
  ![Sample mean of 100 draws](Figures/clt_n100.png)
  ]
]
.panel[.panel-name[n = 200]
.center[
![Sample mean of 200 draws](Figures/clt_n200.png)
]
]
]
]

---
# The approximate distribution of `$\bar{X}$`

.pull-left-wide[
- In practice, we will not know the distribution of `$X$` — but we have a very powerful result
]

.pull-left-wide[
> **Central limit theorem**
>
> Suppose we have a simple random sample `$X_1, X_2, \ldots, X_n$` with `$E(X_i) = \mu$` and `$Var(X_i) = \sigma^2$` for all `$i$`. As `$n \to \infty$`, the sample mean converges to a normal distribution:
>
> `$$\bar{X} \overset{a}{\sim} \mathcal{N}\left(\mu, \frac{\sigma^2}{n} \right)$$`

- This holds *regardless* of the distribution of `$X$` — it is one of the most important results in statistics
]

.pull-right-narrow[
.small123[
#### More on the CLT
- 3Blue1Brown does a good job of explaining how powerfull and widely applicable the CLT is: https://youtu.be/zeJD6dqJ5lo?si=aCQrLf-SJWl8fXyW

#### Proving the CLT
- The proof is beyond the scope of this course, but it relies on the fact that the moment generating function of `$\bar{X}$` converges to that of a normal distribution as `$n \to \infty$`
- A concise proof is given here: https://youtu.be/nWadI0_u6QU?si=zkzSdGqI9DuUHS0t 
]
]

---
# .red[Practice 3: The central limit theorem]

.pull-left-wide[
Household incomes have mean `$\mu = 450{,}000$` DKK and standard deviation `$\sigma = 120{,}000$` DKK. We draw a simple random sample of `$n = 144$` households.

1. State the approximate distribution of `$\bar{X}$`. What are its mean and variance?
2. Using the CLT, approximate `$F(460{,}000)$`
3. How does your answer change if `$n = 36$`?
]

---

# Summary

.pull-left-wide[
Central question: **How good is the 'average' as an estimator of `$\mu$`?**
- **Unbiased**: `$E(\bar{X}) = \mu$` — correct on average
- **Efficient**: Smallest variance among unbiased linear estimators
- **Consistent**: `$\bar{X} \xrightarrow{p} \mu$` as `$n \to \infty$` (**Law of Large Numbers**)
]

.pull-left-wide[
**Distribution of `$\bar{X}$` (Central Limit Theorem)**

For large `$n$`, regardless of the population distribution:
`$$\bar{X} \overset{a}{\sim} N\!\left(\mu,\, \frac{\sigma^2}{n}\right)$$`
]

.pull-left-wide[
These properties — unbiasedness, efficiency, consistency — are exactly the criteria used to evaluate estimators in future courses (OLS, IV, and beyond).
]

---
# Next time

.pull-left[
- Stratified/Cluster, Estimators
- .red[Remember, it's on Monday 12:00-14:00 in U168.]
]

.pull-right[
![Trees](Figures/Trees1.jpg)
]