Lecture 4: Confidence Intervals

Confidence intervals: a primer

Why do we need confidence intervals?

The problem with point estimates:

In real life we only get to take one sample from the population
Also, we obviously don’t know the true population parameter, that’s what we are interested in!
Even unobserved, we know that the sampling distribution does exist, and even better, we know how it behaves!

Key insight: A single point estimate tells us nothing about how precise or reliable it is!

From point estimates to confidence intervals

Until now, we have only estimated point estimates from our samples: sample means, sample proportions, sample variances, etc.

We know that this sample statistic differs from the true population parameter due to sampling variation.

Rather than a point estimate, we could give a range of plausible values for the population parameter.

This is precisely what a confidence interval (CI) provides.

What is a confidence interval?

A confidence interval for \(\theta\) is an estimated interval that covers the true value of \(\theta\) with at least a given probability.

For example: if we were to take an i.i.d. sample of size \(n\) of a random variable \(X\) and compute a 95% confidence interval for the population mean \(\mathbb{E}[X]\), then with 95% probability, this interval will include \(\mathbb{E}[X]\).

Definition

A confidence interval for \(\theta\) with coverage \((1 - \alpha)\) is a random interval \(CI_{1-\alpha}(\theta)\), such that \(Pr[\theta \in CI_{1-\alpha}(\theta)] \geq 1-\alpha\).

For any given \(\alpha \in (0,1)\), we say that the confidence level is \(100(1-\alpha)\%\) and that \(CI_{1-\alpha}(\theta)\) is a \(100(1-\alpha)\%\) confidence interval for \(\theta\).

What is a confidence interval?

Intuition: Instead of saying “the mean is 65,” we say:

“We are 95% confident that the true mean lies between 62 and 68”

Common misconception

A 95% CI does not mean “there’s a 95% probability the true parameter is in this interval.”

\(\rightarrow\) the true parameter is either in the interval or not!

The correct interpretation

What 95% confidence actually means:

If we repeated our sampling procedure many times and computed a CI each time, 95% of those intervals would contain the true parameter.

Constructing confidence intervals

There are several approaches to constructing confidence intervals:

Theory: use mathematical formulas (Central Limit Theorem) to derive the sampling distribution of our point estimate under certain conditions \(\rightarrow\) what R does under the hood!

Simulation: use the bootstrapping method to reconstruct the sampling distribution of our point estimate

We will cover both methods in this lecture. Let’s start with theory and then cover bootstrap.

The anatomy of a confidence interval

\[\text{CI} = {\color{#648FFF}{\text{Point Estimate}}} \pm \color{#DC267F}{\text{Margin of Error}}\] where:

\[{\color{#DC267F}{\text{Margin of Error}}} = {\color{#785EF0}{\text{Critical Value}}} \times \color{#FFB000}{\text{Standard Error}}\]

Components:

Point estimate: our best guess (e.g., \(\bar{x}\), \(\hat{p}\))
Margin of error: accounts for sampling variability
Critical value: depends on confidence level (e.g., 1.96 for 95%)
Standard error: estimated standard deviation of the sampling distribution

Where do critical values come from?

For a \((1-\alpha)\)% confidence interval, we need to find \(z_{\alpha/2}\) such that the middle \((1-\alpha)\) of the standard normal distribution lies between \(-z_{\alpha/2}\) and \(+z_{\alpha/2}\).

Why \(z_{\alpha/2}\)?

We want to construct an interval that captures the true parameter with probability \(1-\alpha\) (e.g., if \(\alpha = 0.05 \rightarrow 95\%\)).

The notation: \(z_p\) denotes the z-value with probability \(p\) in the upper tail:

\[P(Z > z_p) = p\]

The logic:

We split the “error probability” \(\alpha\) equally between both tails of the distribution
Each tail gets \(\alpha/2\) of the probability
So we need \(z_{\alpha/2}\): the z-value with \(\alpha/2\) probability in the upper tail

# z_{0.025}: 2.5% in upper tail
qnorm(1 - 0.025)  # = qnorm(0.975)

[1] 1.959964

# -z_{0.025}: 2.5% in lower tail
qnorm(0.025)

[1] -1.959964

Why not \(z_{\alpha}\)?

What if we used \(z_{\alpha}\) instead of \(z_{\alpha/2}\)?

For a symmetric confidence interval around our estimate, we need to split \(\alpha\) between both tails → use \(z_{\alpha/2}\)!

Confidence level: a trade-off

Confidence Level	Critical Value (\(z^*\))	Margin of Error
90%	1.645	Narrower
95%	1.960	Medium
99%	2.576	Wider

Today’s lecture

The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?

2. CI for a population mean: theory-based

CI for a population mean: theory-based

CI for a population mean (\(\sigma\) known)

Setting: \(X_1, \ldots, X_n\) i.i.d. with unknown mean \(\mu\) and known variance \(\sigma^2\)

By the CLT, for large \(n\): \[\bar{X}_n \sim \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right)\]

95% Confidence Interval for \(\mu\) (\(\sigma\) known)

\[\bar{x} \pm 1.96 \times \frac{\sigma}{\sqrt{n}}\]

General formula for \((1-\alpha)\)% CI: \[\bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}\]

where \(z_{\alpha/2}\) is the \((1-\alpha/2)\) quantile of \(\mathcal{N}(0,1)\)

Why does this formula work?

We know that \(\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim \mathcal{N}(0,1)\), so by definition of \(z_{\alpha/2}\):

\[1 - \alpha = \mathbb{P}\left( -z_{\alpha/2} \leq \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \leq z_{\alpha/2} \right)\]

Now we solve for \(\mu\) in the inequalities:

\[= \mathbb{P}\left( -z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \leq \bar{X} - \mu \leq z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right)\]

\[= \mathbb{P}\left( -\bar{X} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \leq - \mu \leq -\bar{X} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right)\]

\[= \mathbb{P}\left( \bar{X} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right)\]

We therefore have: \(\boxed{\mathbb{P}\left( \bar{X} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right) = 1 - \alpha}\)

This shows that the interval \(\left[ \bar{X} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{X} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right]\) contains \(\mu\) with probability \(1 - \alpha\). ✓

Example: IQ scores (\(\sigma\) known)

Problem: A sample of \(n = 36\) students has a mean grade of \(\bar{x} = 11\). Assuming grades have a known standard deviation \(\sigma = 2\), compute a 95% CI for the population mean.

Manual calculation:

\[11 \pm 1.96 \times \frac{2}{\sqrt{36}} = [10.347, 11.653]\]

Interpretation: we are 95% confident that the true mean IQ of the population is between 10.35 and 11.65.

Using R directly:

# Given values
x_bar <- 11
sigma <- 2
n <- 36
alpha <- 0.05

# Critical value
z_crit <- qnorm(1 - alpha/2)

# Standard error
se <- sigma / sqrt(n)

# Margin of error
me <- z_crit * se

# Confidence interval
c(x_bar - me, x_bar + me)

[1] 10.34668 11.65332

CI for a population mean (\(\sigma\) unknown)

The realistic case: We almost never know \(\sigma\)!

Solution: Replace \(\sigma\) with the sample standard deviation \(s\)

\[s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}\]

As we saw last week, our standardized statistic follows a t-distribution with \(n-1\) degrees of freedom:

\[\frac{\bar{X}_n - \mu}{s/\sqrt{n}} \sim t_{n-1}\]

The t-distribution: a reminder

Key insight: Heavier tails → wider CIs → accounts for uncertainty in estimating \(\sigma\)

Student’s t critical values table

Critical values t for two-sided confidence intervals
df	90% (α = 0.10)	95% (α = 0.05)	99% (α = 0.01)
1	6.314	12.706	63.657
2	2.920	4.303	9.925
3	2.353	3.182	5.841
4	2.132	2.776	4.604
5	2.015	2.571	4.032
10	1.812	2.228	3.169
15	1.753	2.131	2.947
20	1.725	2.086	2.845
25	1.708	2.060	2.787
30	1.697	2.042	2.750
50	1.676	2.009	2.678
100	1.660	1.984	2.626
∞ (Normal)	1.645	1.960	2.576

In R: Use qt(1 - α/2, df) to get critical values, e.g., qt(0.975, df = 29) for 95% CI with n = 30.

CI formula with t-distribution

\((1-\alpha)\)% Confidence Interval for \(\mu\) (\(\sigma\) unknown)

\[\boxed{\bar{x} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}}}\]

where \(t_{\alpha/2, n-1}\) is the \((1-\alpha/2)\) quantile of \(t_{n-1}\)

In R:

# Critical value for 95% CI with n=30
qt(0.975, df = 29)

[1] 2.04523

Note: For \(n > 30\), \(t_{n-1} \approx \mathcal{N}(0,1)\), so using \(z\) instead of \(t\) is often acceptable.

Real data example: European Social Survey

Let’s use data from the European Social Survey (Round 11, 2023-2024) to estimate the 96% confidence interval for average life satisfaction in France. The data can be found here.

# Load ESS France data
# In practice, download from https://ess.sikt.no/en/country/321b06ad-1b98-4b7d-93ad-ca8a24e8788a/fr/
ess_france <- read.csv("https://www.dropbox.com/scl/fi/25h4lezq3zuzla94ejmqq/ess_france_11ed.csv?rlkey=lf23ra0i6bvq5dzgtt2u1gy9a&dl=1")

# Summary statistics
ess_france |> 
  count(stflife)

   stflife   n
1        0  41
2        1  22
3        2  47
4        3  70
5        4  70
6        5 193
7        6 166
8        7 336
9        8 444
10       9 202
11      10 147
12      77  10
13      88  23

ess_france_clean <-  ess_france |>
  filter(!stflife %in% c(77,88))

ess_france_clean |> 
  summarise(n = n(),
            mean_life_satisfaction = mean(stflife),
            sd_life_satis = sd(stflife))

     n mean_life_satisfaction sd_life_satis
1 1738               6.765823      2.294356

Computing the CI in R

Method 1: Manual calculation

# Extract summary stats
n <- nrow(ess_france_clean)
x_bar <- mean(ess_france_clean$stflife)
s <- sd(ess_france_clean$stflife)
se <- s / sqrt(n)

# Critical value for 95% CI
t_crit <- qt(0.975, df = n - 1)

# Confidence interval
ci_lower <- x_bar - t_crit * se
ci_upper <- x_bar + t_crit * se

c(ci_lower, ci_upper)

[1] 6.657882 6.873764

Method 2: Using t.test()

t.test(ess_france_clean$stflife, conf.level = 0.95)$conf.int

[1] 6.657882 6.873764
attr(,"conf.level")
[1] 0.95

Visualizing the result

Interpretation

Interpretation

We are 95% confident that the true average life satisfaction for adults in France is between 6.7 and 6.9 (on a 0-10 scale).

What this means practically:

Life satisfaction is measured on a scale from 0 (extremely dissatisfied) to 10 (extremely satisfied)
The average French respondent reports being fairly satisfied with life

Remember: The confidence is about the procedure, not this specific interval!

Your turn! #1

Load the ESS Round 11 data for France

ess_france <- read.csv("https://www.dropbox.com/scl/fi/25h4lezq3zuzla94ejmqq/ess_france_11ed.csv?rlkey=lf23ra0i6bvq5dzgtt2u1gy9a&dl=1")

Read the documentation on the Trust in politicians trstplt variable here. Drop the values corresponding refusal, don’t know and no answer.
Compute the sample mean trust in politicians, the sample standard deviation and the standard error.
Find the critical value for a 90% confidence interval.
Compute the 90% confidence interval manually.
Use the t.test() command to compute the 90% confidence interval directly.
Interpret your results.

Today’s lecture

The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?

3. CI for a proportion: theory-based

CI for a proportion: theory-based

From means to proportions

Setting: We want to estimate a population proportion \(p\)

Examples:

What proportion of French adults support the EU?
What percentage of Americans trust the government?
What fraction of voters will vote for candidate X?

Key insight: A proportion is just a mean of 0s and 1s!

If \(X_i \in \{0, 1\}\) with \(P(X_i = 1) = p\), then: \[\hat{p} = \bar{X} = \frac{\text{number of successes}}{n}\]

Sampling distribution of proportions

Since \(X_i \sim \text{Bernoulli}(p)\):

\(\mathbb{E}[X_i] = p\)
\(\mathbb{V}(X_i) = p(1-p)\)

By the CLT (for large \(n\)):

\[\hat{p} \sim \mathcal{N}\left(p, \frac{p(1-p)}{n}\right)\]

Standard Error of a Proportion

\[SE(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

CI for a proportion

\((1-\alpha)\)% Confidence Interval for p

\[\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

Rule of thumb for CLT validity:

\(n\hat{p} \geq 10\) and \(n(1-\hat{p}) \geq 10\)

Caution

For small samples or extreme proportions (close to 0 or 1), other methods may be more appropriate.

Example: Interpersonal Trust

Using ESS data, let’s estimate the proportion of French respondents who have high interpersonal trust (i.e., believe most people can be trusted).

# Create binary variable for high trust (> 5 on 0-10 scale)
# Filter out refusals (77), don't know (88), and no answer (99)
ess_trust_prop <- ess_france |> 
  filter(!ppltrst %in% c(77, 88)) |> 
  mutate(high_trust = ppltrst > 5)

# Compute proportion and CI
n_trust <- nrow(ess_trust_prop)
p_hat <- mean(ess_trust_prop$high_trust)
se_p <- sqrt(p_hat * (1 - p_hat) / n_trust)
z_crit <- qnorm(0.975)

ci_p_lower <- p_hat - z_crit * se_p
ci_p_upper <- p_hat + z_crit * se_p

p_hat

[1] 0.3198864

c(ci_p_lower, ci_p_upper)

[1] 0.2980952 0.3416775

Using `binom.confint()` in R

# Alternative: prop.test (no specific correction applied)
library(binom)
binom.confint(x = sum(ess_trust_prop$high_trust), 
              n = n_trust, 
              methods = "asymptotic")

      method   x    n      mean     lower     upper
1 asymptotic 563 1760 0.3198864 0.2980952 0.3416775

# manual CI
c(ci_p_lower, ci_p_upper)

[1] 0.2980952 0.3416775

# a common method: prop.test
prop.test(sum(ess_trust_prop$high_trust), n_trust, conf.level = 0.95)$conf.int

[1] 0.2982293 0.3423382
attr(,"conf.level")
[1] 0.95

Note: prop.test() uses the Wilson score interval, which is generally more accurate than the Wald interval we computed manually.

Your turn! #2

Create education groups and a binary variable for high interpersonal trust (ppltrst > 5):

ess_trust_educ <- ess_france |>
  filter(!ppltrst %in% c(77, 88),
         !is.na(education)) |>
  mutate(high_trust = ppltrst > 5)

Compute the proportion with high trust for each education group.
Compute the standard error for each proportion.
Compute 95% CIs for each education group.
Create a visualization showing point estimates and CIs.
Based on the CIs, do you think there are “true” differences in interpersonal trust across education levels? Why or why not?

Comparing proportions across education levels

Today’s lecture

The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?

4. Bootstrap CI

Bootstrap CI

Motivation: Why bootstrap?

Limitations of CLT-based CIs:

Require large samples for CLT to kick in
Assume specific parametric forms
Difficult for complex statistics (medians, ratios, correlations)

The bootstrap idea:

If we can’t sample repeatedly from the population, let’s sample repeatedly from our sample!

Key insight

The sample is our best representation of the population. Resampling from the sample mimics sampling from the population.

Bootstrap: the procedure

Bootstrap Algorithm

Original sample: Start with sample of size \(n\)
Resample: Draw \(n\) observations with replacement
Compute: Calculate statistic of interest
Repeat: Do this \(B\) times (typically \(B \geq 1000\))
CI: Two methods:
- Percentile method: Use the 2.5th and 97.5th percentiles of bootstrap distribution
- SE method: \(\hat{\theta} \pm z_{\alpha/2} \times SE_{boot}\), where \(SE_{boot}\) is the SD of bootstrap estimates

Our original sample

Question: How can we estimate the variability of \(\bar{x}\) without taking new samples from the population?

Building a bootstrap sample

We resample with replacement from our original sample:

Building a bootstrap sample

We resample with replacement from our original sample:

Building a bootstrap sample

We resample with replacement from our original sample:

Building a bootstrap sample

We resample with replacement from our original sample:

Building a bootstrap sample

We resample with replacement from our original sample:

Note

Each draw is independent and with replacement — the same observation can be selected multiple times!

Bootstrap resampling

Bootstrap in R with `infer`

Let’s repeat the resampling procedure 1,000 times: there will be 1,000 bootstrap samples and 1,000 bootstrap estimates!

We use the infer package to ease the bootstrapping procedure.

# Using our working hours data
bootstrap_means <- ess_france_clean |> 
  # specify the variable of interest
  specify(response = stflife) |> 
  # generate 1000 bootstrap samples
  generate(reps = 1000, type = "bootstrap") |> 
  # calculate the mean
  calculate(stat = "mean")

# View first few bootstrap means
head(bootstrap_means)

Response: stflife (numeric)
# A tibble: 6 × 2
  replicate  stat
      <int> <dbl>
1         1  6.81
2         2  6.74
3         3  6.76
4         4  6.74
5         5  6.75
6         6  6.83

nrow(bootstrap_means)

[1] 1000

Bootstrap distribution

mean(bootstrap_means$stat)

[1] 6.763766

Bootstrap distribution

mean(ess_france_clean$stflife)

[1] 6.765823

Bootstrap distribution: Percentile CI

quantile(bootstrap_means$stat, probs = c(0.025, 0.975))

    2.5%    97.5% 
6.658789 6.864873

Bootstrap distribution: SE method

mean(bootstrap_means$stat) + 1.96 * sd(bootstrap_means$stat)

[1] 6.871183

Bootstrap distribution: Comparison with theory

mean(ess_france_clean$stflife) + 1.96 * sd(ess_france_clean$stflife)/sqrt(nrow(ess_france_clean))

[1] 6.873691

Why the bootstrap works: Intuition

The bootstrap principle: The empirical distribution of the sample is a good approximation of the population distribution.

Bootstrap for complex statistics

The power of bootstrap: Works for any statistic!

# Bootstrap CI for the median
boot_median <- ess_france_clean %>%
  specify(response = stflife) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "median")

boot_median %>% get_ci(level = 0.95, type = "percentile")

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1        7        7

# Bootstrap CI for the standard deviation
boot_sd <- ess_france_clean %>%
  specify(response = stflife) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "sd")

boot_sd %>% get_ci(level = 0.95, type = "percentile")

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     2.20     2.38

Your Turn! #3

Compute the sample mean, standard deviation, and standard error for satisfaction with democracy (stfdem) in the ESS France data.
Compute the t-based 95% CI for mean satisfaction manually and using t.test().
Compute the bootstrap 95% CI using the infer package (use 1000 reps):

bootstrap_stfdem <- ess_france_clean |>
  specify(response = stfdem) |>
  generate(reps = 1000, type = "bootstrap") |>
  calculate(stat = "mean")

Visualize the bootstrap distribution with a histogram.
Compare the two methods — how similar are the CIs? Why might they differ?

Today’s lecture

The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?

5. Comparing two groups

Comparing two groups

Key distinctions in group comparisons

When comparing two groups, we need to consider three important dimensions:

1. What are we comparing?

Means: Continuous variables (e.g., average income, test scores)
- Use: t-tests, z-tests
Proportions: Binary/categorical outcomes (e.g., % with high trust)
- Use: proportion tests

2. What do we know about variability?

Known variance (\(\sigma^2\)): rarely in practice, use z-distribution
- \(SE = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\)
Unknown variance: most realistic case, use t-distribution
- \(SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)

Key distinctions in group comparisons

3. How are the samples related?

Independent samples: Two separate groups (e.g., men vs women, treatment vs control)
- Standard errors add: \(SE(\bar{x}_1 - \bar{x}_2) = \sqrt{SE_1^2 + SE_2^2}\)
Dependent samples: Paired/matched observations (e.g., before-after, twins)
- Analyze differences: \(\bar{d} = \frac{1}{n}\sum(x_i - y_i)\), then \(SE(\bar{d}) = \frac{s_d}{\sqrt{n}}\)

CI for difference in means

Question: Is there a difference in life satisfaction between men and women in France?

Setup:

Population 1 (men): mean \(\mu_M\), sample mean \(\bar{x}_M\), sample size \(n_M\)
Population 2 (women): mean \(\mu_F\), sample mean \(\bar{x}_F\), sample size \(n_F\)

Parameter of interest: \(\mu_M - \mu_F\)

Independent samples: the two samples are independent (pretty reasonable)

CI for Difference in Means (known \(\sigma\))

\[(\bar{x}_1 - \bar{x}_2) \pm z_{\alpha/2} \times \sqrt{\frac{\sigma_M^2}{n_M} + \frac{\sigma_F^2}{n_F}}\]

CI for difference in means

Question: Is there a difference in life satisfaction between men and women in France?

Setup:

Population 1 (men): mean \(\mu_M\), sample mean \(\bar{x}_M\), sample size \(n_M\)
Population 2 (women): mean \(\mu_F\), sample mean \(\bar{x}_F\), sample size \(n_F\)

Parameter of interest: \(\mu_M - \mu_F\)

Independent samples: the two samples are independent (pretty reasonable)

CI for Difference in Means (unknown \(\sigma\))

\[(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2,df} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]

Welch–Satterthwaite df

CI for Difference in Means (unknown \(\sigma\))

\[(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2,df} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]

The degrees of freedom \((df)\) is this case is given by the Welch–Satterthwaite equation:

\[ df = \frac{\left(\frac{s_M^2}{n_M} + \frac{s_F^2}{n_F}\right)^2}{\frac{\left(\frac{s_M^2}{n_M}\right)^2}{n_M-1} + \frac{\left(\frac{s_F^2}{n_F}\right)^2}{n_F-1}} \]

This is what t.test does by default (see help).

Special case: equal variance

If we assume \(\sigma_M^2\) = \(\sigma_F^2\), the pooled variance becomes:

\[\begin{align*} \mathbb{V}(\bar{X}_M - \bar{X}_F) &= \mathbb{V}(\bar{X}_M) + \mathbb{V}(\bar{X}_F) \\ &= \frac{\sigma^2}{n_M} + \frac{\sigma^2}{n_F} \\ &= \sigma^2 \left(\frac{1}{n_M} + \frac{1}{n_F} \right) \end{align*}\]

Natural unbiased estimator is a weighted average of the two sample variances:

\[ s_p^2 = \frac{(n_M-1)s_M^2 + (n_F-1)s_F^2}{n_M+n_F-2} \]

\[\begin{align*} &\Rightarrow \widehat{\mathbb{V}(\bar{X}_M - \bar{X}_F)} = s_p^2\left( \frac{1}{n_M} + \frac{1}{n_F} \right) \\ &\Rightarrow \boxed{\hat{SE}_{pooled} = s_p \sqrt{\left( \frac{1}{n_M} + \frac{1}{n_F} \right)}} \end{align*}\]

The critical \(t\)-value has \(n_M + n_F - 2\) degrees of freedom.

Example: gender gap in life satisfaction

# Compare life satisfaction by gender
gender_summary <- ess_france_clean |> 
  mutate(gender = ifelse(gndr == 1, "male", "female")) |> 
  summarise(n = n(),
            mean = mean(stflife),
            sd = sd(stflife),
            se = sd^2 / n, .by = gender) |> 
  pivot_wider(names_from = gender,
              values_from = c(n, mean, sd, se))

diff_mean = gender_summary$mean_male - gender_summary$mean_female
n_m = gender_summary$n_male
n_f = gender_summary$n_female
var_m = gender_summary$sd_male^2
var_f = gender_summary$sd_female^2
se_m = gender_summary$se_male
se_f = gender_summary$se_female
se_pooled = sqrt(se_m + se_f)
df_welch = (se_m + se_f)^2/(((se_m)^2/(n_m -1)) + ((se_f)^2/(n_f -1)))

t_stat = qt(0.975, df = df_welch)
c(diff_mean - t_stat * se_pooled, diff_mean + t_stat * se_pooled)

[1] -0.07055206  0.36130738

# Two-sample t-test
t.test(stflife ~ gndr, data = ess_france_clean)


    Welch Two Sample t-test

data:  stflife by gndr
t = 1.3205, df = 1730.8, p-value = 0.1868
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
 -0.07055206  0.36130738
sample estimates:
mean in group 1 mean in group 2 
       6.839348        6.693970

Visualizing the difference

Interpretation: If the CIs overlap substantially, we can’t confidently conclude the means differ.

Bootstrap for difference in means

# Bootstrap approach
boot_diff <- ess_france_clean |> 
  mutate(gender = ifelse(gndr == 1, "Male", "Female")) |> 
  specify(stflife ~ gender) |> 
  generate(reps = 1000, type = "bootstrap") |> 
  calculate(stat = "diff in means", order = c("Male", "Female"))

# Get CI
boot_diff |> get_ci(level = 0.95, type = "percentile")

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1  -0.0542    0.376

CI for difference in means: paired samples

When samples are dependent (paired):

Paired data: Two measurements on the same individuals (before/after, twins, matched pairs)
Ex: pre-test vs post-test scores, trust in parliament vs trust in politicians

Key insight: instead of comparing two independent samples, we analyze the differences within each pair!

Approach:

Compute differences: \(d_i = x_i - y_i\) for each pair
Calculate mean difference: \(\bar{d} = \frac{1}{n}\sum_{i=1}^n d_i\)
Calculate SD of differences: \(s_d = \sqrt{\frac{1}{n-1}\sum_{i=1}^n(d_i - \bar{d})^2}\)

CI for difference in means: paired samples

CI for Paired Difference in Means

\[\bar{d} \pm t_{\alpha/2, n-1} \times \frac{s_d}{\sqrt{n}}\]

Example: trust in parliament vs politicians

Question: Do French adults trust parliament more than politicians?

# Prepare paired data (trust in parliament vs trust in politicians)
ess_paired <- ess_france_clean |>
  filter(!trstprl %in% c(77, 88),  # trust parliament
         !trstplt %in% c(77, 88))   # trust politicians

# Create differences
ess_paired <- ess_paired %>%
  mutate(diff = trstprl - trstplt)  # positive = more trust in parliament

# Summary
ess_paired |>
  summarise(mean_parliament = mean(trstprl),
            mean_politicians = mean(trstplt),
            mean_difference = mean(diff))

  mean_parliament mean_politicians mean_difference
1        4.158269         3.766449       0.3918198

Question: is there a difference between these two variables?

Manual calculation: Paired t-test

# Summary statistics
n_paired <- nrow(ess_paired)
mean_diff <- mean(ess_paired$diff)
sd_diff <- sd(ess_paired$diff)
se_diff <- sd_diff / sqrt(n_paired)

t_crit <- qt(0.975, df = n_paired - 1)

ci_lower <- mean_diff - t_crit * se_diff
ci_upper <- mean_diff + t_crit * se_diff

c(ci_lower, ci_upper)

[1] 0.2956501 0.4879895

Interpretation: We are 95% confident that French adults trust parliament between 0.3 and 0.49 points more than politicians (on a 0-10 scale).

Using `t.test()` for paired samples

# Using t.test with paired = TRUE
t.test(ess_paired$trstprl, ess_paired$trstplt,
       paired = TRUE,
       conf.level = 0.95)


    Paired t-test

data:  ess_paired$trstprl and ess_paired$trstplt
t = 7.9911, df = 1686, p-value = 2.457e-15
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 0.2956501 0.4879895
sample estimates:
mean difference 
      0.3918198

Why does pairing matter?

Paired design controls for individual differences (some people are generally more/less trusting)
This reduces variability and gives more precise estimates
More statistical power to detect differences!

CI for difference in proportions: Theory

Setting: We want to compare proportions between two independent groups

Group 1: proportion \(p_1\), sample proportion \(\hat{p}_1\), sample size \(n_1\)
Group 2: proportion \(p_2\), sample proportion \(\hat{p}_2\), sample size \(n_2\)

Parameter of interest: \(p_1 - p_2\)

Sampling distribution:

By the CLT (for large samples), each sample proportion follows:

\[\hat{p}_1 \sim \mathcal{N}\left(p_1, \frac{p_1(1-p_1)}{n_1}\right) \quad \text{and} \quad \hat{p}_2 \sim \mathcal{N}\left(p_2, \frac{p_2(1-p_2)}{n_2}\right)\]

Therefore, the difference follows:

\[\hat{p}_1 - \hat{p}_2 \sim \mathcal{N}\left(p_1 - p_2, \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}\right)\]

CI for difference in proportions: Formula

Standard error of the difference:

\[SE(\hat{p}_1 - \hat{p}_2) = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]

\((1-\alpha)\)% Confidence Interval for \(p_1 - p_2\)

\[(\hat{p}_1 - \hat{p}_2) \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]

Rule of thumb for CLT validity:

\(n_1\hat{p}_1 \geq 10\), \(n_1(1-\hat{p}_1) \geq 10\)
\(n_2\hat{p}_2 \geq 10\), \(n_2(1-\hat{p}_2) \geq 10\)

CI for difference in proportions: Example

Question: Is the proportion of people with high interpersonal trust different by gender?

# Create binary variable for high trust (> 5 on 0-10 scale)
ess_trust <- ess_france_clean |> 
  filter(!ppltrst %in% c(77, 88)) |> 
  mutate(high_trust = ppltrst > 5,
         gender = ifelse(gndr == 1, "Male", "Female"))

# Compare proportions by gender
prop_comparison <- ess_trust |> 
  summarise(n = n(),
            p = mean(high_trust),
            se = sqrt(p * (1 - p) / n),
            .by = gender)
prop_comparison

  gender   n         p         se
1   Male 855 0.3403509 0.01620455
2 Female 872 0.2993119 0.01550837

Visualizing proportions with confidence intervals

Interpretation: The confidence intervals overlap substantially, suggesting no significant difference in high trust proportions between males and females in France.

CI for difference in proportions: Example

Question: Is the proportion of people with high interpersonal trust different by gender?

# Two-sample proportion test
male_trust <- sum(ess_trust$high_trust[ess_trust$gender == "Male"])
female_trust <- sum(ess_trust$high_trust[ess_trust$gender == "Female"])
n_male <- sum(ess_trust$gender == "Male")
n_female <- sum(ess_trust$gender == "Female")

diff_trust <- male_trust/n_male - female_trust/n_female
se = sqrt((male_trust/n_male * (1 - male_trust/n_male))/n_male + (female_trust/n_female * (1 - female_trust/n_female))/n_female)

c(diff_trust - 1.96 * se, diff_trust + 1.96 * se)

[1] -0.002923497  0.085001398

prop.test(c(male_trust, female_trust), c(n_male, n_female))


    2-sample test for equality of proportions with continuity correction

data:  c(male_trust, female_trust) out of c(n_male, n_female)
X-squared = 3.1574, df = 1, p-value = 0.07559
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.004080879  0.086158780
sample estimates:
   prop 1    prop 2 
0.3403509 0.2993119

Your turn! #4

Using the same data as for Your Turn #2 (slide 35), investigate whether interpersonal trust (ppltrst) differs by education level.

ess_trust_educ <- ess_france |>
  filter(!ppltrst %in% c(77, 88),
         education != "") |>
  mutate(high_trust = ppltrst > 5,
         edu_cat = case_when(education == "High school diploma or less" ~ "High school diploma or less",
                             education != "High school diploma or less" ~ "Some higher eduation or more"))

Create a visualization comparing the groups with their associated 95% confidence intervals.
Compute the 95% confidence interval for the difference in proportions using t.test().
What can you deduce about the difference by education level?

Today’s lecture

The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?