L2 - Statistics
2026-02-02
Or click here: link to Wooclap
The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?
The problem with point estimates:
In real life we only get to take one sample from the population
Also, we obviously don’t know the true population parameter, that’s what we are interested in!
Even unobserved, we know that the sampling distribution does exist, and even better, we know how it behaves!
Key insight: A single point estimate tells us nothing about how precise or reliable it is!
A confidence interval for \(\theta\) is an estimated interval that covers the true value of \(\theta\) with at least a given probability.
For example: if we were to take an i.i.d. sample of size \(n\) of a random variable \(X\) and compute a 95% confidence interval for the population mean \(\mathbb{E}[X]\), then with 95% probability, this interval will include \(\mathbb{E}[X]\).
Definition
A confidence interval for \(\theta\) with coverage \((1 - \alpha)\) is a random interval \(CI_{1-\alpha}(\theta)\), such that \(Pr[\theta \in CI_{1-\alpha}(\theta)] \geq 1-\alpha\).
For any given \(\alpha \in (0,1)\), we say that the confidence level is \(100(1-\alpha)\%\) and that \(CI_{1-\alpha}(\theta)\) is a \(100(1-\alpha)\%\) confidence interval for \(\theta\).
Intuition: Instead of saying “the mean is 65,” we say:
“We are 95% confident that the true mean lies between 62 and 68”
Common misconception
A 95% CI does not mean “there’s a 95% probability the true parameter is in this interval.”
\(\rightarrow\) the true parameter is either in the interval or not!
What 95% confidence actually means:
If we repeated our sampling procedure many times and computed a CI each time, 95% of those intervals would contain the true parameter.
There are several approaches to constructing confidence intervals:
We will cover both methods in this lecture. Let’s start with theory and then cover bootstrap.
\[\text{CI} = {\color{#648FFF}{\text{Point Estimate}}} \pm \color{#DC267F}{\text{Margin of Error}}\] where:
\[{\color{#DC267F}{\text{Margin of Error}}} = {\color{#785EF0}{\text{Critical Value}}} \times \color{#FFB000}{\text{Standard Error}}\]
Components:
For a \((1-\alpha)\)% confidence interval, we need to find \(z_{\alpha/2}\) such that the middle \((1-\alpha)\) of the standard normal distribution lies between \(-z_{\alpha/2}\) and \(+z_{\alpha/2}\).
We want to construct an interval that captures the true parameter with probability \(1-\alpha\) (e.g., if \(\alpha = 0.05 \rightarrow 95\%\)).
The notation: \(z_p\) denotes the z-value with probability \(p\) in the upper tail:
\[P(Z > z_p) = p\]
The logic:
We split the “error probability” \(\alpha\) equally between both tails of the distribution
Each tail gets \(\alpha/2\) of the probability
So we need \(z_{\alpha/2}\): the z-value with \(\alpha/2\) probability in the upper tail
What if we used \(z_{\alpha}\) instead of \(z_{\alpha/2}\)?
For a symmetric confidence interval around our estimate, we need to split \(\alpha\) between both tails → use \(z_{\alpha/2}\)!
| Confidence Level | Critical Value (\(z^*\)) | Margin of Error |
|---|---|---|
| 90% | 1.645 | Narrower |
| 95% | 1.960 | Medium |
| 99% | 2.576 | Wider |
The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?
1. Confidence intervals: a primer
2. CI for a population mean: theory-based
3. CI for a proportion: theory-based
Setting: \(X_1, \ldots, X_n\) i.i.d. with unknown mean \(\mu\) and known variance \(\sigma^2\)
By the CLT, for large \(n\): \[\bar{X}_n \sim \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right)\]
95% Confidence Interval for \(\mu\) (\(\sigma\) known)
\[\bar{x} \pm 1.96 \times \frac{\sigma}{\sqrt{n}}\]
General formula for \((1-\alpha)\)% CI: \[\bar{x} \pm z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}\]
where \(z_{\alpha/2}\) is the \((1-\alpha/2)\) quantile of \(\mathcal{N}(0,1)\)
We know that \(\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim \mathcal{N}(0,1)\), so by definition of \(z_{\alpha/2}\):
\[1 - \alpha = \mathbb{P}\left( -z_{\alpha/2} \leq \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \leq z_{\alpha/2} \right)\]
Now we solve for \(\mu\) in the inequalities:
\[= \mathbb{P}\left( -z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \leq \bar{X} - \mu \leq z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right)\]
\[= \mathbb{P}\left( -\bar{X} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \leq - \mu \leq -\bar{X} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right)\]
\[= \mathbb{P}\left( \bar{X} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right)\]
We therefore have: \(\boxed{\mathbb{P}\left( \bar{X} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right) = 1 - \alpha}\)
This shows that the interval \(\left[ \bar{X} - z_{\alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{X} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \right]\) contains \(\mu\) with probability \(1 - \alpha\). ✓
Problem: A sample of \(n = 36\) students has a mean grade of \(\bar{x} = 11\). Assuming grades have a known standard deviation \(\sigma = 2\), compute a 95% CI for the population mean.
Manual calculation:
\[11 \pm 1.96 \times \frac{2}{\sqrt{36}} = [10.347, 11.653]\]
Interpretation: we are 95% confident that the true mean IQ of the population is between 10.35 and 11.65.
The realistic case: We almost never know \(\sigma\)!
Solution: Replace \(\sigma\) with the sample standard deviation \(s\)
\[s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}\]
As we saw last week, our standardized statistic follows a t-distribution with \(n-1\) degrees of freedom:
\[\frac{\bar{X}_n - \mu}{s/\sqrt{n}} \sim t_{n-1}\]
Key insight: Heavier tails → wider CIs → accounts for uncertainty in estimating \(\sigma\)
| df | 90% (α = 0.10) | 95% (α = 0.05) | 99% (α = 0.01) |
|---|---|---|---|
| 1 | 6.314 | 12.706 | 63.657 |
| 2 | 2.920 | 4.303 | 9.925 |
| 3 | 2.353 | 3.182 | 5.841 |
| 4 | 2.132 | 2.776 | 4.604 |
| 5 | 2.015 | 2.571 | 4.032 |
| 10 | 1.812 | 2.228 | 3.169 |
| 15 | 1.753 | 2.131 | 2.947 |
| 20 | 1.725 | 2.086 | 2.845 |
| 25 | 1.708 | 2.060 | 2.787 |
| 30 | 1.697 | 2.042 | 2.750 |
| 50 | 1.676 | 2.009 | 2.678 |
| 100 | 1.660 | 1.984 | 2.626 |
| ∞ (Normal) | 1.645 | 1.960 | 2.576 |
In R: Use qt(1 - α/2, df) to get critical values, e.g., qt(0.975, df = 29) for 95% CI with n = 30.
\((1-\alpha)\)% Confidence Interval for \(\mu\) (\(\sigma\) unknown)
\[\boxed{\bar{x} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}}}\]
where \(t_{\alpha/2, n-1}\) is the \((1-\alpha/2)\) quantile of \(t_{n-1}\)
Note: For \(n > 30\), \(t_{n-1} \approx \mathcal{N}(0,1)\), so using \(z\) instead of \(t\) is often acceptable.
Let’s use data from the European Social Survey (Round 11, 2023-2024) to estimate the 96% confidence interval for average life satisfaction in France. The data can be found here.
Method 1: Manual calculation
# Extract summary stats
n <- nrow(ess_france_clean)
x_bar <- mean(ess_france_clean$stflife)
s <- sd(ess_france_clean$stflife)
se <- s / sqrt(n)
# Critical value for 95% CI
t_crit <- qt(0.975, df = n - 1)
# Confidence interval
ci_lower <- x_bar - t_crit * se
ci_upper <- x_bar + t_crit * se
c(ci_lower, ci_upper)[1] 6.657882 6.873764
Interpretation
We are 95% confident that the true average life satisfaction for adults in France is between 6.7 and 6.9 (on a 0-10 scale).
What this means practically:
Remember: The confidence is about the procedure, not this specific interval!
Read the documentation on the Trust in politicians trstplt variable here. Drop the values corresponding refusal, don’t know and no answer.
Compute the sample mean trust in politicians, the sample standard deviation and the standard error.
Find the critical value for a 90% confidence interval.
Compute the 90% confidence interval manually.
Use the t.test() command to compute the 90% confidence interval directly.
Interpret your results.
The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?
1. Confidence intervals: a primer
2. CI for a population mean: theory-based
3. CI for a proportion: theory-based
Setting: We want to estimate a population proportion \(p\)
Examples:
Key insight: A proportion is just a mean of 0s and 1s!
If \(X_i \in \{0, 1\}\) with \(P(X_i = 1) = p\), then: \[\hat{p} = \bar{X} = \frac{\text{number of successes}}{n}\]
Since \(X_i \sim \text{Bernoulli}(p)\):
By the CLT (for large \(n\)):
\[\hat{p} \sim \mathcal{N}\left(p, \frac{p(1-p)}{n}\right)\]
Standard Error of a Proportion
\[SE(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
\((1-\alpha)\)% Confidence Interval for p
\[\hat{p} \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
Rule of thumb for CLT validity:
Caution
For small samples or extreme proportions (close to 0 or 1), other methods may be more appropriate.
Using ESS data, let’s estimate the proportion of French respondents who have high interpersonal trust (i.e., believe most people can be trusted).
# Create binary variable for high trust (> 5 on 0-10 scale)
# Filter out refusals (77), don't know (88), and no answer (99)
ess_trust_prop <- ess_france |>
filter(!ppltrst %in% c(77, 88)) |>
mutate(high_trust = ppltrst > 5)
# Compute proportion and CI
n_trust <- nrow(ess_trust_prop)
p_hat <- mean(ess_trust_prop$high_trust)
se_p <- sqrt(p_hat * (1 - p_hat) / n_trust)
z_crit <- qnorm(0.975)
ci_p_lower <- p_hat - z_crit * se_p
ci_p_upper <- p_hat + z_crit * se_p
p_hat[1] 0.3198864
[1] 0.2980952 0.3416775
binom.confint() in R# Alternative: prop.test (no specific correction applied)
library(binom)
binom.confint(x = sum(ess_trust_prop$high_trust),
n = n_trust,
methods = "asymptotic") method x n mean lower upper
1 asymptotic 563 1760 0.3198864 0.2980952 0.3416775
[1] 0.2980952 0.3416775
# a common method: prop.test
prop.test(sum(ess_trust_prop$high_trust), n_trust, conf.level = 0.95)$conf.int[1] 0.2982293 0.3423382
attr(,"conf.level")
[1] 0.95
Note: prop.test() uses the Wilson score interval, which is generally more accurate than the Wald interval we computed manually.
ppltrst > 5):Compute the proportion with high trust for each education group.
Compute the standard error for each proportion.
Compute 95% CIs for each education group.
Create a visualization showing point estimates and CIs.
Based on the CIs, do you think there are “true” differences in interpersonal trust across education levels? Why or why not?
The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?
1. Confidence intervals: a primer
2. CI for a population mean: theory-based
3. CI for a proportion: theory-based
Limitations of CLT-based CIs:
The bootstrap idea:
If we can’t sample repeatedly from the population, let’s sample repeatedly from our sample!
Key insight
The sample is our best representation of the population. Resampling from the sample mimics sampling from the population.
Bootstrap Algorithm
Question: How can we estimate the variability of \(\bar{x}\) without taking new samples from the population?
We resample with replacement from our original sample:
We resample with replacement from our original sample:
We resample with replacement from our original sample:
We resample with replacement from our original sample:
We resample with replacement from our original sample:
Note
Each draw is independent and with replacement — the same observation can be selected multiple times!
inferinfer package to ease the bootstrapping procedure.The bootstrap principle: The empirical distribution of the sample is a good approximation of the population distribution.
The power of bootstrap: Works for any statistic!
Compute the sample mean, standard deviation, and standard error for satisfaction with democracy (stfdem) in the ESS France data.
Compute the t-based 95% CI for mean satisfaction manually and using t.test().
Compute the bootstrap 95% CI using the infer package (use 1000 reps):
Visualize the bootstrap distribution with a histogram.
Compare the two methods — how similar are the CIs? Why might they differ?
The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?
1. Confidence intervals: a primer
2. CI for a population mean: theory-based
3. CI for a proportion: theory-based
When comparing two groups, we need to consider three important dimensions:
1. What are we comparing?
2. What do we know about variability?
3. How are the samples related?
Question: Is there a difference in life satisfaction between men and women in France?
Setup:
Parameter of interest: \(\mu_M - \mu_F\)
Independent samples: the two samples are independent (pretty reasonable)
CI for Difference in Means (known \(\sigma\))
\[(\bar{x}_1 - \bar{x}_2) \pm z_{\alpha/2} \times \sqrt{\frac{\sigma_M^2}{n_M} + \frac{\sigma_F^2}{n_F}}\]
Question: Is there a difference in life satisfaction between men and women in France?
Setup:
Parameter of interest: \(\mu_M - \mu_F\)
Independent samples: the two samples are independent (pretty reasonable)
CI for Difference in Means (unknown \(\sigma\))
\[(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2,df} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]
CI for Difference in Means (unknown \(\sigma\))
\[(\bar{x}_1 - \bar{x}_2) \pm t_{\alpha/2,df} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]
The degrees of freedom \((df)\) is this case is given by the Welch–Satterthwaite equation:
\[ df = \frac{\left(\frac{s_M^2}{n_M} + \frac{s_F^2}{n_F}\right)^2}{\frac{\left(\frac{s_M^2}{n_M}\right)^2}{n_M-1} + \frac{\left(\frac{s_F^2}{n_F}\right)^2}{n_F-1}} \]
This is what t.test does by default (see help).
If we assume \(\sigma_M^2\) = \(\sigma_F^2\), the pooled variance becomes:
\[\begin{align*} \mathbb{V}(\bar{X}_M - \bar{X}_F) &= \mathbb{V}(\bar{X}_M) + \mathbb{V}(\bar{X}_F) \\ &= \frac{\sigma^2}{n_M} + \frac{\sigma^2}{n_F} \\ &= \sigma^2 \left(\frac{1}{n_M} + \frac{1}{n_F} \right) \end{align*}\]
Natural unbiased estimator is a weighted average of the two sample variances:
\[ s_p^2 = \frac{(n_M-1)s_M^2 + (n_F-1)s_F^2}{n_M+n_F-2} \]
\[\begin{align*} &\Rightarrow \widehat{\mathbb{V}(\bar{X}_M - \bar{X}_F)} = s_p^2\left( \frac{1}{n_M} + \frac{1}{n_F} \right) \\ &\Rightarrow \boxed{\hat{SE}_{pooled} = s_p \sqrt{\left( \frac{1}{n_M} + \frac{1}{n_F} \right)}} \end{align*}\]
The critical \(t\)-value has \(n_M + n_F - 2\) degrees of freedom.
# Compare life satisfaction by gender
gender_summary <- ess_france_clean |>
mutate(gender = ifelse(gndr == 1, "male", "female")) |>
summarise(n = n(),
mean = mean(stflife),
sd = sd(stflife),
se = sd^2 / n, .by = gender) |>
pivot_wider(names_from = gender,
values_from = c(n, mean, sd, se))
diff_mean = gender_summary$mean_male - gender_summary$mean_female
n_m = gender_summary$n_male
n_f = gender_summary$n_female
var_m = gender_summary$sd_male^2
var_f = gender_summary$sd_female^2
se_m = gender_summary$se_male
se_f = gender_summary$se_female
se_pooled = sqrt(se_m + se_f)
df_welch = (se_m + se_f)^2/(((se_m)^2/(n_m -1)) + ((se_f)^2/(n_f -1)))
t_stat = qt(0.975, df = df_welch)
c(diff_mean - t_stat * se_pooled, diff_mean + t_stat * se_pooled)[1] -0.07055206 0.36130738
Welch Two Sample t-test
data: stflife by gndr
t = 1.3205, df = 1730.8, p-value = 0.1868
alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
95 percent confidence interval:
-0.07055206 0.36130738
sample estimates:
mean in group 1 mean in group 2
6.839348 6.693970
Interpretation: If the CIs overlap substantially, we can’t confidently conclude the means differ.
# Bootstrap approach
boot_diff <- ess_france_clean |>
mutate(gender = ifelse(gndr == 1, "Male", "Female")) |>
specify(stflife ~ gender) |>
generate(reps = 1000, type = "bootstrap") |>
calculate(stat = "diff in means", order = c("Male", "Female"))
# Get CI
boot_diff |> get_ci(level = 0.95, type = "percentile")# A tibble: 1 × 2
lower_ci upper_ci
<dbl> <dbl>
1 -0.0542 0.376
When samples are dependent (paired):
Key insight: instead of comparing two independent samples, we analyze the differences within each pair!
Approach:
CI for Paired Difference in Means
\[\bar{d} \pm t_{\alpha/2, n-1} \times \frac{s_d}{\sqrt{n}}\]
Question: Do French adults trust parliament more than politicians?
# Prepare paired data (trust in parliament vs trust in politicians)
ess_paired <- ess_france_clean |>
filter(!trstprl %in% c(77, 88), # trust parliament
!trstplt %in% c(77, 88)) # trust politicians
# Create differences
ess_paired <- ess_paired %>%
mutate(diff = trstprl - trstplt) # positive = more trust in parliament
# Summary
ess_paired |>
summarise(mean_parliament = mean(trstprl),
mean_politicians = mean(trstplt),
mean_difference = mean(diff)) mean_parliament mean_politicians mean_difference
1 4.158269 3.766449 0.3918198
Question: is there a difference between these two variables?
# Summary statistics
n_paired <- nrow(ess_paired)
mean_diff <- mean(ess_paired$diff)
sd_diff <- sd(ess_paired$diff)
se_diff <- sd_diff / sqrt(n_paired)
t_crit <- qt(0.975, df = n_paired - 1)
ci_lower <- mean_diff - t_crit * se_diff
ci_upper <- mean_diff + t_crit * se_diff
c(ci_lower, ci_upper)[1] 0.2956501 0.4879895
Interpretation: We are 95% confident that French adults trust parliament between 0.3 and 0.49 points more than politicians (on a 0-10 scale).
t.test() for paired samples# Using t.test with paired = TRUE
t.test(ess_paired$trstprl, ess_paired$trstplt,
paired = TRUE,
conf.level = 0.95)
Paired t-test
data: ess_paired$trstprl and ess_paired$trstplt
t = 7.9911, df = 1686, p-value = 2.457e-15
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
0.2956501 0.4879895
sample estimates:
mean difference
0.3918198
Why does pairing matter?
Setting: We want to compare proportions between two independent groups
Parameter of interest: \(p_1 - p_2\)
Sampling distribution:
By the CLT (for large samples), each sample proportion follows:
\[\hat{p}_1 \sim \mathcal{N}\left(p_1, \frac{p_1(1-p_1)}{n_1}\right) \quad \text{and} \quad \hat{p}_2 \sim \mathcal{N}\left(p_2, \frac{p_2(1-p_2)}{n_2}\right)\]
Therefore, the difference follows:
\[\hat{p}_1 - \hat{p}_2 \sim \mathcal{N}\left(p_1 - p_2, \frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}\right)\]
Standard error of the difference:
\[SE(\hat{p}_1 - \hat{p}_2) = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]
\((1-\alpha)\)% Confidence Interval for \(p_1 - p_2\)
\[(\hat{p}_1 - \hat{p}_2) \pm z_{\alpha/2} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\]
Rule of thumb for CLT validity:
Question: Is the proportion of people with high interpersonal trust different by gender?
# Create binary variable for high trust (> 5 on 0-10 scale)
ess_trust <- ess_france_clean |>
filter(!ppltrst %in% c(77, 88)) |>
mutate(high_trust = ppltrst > 5,
gender = ifelse(gndr == 1, "Male", "Female"))
# Compare proportions by gender
prop_comparison <- ess_trust |>
summarise(n = n(),
p = mean(high_trust),
se = sqrt(p * (1 - p) / n),
.by = gender)
prop_comparison gender n p se
1 Male 855 0.3403509 0.01620455
2 Female 872 0.2993119 0.01550837
Interpretation: The confidence intervals overlap substantially, suggesting no significant difference in high trust proportions between males and females in France.
Question: Is the proportion of people with high interpersonal trust different by gender?
# Two-sample proportion test
male_trust <- sum(ess_trust$high_trust[ess_trust$gender == "Male"])
female_trust <- sum(ess_trust$high_trust[ess_trust$gender == "Female"])
n_male <- sum(ess_trust$gender == "Male")
n_female <- sum(ess_trust$gender == "Female")
diff_trust <- male_trust/n_male - female_trust/n_female
se = sqrt((male_trust/n_male * (1 - male_trust/n_male))/n_male + (female_trust/n_female * (1 - female_trust/n_female))/n_female)
c(diff_trust - 1.96 * se, diff_trust + 1.96 * se)[1] -0.002923497 0.085001398
2-sample test for equality of proportions with continuity correction
data: c(male_trust, female_trust) out of c(n_male, n_female)
X-squared = 3.1574, df = 1, p-value = 0.07559
alternative hypothesis: two.sided
95 percent confidence interval:
-0.004080879 0.086158780
sample estimates:
prop 1 prop 2
0.3403509 0.2993119
Using the same data as for Your Turn #2 (slide 35), investigate whether interpersonal trust (ppltrst) differs by education level.
Create a visualization comparing the groups with their associated 95% confidence intervals.
Compute the 95% confidence interval for the difference in proportions using t.test().
What can you deduce about the difference by education level?
The big question: We can compute sample statistics, but how confident should we be that they reflect the true population parameters?
1. Confidence intervals: a primer
2. CI for a population mean: theory-based
3. CI for a proportion: theory-based
| Method | Best for | Requirements |
|---|---|---|
| z-interval | Large n, known σ | n > 30, σ known |
| t-interval | Means, unknown σ | Approx. normal or n > 30 |
| Proportion z | Proportions | np ≥ 10, n(1-p) ≥ 10 |
| Bootstrap | Any statistic | Representative sample |
Don’t say:
❌ “There’s a 95% probability the true mean is in this interval”
❌ “95% of the data falls in this interval”
❌ “If we repeated this study, we’d get the same interval 95% of the time”
Do say:
✅ “We are 95% confident that the true mean is between X and Y”
✅ “This interval was constructed using a procedure that captures the true parameter 95% of the time”
Good practice in reporting:
The average life satisfaction among French adults was 7.0 (95% CI: 6.9 to 7.1) on a 0-10 scale. This estimate is based on a sample of 1,836 respondents from the European Social Survey (Round 11, 2023-2024).
Include:
| Parameter | Point Estimate | Standard Error | CI Formula |
|---|---|---|---|
| Mean (σ known) | \(\bar{x}\) | \(\sigma/\sqrt{n}\) | \(\bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}\) |
| Mean (σ unknown) | \(\bar{x}\) | \(s/\sqrt{n}\) | \(\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}\) |
| Proportion | \(\hat{p}\) | \(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\) | \(\hat{p} \pm z^* \cdot SE\) |
| Diff. in means | \(\bar{x}_1 - \bar{x}_2\) | \(\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\) | \((\bar{x}_1 - \bar{x}_2) \pm t^* \cdot SE\) |
| Diff. in proportions | \(\hat{p}_1 - \hat{p}_2\) | \(\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\) | \((\hat{p}_1 - \hat{p}_2) \pm z^* \cdot SE\) |
Putting it all together (15 minutes)
Design and conduct a complete analysis:
Examples:
trstplc) between age groupshappy) by genderstfeco) across education levelsLecture 4: Confidence Intervals