Lecture 3: Sampling and Inferential Statistics

L2 - Statistics

Gustave Kenedi

2026-02-06

Recap quiz

Or click here: link to Wooclap

Today’s lecture

1. Sampling: an example

2. Statistical inference

3. Quantifying uncertainty

Sampling: an example

Political polling

Imagine you are the campaign manager of a new political candidate: Pumpernickel. She wants to know the share of the population who will vote for her. How would you proceed?

Ask every single voter in the country who they intend to vote for?

\(\rightarrow\) Too costly: only survey a fraction or sample of the population.

What characteristics of the sample?

\(\rightarrow\) How many people do you survey? How would you select them?

Once you’ve conducted your survey, would you get the exact share of voters?

\(\rightarrow\) Probably not: you obtain an estimate of this share.

Population vs. sample

Definitions

Population: the entire group about which we want information, denoted \(N\).
Sample: the fraction of the population for which we collect information, denoted \(n\).
Sampling rate: ratio between the sample size and the population size, denoted \(t = n/N\).
Sampling error: the error incurred when the statistical characteristics of a population are estimated from a sample of that population.

Goal: use information from the sample to draw conclusions about the population as a whole. In order to ensure accurate inferences about the population, the sample needs to be representative of the population.

Sampling

When it is costly to collect information on the whole population, and you want to know some characteristics of a population, you only survey a sample of individuals.

Two main challenges:

Selecting the sample
Measuring the level of uncertainty on the statistics you get

Selection and sampling

How to ensure your sample is representative ?

For you, the easiest way of surveying people could be to do so at the Cergy RER A station and ask people as they exit the station.

\(\rightarrow\) Does it sound like a good idea? Why?

Very likely that this is a biased sample, in the sense that it is not representative of the population of interest (French voters)

The best way of making a representative sample of the population of interest is to randomly select the individuals who are included in the sample

– You can do so if you have access to the exhaustive list of individuals in the population of interest: the sample frame

– Having access to the exhaustive list of observations of a population is most of the time difficult

Random sample

Definition

The random variables \(X_1, ..., X_n\) are called a random sample of size n from a population if \(X_1, ..., X_n\) are mutually independent random variables and each \(X_i\) has the same probability distribution. Alternatively, \(X_1, ..., X_n\) are called independent and identically distributed random variables. This is commonly abbreviated to i.i.d. random variables.