Properties of the Sample Average as Estimator for the Mean

Bernoulli Distribution

Topic of the module

Understand the effect of increasing the sample size \(n\) on the sampling distribution of the sample average as estimator for the mean.


Data generating process (DGP)

Consider \(n\) observations are drawn from the Bernoulli distribution,

$$ \begin{align} Y_i &\sim \text{Bernoulli}\left(p\right), \end{align} $$

where \(p\) is a parameters representing the probability of success of the Bernoulli distribution.


Estimator and parameter of interest

We are interested in the sampling properties of the sample average \(\overline{Y}\) given by,

$$ \begin{align} \overline{Y} = \frac{1}{N}\sum_{i=1}^{n} Y_{i}, \end{align} $$

as estimator for the mean \(\mu=\text{E}\left(Y_{i}\right)\) of the Bernoulli distribution given by,

$$ \begin{align} \mu=p, \end{align} $$

where \(p\) is the probability of success of the Bernoulli distribution.

Illustration

Change the parameters and see the effect on the properties of the sample average \(\overline{Y}\) as estimator for \(\mu\).

Parameters


Sample size \(n\)


Probability of
success \(p\)


Bar chart

The bar chart shows the number of ones and zeros of one realization of the DGP.

Histogram of the sample average \(\overline{Y}\)
Consistency:

As the sample size \(n\) grows the sample average \(\overline{Y}\) gets closer to \(\mu\), i.e.,

$$ \begin{align} \overline{Y} \overset{p}{\to} \mu. \end{align} $$
Histogram of the standardized sample average \(z_{\overline{Y}}\)
Asymptotic Normality:

As the sample size \(n\) grows the distribution of the standardized sample average,

$$ \begin{align} z_{\overline{Y}} &= \frac{\overline{Y} - \mu}{\sigma_{\overline{Y}}}, \\ z_{\overline{Y}} &= \frac{\overline{Y} - \mu}{\frac{\sigma}{\sqrt{N}}}, \end{align} $$

gets closer to the standard normal distribution \(N\left(0, 1\right)\).

More Details

For the construction of the standardized sample average \(z_{\overline{Y}}\), the mean \(\mu\) as well as the variance \(\sigma^{2}\) are used.

For the continuous uniform distribution the mean is given by,

$$ \begin{align} \mu=p, \end{align} $$

and the variance is given by,

$$ \begin{align} \sigma^{2}=p\left(1-p\right), \end{align} $$

where \(p\) is the probability of success of the Bernoulli distribution.

  1. A realization of the DGP specified above is simulated, i.e., a i.i.d. sequence of realizations \(Y_{1}, Y_{2}, ..., Y_{n}\) are drawn from the distribution of \(Y_{i}\) above.
  2. Based on the sequence of observations \(Y_{1}, Y_{2}, ... Y_{n}\), the sample average and standardized sample average are calculated.
  3. The values of the sample average \(\overline{Y}\) and the standardized sample average \(z_{\overline{Y}}\) are stored.
  4. Step 1 to 3 is repeated \(10,\!000\) times resulting in \(10,\!000\) sample averages and standardized sample averages.
  5. The distribution of the sample averages and standardized sample averages are illustrated using histograms.
There is no explanation yet.
The figure shows the barplot of the the number zeros and ones for one particular realization of the DGP.
Increasing the sample size increases the number of zeros and ones for one particular realization of the DGP.
Increasing the probability of success increases the number of ones for one particular realization of the DGP.
The figure shows:
The histogram of all estimated sample averages for all realizations of the DGP.
The red vertical dashed line represents the sample average for one particular realization of the DGP. The green vertical dashed line represents the mean, i.e., the probability of success, of the underlying DGP.
By increasing the sample size the sample average concentrate more around the mean, i.e., the probability of success, used to generate the data.
This is the result of law of large numbers.
Changing the probability of success, i.e., changing the mean of the underlying DGP, shifts the histogram.
The figure shows:
The histogram of all standardized sample averages for all realizations of the DGP.
The red vertical dashed line represents the standardized sample average for one particular realization of the DGP. The green vertical dashed curve represents the pdf of the standard normal distribution.
By increasing the sample size the sampling distribution of the standardized sample average gets closer to the standard normal distribution which pdf is illustrated by the green dashed curve.
This is the results of the central limit theorem.
Changing the probability of success does not effect the results of the central limit theorem.

This module is part of the DeLLFi project of the University of Hohenheim and funded by the
Foundation for Innovation in University Teaching

Creative Commons License