Interactive and Animated
Illustrations
of Sampling Properties of Estimators

Content

This learning module contains interactive and animated illustrations of the sampling properties of (parametric) estimators for parametric statistical models.

Starting point is a specification of the data-generating process (DGP).

To understand the sampling properties of an estimator, realizations of data are simulated based on an underlying DGP of interest, and the value of a specific estimator is calculated.

This procedure is repeated a large number of times whereas the values of the estimator are stored for each realization of the data and the distribution of the values of the estimator is analyzed across simulations.


Motivation

Interactive and animated illustrations based on simulation studies to understand:

  • The effect of the sample size \(n\) on the sampling properties of estimators.
  • The role of the law of large numbers (LLN) and the central limit theorem (CLT) for the sampling properties of estimators.
  • The sampling properties of the sample average and the ordinary least squares (OLS) estimator.
  • The effect of heteroskedasticity on the sampling distribution of the ordinary least squares (OLS) estimator in a linear regression model.
  • The relevance of assumptions for causal inference on the sampling distribution of the ordinary least squares (OLS) estimator in a linear regression model.

Addional illustrations:


Approach

The course is divided into individual topics based on individual modules.

Each topic, i.e., each module, addresses a specific learning goal, e.g., to understand the effect of increasing the sample size \(n\) on the sampling properties of the sample average.

To reach these learning goals the modules contain interactive and animated illustrations based on simulation studies.

The learning modules can be studied in the given order or independently of each other.

Properties of the the Sample Average

Illustration of the properties of the sample average as estimator for the mean of a random variable.

Module 1) Probability of Success of a Bernoulli Experiment

Motivation: Understand the effect of increasing the sample size \(n\) on the sampling distribution of the sample average as estimator for the mean, i.e., the probability of success \(p\), of a Bernoulli experiment.

Results: By increasing the sample size \(n\) the sample average gets closer to the population mean and the sampling distribution of the standardized sample average gets closer the standard normal distribution.

Access: Link to Module

Access: Link to Module

Module 2) Mean of the Continuous Uniform Distribution

Motivation: Understand the effect of increasing the sample size \(n\) on the sampling distribution of the sample average as estimator for the mean of a random variable following a continuous uniform distribution.

Results: By increasing the sample size \(n\) the sample average gets closer to the population mean and the sampling distribution of the standardized sample average gets closer the standard normal distribution.

Access: Link to Module

Access: Link to Module

Properties of the OLS estimator in a Regression Model

Illustration of the properties of the OLS estimator to estimate the slope coefficient \(\beta_{1}\) of a linear regression model, i.e.,

$$ \begin{align} Y_{i} = \beta_{0} + \beta_{1} X_{i} + u_{i} \end{align} $$

Module 3) Effect of the sample size

Motivation: Understand the the concept of the sampling distribution of the OLS estimator and the effect of increasing the sample size \(n\) on the sampling distribution of the OLS estimator for the slope parameter of a linear regression model.

Results: By increasing the sample size \(n\) the OLS estimator for the slope parameter gets closer to the population slope parameter and the sampling distribution of the standardized OLS estimator of the slope parameter approaches the standard normal distribution.

Access: Link to Module

Access: Link to Module

Module 4) Effect of the parameterization

Motivation: Understand the effects of (1) increasing the sample size \(n\), (2) changing the value of the slope coefficient \(\beta_{1}\), (2) changing the variance of \(u_{i}\), i.e., \(\sigma_{u}^{2}\), and (4) changing the variance of \(X_{i}\), i.e., \(\sigma_{X}^{2}\) on the sampling distribution of the OLS estimator for the slope coefficient of a simple linear regression model.

Results:

(1) A higher sample size \(n\) results in a lower spread of the OLS estimates, (2) a higher variance of \(u_{i}\) results in a wider spread of the OLS estimates, (3) a higher variance of \(X_{i}\) results in a narrower spread of the OLS estimates.

Thus, to conduct hypothesis tests for the slope coeffient the slope coefficient has to be standardized w.r.t. (1) the sample size (2) the underlying (hypothetical) slope coefficient, (3) the spread of \(u_{i}\) and (4) the spread of \(X_{i}\), i.e.,

$$ \begin{align} z_{\widehat{\beta}_1} &= \frac{\widehat{\beta}_1 - \textcolor{blue}{\beta_1}}{\sigma_{\widehat{\beta}_{1}}}, \end{align} $$

with,

$$ \begin{align} \widetilde{\sigma}_{\widehat{\beta}_{1}}^{2} = \frac{\textcolor{red}{\frac{1}{n-2}}\textcolor{green}{\sum_{i=1}^{n}\widehat{u}_{i}^{2}}}{\textcolor{purple}{\sum_{i=1}^{n}\left(X_{i} - \overline{X}\right)^{2}}}, \end{align} $$

based on the so called homoskedasticity-only estimator of \(\sigma_{\widehat{\beta}_{1}}^{2}\).

Access: Link to Module

Access: Link to Module

Module 5) Effect of heteroskedasticitiy

Motivation: Understand the effect of heteroskedasticity on the sampling distribution of the OLS estimator for the slope coefficient of a simple linear regression model.

Results: To account for potential heteroskedasticity the heteroskedasticity robust estimator for the standard errors of the OLS estimator should be used.

Access: Link to Module

Access: Link to Module

Module 6) Effect of Omitted Variables

Motivation: Understand the effect of omitted variables on the sampling distribution of the OLS estimator for the slope coefficient of a simple linear regression model.

Results: If an omitted variable, e.g., \(X_{2}\), is a relevant determinant of \(Y_{i}\) and correlated with the variable of interest, e.g., \(X_{1}\), the OLS estimator is biased.

Access: Link to Module

Access: Link to Module


This module is part of the DeLLFi project of the University of Hohenheim and funded by the
Foundation for Innovation in University Teaching

Creative Commons License