Linear Regression Model and the OLS Estimator
Effect of the Sample Size
Topic of the module
Understand the procedure to simulate the sampling distribution of the OLS estimator for the slope parameter of a linear regression model.
A realization of the DGP (see below) is simulated and the value for the estimator (see below), i.e. an estimate, is calculated.
This exercise is repeated \(10,\!000\) times resulting in \(10,\!000\) values for the estimators, i.e., \(10,\!000\) estimates. The distribution of these \(10,\!000\) estimates is the sampling distribution of the estimator.
Based on the simulation understand the effect of increasing the sample size \(n\) on the sampling distribution of the OLS estimator.
Data generating process (DGP)
Consider \(n\) observations are generated from the simple regression model,
$$ \begin{align} Y_i = \beta_{0} + \beta_{1} X_{i} + u_{i}, \end{align} $$where \(\beta_{0}=0\) is the intercept and \(\beta_{1}=1\) is the slope parameter.
Furthermore, \(X_{i}\) and \(u_{i}\) are i.i.d. normally distributed, i.e.,
$$ \begin{align} X_{i} \sim N\left(0, \sigma_{X}^{2}\right), \;\;\;\;\; u_{i} \sim N\left(0, \sigma_{u}^{2}\right), \end{align} $$where \(\sigma_{X}=5\) and \(\sigma_{u}=5\) is the standard deviation of \(X_{i}\) and \(u_{i}\), respectively.
Estimator and parameter of interest
We are interested in the sampling properties of the OLS estimator \(\widehat{\beta}_{1}\) given by,
$$ \begin{align} \widehat{\beta}_{1} = \frac{\sum_{i=1}^{n}\left(X_{i} - \overline{X}\right)\left(Y_{i} - \overline{Y}\right)}{\sum_{i=1}^{n}\left(X_{i} - \overline{X}\right)^{2}}, \end{align} $$as estimator for the slope parameter \(\beta_{1}\) of the regression model above.
Illustration
Change the parameters and see the effect on the properties of the OLS estimator \(\widehat{\beta}_{1}\) as estimator for \(\beta_{1}\).
Parameters
Realization DGP \(r\)
Sample size \(n\)
Scatter plot (realizations) |
The red fitted regression line is based on the regression of,
$$ \begin{align} Y_{i} \;\;\;\;\; \text{on} \;\;\;\;\; X_{i}. \end{align} $$The scatter plots and the fitted regression lines represent the result for one realization of the DGP. The shaded area illustrate the range of all fitted regression lines across all \(10,\!000\) realizations of the DGP.
Scatter plot (fitted unobserved residuals) |
The fitted OLS residuals are
$$ \begin{align} \widehat{u}_{i} = Y_{i} - \widehat{\beta}_{1} X_{i}, \end{align} $$and illustrated for one realization of the DGP where \(\widehat{\beta}_{1}\) is the respective OLS estimate.
Histogram of the OLS estimates \(\widehat{\beta}_{1}\) |
As the sample size \(n\) grows the OLS estimator \(\widehat{\beta}_{1}\) gets closer to \(\beta_1\),
$$ \begin{align} \widehat{\beta}_{1} \overset{p}{\to} \beta. \end{align} $$Note, to conduct hypothesis tests we need a sampling distribution which is stable across $n$. For this we have to standardize or scale the estimate.
Histogram of the standardized OLS estimates \(z_{\widehat{\beta}_{1}}\) |
As the sample size \(n\) grows the distribution of the standardized OLS estimator,
$$ \begin{align} z_{\widehat{\beta}_1} &= \frac{\widehat{\beta}_1 - \beta_1}{\sigma_{\widehat{\beta}_{1}}}, \end{align} $$gets closer to the standard normal distribution \(N\left(0, 1\right)\).
Note, the the sampling distribution of the standardized estimates are stable across \(n\). Thus, it can be used to conduct hypothesis tests.
More Details
For the construction of the standardized OLS estimator \(z_{\widehat{\beta}_{1}}\), the variance of \(\widehat{\beta}_{1}\), i.e., \(\sigma_{\widehat{\beta}_{1}}^{2}\), has to be estimated.
The variance of \(\widehat{\beta}_1\), i.e., \(\sigma_{\widehat{\beta}_{1}}^{2}\), can be robustly estimated by,
$$ \begin{align} \widehat{\sigma}_{\widehat{\beta}_{1}}^{2} = \frac{1}{n} \times \frac{\frac{1}{n-2}\sum_{i=1}^{n}\left(X_{i} - \overline{X}\right)^{2}\widehat{u}_{i}^{2}}{\left[\frac{1}{n}\sum_{i=1}^{n}\left(X_{i} - \overline{X}\right)^{2}\right]^{2}}, \end{align} $$where \(\widehat{u}_{i}\) are the residuals of the estimate regression line.
Note, the estimator for \(\sigma_{\widehat{\beta}_{1}}^{2}\) above is robust w.r.t. to heteroskedasticity, i.e., it does not rely on the assumption of homoskedasticity.
Instead, some statistic software report estimates \(\sigma_{\widehat{\beta}_{1}}^{2}\), based on the assumption of homoskedasticity.
The so called homoskedasticity-only estimator of \(\sigma_{\widehat{\beta}_{1}}^{2}\), is given by,
$$ \begin{align} \widetilde{\sigma}_{\widehat{\beta}_{1}}^{2} = \frac{\frac{1}{n-2}\sum_{i=1}^{n}\widehat{u}_{i}^{2}}{\sum_{i=1}^{n}\left(X_{i} - \overline{X}\right)^{2}}. \end{align} $$
|
This module is part of the DeLLFi project of the University of Hohenheim and funded by the ![]() |
|