Asymptotics and consistency

EC 421, Set 6

Prologue

Schedule

Last Time

Living with heteroskedasticity

Today

Asymptotics and consistency

Next

  • New problem set
  • Next topic: Time series
  • Midterm on Feb. 12

R showcase

Need speed? R allows essentially infinite parallelization.

Three popular packages:

And here’s a nice tutorial.

Consistency

Consistency

Welcome to asymptopia

Previously: We examined estimators (e.g., \(\hat{\beta}_j\)) and their properties using

  1. The mean of the estimator’s distribution: \(\mathop{\boldsymbol{E}}\left[ \hat{\beta}_j \right] = ?\)
  1. The variance of the estimator’s distribution: \(\mathop{\text{Var}} \left( \hat{\beta}_j \right) = ?\)

which tell us about the tendency of the estimator if we took ∞ samples, each with sample size \(\textcolor{#e64173}{n}\).

This approach misses something.

Consistency

Welcome to asymptopia

New question:
How does our estimator behave as our sample gets larger (as \(n\rightarrow\infty\))?

This new question forms a new way to think about the properties of estimators: asymptotic properties (or large-sample properties).

A “good” estimator will become indistinguishable from the parameter it estimates when \(n\) is very large (close to \(\infty\)).

Consistency

Probability limits

Just as the expected value helped us characterize the finite-sample distribution of an estimator with sample size \(n\),

the probability limit helps us analyze the asymptotic distribution of an estimator (the distribution of the estimator as \(n\) gets “big”).

1

Consistency

Probability limits

Let \(B_n\) be our estimator with sample size \(n\).

Then the probability limit of \(B\) is \(\alpha\) if

\[ \lim_{n\rightarrow\infty} \mathop{P}\left( \middle| B_n - \alpha \middle| > \epsilon \right) = 0 \tag{1} \]

for any \(\varepsilon > 0\).

The definition in \((1)\) essentially says that as the sample size approaches infinity, the probability that \(B_n\) differs from \(\alpha\) by more than a very small number \((\epsilon)\) is zero.

Practically: \(B\)’s distribution collapses to a spike at \(\alpha\) as \(n\) approaches \(\infty\).

Consistency

Probability limits

Equivalent statements:

  • The probability limit of \(B_n\) is \(\alpha\).

  • \(\text{plim}\: B = \alpha\)

  • \(B\) converges in probability to \(\alpha\).

Consistency

Probability limits

Probability limits have some nice/important properties:

  • \(\mathop{\text{plim}}\left( X \times Y \right) = \mathop{\text{plim}}\left( X \right) \times \mathop{\text{plim}}\left( Y \right)\)

  • \(\mathop{\text{plim}}\left( X + Y \right) = \mathop{\text{plim}}\left( X \right) + \mathop{\text{plim}}\left( Y \right)\)

  • \(\mathop{\text{plim}}\left( c \right) = c\), where \(c\) is a constant

  • \(\mathop{\text{plim}}\left( \dfrac{X}{Y} \right) = \dfrac{\mathop{\text{plim}}\left( X \right)}{ \mathop{\text{plim}}\left( Y \right)}\)

  • \(\mathop{\text{plim}}\!\big( f(X) \big) = \mathop{f}\!\big(\mathop{\text{plim}}\left( X \right)\big)\)

Consistency

Consistent estimators

We say that an estimator is consistent if

  1. The estimator has a prob. limit (its distribution collapses to a spike).

  2. This spike is located at the parameter the estimator predicts.

In other words…

An estimator is consistent if its asymptotic distribution collapses to a spike located at the estimated parameter.

In math: The estimator \(B\) is consistent for \(\alpha\) if \(\mathop{\text{plim}} B = \alpha\).

The estimator is inconsistent if \(\mathop{\text{plim}} B \neq \alpha\).

Consistency

Consistent estimators

Example: We want to estimate the population mean \(\mu_x\) (where \(X\)∼Normal).

Let’s compare the asymptotic distributions of two competing estimators:

  1. The first observation: \(X_{1}\)
  2. The sample mean: \(\overline{X} = \dfrac{1}{n} \sum_{i=1}^n x_i\)
  3. Some other estimator: \(\widetilde{X} = \dfrac{1}{n+1} \sum_{i=1}^n x_i\)

Note that (1) and (2) are unbiased, but (3) is biased.

Consistency

Consistent estimators

To see which are unbiased/biased:

\(\mathop{\boldsymbol{E}}\left[ X_1 \right] = \mu_x\)

\(\mathop{\boldsymbol{E}}\left[ \overline{X} \right]\)

\(= \mathop{\boldsymbol{E}}\left[ \dfrac{1}{n} \sum_{i=1}^n x_i \right]\)

\(= \dfrac{1}{n} \sum_{i=1}^n \mathop{\boldsymbol{E}}\left[ x_i \right]\)

\(= \dfrac{1}{n} \sum_{i=1}^n \mu_x\)

\(= \mu_x\)

\(\mathop{\boldsymbol{E}}\left[ \widetilde{X} \right]\)

\(= \mathop{\boldsymbol{E}}\left[ \dfrac{1}{n+1} \sum_{i=1}^n x_i \right]\)

\(= \dfrac{1}{n+1} \sum_{i=1}^n \mathop{\boldsymbol{E}}\left[ x_i \right]\)

\(= \dfrac{n}{n+1}\mu_x\)

Consistency

Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)


\(n = 2\)

Consistency

Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)


\(n = 5\)

Consistency

Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)


\(n = 10\)

Consistency

Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)


\(n = 30\)

Consistency

Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)


\(n = 50\)

Consistency

Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)


\(n = 100\)

Consistency

Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)


\(n = 500\)

Consistency

Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)


\(n = 1000\)

Consistency

The distributions of \(\textcolor{#314f4f}{\widetilde{X}}\)
For \(n\) in \(\{\textcolor{#FCCE25}{2},\, \textcolor{#F89441}{5},\, \textcolor{#E16462}{10},\, \textcolor{#BF3984}{50},\, \textcolor{#900DA4}{100},\, \textcolor{#5601A4}{500}, \textcolor{#0D0887}{1000}\}\)

Consistency

The takeaway?

  • An estimator can be unbiased without being consistent

(e.g., \(\textcolor{#FFA500}{X_1}\)).

  • An estimator can be unbiased and consistent

(e.g., \(\textcolor{#e64173}{\overline{X}}\)).

  • An estimator can be biased but consistent

(e.g., \(\textcolor{#314f4f}{\widetilde{X}}\)).

  • An estimator can be biased and inconsistent

(e.g., \(\overline{X} - 50\)).

Best-case scenario: The estimator is unbiased and consistent.

Consistency

Why consistency (asymptotics)?

  1. We cannot always find an unbiased estimator. In these situations, we generally (at least) want consistency.

  2. Expected values can be hard/undefined. Probability limits are less constrained, e.g., \[ \mathop{\boldsymbol{E}}\left[ g(X)h(Y) \right] \text{ vs. } \mathop{\text{plim}}\left( g(X)h(Y) \right) \]

  3. Asymptotics help us move away from assuming the distribution of \(u_i\).


Caution: As we saw, consistent estimators can be biased in small samples.

OLS in asymptopia

OLS in asymptopia

OLS has two very nice asymptotic properties:

  1. Consistency

  2. Asymptotic Normality

Let’s prove #1 for OLS with simple, linear regression, i.e.,

\[ y_i = \beta_0 + \beta_1 x_i + u_i \]

OLS in asymptopia

Proof of consistency

First, recall our previous derivation of of \(\hat{\beta}_1\),

\[ \hat{\beta}_1 = \beta_1 + \dfrac{\sum_i \left( x_i - \overline{x} \right) u_i}{\sum_i \left( x_i - \overline{x} \right)^2} \]

Now divide the numerator and denominator by \(1/n\)

\[ \hat{\beta}_1 = \beta_1 + \dfrac{\frac{1}{n} \sum_i \left( x_i - \overline{x} \right) u_i}{\frac{1}{n}\sum_i \left( x_i - \overline{x} \right)^2} \]

OLS in asymptopia

Proof of consistency

We actually want to know the probability limit of \(\hat{\beta}_1\), so

\[ \mathop{\text{plim}} \hat{\beta}_1 = \mathop{\text{plim}}\left(\beta_1 + \dfrac{\frac{1}{n} \sum_i \left( x_i - \overline{x} \right) u_i}{\frac{1}{n}\sum_i \left( x_i - \overline{x} \right)^2} \right) \]

which, by the properties of probability limits, gives us

\[ = \beta_1 + \dfrac{\mathop{\text{plim}}\left(\frac{1}{n} \sum_i \left( x_i - \overline{x} \right) u_i \right)}{\mathop{\text{plim}}\left(\frac{1}{n}\sum_i \left( x_i - \overline{x} \right)^2 \right)} \]

The numerator and denominator are, in fact, population quantities

\[ = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\, u \right)}{\mathop{\text{Var}} \left( x \right)} \]

OLS in asymptopia

Proof of consistency

So we have

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\, u \right)}{\mathop{\text{Var}} \left( x \right)} \]

By our assumption of exogeneity (plus the law of total expectation)

\[ \mathop{\text{Cov}} \left( x,\,u \right) = 0 \]

Combining these two equations yields

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{0}{\mathop{\text{Var}} \left( x \right)} = \beta_1 \quad\text{🤓} \]

so long as \(\mathop{\text{Var}} \left( x \right) \neq 0\) (which we’ve assumed).

OLS in asymptopia

Asymptotic normality

Up to this point, we made a very specific assumption about the distribution of \(u_i\)—the \(u_i\) came from a normal distribution.

We can relax this assumption—allowing the \(u_i\) to come from any distribution (still assume exogeneity, independence, and homoskedasticity).

We will focus on the asymptotic distribution of our estimators (how they are distributed as \(n\) gets large), rather than their finite-sample distribution.

As \(n\) approaches \(\infty\), the distribution of the OLS estimator converges to a normal distribution.

OLS in asymptopia

Recap

With a more limited set of assumptions, OLS is consistent and is asymptotically normally distributed.

Current assumptions

  1. Our data were randomly sampled from the population.
  2. \(y_i\) is a linear function of its parameters and disturbance.
  3. There is no perfect collinearity in our data.
  4. The \(u_i\) have conditional mean of zero (exogeneity), \(\mathop{\boldsymbol{E}}\left[ u_i \middle| X_i \right] = 0\).
  5. The \(u_i\) are homoskedastic with zero correlation between \(u_i\) and \(u_j\).

Omitted-variable bias, redux

Omitted-variable bias, redux

Inconsistency?

Imagine we have a population whose true model is

\[ \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i \tag{2} \end{align} \]

Recall1: Omitted-variable bias occurs when we omit a variable in our linear regression model (e.g., leavining out \(x_2\)) such that

  1. \(x_{2}\) affects \(y\), i.e., \(\beta_2 \neq 0\).
  1. Correlates with an included explanatory variable, i.e., \(\mathop{\text{Cov}} \left( x_1,\, x_2 \right) \neq 0\).

Omitted-variable bias, redux

Inconsistency?

Imagine we have a population whose true model is

\[ \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i \tag{2} \end{align} \]

Recall2: We defined the bias of an estimator \(W\) for parameter \(\theta\)

\[ \mathop{\text{Bias}}_\theta \left( W \right) = \mathop{\boldsymbol{E}}\left[ W \right] - \theta \]

Omitted-variable bias, redux

Inconsistency?

Imagine we have a population whose true model is

\[ \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i \tag{2} \end{align} \]

We know that omitted-variable bias causes biased estimates.

Question: Do omitted variables also cause inconsistent estimates?

Answer: Find \(\mathop{\text{plim}} \hat{\beta}_1\) in a regression that omits \(x_2\).

Omitted-variable bias, redux

Inconsistency?

Imagine we have a population whose true model is

\[ \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i \tag{2} \end{align} \]

but we instead specify the model as

\[ \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + w_i \tag{3} \end{align} \]

where \(w_i = \beta_2 x_{2i} + u_i\).

We estimate \((3)\) via OLS

\[ \begin{align} y_i = \hat{\beta}_0 + \hat{\beta}_1 x_{1i} + \hat{w}_i \tag{4} \end{align} \]

Our question: Is \(\hat{\beta}_1\) consistent for \(\beta_1\) when we omit \(x_2\)?

\[ \mathop{\text{plim}}\left( \hat{\beta}_1 \right) \overset{?}{=} \beta_1 \]

Omitted-variable bias, redux

Inconsistency?

Truth: \(y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i\)

Specified: \(y_i = \beta_0 + \beta_1 x_{1i} + w_i\)

We already showed \(\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, w \right)}{\mathop{\text{Var}} \left( x_1 \right)}\)

where \(w\) is the disturbance.

Here, we know \(w = \beta_2 x_2 + u\). Thus,

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, \beta_2 x_2 + u \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

Now, we make use of \(\mathop{\text{Cov}} \left( X,\, Y + Z \right) = \mathop{\text{Cov}} \left( X,\, Y \right) + \mathop{\text{Cov}} \left( X,\, Z \right)\)

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, \beta_2 x_2 \right) + \mathop{\text{Cov}} \left( x_1,\, u \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

Omitted-variable bias, redux

Inconsistency?

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, \beta_2 x_2 \right) + \mathop{\text{Cov}} \left( x_1,\, u \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

Now we use the fact that \(\mathop{\text{Cov}} \left( X,\, cY \right) = c\mathop{\text{Cov}} \left( X,\,Y \right)\) for a constant \(c\).

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\beta_2 \mathop{\text{Cov}} \left( x_1,\, x_2 \right) + \mathop{\text{Cov}} \left( x_1,\, u \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

As before, our exogeneity (conditional mean zero) assumption implies \(\mathop{\text{Cov}} \left( x_1,\, u \right) = 0\), which gives us

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\beta_2 \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

Omitted-variable bias, redux

Inconsistency?

Thus, we find that

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

In other words, an omitted variable will cause OLS to be inconsistent if both of the following statements are true:

  1. The omitted variable affects our outcome, i.e., \(\beta_2 \neq 0\).

  2. The omitted variable correlates with included explanatory variables, i.e., \(\mathop{\text{Cov}} \left( x_1,\,x_2 \right) \neq 0\).

If both of these statements are true, then the OLS estimate \(\hat{\beta}_1\) will not converge to \(\beta_1\), even as \(n\) approaches \(\infty\).

Omitted-variable bias, redux

Signing the bias

Sometimes we’re stuck with omitted variable bias.

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

1

When this happens, we can often at least know the direction of the inconsistency.

Omitted-variable bias, redux

Signing the bias

Begin with

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

We know \(\textcolor{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}\). Suppose \(\textcolor{#e64173}{\beta_2 > 0}\) and \(\textcolor{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) > 0}\). Then

\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \textcolor{#e64173}{(+)} \dfrac{\textcolor{#FFA500}{(+)}}{\textcolor{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 > \beta_1 \end{align} \] ∴ In this case, OLS is biased upward (estimates are too large).

\[ \begin{matrix} \enspace & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \textcolor{#e64173}{\beta_2 > 0} & \text{Upward} & \\ \textcolor{#e64173}{\beta_2 < 0} & & \end{matrix} \]

Omitted-variable bias, redux

Signing the bias

Begin with

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

We know \(\textcolor{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}\). Suppose \(\textcolor{#e64173}{\beta_2 < 0}\) and \(\textcolor{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) > 0}\). Then

\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \textcolor{#e64173}{(-)} \dfrac{\textcolor{#FFA500}{(+)}}{\textcolor{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 < \beta_1 \end{align} \] ∴ In this case, OLS is biased downward (estimates are too small).

\[ \begin{matrix} \enspace & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \textcolor{#e64173}{\beta_2 > 0} & \text{Upward} & \\ \textcolor{#e64173}{\beta_2 < 0} & \text{Downward} & \end{matrix} \]

Omitted-variable bias, redux

Signing the bias

Begin with

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

We know \(\textcolor{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}\). Suppose \(\textcolor{#e64173}{\beta_2 > 0}\) and \(\textcolor{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) < 0}\). Then

\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \textcolor{#e64173}{(+)} \dfrac{\textcolor{#FFA500}{(-)}}{\textcolor{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 < \beta_1 \end{align} \] ∴ In this case, OLS is biased downward (estimates are too small).

\[ \begin{matrix} \enspace & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \textcolor{#e64173}{\beta_2 > 0} & \text{Upward} & \text{Downward} \\ \textcolor{#e64173}{\beta_2 < 0} & \text{Downward} & \end{matrix} \]

Omitted-variable bias, redux

Signing the bias

Begin with

\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]

We know \(\textcolor{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}\). Suppose \(\textcolor{#e64173}{\beta_2 < 0}\) and \(\textcolor{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) < 0}\). Then

\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \textcolor{#e64173}{(-)} \dfrac{\textcolor{#FFA500}{(-)}}{\textcolor{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 > \beta_1 \end{align} \] ∴ In this case, OLS is biased upward (estimates are too large).

\[ \begin{matrix} \enspace & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \textcolor{#e64173}{\beta_2 > 0} & \text{Upward} & \text{Downward} \\ \textcolor{#e64173}{\beta_2 < 0} & \text{Downward} & \text{Upward} \end{matrix} \]

Omitted-variable bias, redux

Signing the bias

Thus, in cases where we have a sense of

  1. the sign of \(\mathop{\text{Cov}} \left( x_1,\,x_2 \right)\)

  2. the sign of \(\beta_2\)

we know in which direction inconsistency pushes our estimates.

Direction of bias

\[ \begin{matrix} \enspace & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \textcolor{#e64173}{\beta_2 > 0} & \text{Upward} & \text{Downward} \\ \textcolor{#e64173}{\beta_2 < 0} & \text{Downward} & \text{Upward} \end{matrix} \]

Measurement error

Measurement error in our explanatory variables presents another case in which OLS is inconsistent.

Consider the population model: \(y_i = \beta_0 + \beta_1 z_i + u_i\)

  • We want to observe \(z_i\) but cannot.

  • Instead, we measure the variable \(x_i\), which is \(z_i\) plus some error (noise): \[ x_i = z_i + \omega_i \]

  • Assume \(\mathop{\boldsymbol{E}}\left[ \omega_i \right] = 0\), \(\mathop{\text{Var}} \left( \omega_i \right) = \sigma^2_\omega\), and \(\omega\) is independent of \(z\) and \(u\).


OLS regression of \(y\) and \(x\) will produce inconsistent estimates for \(\beta_1\).

Measurement error

Proof

\(y_i = \beta_0 + \beta_1 z_i + u_i\)


\(\quad= \beta_0 + \beta_1 \left( x_i - \omega_i \right) + u_i\)


\(\quad= \beta_0 + \beta_1 x_i + \left( u_i - \beta_1 \omega_i \right)\)


\(\quad= \beta_0 + \beta_1 x_i + \varepsilon_i\)

where \(\varepsilon_i = u_i - \beta_1 \omega_i\)

What happens when we estimate \(y_i = \hat{\beta}_0 + \hat{\beta}_1 x_i + e_i\)?

\(\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\,\varepsilon \right)}{\mathop{\text{Var}} \left( x \right)}\)

We will derive the numerator and denominator separately…

Measurement error

Proof

The covariance of our noisy variable \(x\) and the disturbance \(\varepsilon\).

\(\mathop{\text{Cov}} \left( x,\, \varepsilon \right)\)

\(= \mathop{\text{Cov}} \left( \left[ z + \omega \right],\, \left[ u - \beta_1 \omega \right] \right)\)


\(\quad\quad\quad\quad\enspace= \mathop{\text{Cov}} \left( z,\,u \right) -\beta_1 \mathop{\text{Cov}} \left( z,\,\omega \right) + \mathop{\text{Cov}} \left( \omega,\, u \right) - \beta_1 \mathop{\text{Var}} \left( \omega \right)\)


\(\quad\quad\quad\quad\enspace= 0 + 0 + 0 - \beta_1 \sigma_\omega^2\)


\(\quad\quad\quad\quad\enspace= - \beta_1 \sigma_\omega^2\)

Measurement error

Proof

Now for the denominator, \(\mathop{\text{Var}} \left( x \right)\).

\(\mathop{\text{Var}} \left( x \right)\)

\(= \mathop{\text{Var}} \left( z + \omega \right)\)


\(\quad\quad\quad= \mathop{\text{Var}} \left( z \right) + \mathop{\text{Var}} \left( \omega \right) + 2\mathop{\text{Cov}} \left( z,\,\omega \right)\)


\(\quad\quad\quad= \sigma_z^2 + \sigma_\omega^2\)

Measurement error

Proof

Putting the numerator and denominator back together,

\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 &= \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\,\varepsilon \right)}{\mathop{\text{Var}} \left( x \right)} \\ &= \beta_1 + \dfrac{-\beta_1 \sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} \\ &= \beta_1 - \beta_1 \dfrac{\sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} \\ &= \beta_1 \dfrac{\sigma_z^2 + \sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} - \beta_1 \dfrac{\sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} \\ &= \beta_1 \dfrac{\sigma_z^2}{\sigma_z^2 + \sigma_\omega^2} \end{align} \]

Measurement error

Summary

\(\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 \dfrac{\sigma_z^2}{\sigma_z^2 + \sigma_\omega^2}\).

What does this equation tell us?

Measurement error in our explanatory variables biases the coefficient estimates toward zero.

  • This type of bias/inconsistency is often called attenuation bias.

  • If the measurement error correlates with the explanatory variables, we have bigger problems with inconsistency/bias.

Measurement error

Summary

What about measurement in the outcome variable?

It doesn’t really matter—it just increases our standard errors.

Measurement error

It’s everywhere

General cases

  1. We cannot perfectly observe a variable.
  2. We use one variable as a proxy for another.

Specific examples

  • GDP
  • Population
  • Crime/police statistics
  • Air quality
  • Health data
  • Proxy ability with test scores