EC 421, Set 6
Prologue
Living with heteroskedasticity
Asymptotics and consistency
Need speed? R allows essentially infinite parallelization.
Three popular packages:
And here’s a nice tutorial.
Consistency
Previously: We examined estimators (e.g., \(\hat{\beta}_j\)) and their properties using
which tell us about the tendency of the estimator if we took ∞ samples, each with sample size \(\textcolor{#e64173}{n}\).
This approach misses something.
New question:
How does our estimator behave as our sample gets larger (as \(n\rightarrow\infty\))?
This new question forms a new way to think about the properties of estimators: asymptotic properties (or large-sample properties).
A “good” estimator will become indistinguishable from the parameter it estimates when \(n\) is very large (close to \(\infty\)).
Just as the expected value helped us characterize the finite-sample distribution of an estimator with sample size \(n\),
the probability limit helps us analyze the asymptotic distribution of an estimator (the distribution of the estimator as \(n\) gets “big”†).
1
Let \(B_n\) be our estimator with sample size \(n\).
Then the probability limit of \(B\) is \(\alpha\) if
\[ \lim_{n\rightarrow\infty} \mathop{P}\left( \middle| B_n - \alpha \middle| > \epsilon \right) = 0 \tag{1} \]
for any \(\varepsilon > 0\).
The definition in \((1)\) essentially says that as the sample size approaches infinity, the probability that \(B_n\) differs from \(\alpha\) by more than a very small number \((\epsilon)\) is zero.
Practically: \(B\)’s distribution collapses to a spike at \(\alpha\) as \(n\) approaches \(\infty\).
Equivalent statements:
The probability limit of \(B_n\) is \(\alpha\).
\(\text{plim}\: B = \alpha\)
\(B\) converges in probability to \(\alpha\).
Probability limits have some nice/important properties:
\(\mathop{\text{plim}}\left( X \times Y \right) = \mathop{\text{plim}}\left( X \right) \times \mathop{\text{plim}}\left( Y \right)\)
\(\mathop{\text{plim}}\left( X + Y \right) = \mathop{\text{plim}}\left( X \right) + \mathop{\text{plim}}\left( Y \right)\)
\(\mathop{\text{plim}}\left( c \right) = c\), where \(c\) is a constant
\(\mathop{\text{plim}}\left( \dfrac{X}{Y} \right) = \dfrac{\mathop{\text{plim}}\left( X \right)}{ \mathop{\text{plim}}\left( Y \right)}\)
\(\mathop{\text{plim}}\!\big( f(X) \big) = \mathop{f}\!\big(\mathop{\text{plim}}\left( X \right)\big)\)
We say that an estimator is consistent if
The estimator has a prob. limit (its distribution collapses to a spike).
This spike is located at the parameter the estimator predicts.
In other words…
An estimator is consistent if its asymptotic distribution collapses to a spike located at the estimated parameter.
In math: The estimator \(B\) is consistent for \(\alpha\) if \(\mathop{\text{plim}} B = \alpha\).
The estimator is inconsistent if \(\mathop{\text{plim}} B \neq \alpha\).
Example: We want to estimate the population mean \(\mu_x\) (where \(X\)∼Normal).
Let’s compare the asymptotic distributions of two competing estimators:
Note that (1) and (2) are unbiased, but (3) is biased.
To see which are unbiased/biased:
\(\mathop{\boldsymbol{E}}\left[ X_1 \right] = \mu_x\)
\(\mathop{\boldsymbol{E}}\left[ \overline{X} \right]\)
\(= \mathop{\boldsymbol{E}}\left[ \dfrac{1}{n} \sum_{i=1}^n x_i \right]\)
\(= \dfrac{1}{n} \sum_{i=1}^n \mathop{\boldsymbol{E}}\left[ x_i \right]\)
\(= \dfrac{1}{n} \sum_{i=1}^n \mu_x\)
\(= \mu_x\)
\(\mathop{\boldsymbol{E}}\left[ \widetilde{X} \right]\)
\(= \mathop{\boldsymbol{E}}\left[ \dfrac{1}{n+1} \sum_{i=1}^n x_i \right]\)
\(= \dfrac{1}{n+1} \sum_{i=1}^n \mathop{\boldsymbol{E}}\left[ x_i \right]\)
\(= \dfrac{n}{n+1}\mu_x\)
Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)
\(n = 2\)
Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)
\(n = 5\)
Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)
\(n = 10\)
Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)
\(n = 30\)
Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)
\(n = 50\)
Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)
\(n = 100\)
Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)
\(n = 500\)
Distributions of \(\textcolor{#FFA500}{X_1}\), \(\textcolor{#e64173}{\overline{X}}\), and \(\textcolor{#314f4f}{\widetilde{X}}\)
\(n = 1000\)
The distributions of \(\textcolor{#314f4f}{\widetilde{X}}\)
For \(n\) in \(\{\textcolor{#FCCE25}{2},\, \textcolor{#F89441}{5},\, \textcolor{#E16462}{10},\, \textcolor{#BF3984}{50},\, \textcolor{#900DA4}{100},\, \textcolor{#5601A4}{500}, \textcolor{#0D0887}{1000}\}\)
(e.g., \(\textcolor{#FFA500}{X_1}\)).
(e.g., \(\textcolor{#e64173}{\overline{X}}\)).
(e.g., \(\textcolor{#314f4f}{\widetilde{X}}\)).
(e.g., \(\overline{X} - 50\)).
Best-case scenario: The estimator is unbiased and consistent.
We cannot always find an unbiased estimator. In these situations, we generally (at least) want consistency.
Expected values can be hard/undefined. Probability limits are less constrained, e.g., \[ \mathop{\boldsymbol{E}}\left[ g(X)h(Y) \right] \text{ vs. } \mathop{\text{plim}}\left( g(X)h(Y) \right) \]
Asymptotics help us move away from assuming the distribution of \(u_i\).
Caution: As we saw, consistent estimators can be biased in small samples.
OLS in asymptopia
OLS has two very nice asymptotic properties:
Consistency
Asymptotic Normality
Let’s prove #1 for OLS with simple, linear regression, i.e.,
\[ y_i = \beta_0 + \beta_1 x_i + u_i \]
First, recall our previous derivation of of \(\hat{\beta}_1\),
\[ \hat{\beta}_1 = \beta_1 + \dfrac{\sum_i \left( x_i - \overline{x} \right) u_i}{\sum_i \left( x_i - \overline{x} \right)^2} \]
Now divide the numerator and denominator by \(1/n\)
\[ \hat{\beta}_1 = \beta_1 + \dfrac{\frac{1}{n} \sum_i \left( x_i - \overline{x} \right) u_i}{\frac{1}{n}\sum_i \left( x_i - \overline{x} \right)^2} \]
We actually want to know the probability limit of \(\hat{\beta}_1\), so
\[ \mathop{\text{plim}} \hat{\beta}_1 = \mathop{\text{plim}}\left(\beta_1 + \dfrac{\frac{1}{n} \sum_i \left( x_i - \overline{x} \right) u_i}{\frac{1}{n}\sum_i \left( x_i - \overline{x} \right)^2} \right) \]
which, by the properties of probability limits, gives us
\[ = \beta_1 + \dfrac{\mathop{\text{plim}}\left(\frac{1}{n} \sum_i \left( x_i - \overline{x} \right) u_i \right)}{\mathop{\text{plim}}\left(\frac{1}{n}\sum_i \left( x_i - \overline{x} \right)^2 \right)} \]
The numerator and denominator are, in fact, population quantities
\[ = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\, u \right)}{\mathop{\text{Var}} \left( x \right)} \]
So we have
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\, u \right)}{\mathop{\text{Var}} \left( x \right)} \]
By our assumption of exogeneity (plus the law of total expectation)
\[ \mathop{\text{Cov}} \left( x,\,u \right) = 0 \]
Combining these two equations yields
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{0}{\mathop{\text{Var}} \left( x \right)} = \beta_1 \quad\text{🤓} \]
so long as \(\mathop{\text{Var}} \left( x \right) \neq 0\) (which we’ve assumed).
Up to this point, we made a very specific assumption about the distribution of \(u_i\)—the \(u_i\) came from a normal distribution.
We can relax this assumption—allowing the \(u_i\) to come from any distribution (still assume exogeneity, independence, and homoskedasticity).
We will focus on the asymptotic distribution of our estimators (how they are distributed as \(n\) gets large), rather than their finite-sample distribution.
As \(n\) approaches \(\infty\), the distribution of the OLS estimator converges to a normal distribution.
With a more limited set of assumptions, OLS is consistent and is asymptotically normally distributed.
Omitted-variable bias, redux
Imagine we have a population whose true model is
\[ \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i \tag{2} \end{align} \]
Recall1: Omitted-variable bias occurs when we omit a variable in our linear regression model (e.g., leavining out \(x_2\)) such that
Imagine we have a population whose true model is
\[ \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i \tag{2} \end{align} \]
Recall2: We defined the bias of an estimator \(W\) for parameter \(\theta\)
\[ \mathop{\text{Bias}}_\theta \left( W \right) = \mathop{\boldsymbol{E}}\left[ W \right] - \theta \]
Imagine we have a population whose true model is
\[ \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i \tag{2} \end{align} \]
We know that omitted-variable bias causes biased estimates.
Question: Do omitted variables also cause inconsistent estimates?
Answer: Find \(\mathop{\text{plim}} \hat{\beta}_1\) in a regression that omits \(x_2\).
Imagine we have a population whose true model is
\[ \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i \tag{2} \end{align} \]
but we instead specify the model as
\[ \begin{align} y_i = \beta_0 + \beta_1 x_{1i} + w_i \tag{3} \end{align} \]
where \(w_i = \beta_2 x_{2i} + u_i\).
We estimate \((3)\) via OLS
\[ \begin{align} y_i = \hat{\beta}_0 + \hat{\beta}_1 x_{1i} + \hat{w}_i \tag{4} \end{align} \]
Our question: Is \(\hat{\beta}_1\) consistent for \(\beta_1\) when we omit \(x_2\)?
\[ \mathop{\text{plim}}\left( \hat{\beta}_1 \right) \overset{?}{=} \beta_1 \]
Truth: \(y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + u_i\)
Specified: \(y_i = \beta_0 + \beta_1 x_{1i} + w_i\)
We already showed \(\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, w \right)}{\mathop{\text{Var}} \left( x_1 \right)}\)
where \(w\) is the disturbance.
Here, we know \(w = \beta_2 x_2 + u\). Thus,
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, \beta_2 x_2 + u \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
Now, we make use of \(\mathop{\text{Cov}} \left( X,\, Y + Z \right) = \mathop{\text{Cov}} \left( X,\, Y \right) + \mathop{\text{Cov}} \left( X,\, Z \right)\)
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, \beta_2 x_2 \right) + \mathop{\text{Cov}} \left( x_1,\, u \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x_1,\, \beta_2 x_2 \right) + \mathop{\text{Cov}} \left( x_1,\, u \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
Now we use the fact that \(\mathop{\text{Cov}} \left( X,\, cY \right) = c\mathop{\text{Cov}} \left( X,\,Y \right)\) for a constant \(c\).
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\beta_2 \mathop{\text{Cov}} \left( x_1,\, x_2 \right) + \mathop{\text{Cov}} \left( x_1,\, u \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
As before, our exogeneity (conditional mean zero) assumption implies \(\mathop{\text{Cov}} \left( x_1,\, u \right) = 0\), which gives us
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\beta_2 \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
Thus, we find that
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
In other words, an omitted variable will cause OLS to be inconsistent if both of the following statements are true:
The omitted variable affects our outcome, i.e., \(\beta_2 \neq 0\).
The omitted variable correlates with included explanatory variables, i.e., \(\mathop{\text{Cov}} \left( x_1,\,x_2 \right) \neq 0\).
If both of these statements are true, then the OLS estimate \(\hat{\beta}_1\) will not converge to \(\beta_1\), even as \(n\) approaches \(\infty\).
Sometimes we’re stuck with omitted variable bias.†
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
1
When this happens, we can often at least know the direction of the inconsistency.
Begin with
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
We know \(\textcolor{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}\). Suppose \(\textcolor{#e64173}{\beta_2 > 0}\) and \(\textcolor{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) > 0}\). Then
\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \textcolor{#e64173}{(+)} \dfrac{\textcolor{#FFA500}{(+)}}{\textcolor{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 > \beta_1 \end{align} \] ∴ In this case, OLS is biased upward (estimates are too large).
\[ \begin{matrix} \enspace & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \textcolor{#e64173}{\beta_2 > 0} & \text{Upward} & \\ \textcolor{#e64173}{\beta_2 < 0} & & \end{matrix} \]
Begin with
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
We know \(\textcolor{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}\). Suppose \(\textcolor{#e64173}{\beta_2 < 0}\) and \(\textcolor{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) > 0}\). Then
\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \textcolor{#e64173}{(-)} \dfrac{\textcolor{#FFA500}{(+)}}{\textcolor{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 < \beta_1 \end{align} \] ∴ In this case, OLS is biased downward (estimates are too small).
\[ \begin{matrix} \enspace & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \textcolor{#e64173}{\beta_2 > 0} & \text{Upward} & \\ \textcolor{#e64173}{\beta_2 < 0} & \text{Downward} & \end{matrix} \]
Begin with
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
We know \(\textcolor{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}\). Suppose \(\textcolor{#e64173}{\beta_2 > 0}\) and \(\textcolor{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) < 0}\). Then
\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \textcolor{#e64173}{(+)} \dfrac{\textcolor{#FFA500}{(-)}}{\textcolor{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 < \beta_1 \end{align} \] ∴ In this case, OLS is biased downward (estimates are too small).
\[ \begin{matrix} \enspace & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \textcolor{#e64173}{\beta_2 > 0} & \text{Upward} & \text{Downward} \\ \textcolor{#e64173}{\beta_2 < 0} & \text{Downward} & \end{matrix} \]
Begin with
\[ \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \beta_2 \dfrac{ \mathop{\text{Cov}} \left( x_1,\, x_2 \right)}{\mathop{\text{Var}} \left( x_1 \right)} \]
We know \(\textcolor{#20B2AA}{\mathop{\text{Var}} \left( x_1 \right) > 0}\). Suppose \(\textcolor{#e64173}{\beta_2 < 0}\) and \(\textcolor{#FFA500}{\mathop{\text{Cov}} \left( x_1,\,x_2 \right) < 0}\). Then
\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \textcolor{#e64173}{(-)} \dfrac{\textcolor{#FFA500}{(-)}}{\textcolor{#20B2AA}{(+)}} \implies \mathop{\text{plim}} \hat{\beta}_1 > \beta_1 \end{align} \] ∴ In this case, OLS is biased upward (estimates are too large).
\[ \begin{matrix} \enspace & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \textcolor{#e64173}{\beta_2 > 0} & \text{Upward} & \text{Downward} \\ \textcolor{#e64173}{\beta_2 < 0} & \text{Downward} & \text{Upward} \end{matrix} \]
Thus, in cases where we have a sense of
the sign of \(\mathop{\text{Cov}} \left( x_1,\,x_2 \right)\)
the sign of \(\beta_2\)
we know in which direction inconsistency pushes our estimates.
Direction of bias
\[ \begin{matrix} \enspace & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)> 0} & \textcolor{#FFA500}{\text{Cov}(x_1,\,x_2)< 0} \\ \textcolor{#e64173}{\beta_2 > 0} & \text{Upward} & \text{Downward} \\ \textcolor{#e64173}{\beta_2 < 0} & \text{Downward} & \text{Upward} \end{matrix} \]
Measurement error in our explanatory variables presents another case in which OLS is inconsistent.
Consider the population model: \(y_i = \beta_0 + \beta_1 z_i + u_i\)
We want to observe \(z_i\) but cannot.
Instead, we measure the variable \(x_i\), which is \(z_i\) plus some error (noise): \[ x_i = z_i + \omega_i \]
Assume \(\mathop{\boldsymbol{E}}\left[ \omega_i \right] = 0\), \(\mathop{\text{Var}} \left( \omega_i \right) = \sigma^2_\omega\), and \(\omega\) is independent of \(z\) and \(u\).
OLS regression of \(y\) and \(x\) will produce inconsistent estimates for \(\beta_1\).
\(y_i = \beta_0 + \beta_1 z_i + u_i\)
\(\quad= \beta_0 + \beta_1 \left( x_i - \omega_i \right) + u_i\)
\(\quad= \beta_0 + \beta_1 x_i + \left( u_i - \beta_1 \omega_i \right)\)
\(\quad= \beta_0 + \beta_1 x_i + \varepsilon_i\)
where \(\varepsilon_i = u_i - \beta_1 \omega_i\)
What happens when we estimate \(y_i = \hat{\beta}_0 + \hat{\beta}_1 x_i + e_i\)?
\(\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\,\varepsilon \right)}{\mathop{\text{Var}} \left( x \right)}\)
We will derive the numerator and denominator separately…
The covariance of our noisy variable \(x\) and the disturbance \(\varepsilon\).
\(\mathop{\text{Cov}} \left( x,\, \varepsilon \right)\)
\(= \mathop{\text{Cov}} \left( \left[ z + \omega \right],\, \left[ u - \beta_1 \omega \right] \right)\)
\(\quad\quad\quad\quad\enspace= \mathop{\text{Cov}} \left( z,\,u \right) -\beta_1 \mathop{\text{Cov}} \left( z,\,\omega \right) + \mathop{\text{Cov}} \left( \omega,\, u \right) - \beta_1 \mathop{\text{Var}} \left( \omega \right)\)
\(\quad\quad\quad\quad\enspace= 0 + 0 + 0 - \beta_1 \sigma_\omega^2\)
\(\quad\quad\quad\quad\enspace= - \beta_1 \sigma_\omega^2\)
Now for the denominator, \(\mathop{\text{Var}} \left( x \right)\).
\(\mathop{\text{Var}} \left( x \right)\)
\(= \mathop{\text{Var}} \left( z + \omega \right)\)
\(\quad\quad\quad= \mathop{\text{Var}} \left( z \right) + \mathop{\text{Var}} \left( \omega \right) + 2\mathop{\text{Cov}} \left( z,\,\omega \right)\)
\(\quad\quad\quad= \sigma_z^2 + \sigma_\omega^2\)
Putting the numerator and denominator back together,
\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 &= \beta_1 + \dfrac{\mathop{\text{Cov}} \left( x,\,\varepsilon \right)}{\mathop{\text{Var}} \left( x \right)} \\ &= \beta_1 + \dfrac{-\beta_1 \sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} \\ &= \beta_1 - \beta_1 \dfrac{\sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} \\ &= \beta_1 \dfrac{\sigma_z^2 + \sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} - \beta_1 \dfrac{\sigma_\omega^2}{\sigma_z^2 + \sigma_\omega^2} \\ &= \beta_1 \dfrac{\sigma_z^2}{\sigma_z^2 + \sigma_\omega^2} \end{align} \]
∴ \(\mathop{\text{plim}} \hat{\beta}_1 = \beta_1 \dfrac{\sigma_z^2}{\sigma_z^2 + \sigma_\omega^2}\).
What does this equation tell us?
Measurement error in our explanatory variables biases the coefficient estimates toward zero.
This type of bias/inconsistency is often called attenuation bias.
If the measurement error correlates with the explanatory variables, we have bigger problems with inconsistency/bias.
What about measurement in the outcome variable?
It doesn’t really matter—it just increases our standard errors.
General cases
Specific examples