Time series

EC 421, Set 7

Prologue

Schedule

Last Time

Asymptotics, probability limits, and consistency

Today

Time series

Soon

Problem set and midterm

Asymptotics and consistency

Review

Asymptotics and consistency

Review

  1. Compare/contrast the concepts expected value and probability limit.

  2. What does it mean if the estimator \(\hat{\theta}\) is consistent for \(\theta\)?

  3. What is required for an omitted variable to bias OLS estimates of \(\beta_j\)?

  4. Does omitted-variable bias affect the consistency of OLS for \(\beta_j\)?

  5. What can we know about the direction of omitted-variable bias?

  6. How does measurement error in an explanatory variable affect the OLS estimate for that variable’s effect on the outcome variable?

  7. How does measurement error in an outcome variable affect OLS?

Time-series data

Time-series data

Introduction

Up to this point, we focused on cross-sectional data.

  • Sampled across a population (e.g., people, counties, countries).
  • Sampled at one moment in time (e.g., Jan. 1, 2015).
  • We had \(n\) individuals, each indexed \(i\) in \(\left\{1,\,\ldots,\, n \right\}\).

Today, we focus on a different type of data: time-series data.

  • Sampled within one unit/individual (e.g., Oregon).
  • Observe multiple times for the same unit (e.g., Oregon: 1990–2020).
  • We have \(\textcolor{#e64173}{T}\) time periods, each indexed \(t\) in \(\left\{1,\,\ldots,\, T \right\}\).

US monthly births, 1933–2015: Classic time-series graph

US monthly births, 1933–2015: Newfangled time-series graph

US monthly births per 30 days, 1933–2015: Newfangled time-series graph

Time-series models

Time-series models

Introduction

Our model now looks something like

\[ \begin{align} \text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + u_t \end{align} \]

or perhaps

\[ \begin{align} \text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + \beta_2 \textcolor{#e64173}{\text{Income}_{t-1}} + u_t \end{align} \]

maybe even

\[ \begin{align} \text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + \textcolor{#e64173}{\beta_2 \text{Income}_{t-1}} + \beta_3 \textcolor{#6A5ACD}{\text{Births}_{t-1}} + u_t \end{align} \]

where \(t-1\) denotes the time period prior to \(t\) (lagged income or births).

Time-series models

Assumptions

  1. New: Weakly persistent outcomes—essentially, \(x_{t+k}\) in the distant period \(t+k\) is weakly correlated with period \(x_t\) (when \(k\) is “big”).

  2. \(y_t\) is a linear function of its parameters and disturbance.

  3. There is no perfect collinearity in our data.

  4. The \(u_t\) have conditional mean of zero (exogeneity), \(\mathop{\boldsymbol{E}}\left[ u_t \middle| X \right] = 0\).

  5. The \(u_t\) are homoskedastic with zero correlation between \(u_t\) and \(u_s\), i.e., \(\mathop{\text{Var}} \left( u_t | X \right) = \mathop{\text{Var}} \left( u_t \right) = \sigma^2\) and \(\mathop{\text{Cor}} \left( u_t,\,u_s \middle| X \right) = 0\).

  6. Normality of disturbances, i.e., \(u_t\overset{\text{iid}}{\sim}\mathop{N}\left( 0,\,\sigma^2 \right)\).

Time-series models

Model options

Time-series modeling boils down to two classes of models.

  1. Static models: Do not allow for persistent effect.

  2. Dynamic models: Allow for persistent effects.

  • Models with lagged explanatory variables

  • Autoregressive, distributed-lag (ADL) models

Time-series models

Model options

Option 1: Static models

Static models assume the outcome depends upon only the current period.

\[ \begin{align} \text{Births}_{\textcolor{#e64173}{t}} = \beta_0 + \beta_1 \text{Income}_{\textcolor{#e64173}{t}} + u_{\textcolor{#e64173}{t}} \end{align} \]

Here, we must believe that income immediately affects the number of births and does not affect on the numbers of births in the future.

We also need to believe current births do not depend upon previous births.

Can be a very restrictive way to consider time-series data.

Time-series models

Model options

Option 2: Dynamic models

Dynamic models allow the outcome to depend upon other periods.

Time-series models

Model options

Option 2a: Dynamic models with lagged explanatory variables

These models allow the outcome to depend upon the explanatory variable(s) in other periods.

\[ \begin{align} \text{Births}_{\textcolor{#e64173}{t}} = &\beta_0 + \beta_1 \text{Income}_{\textcolor{#e64173}{t}} + \beta_2 \text{Income}_{\textcolor{#6A5ACD}{t-1}} + \\ &\beta_3 \text{Income}_{\textcolor{#6A5ACD}{t-2}} + \beta_4 \text{Income}_{\textcolor{#6A5ACD}{t-3}} + u_{\textcolor{#e64173}{t}} \end{align} \]

Here, income immediately affects the number of births and affects future numbers of births.

In other words: Births today depend today’s income and lags of income—e.g., last month’s income, last year’s income, …

Estimate total effects by summing lags’ coefficients, e.g., \(\beta_1 + \beta_2 + \beta_3 + \beta_4\).

Note: We still assume current births don’t affect future births.

Time-series models

Model options

Option 2b: Autoregressive distributed-lag (ADL) models

These models allow the outcome to depend upon the explanatory variable(s) and/or the outcome variable in prior periods.

\[ \begin{align} \text{Births}_{\textcolor{#e64173}{t}} = \beta_0 + \beta_1 \text{Income}_{\textcolor{#e64173}{t}} + \beta_2 \text{Income}_{\textcolor{#6A5ACD}{t-1}} + \beta_3 \text{Births}_{\textcolor{#6A5ACD}{t-1}} + u_{\textcolor{#e64173}{t}} \end{align} \]

Here, current income affects affects current births and future births.

In addition, current births affect future births—we’re allowing lags of the outcome variable.

Autoregressive distributed-lag models

Numbers of lags

ADL models are often specified as \(\text{ADL}(\textcolor{#FFA500}{p},\,\textcolor{#e64173}{q})\), where

  • \(\textcolor{#FFA500}{p}\) is the (maximum) number of lags for the outcome variable.

  • \(\textcolor{#e64173}{q}\) is the (maximum) number of lags for explanatory variables.

Example: \(\text{ADL}(\textcolor{#FFA500}{1},\,\textcolor{#e64173}{0})\)

\[ \begin{aligned} \text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Births}_{\textcolor{#FFA500}{t-1}} + u_t \end{aligned} \]

Example: \(\text{ADL}(\textcolor{#FFA500}{2},\,\textcolor{#e64173}{2})\)

\[ \begin{aligned} \text{Births}_t = &\beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Income}_{\textcolor{#e64173}{t-1}} + \beta_3 \text{Income}_{\textcolor{#e64173}{t-2}} \\ & + \beta_4 \text{Births}_{\textcolor{#FFA500}{t-1}} + \beta_5 \text{Births}_{\textcolor{#FFA500}{t-2}} + u_t \end{aligned} \]

Autoregressive distributed-lag models

Complexity

Due to their lags, ADL models actually estimate even more complex relationships than you might first guess.

Consider ADL(1, 0): \(\text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Births}_{t-1} + u_t\)

Write out the model for period \(t-1\):

\[ \begin{align} \text{Births}_{t-1} = \beta_0 + \beta_1 \text{Income}_{t-1} + \beta_2 \text{Births}_{t-2} + u_{t-1} \end{align} \]

which we can substitute in for \(\text{Births}_{t-1}\) in the first equation, i.e.,

\[ \begin{align} \text{Births}_t = &\beta_0 + \beta_1 \text{Income}_t + \\ &\beta_2 \underbrace{\left( \beta_0 + \beta_1 \text{Income}_{t-1} + \beta_2 \text{Births}_{t-2} + u_{t-1} \right)}_{\text{Births}_{t-1}} + u_t \end{align} \]

Autoregressive distributed-lag models

Complexity

Continuing…

\[ \begin{align} \text{Births}_t = &\beta_0 + \beta_1 \text{Income}_t + \\ &\beta_2 \underbrace{\left( \beta_0 + \beta_1 \text{Income}_{t-1} + \beta_2 \text{Births}_{t-2} + u_{t-1} \right)}_{\text{Births}_{t-1}} + u_t \\ =& \beta_0 \left(1 + \beta_2 \right) + \beta_1 \text{Income}_t + \beta_1 \beta_2 \text{Income}_{t-1} + \\ &\beta_2^2 \text{Births}_{t-2} + u_{t} + \beta_2 u_{t-1} \end{align} \]

We could then substitute in the equation for \(\text{Births}_{t-2}\), \(\text{Births}_{t-3}\), …

Autoregressive distributed-lag models

Complexity

Eventually we arrive at

\[ \begin{align} \text{Births}_t = &\beta_0 \left( 1 + \beta_2 + \beta_2^2 + \beta_2^3 + \cdots \right) + \\ &\beta_1 \left( \text{Income}_t + \beta_2 \text{Income}_{t-1} + \beta_2^2 \text{Income}_{t-2} + \cdots \right) +\\ & u_t + \beta_2 u_{t-1} + \beta_2^2 u_{t-2} + \cdots \end{align} \]

The point?

By including just one lag of the dependent variable—as in a ADL(1, 0)—we implicitly include for many lags of the explanatory variables and disturbances.

1

Autoregressive distributed-lag models

The partial-adjustment model

There are times that actually want to model an individual’s desired amount, rather than her actual amount, but we are unable to observe the desired level.

Partial-adjustment models help us model this situation.

Autoregressive distributed-lag models

The partial-adjustment model

Example

We want to know how the desired number of cigarettes, \(\textcolor{#e64173}{\widetilde{\text{Cig}}_t}\) , changes with the current period’s cigarette tax, e.g.,

\[ \begin{align} \textcolor{#e64173}{\widetilde{\text{Cig}}_t} = \beta_0 + \beta_1 \text{Tax}_t + u_t \tag{A} \end{align} \]

Imagine actual cigarette consumption, \(\textcolor{#6A5ACD}{\text{Cig}_t}\), doesn’t change immediately (e.g., habit persistence). Instead, consumption depends upon current desired level and previous consumption level

\[ \begin{align} \textcolor{#6A5ACD}{\text{Cig}_t} = \lambda \textcolor{#e64173}{\widetilde{\text{Cig}}_t} + \left( 1-\lambda \right) \textcolor{#FFA500}{\text{Cig}_{t-1}} \tag{B} \end{align} \]

Autoregressive distributed-lag models

The partial-adjustment model

Example, continued

\[ \begin{align} \textcolor{#e64173}{\widetilde{\text{Cig}}_t} &= \beta_0 + \beta_1 \text{Tax}_t + u_t \tag{A} \\[0.3em] \textcolor{#6A5ACD}{\text{Cig}_t} &= \lambda \textcolor{#e64173}{\widetilde{\text{Cig}}_t} + \left( 1-\lambda \right) \textcolor{#FFA500}{\text{Cig}_{t-1}} \tag{B} \end{align} \]

Substituting \(\textcolor{#e64173}{\widetilde{\text{Cig}}_t}\) from \((\text{A})\) into \((\text{B})\) yields

\[ \begin{align} \textcolor{#6A5ACD}{\text{Cig}_t} &= \lambda \left( \beta_0 + \beta_1 \text{Tax}_t + u_t \right) + \left( 1-\lambda \right) \textcolor{#FFA500}{\text{Cig}_{t-1}} \\[0.3em] &= \lambda\beta_0 + \lambda\beta_1 \text{Tax}_t + \left( 1-\lambda \right) \textcolor{#FFA500}{\text{Cig}_{t-1}} + \lambda u_t \tag{C} \end{align} \]

The equation in \((\text{C})\) is ADL(1, 0).

We can also estimate/recover the speed-of-adjustment coefficient \(\lambda\).

OLS in time series

OLS in time series

Unbiased coefficients

As before, the unbiased-ness of OLS is going to depend upon our exogeneity assumption, i.e., \(\mathop{\boldsymbol{E}}\left[ u_t \middle| X \right] = 0\).

We can split this assumption into two parts.

  1. The disturbance \(u_t\) is independent of the explanatory variables in the same period (i.e., \(X_t\)).
  1. The disturbance \(u_t\) is independent of the explanatory variables in the other periods (i.e., \(X_s\) for \(s\neq t\)).

We need both of these parts to be true for OLS to be unbiased.

OLS in time series

Unbiased coefficients

We need both parts of our exogeneity assumption for OLS to be unbiased:

\[ \begin{align} \mathop{\boldsymbol{E}}\left[ \hat{\beta}_1 \middle| X \right] &= \beta_1 + \mathop{\boldsymbol{E}}\left[ \dfrac{\sum_t \left( x_t - \overline{x} \right) u_t}{\sum_t \left( x_t - \overline{x} \right)^2} \middle| X \right] \end{align} \]

I.e., to guarantee the numerator equals zero, we need \(\mathop{\boldsymbol{E}}\left[ u_t | X \right] = 0\)—for both \(\mathop{\boldsymbol{E}}\left[ u_t | X_t \right] = 0\) and \(\mathop{\boldsymbol{E}}\left[ u_t | X_{s} \right] = 0\) \((s\neq t)\).

The second part of our exogeneity assumption—requiring that \(u_t\) is independent of all regressors in other periods—fails with dynamic models with lagged outcome variables.

Thus, OLS is biased for dynamic models with lagged outcome variables.

OLS in time series

Unbiased coefficients

To see why dynamic models with lagged outcome variables violate our exogeneity assumption, consider two periods of our simple ADL(1, 0) model.

\[ \begin{align} \textcolor{#e64173}{\text{Births}_t} &= \beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Births}_{t-1} + \textcolor{#e64173}{u_t} \tag{1}\\[0.3em] \text{Births}_{t+1} &= \beta_0 + \beta_1 \text{Income}_{t+1} + \beta_2 \textcolor{#e64173}{\text{Births}_t} + u_{t+1} \tag{2} \end{align} \]

In \((1)\), \(\textcolor{#e64173}{u_t}\) clearly correlates with \(\textcolor{#e64173}{\text{Births}_t}\).

However, \(\textcolor{#e64173}{\text{Births}_t}\) is a regressor in \((2)\) (lagged dependent variable).

∴ The disturbance in \(t\) \(\left(\textcolor{#e64173}{u_t}\right)\) correlates with a regressor in \(t+1\) \(\left(\textcolor{#e64173}{\text{Births}_t}\right)\).

This correlation violates the second part of our exogeneity requirement.

OLS in time series

Consistent coefficients

All is not lost.

For OLS to be consistent, we only need contemporaneous exogeneity.

Contemporaneous exogeneity: each disturbance is uncorrelated with the explanatory variables in the same period, i.e.,

\[ \begin{align} \mathop{\boldsymbol{E}}\left[ u_t \middle| X_t \right] = 0 \end{align} \]

With contemporaneous exogeneity, OLS estimates for the coefficients in a time series model are consistent.

OLS in time series

Consistent coefficients

To see why OLS is consistent with contemporaneous exogeneity, consider the OLS estimate for \(\beta_1\) in

\[ \begin{align} \text{Births}_t &= \beta_0 + \beta_1 \text{Births}_{t-1} + u_t \end{align} \]

which we’ve shown (a few times) can be written

\[ \begin{align} \hat{\beta}_1 &= \beta_1 + \dfrac{\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)u_t}{\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)^2} \end{align} \]

OLS in time series

Consistent coefficients

\[ \begin{align} \mathop{\text{plim}} \hat{\beta}_1 &= \mathop{\text{plim}} \left( \beta_1 + \dfrac{\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)u_t}{\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)^2} \right) \\[0.3em] &= \beta_1 + \dfrac{\mathop{\text{plim}} \left[\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)u_t/T\right]}{\mathop{\text{plim}} \left[\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)^2/T\right]} \\[0.3em] &= \beta_1 + \dfrac{\textcolor{#e64173}{\mathop{\text{Cov}} \left( \text{Births}_{t-1},\, u_t \right)}}{\mathop{\text{Var}} \left( \text{Births}_{t} \right)} \end{align} \]

\(\hspace{8em}=\beta_1\hspace{1em}\) if \(\textcolor{#e64173}{\mathop{\text{Cov}} \left( \text{Births}_{t-1},\, u_t \right)=0}\)

Contemporaneous exogeneity gives us \(\textcolor{#e64173}{\mathop{\text{Cov}} \left( \text{Births}_{t-1},\, u_t \right)=0}\).

OLS in time series

Consistent coefficients

Thus, if we assume contemporaneous exogeneity, OLS is consistent for the coefficients, even for models with lagged dependent variables.

The end.

Table of contents

Autoregressive distributed-lag models

Equilibrium effects

ADL models also offer interesting insights for long-run/equilibrium effects. \[ \begin{aligned} \text{Births}_t = \beta_0 + \textcolor{#e64173}{\beta_1} \text{Income}_t + \beta_2 \text{Births}_{t-1} + u_t \end{aligned} \]

In this ADL(1, 0) model, \(\beta_1\) gives the short-run effect of income on the number of births.

I.e., how income in time \(t\) affects births in time \(t\).

Autoregressive distributed-lag models

Equilibrium effects

Starting with

\[ \begin{aligned} \text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Births}_{t-1} + u_t \end{aligned} \]

we move into equilibrium, i.e., \(\text{Births}_t=\text{Births}^\star\), i.e.,

\[ \begin{aligned} \text{Births}^\star &= \beta_0 + \beta_1 \text{Income}^\star + \beta_2 \text{Births}^\star \end{aligned} \]

Now rearrange…

\[ \begin{align} \text{Births}^\star - \beta_2 \text{Births}^\star &= \beta_0 + \beta_1 \text{Income}^\star \\ \left(1 - \beta_2\right) \text{Births}^\star &= \beta_0 + \beta_1 \text{Income}^\star \\ \text{Births}^\star &= \dfrac{\beta_0}{\left(1 - \beta_2\right)} + \dfrac{\beta_1}{\left(1 - \beta_2\right)} \text{Income}^\star \end{align} \]

Autoregressive distributed-lag models

Equilibrium effects

Short-run effect of income on births: \[ \begin{aligned} \text{Births}_t = \beta_0 + \textcolor{#e64173}{\beta_1} \text{Income}_t + \beta_2 \text{Births}_{t-1} + u_t \end{aligned} \]

Long-run effect of income on births: \[ \begin{align} \text{Births}^\star = \dfrac{\beta_0}{\left(1 - \beta_2\right)} + \textcolor{#6A5ACD}{\dfrac{\beta_1}{\left(1 - \beta_2\right)}} \text{Income}^\star \end{align} \]

Autoregressive distributed-lag models

Equilibrium effects

Another way to see this result:

We already showed \[ \begin{align} \text{Births}_t =& \beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Births}_{t-1} \end{align} \]

gives us

\[ \begin{align} \text{Births}_t = &\beta_0 \left( 1 + \beta_2 + \beta_2^2 + \beta_2^3 + \cdots \right) + \\ &\beta_1 \left( \text{Income}_t + \beta_2 \text{Income}_{t-1} + \beta_2^2 \text{Income}_{t-2} + \cdots \right) +\\ & u_t + \beta_2 u_{t-1} + \beta_2^2 u_{t-2} + \cdots \end{align} \]

In equilibrium: \(\text{Income}_t=\text{Income}_{t-k}=\text{Income}^\star\) for all \(k\).

Autoregressive distributed-lag models

Equilibrium effects

Substituting \(\text{Income}_{t}=\text{Income}^\star\) for all \(k\)
(and assuming no disturbances in equilibrium):

\[ \begin{align} \text{Births}_t = &\beta_0 \left( 1 + \beta_2 + \beta_2^2 + \beta_2^3 + \cdots \right) + \\ &\beta_1 \left( \text{Income}^\star + \beta_2 \text{Income}^\star + \beta_2^2 \text{Income}^\star + \cdots \right) +\\ \end{align} \]

Autoregressive distributed-lag models

Equilibrium effects

count: false

Substituting \(\text{Income}_{t}=\text{Income}^\star\) for all \(k\)
(and assuming no disturbances in equilibrium): \[ \begin{align} \text{Births}_t = &\beta_0 \left( 1 + \beta_2 + \beta_2^2 + \beta_2^3 + \cdots \right) + \\ &\beta_1 \left( \text{Income}^\star + \beta_2 \text{Income}^\star + \beta_2^2 \text{Income}^\star + \cdots \right) +\\ = &\beta_0 \left( \dfrac{1}{\beta_2} \right) + \\ &\beta_1 \left( \dfrac{1}{\beta_2} \right) \text{Income}^\star \end{align} \]

So long as \(-1<\beta_2<1\).

1