class: center, middle, inverse, title-slide .title[ # Time series ] .subtitle[ ## EC 421, Set 7 ] .author[ ### Edward Rubin ] --- class: inverse, middle # Prologue --- name: schedule # Schedule ## Last Time Asymptotics, probability limits, and consistency ## Today Time series --- # EC 421 ## About our class 1. EC 421 is often a .hi[difficult class]. 1. EC 421 requires .hi[more math/theory] than most other classes. 1. This .hi[theory is important]—why/when you can trust OLS/regression. 1. With all of this theory, we get .hi[fewer traditional examples]. Proofs and simulations *are* our examples. --- layout: true # Asymptotics and consistency ## Review --- class: inverse, middle --- name: review 1. Compare/contrast the concepts *expected value* and *probability limit*. 1. What does it mean if the estimator `\(\hat{\theta}\)` is consistent for `\(\theta\)`? 1. What is required for an omitted variable to bias OLS estimates of `\(\beta_j\)`? 1. Does omitted-variable bias affect the consistency of OLS for `\(\beta_j\)`? 1. What can we know about the direction of omitted-variable bias? 1. How does measurement error in an explanatory variable affect the OLS estimate for that variable's effect on the outcome variable? 1. How does measurement error in an outcome variable affect OLS? --- layout: false class: inverse, middle # Time-series data --- layout: true # Time-series data ## Introduction --- name: ts-intro Up to this point, we focused on .hi-slate[cross-sectional data]. - Sampled *across* a population (_e.g._, people, counties, countries). - Sampled at *one moment* in time (_e.g._, Jan. 1, 2015). - We had `\(n\)` *individuals*, each indexed `\(i\)` in `\(\left\{1,\,\ldots,\, n \right\}\)`. -- Today, we focus on a different type of data: .hi[time-series data]. - Sampled within .pink[one unit/individual] (_e.g._, Oregon). - Observe .pink[multiple times] for the same unit (_e.g._, Oregon: 1990–2020). - We have `\(\color{#e64173}{T}\)` .pink[time periods], each indexed `\(t\)` in `\(\left\{1,\,\ldots,\, T \right\}\)`. --- layout: false class: clear .hi[US monthly births, 1933–2015]: Classic time-series graph <br> <img src="slides_files/figure-html/ts ex births-1.svg" style="display: block; margin: auto;" /> --- class: clear .hi[US monthly births, 1933–2015]: Newfangled time-series graph <br> <img src="slides_files/figure-html/ts ex2 births-1.svg" style="display: block; margin: auto;" /> --- class: clear .hi[US monthly births per 30 days, 1933–2015]: Newfangled time-series graph <br> <img src="slides_files/figure-html/ts ex3 births-1.svg" style="display: block; margin: auto;" /> --- layout: false class: inverse, middle # Time-series models --- layout: true # Time-series models ## Introduction --- Our model now looks something like $$ `\begin{align} \text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + u_t \end{align}` $$ -- or perhaps $$ `\begin{align} \text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + \beta_2 \color{#e64173}{\text{Income}_{t-1}} + u_t \end{align}` $$ -- maybe even $$ `\begin{align} \text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + \color{#e64173}{\beta_2 \text{Income}_{t-1}} + \beta_3 \color{#6A5ACD}{\text{Births}_{t-1}} + u_t \end{align}` $$ -- where `\(t-1\)` denotes the time period prior to `\(t\)` (*lagged* income or births). --- name: assumptions layout: false # Time-series models ## Assumptions 1. .hi[New:] **Weakly persistent outcomes**—essentially, `\(x_{t+k}\)` in the distant period `\(t+k\)` is weakly correlated with period `\(x_t\)` (when `\(k\)` is "big"). 1. `\(y_t\)` is a **linear function** of its parameters and disturbance. 1. There is **no perfect collinearity** in our data. 1. The `\(u_t\)` have conditional mean of zero (**exogeneity**), `\(\mathop{\boldsymbol{E}}\left[ u_t \middle| X \right] = 0\)`. 1. The `\(u_t\)` are **homoskedastic** with **zero correlation** between `\(u_t\)` and `\(u_s\)`, _i.e._, `\(\mathop{\text{Var}} \left( u_t | X \right) = \mathop{\text{Var}} \left( u_t \right) = \sigma^2\)` and `\(\mathop{\text{Cor}} \left( u_t,\,u_s \middle| X \right) = 0\)`. 1. **Normality of disturbances**, _i.e._, `\(u_t\overset{\text{iid}}{\sim}\mathop{N}\left( 0,\,\sigma^2 \right)\)`. --- layout: true # Time-series models ## Model options --- name: static_dynamic Time-series modeling boils down to two classes of models. 1. .hi[Static models:] Do not allow for persistent effect. 2. .hi-purple[Dynamic models:] Allow for persistent effects. -- - Models with .hi-purple[lagged explanatory] variables - .hi-purple[Autoregressive, distributed-lag] (ADL) models --- **Option 1:** .hi[Static models] .hi[Static models] assume the outcome depends upon .pink[only the current period]. $$ `\begin{align} \text{Births}_{\color{#e64173}{t}} = \beta_0 + \beta_1 \text{Income}_{\color{#e64173}{t}} + u_{\color{#e64173}{t}} \end{align}` $$ -- Here, we must believe that income .hi[immediately] affects the number of births and does not affect on the numbers of births in the future. -- We also need to believe current births do not depend upon previous births. -- Can be a very restrictive way to consider time-series data. --- **Option 2:** .hi-purple[Dynamic models] .hi-purple[Dynamic models] allow the outcome to depend upon .purple[other periods]. --- **Option 2a:** .hi-purple[Dynamic models] with .purple[lagged explanatory variables] These models allow the outcome to depend upon the .purple[explanatory variable(s) in other periods]. $$ `\begin{align} \text{Births}_{\color{#e64173}{t}} = &\beta_0 + \beta_1 \text{Income}_{\color{#e64173}{t}} + \beta_2 \text{Income}_{\color{#6A5ACD}{t-1}} + \\ &\beta_3 \text{Income}_{\color{#6A5ACD}{t-2}} + \beta_4 \text{Income}_{\color{#6A5ACD}{t-3}} + u_{\color{#e64173}{t}} \end{align}` $$ -- Here, income .hi[immediately] affects the number of births *and* affects .hi-purple[future] numbers of births. -- In other words: Births today depend today's income and *lags* of income—_e.g._, last month's income, last year's income, ... -- Estimate *total* effects by summing lags' coefficients, _e.g._, `\(\beta_1 + \beta_2 + \beta_3 + \beta_4\)`. -- *Note:* We still assume current births don't affect future births. --- **Option 2b:** .hi-purple[Autoregressive distributed-lag (ADL) models] These models allow the outcome to depend upon the .purple[explanatory variable(s) and/or the outcome variable in prior periods]. $$ `\begin{align} \text{Births}_{\color{#e64173}{t}} = \beta_0 + \beta_1 \text{Income}_{\color{#e64173}{t}} + \beta_2 \text{Income}_{\color{#6A5ACD}{t-1}} + \beta_3 \text{Births}_{\color{#6A5ACD}{t-1}} + u_{\color{#e64173}{t}} \end{align}` $$ -- Here, current income affects affects .hi[current] births and .hi-purple[future] births. -- In addition, .purple[current births affect future births]—we're allowing lags of the outcome variable. --- layout: true # Autoregressive distributed-lag models --- name: adl ## Numbers of lags ADL models are often specified as `\(\text{ADL}(\color{#FFA500}{p},\,\color{#e64173}{q})\)`, where - `\(\color{#FFA500}{p}\)` is the (maximum) number of **lags** for .orange[the outcome variable.] - `\(\color{#e64173}{q}\)` is the (maximum) number of **lags** for .pink[explanatory variables.] -- *Example:* `\(\text{ADL}(\color{#FFA500}{1},\,\color{#e64173}{0})\)` -- $$ `\begin{aligned} \text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Births}_{\color{#FFA500}{t-1}} + u_t \end{aligned}` $$ -- *Example:* `\(\text{ADL}(\color{#FFA500}{2},\,\color{#e64173}{2})\)` -- $$ `\begin{aligned} \text{Births}_t = &\beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Income}_{\color{#e64173}{t-1}} + \beta_3 \text{Income}_{\color{#e64173}{t-2}} \\ & + \beta_4 \text{Births}_{\color{#FFA500}{t-1}} + \beta_5 \text{Births}_{\color{#FFA500}{t-2}} + u_t \end{aligned}` $$ --- layout: true # Autoregressive distributed-lag models ## Complexity --- name: adl_complexity Due to their lags, ADL models actually estimate even more complex relationships than you might first guess. -- Consider ADL(1, 0): `\(\text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Births}_{t-1} + u_t\)` -- Write out the model for period `\(t-1\)`: $$ `\begin{align} \text{Births}_{t-1} = \beta_0 + \beta_1 \text{Income}_{t-1} + \beta_2 \text{Births}_{t-2} + u_{t-1} \end{align}` $$ -- which we can substitute in for `\(\text{Births}_{t-1}\)` in the first equation, _i.e._, $$ `\begin{align} \text{Births}_t = &\beta_0 + \beta_1 \text{Income}_t + \\ &\beta_2 \underbrace{\left( \beta_0 + \beta_1 \text{Income}_{t-1} + \beta_2 \text{Births}_{t-2} + u_{t-1} \right)}_{\text{Births}_{t-1}} + u_t \end{align}` $$ --- Continuing... $$ `\begin{align} \text{Births}_t = &\beta_0 + \beta_1 \text{Income}_t + \\ &\beta_2 \underbrace{\left( \beta_0 + \beta_1 \text{Income}_{t-1} + \beta_2 \text{Births}_{t-2} + u_{t-1} \right)}_{\text{Births}_{t-1}} + u_t \\ =& \beta_0 \left(1 + \beta_2 \right) + \beta_1 \text{Income}_t + \beta_1 \beta_2 \text{Income}_{t-1} + \\ &\beta_2^2 \text{Births}_{t-2} + u_{t} + \beta_2 u_{t-1} \end{align}` $$ -- We could then substitute in the equation for `\(\text{Births}_{t-2}\)`, `\(\text{Births}_{t-3}\)`, ... --- Eventually we arrive at $$ `\begin{align} \text{Births}_t = &\beta_0 \left( 1 + \beta_2 + \beta_2^2 + \beta_2^3 + \cdots \right) + \\ &\beta_1 \left( \text{Income}_t + \beta_2 \text{Income}_{t-1} + \beta_2^2 \text{Income}_{t-2} + \cdots \right) +\\ & u_t + \beta_2 u_{t-1} + \beta_2^2 u_{t-2} + \cdots \end{align}` $$ -- **The point?** -- By including just **one lag of the dependent variable**—as in a ADL(1, 0)—we implicitly include for *many lags* of the explanatory variables and disturbances.<sup>.pink[†]</sup> .footnote[.pink[†] These lags enter into the equation in a very specific way—not the most flexible specification.] --- layout: true # Autoregressive distributed-lag models ## The partial-adjustment model --- name: adl_partial There are times that actually want to model an individual's .hi[desired amount], rather than her .hi-purple[*actual* amount] -- , but we are unable to observe the .pink[desired level]. -- *Partial-adjustment models* help us model this situation. --- *Example* We want to know how the .hi[desired number of cigarettes], `\(\color{#e64173}{\widetilde{\text{Cig}}_t}\)` , changes with the current period's cigarette tax, _e.g._, $$ `\begin{align} \color{#e64173}{\widetilde{\text{Cig}}_t} = \beta_0 + \beta_1 \text{Tax}_t + u_t \tag{A} \end{align}` $$ -- Imagine .purple[actual cigarette consumption], `\(\color{#6A5ACD}{\text{Cig}_t}\)`, doesn't change immediately (_e.g._, habit persistence). Instead, consumption depends upon .pink[current desired level] and .orange[previous consumption level] $$ `\begin{align} \color{#6A5ACD}{\text{Cig}_t} = \lambda \color{#e64173}{\widetilde{\text{Cig}}_t} + \left( 1-\lambda \right) \color{#FFA500}{\text{Cig}_{t-1}} \tag{B} \end{align}` $$ --- *Example, continued* $$ `\begin{align} \color{#e64173}{\widetilde{\text{Cig}}_t} &= \beta_0 + \beta_1 \text{Tax}_t + u_t \tag{A} \\[0.3em] \color{#6A5ACD}{\text{Cig}_t} &= \lambda \color{#e64173}{\widetilde{\text{Cig}}_t} + \left( 1-\lambda \right) \color{#FFA500}{\text{Cig}_{t-1}} \tag{B} \end{align}` $$ -- Substituting `\(\color{#e64173}{\widetilde{\text{Cig}}_t}\)` from `\((\text{A})\)` into `\((\text{B})\)` yields $$ `\begin{align} \color{#6A5ACD}{\text{Cig}_t} &= \lambda \left( \beta_0 + \beta_1 \text{Tax}_t + u_t \right) + \left( 1-\lambda \right) \color{#FFA500}{\text{Cig}_{t-1}} \\[0.3em] &= \lambda\beta_0 + \lambda\beta_1 \text{Tax}_t + \left( 1-\lambda \right) \color{#FFA500}{\text{Cig}_{t-1}} + \lambda u_t \tag{C} \end{align}` $$ -- The equation in `\((\text{C})\)` is ADL(1, 0). -- We can also estimate/recover the speed-of-adjustment coefficient `\(\lambda\)`. --- layout: false class: inverse, middle # OLS in time series --- layout: true # OLS in time series ## Unbiased coefficients --- name: ols_unbiased As before, the unbiased-ness of OLS is going to depend upon our exogeneity assumption, _i.e._, `\(\mathop{\boldsymbol{E}}\left[ u_t \middle| X \right] = 0\)`. -- We can split this assumption into two parts. -- 1. The disturbance `\(u_t\)` is independent of the explanatory variables in the **same period** (_i.e._, `\(X_t\)`). -- 1. The disturbance `\(u_t\)` is independent of the explanatory variables in the **other periods** (_i.e._, `\(X_s\)` for `\(s\neq t\)`). -- We need both of these parts to be true for OLS to be unbiased. --- We need both parts of our exogeneity assumption for OLS to be unbiased: $$ `\begin{align} \mathop{\boldsymbol{E}}\left[ \hat{\beta}_1 \middle| X \right] &= \beta_1 + \mathop{\boldsymbol{E}}\left[ \dfrac{\sum_t \left( x_t - \overline{x} \right) u_t}{\sum_t \left( x_t - \overline{x} \right)^2} \middle| X \right] \end{align}` $$ _I.e._, to guarantee the numerator equals zero, we need `\(\mathop{\boldsymbol{E}}\left[ u_t | X \right] = 0\)`—for both `\(\mathop{\boldsymbol{E}}\left[ u_t | X_t \right] = 0\)` *and* `\(\mathop{\boldsymbol{E}}\left[ u_t | X_{s} \right] = 0\)` `\((s\neq t)\)`. -- .pink[The second part of our exogeneity assumption]—requiring that `\(u_t\)` is independent of all regressors in other periods—.pink[fails with dynamic models with lagged outcome variables]. -- Thus, .hi[OLS is biased for dynamic models with lagged outcome variables]. --- To see why dynamic models with lagged outcome variables violate our exogeneity assumption, consider two periods of our simple ADL(1, 0) model. $$ `\begin{align} \color{#e64173}{\text{Births}_t} &= \beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Births}_{t-1} + \color{#e64173}{u_t} \tag{1}\\[0.3em] \text{Births}_{t+1} &= \beta_0 + \beta_1 \text{Income}_{t+1} + \beta_2 \color{#e64173}{\text{Births}_t} + u_{t+1} \tag{2} \end{align}` $$ -- In `\((1)\)`, `\(\color{#e64173}{u_t}\)` clearly correlates with `\(\color{#e64173}{\text{Births}_t}\)`. -- However, `\(\color{#e64173}{\text{Births}_t}\)` is a regressor in `\((2)\)` (lagged dependent variable). -- ∴ The disturbance in `\(t\)` `\(\left(\color{#e64173}{u_t}\right)\)` correlates with a regressor in `\(t+1\)` `\(\left(\color{#e64173}{\text{Births}_t}\right)\)`. -- This correlation violates the second part of our exogeneity requirement. --- layout: true # OLS in time series ## Consistent coefficients --- name: ols_consistency All is not lost. -- For OLS to be **consistent**, we only need .hi[contemporaneous exogeneity]. .hi[Contemporaneous exogeneity:] each disturbance is uncorrelated with the explanatory variables .pink[in the same period], _i.e._, $$ `\begin{align} \mathop{\boldsymbol{E}}\left[ u_t \middle| X_t \right] = 0 \end{align}` $$ -- With contemporaneous exogeneity, OLS estimates for the coefficients in a time series model are consistent. --- To see why OLS is consistent with contemporaneous exogeneity, consider the OLS estimate for `\(\beta_1\)` in $$ `\begin{align} \text{Births}_t &= \beta_0 + \beta_1 \text{Births}_{t-1} + u_t \end{align}` $$ which we've shown (a few times) can be written $$ `\begin{align} \hat{\beta}_1 &= \beta_1 + \dfrac{\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)u_t}{\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)^2} \end{align}` $$ --- $$ `\begin{align} \mathop{\text{plim}} \hat{\beta}_1 &= \mathop{\text{plim}} \left( \beta_1 + \dfrac{\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)u_t}{\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)^2} \right) \\[0.3em] &= \beta_1 + \dfrac{\mathop{\text{plim}} \left[\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)u_t/T\right]}{\mathop{\text{plim}} \left[\sum_t \left( \text{Births}_{t-1} - \overline{\text{Births} } \right)^2/T\right]} \\[0.3em] &= \beta_1 + \dfrac{\color{#e64173}{\mathop{\text{Cov}} \left( \text{Births}_{t-1},\, u_t \right)}}{\mathop{\text{Var}} \left( \text{Births}_{t} \right)} \end{align}` $$ -- `\(\hspace{8em}=\beta_1\hspace{1em}\)` **if** `\(\color{#e64173}{\mathop{\text{Cov}} \left( \text{Births}_{t-1},\, u_t \right)=0}\)` -- .hi[Contemporaneous exogeneity] gives us `\(\color{#e64173}{\mathop{\text{Cov}} \left( \text{Births}_{t-1},\, u_t \right)=0}\)`. --- Thus, if we assume .hi[contemporaneous exogeneity], **OLS is consistent** for the coefficients, *even for models with lagged dependent variables*. --- layout: false class: inverse, middle # The end. --- layout: false # Table of contents .pull-left[ ### Admin 1. [Schedule](#schedule) 1. [Example questions](#examples) 1. [Review: Asymptotics](#review) ] .pull-right[ ### Time series 1. [Introduction](#ts-intro) 1. [Assumptions](#assumptions) 1. [Static *vs.* dynamic models](#static_dynamic) 1. ["ADL" models](#adl) - [Underlying complexity](#adl_complexity) - [Partial-adjustment models](#adl_partial) 1. [Unbiasedness of OLS](#ols_unbiased) 1. [Consistency of OLS](#ols_consistency) 1. *Extra:* [ADL in equilibrium](#adl_eq) ] --- layout: true # Autoregressive distributed-lag models ## Equilibrium effects --- name: adl_eq ADL models also offer interesting insights for long-run/equilibrium effects. $$ `\begin{aligned} \text{Births}_t = \beta_0 + \color{#e64173}{\beta_1} \text{Income}_t + \beta_2 \text{Births}_{t-1} + u_t \end{aligned}` $$ In this ADL(1, 0) model, `\(\beta_1\)` gives the .hi[short-run effect] of income on the number of births. -- _I.e._, how income in time `\(t\)` affects births in time `\(t\)`. --- Starting with $$ `\begin{aligned} \text{Births}_t = \beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Births}_{t-1} + u_t \end{aligned}` $$ -- we move into equilibrium, _i.e._, `\(\text{Births}_t=\text{Births}^\star\)`, _i.e._, -- $$ `\begin{aligned} \text{Births}^\star &= \beta_0 + \beta_1 \text{Income}^\star + \beta_2 \text{Births}^\star \end{aligned}` $$ -- Now rearrange... $$ `\begin{align} \text{Births}^\star - \beta_2 \text{Births}^\star &= \beta_0 + \beta_1 \text{Income}^\star \\ \left(1 - \beta_2\right) \text{Births}^\star &= \beta_0 + \beta_1 \text{Income}^\star \\ \text{Births}^\star &= \dfrac{\beta_0}{\left(1 - \beta_2\right)} + \dfrac{\beta_1}{\left(1 - \beta_2\right)} \text{Income}^\star \end{align}` $$ --- .hi[Short-run] .pink[effect of income on births:] $$ `\begin{aligned} \text{Births}_t = \beta_0 + \color{#e64173}{\beta_1} \text{Income}_t + \beta_2 \text{Births}_{t-1} + u_t \end{aligned}` $$ .hi-purple[Long-run] .purple[effect of income on births:] $$ `\begin{align} \text{Births}^\star = \dfrac{\beta_0}{\left(1 - \beta_2\right)} + \color{#6A5ACD}{\dfrac{\beta_1}{\left(1 - \beta_2\right)}} \text{Income}^\star \end{align}` $$ --- Another way to see this result: We already showed $$ `\begin{align} \text{Births}_t =& \beta_0 + \beta_1 \text{Income}_t + \beta_2 \text{Births}_{t-1} \end{align}` $$ gives us $$ `\begin{align} \text{Births}_t = &\beta_0 \left( 1 + \beta_2 + \beta_2^2 + \beta_2^3 + \cdots \right) + \\ &\beta_1 \left( \text{Income}_t + \beta_2 \text{Income}_{t-1} + \beta_2^2 \text{Income}_{t-2} + \cdots \right) +\\ & u_t + \beta_2 u_{t-1} + \beta_2^2 u_{t-2} + \cdots \end{align}` $$ -- In equilibrium: `\(\text{Income}_t=\text{Income}_{t-k}=\text{Income}^\star\)` for all `\(k\)`. --- Substituting `\(\text{Income}_{t}=\text{Income}^\star\)` for all `\(k\)`<br>(and assuming no disturbances in equilibrium): -- $$ `\begin{align} \text{Births}_t = &\beta_0 \left( 1 + \beta_2 + \beta_2^2 + \beta_2^3 + \cdots \right) + \\ &\beta_1 \left( \text{Income}^\star + \beta_2 \text{Income}^\star + \beta_2^2 \text{Income}^\star + \cdots \right) +\\ \end{align}` $$ --- count: false Substituting `\(\text{Income}_{t}=\text{Income}^\star\)` for all `\(k\)`<br>(and assuming no disturbances in equilibrium): $$ `\begin{align} \text{Births}_t = &\beta_0 \left( 1 + \beta_2 + \beta_2^2 + \beta_2^3 + \cdots \right) + \\ &\beta_1 \left( \text{Income}^\star + \beta_2 \text{Income}^\star + \beta_2^2 \text{Income}^\star + \cdots \right) +\\ = &\beta_0 \left( \dfrac{1}{\beta_2} \right) + \\ &\beta_1 \left( \dfrac{1}{\beta_2} \right) \text{Income}^\star \end{align}` $$ So long as `\(-1<\beta_2<1\)`.<sup>.pink[†]</sup> .footnote[ .pink[†] This simplification comes from `\(\sum_{k=0}^\infty p^k = \dfrac{1}{p}\)` for `\(-1<k<1\)`. ] --- exclude: true