Problem Set 2: Time Series

EC 421: Introduction to Econometrics

Author

Edward Rubin

1 Instructions

Due Upload your answer on Canvas before midnight on Tuesday, 03 March 2026.

Important You must submit your answers as an HTML or PDF file, built from an RMarkdown (.Rmd) or Quarto (.qmd) file. Do not submit the .Rmd or .qmd file. You will not receive credit for it.

If we ask you to create a figure or run a regression, then the figure or regression results should be in the document that you submit (not just the code—we want the actual figure or regression output with coefficients, standard errors, etc.).

Integrity If you are suspected of cheating, then you will receive a zero—for the assignment and possibly for the course.

README! The data for this problem set come from the Federal Reserve Bank of St. Louis FRED database. The dataset contains quarterly U.S. macroeconomic variables.

The table below describes each variable.

Variable names and descriptions
Variable name	Description
`time`	Time in the dataset (starting 1970 Q1)
`date`	Quarter of observation (e.g., ‘1970-01-01’)
`q`	Quarter of the year (1, 2, 3, or 4)
`country`	Country (always ‘United States’)
`unemp_rate`	Unemployment rate (percent)
`cpi`	Consumer Price Index (1982–84 = 100)
`gdppc`	Real GDP per capita (chained dollars)
`rec_prob`	Probability of recession (percent, 0–100)

Objective This problem set has four purposes:

Strengthen your understanding of time-series properties (trend, persistence, stationarity).
Practice dynamic regression modeling.
Learn to implement and interpret Newey–West standard errors using fixest.
Develop your judgment about which regression specifications are reasonable in macroeconomic time-series settings.

2 Setup

[01] Load your R packages and the dataset (data-ps2.csv). You will likely want tidyverse, here, and fixest.

[02] What time period do the data cover? How many quarters are in the sample?

Note that (1) each row represents a quarter and (2) the data are already sorted by date.

3 Time-series plots

[03] Create time-series plots for:

Unemployment rate (unemp_rate)
Real GDP per capita (gdppc)
CPI (cpi)
Probability of recession (rec_prob)

Use properly labeled axes and titles.

[04] For each variable, describe whether it appears to:

Trend upward or downward
Exhibit strong persistence (i.e., is strongly autocorrelated)
Exhibit any other notable features

Be specific. Do not just say “yes” or “no.”

[05] Based on the plots alone, which variables appear likely to be non-stationary in levels (in their means)? Explain your reasoning.

4 Transformations

Macroeconomic time series are often transformed before analysis.

[06] Create the following new variables:

GDP per capita growth, g_gdp: \(100\times\Delta\log(\texttt{gdppc})_t\)
This variable is often called the growth rate of GDP per capita, and it is approximately equal to the percent change in GDP per capita.
Inflation, infl: \(100\times\dfrac{\Delta\texttt{cpi}_t}{\texttt{cpi}_{t-1}}\)
This variable is the percent change in the CPI, which is a common measure of inflation (units are percent).
Change in unemployment, d_unemp: \(\Delta\texttt{unemp}_t\)
This variable is the change in the unemployment rate, which is often used to study short-run dynamics in the labor market. Its units are percentage points.

Important: Remember that \(\Delta x_t\) means the difference between \(x_t\) and its lag. So, \(\Delta x_t = x_t - x_{t-1}\) and \(\Delta\log(x_t) = \log(x_t) - \log(x_{t-1})\). It may also be helpful to remember that differences in logs equal the log of the ratio: \(\Delta\log(x_t) = \log(x_t) - \log(x_{t-1}) = \log\left(\frac{x_t}{x_{t-1}}\right)\).

Hint: You can use dplyr::mutate() to create new variables. You can use lag() to access lagged values. It may be helpful to create intermediate (lagged) variables before moving on to the final transformations.

[07] Plot these three transformed variables. How do their properties differ from the original series?

[08] For studying short-run macroeconomic relationships: would you recommend using levels (the untransformed variables) or changes/growth (the transformed variables)? Explain your answer carefully.

5 A static model

We begin with a simple static/contemporaneous model (using our transformed variables):

\[ \text{(Change in unemployment)}_t = \beta_0 + \beta_1 \text{(GDPpc growth)}_t + \beta_3 \text{Inflation}_t + u_t \]

[09] Estimate the regression above using OLS. Report your results.

[10] Interpret the intercept and the coefficients on GDP growth and inflation. Are the signs economically reasonable?

[11] Why might OLS standard errors be unreliable in this time-series setting?

6 Newey–West standard errors

Time-series disturbances are often autocorrelated.

[12] Re-estimate the regression in [09] using fixest and Newey–West standard errors. Report the results.

Hint: You will want to set vcov = 'NW' and panel.id = ~ country + time in feols(). The panel.id argument is required to use Newey-West standard errors (even though we do not really have a panel dataset). The panel ID tells fixest which variable defines a unit (here: country) and time (here: time).

[13] Do the standard errors meaningfully change? What does this suggest?

[14] Formally test for autocorrelation using the residuals from the OLS regression. What do you find? (I’m find with you testing whichever order autocorrelation you think is reasonable.)

Note: If you are going to add the residuals to the dataset, you will need to do one of the following to make sure you get the right number of observations:

use the na.rm = FALSE argument in residuals() function, e.g., residuals(model_static, na.rm = FALSE), which will return NA for the first observation (since the first observation of the dependent variable is a difference, it cannot be calculated for the first quarter);
directly add the NA to the residuals vector on your own, e.g., c(NA, residuals(model_static)).

Another note/hint: To regress a variable on its own lag, you can do one of the following:

Create a new variable that is the lag of the residuals, e.g., lag_e = lag(e), and then regress e on lag_e.
Use the l() function in fixest to create the lag directly in the regression, e.g., feols(e ~ l(e, 1), data = ps_df, panel.id = ~ country + time), which will regress the residuals on their own first lag. Notice that you need to define the panel.id argument in this case.
Use the lag function (from tidyverse) within lm, e.g., lm(e ~ lag(e, 1), data = ps_df).

7 A dynamic model

Macroeconomic relationships rarely occur instantaneously.

[15] Estimate a dynamic model that includes one lag of GDP growth and inflation, i.e.,

\[\begin{align*} \text{(Change in unemployment)}_t = \beta_0 &+ \beta_1 \text{(GDPpc growth)}_t + \beta_2 \text{Inflation}_t \\ &+ \beta_3 \text{(GDPpc growth)}_{t-1} + \beta_4 \text{Inflation}_{t-1} + u_t \end{align*}\]

Report your results using Newey-West standard errors. (You do not need to interpret the coefficients.)

[16] Are lagged terms statistically significant? What is the total effect of GDP growth? Is it meaningfully different from the contemporaneous effect alone?

[17] Compare the static model and distributed lag model. Which seems more appropriate? Why?

8 ADL model

Now we’ll consider the potential for unemployment changes to affect themselves over time (persistence). In other words, we will include a lagged dependent variable in the regression—an ADL(1,1) model:

[18] Estimate this model (and use Newey-West standard errors).

[19] Is the lagged dependent variable statistically significant? What does this result suggest about the nature of quarterly changes in the unemployment rate?

[20] How do the GDP and inflation coefficients change when including the lagged dependent variable? Why might that occur?

[21] How does OLS perform in this setting—i.e., for ADL(1,1) models? Make sure you explain how the presence of autocorrelation in the disturbance affects OLS here.

9 Back to levels?

[22] Suppose we had not transformed the data and instead estimated the following regression:

\[ \text{(Unemployment rate)}_t = \beta_0 + \beta_1 \text{(Real GDP per capita)}_t + \beta_2 \text{(CPI)}_t + \]

Would this regression be appropriate? Why or why not? Be explicit about non-stationarity and spurious regression.

[23] Run the regression above (using levels). How do the results compare to the regression using transformed variables? Do the results make economic sense?

[24] One approach to deal with trending variables (mean non-stationary variables) is to directly control for time (similar to trying to bring in the omitted variable). What happense to the regression above if we add a time trend? I.e., estimate \[ \text{(Unemployment rate)}_t = \beta_0 + \beta_1 \text{(Real GDP per capita)}_t + \beta_2 \text{(CPI)}_t + \beta_3 t + u_t \]

Hint: We already have a time variable in the dataset (time), so you can just include that variable in the regression.

10 Reflection

[25] If your goal were forecasting unemployment changes, which model would you choose and why? On the other hand, if your goal were causal inference about the effect of GDP growth on unemployment, which model would you choose and why?

11 Bonus

This is a bonus question. You do not have to answer it.

[26] Split the sample into two parts: (1) before the year 2000, and (2) from 2000 onward. Estimate the dynamic model from [15] separately for each subsample.

How do the results differ across the two subsamples? Does the relationship between GDP growth, inflation, and unemployment changes appear to have changed over time? Explain your findings.
Does the model fit better in one subsample than the other?
How could you incorporate such changes over time into a single regression model without splitting the sample (and test for such changes)?