Due Upload your answer on Canvas before midnight on Tuesday, 03 March 2026.
Important You must submit your answers as an HTML or PDF file, built from an RMarkdown (.Rmd) or Quarto (.qmd) file. Do not submit the .Rmd or .qmd file. You will not receive credit for it.
If we ask you to create a figure or run a regression, then the figure or regression results should be in the document that you submit (not just the code—we want the actual figure or regression output with coefficients, standard errors, etc.).
Integrity If you are suspected of cheating, then you will receive a zero—for the assignment and possibly for the course.
README! The data for this problem set come from the Federal Reserve Bank of St. Louis FRED database. The dataset contains quarterly U.S. macroeconomic variables.
Strengthen your understanding of time-series properties (trend, persistence, stationarity).
Practice dynamic regression modeling.
Learn to implement and interpret Newey–West standard errors using fixest.
Develop your judgment about which regression specifications are reasonable in macroeconomic time-series settings.
2 Setup
[01] Load your R packages and the dataset (data-ps2.csv). You will likely want tidyverse, here, and fixest.
Answer I’m using read_csv() from the tidyverse package to load the data.
# Load packages using 'pacman'library(pacman)p_load(tidyverse, patchwork, scales, fixest, here)# Load the dataps_df =here('problem-sets', '002', 'data-ps2.csv') %>%read_csv()
Rows: 224 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): country
dbl (6): time, q, unemp_rate, cpi, gdppc, rec_prob
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
[02] What time period do the data cover? How many quarters are in the sample?
Note that (1) each row represents a quarter and (2) the data are already sorted by date.
Answer The data cover Q1 of 1970 to Q4 of 1970. There are 224 quarters in the sample.
# First observationps_df |>head(1)
# A tibble: 1 × 8
time date q country unemp_rate cpi gdppc rec_prob
<dbl> <date> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 1970-01-01 1 US 4.17 38.1 26.0 50.2
# Last observationps_df |>tail(1)
# A tibble: 1 × 8
time date q country unemp_rate cpi gdppc rec_prob
<dbl> <date> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 224 2025-10-01 4 US 4.45 326. 70.4 0.76
# Number of quartersnrow(ps_df)
[1] 224
3 Time-series plots
[03] Create time-series plots for:
Unemployment rate (unemp_rate)
Real GDP per capita (gdppc)
CPI (cpi)
Probability of recession (rec_prob)
Use properly labeled axes and titles.
Answer I’m using ggplot2 to create the time-series plots.
# Unemploymentue =ggplot(ps_df, aes(x = date, y = unemp_rate)) +geom_line(linewidth = .1) +geom_point(size =1) +labs(title ="Unemployment rate",x ="Date",y ="Percent" ) +scale_y_continuous(labels = scales::percent_format(scale =1)) +theme_minimal(base_family ='Fira Sans Condensed')# GDP per capitagdp =ggplot(ps_df, aes(x = date, y = gdppc)) +geom_line(linewidth = .1) +geom_point(size =1) +labs(title ="Real GDP per capita",x ="Date",y ="Chained dollars (thousands)" ) +scale_y_continuous(labels = scales::dollar_format(scale =1)) +theme_minimal(base_family ='Fira Sans Condensed')# CPIcpi =ggplot(ps_df, aes(x = date, y = cpi)) +geom_line(linewidth = .1) +geom_point(size =1) +labs(title ="Consumer Price Index",x ="Date",y ="1982–84 = 100" ) +theme_minimal(base_family ='Fira Sans Condensed')# Recession probabilityrec =ggplot(ps_df, aes(x = date, y = rec_prob)) +geom_line(linewidth = .1) +geom_point(size =1) +labs(title ="Probability of recession",x ="Date",y ="Percent" ) +scale_y_continuous(labels = scales::percent_format(scale =1)) +theme_minimal(base_family ='Fira Sans Condensed')# Combine the plotsue / gdp / cpi / rec
[04] For each variable, describe whether it appears to:
Trend upward or downward
Exhibit strong persistence (i.e., is strongly autocorrelated)
Exhibit any other notable features
Be specific. Do not just say “yes” or “no.”
Answer Here are my descriptions of the time-series properties of each variable:
Unemployment rate: The unemployment rate appears to be quite persistent, with periods of elevated unemployment during recessions. It does not exhibit a clear upward or downward trend over the entire sample.
Real GDP per capita: Real GDP per capita exhibits a strong upward trend over the sample period, reflecting economic growth. It also appears to be persistent, as growth tends to build on previous growth.
Consumer Price Index: The CPI also shows a clear upward trend, reflecting inflation over time. It appears to be persistent as well, with periods of more rapid inflation during certain decades (e.g., post-2020).
Probability of recession: The probability of recession does not exhibit a clear trend. It spikes during recessions, indicating that it is responsive to economic conditions, but it does not show strong persistence outside of those periods.
[05] Based on the plots alone, which variables appear likely to be non-stationary in levels (in their means)? Explain your reasoning.
Answer Based on the plots, real GDP per capita and the CPI appear likely to be non-stationary in levels, as they both exhibit strong upward trends over time. The unemployment rate and probability of recession do not show clear trends, so they may be stationary in levels.
4 Transformations
Macroeconomic time series are often transformed before analysis.
[06] Create the following new variables:
GDP per capita growth, g_gdp: \(100\times\Delta\log(\texttt{gdppc})_t\) This variable is often called the growth rate of GDP per capita, and it is approximately equal to the percent change in GDP per capita.
Inflation, infl: \(100\times\dfrac{\Delta\texttt{cpi}_t}{\texttt{cpi}_{t-1}}\) This variable is the percent change in the CPI, which is a common measure of inflation (units are percent).
Change in unemployment, d_unemp: \(\Delta\texttt{unemp}_t\) This variable is the change in the unemployment rate, which is often used to study short-run dynamics in the labor market. Its units are percentage points.
Important: Remember that \(\Delta x_t\) means the difference between \(x_t\) and its lag. So, \(\Delta x_t = x_t - x_{t-1}\) and \(\Delta\log(x_t) = \log(x_t) - \log(x_{t-1})\). It may also be helpful to remember that differences in logs equal the log of the ratio: \(\Delta\log(x_t) = \log(x_t) - \log(x_{t-1}) = \log\left(\frac{x_t}{x_{t-1}}\right)\).
Hint: You can use dplyr::mutate() to create new variables. You can use lag() to access lagged values. It may be helpful to create intermediate (lagged) variables before moving on to the final transformations.
Answer I’m using mutate() and lag() to create the new variables.
[07] Plot these three transformed variables. How do their properties differ from the original series?
Answer Here are the plots of the transformed variables:
# GDP growthgdp_growth =ggplot(ps_df, aes(x = date, y = g_gdp)) +geom_line(linewidth = .1) +geom_point(size =1) +labs(title ="GDP per capita growth",x ="Date",y ="Percent" ) +theme_minimal(base_family ='Fira Sans Condensed')# Inflationinflation =ggplot(ps_df, aes(x = date, y = infl)) +geom_line(linewidth = .1) +geom_point(size =1) +labs(title ="Inflation",x ="Date",y ="Percent" ) +theme_minimal(base_family ='Fira Sans Condensed')# Change in unemploymentd_unemp =ggplot(ps_df, aes(x = date, y = d_unemp)) +geom_line(linewidth = .1) +geom_point(size =1) +labs(title ="Change in unemployment",x ="Date",y ="Percentage points" ) +theme_minimal(base_family ='Fira Sans Condensed')# Combine the plotsgdp_growth / inflation / d_unemp
The transformed variables differ from the original series in that they do not exhibit strong trends in levels. The the three time-series fluctuate around their means, which suggests that the transformed variables may be more (mean) stationary than the original levels.
There is some evidence that the variance of the transformed variables may not be constant over time, especially for inflation.
[08] For studying short-run macroeconomic relationships: would you recommend using levels (the untransformed variables) or changes/growth (the transformed variables)? Explain your answer carefully.
Answer For studying short-run macroeconomic relationships, I would recommend using the transformed variables (changes/growth) rather than the levels. The levels of macroeconomic variables often exhibit strong trends—suggesting non-stationarity—which can lead to spurious regression results if not properly addressed. The transformed variables, which are more likely to be stationary, allow us to focus on the short-run dynamics and relationships between the variables without being confounded by long-term trends.
5 A static model
We begin with a simple static/contemporaneous model (using our transformed variables):
[10] Interpret the intercept and the coefficients on GDP growth and inflation. Are the signs economically reasonable?
Answer There are a few ways to interpret the coefficients on GDP growth and inflation.
The intercept, \(\beta_0\), represents the expected change in the unemployment rate when both GDP growth and inflation are zero. In this context, it may not have a meaningful economic interpretation, as one may think it is unlikely that both real GDPpc growth and inflation would be exactly zero. However, the estimate says we would expect the unemployment rate to increase by 0.302 percentage points when both GDP growth and inflation are zero.
In the most literal interpretation, the coefficient on GDP growth suggests that a one-percent increase in the (real) GPD per capita growth rate reduces the change in the unemployment rate by 0.568 percentage points. You could also say that a one-percent increase in GDP growth is associated with a 0.568 percentage point decrease in the unemployment rate. The coefficient is significantly different from zero at the five-percent level. Both interpretations suggest that higher GDP growth is associated with a decrease in the unemployment rate, which is economically reasonable.
For the coefficient on inflation, the interpretation is similar: a one-percentage point increase in inflation is associated with a 0.048 percentage point decrease in the unemployment rate. This result is less economically reasonable (but also not significantly different from zero).
[11] Why might OLS standard errors be unreliable in this time-series setting?
Answer If factors that affect unemployment changes persist over time, then the error term may be autocorrelated. If the errors are autocorrelated, then the OLS standard errors will be biased and inconsistent, leading to incorrect inference.
6 Newey–West standard errors
Time-series disturbances are often autocorrelated.
[12] Re-estimate the regression in [09] using fixest and Newey–West standard errors. Report the results.
Hint: You will want to set vcov = 'NW' and panel.id = ~ country + time in feols(). The panel.id argument is required to use Newey-West standard errors (even though we do not really have a panel dataset). The panel ID tells fixest which variable defines a unit (here: country) and time (here: time).
Answer I’m using fixest::feols() with Newey-West standard errors to re-estimate the regression.
model_static_nw =feols( d_unemp ~ g_gdp + infl,data = ps_df,vcov ='NW',panel.id =~ country + time )
NOTE: 1 observation removed because of NA values (LHS: 1, RHS: 1).
[13] Do the standard errors meaningfully change? What does this suggest?
Answer The Newey-West standard errors are substantially larger than the OLS standard errors; for GPD growth, the standard error is approximately five-times larger (though the coefficient is still statistically significant at the five-percent level).
This results suggests that that autocorrelation’s impact on (biasing) the standard errors is quite large in this setting.
[14] Formally test for autocorrelation using the residuals from the OLS regression. What do you find? (I’m find with you testing whichever order autocorrelation you think is reasonable.)
Note: If you are going to add the residuals to the dataset, you will need to do one of the following to make sure you get the right number of observations:
use the na.rm = FALSE argument in residuals() function, e.g., residuals(model_static, na.rm = FALSE), which will return NA for the first observation (since the first observation of the dependent variable is a difference, it cannot be calculated for the first quarter);
directly add the NA to the residuals vector on your own, e.g., c(NA, residuals(model_static)).
Another note/hint: To regress a variable on its own lag, you can do one of the following:
Create a new variable that is the lag of the residuals, e.g., lag_e = lag(e), and then regress e on lag_e.
Use the l() function in fixest to create the lag directly in the regression, e.g., feols(e ~ l(e, 1), data = ps_df, panel.id = ~ country + time), which will regress the residuals on their own first lag. Notice that you need to define the panel.id argument in this case.
Use the lag function (from tidyverse) within lm, e.g., lm(e ~ lag(e, 1), data = ps_df).
Answer Let’s test for first-order autocorrelation and then third-order autocorrelation.
# Get the residuals from the OLS regressionps_df$e =residuals(model_static, na.rm =FALSE)# Test for first-order autocorrelationtest_ar1 =feols(e ~l(e, 1), data = ps_df, vcov ='NW', panel.id =~ country + time)
NOTE: 2 observations removed because of NA values (LHS: 1, RHS: 2).
# Test for third-order autocorrelationtest_ar3 =feols(e ~l(e, 1) +l(e, 2) +l(e, 3), data = ps_df, vcov ='NW', panel.id =~ country + time)
NOTE: 4 observations removed because of NA values (LHS: 1, RHS: 4).
# p-values for the joint testspchisq(test_ar1$sq.cor *nrow(ps_df), df =1, lower.tail =FALSE)
Neither of these tests finds statistically significant evidence of autocorrelation in the residuals at the five-percent level—though the third-order test is marginally significant (significant at the ten-percent level).
[16] Are lagged terms statistically significant? What is the total effect of GDP growth? Is it meaningfully different from the contemporaneous effect alone?
Answer The lagged term for GDP growth is statistically significant at the five-percent level, while the lagged term for inflation is not statistically significant at the five-percent level.
The total effect of GDP growth is the sum of the contemporaneous and lagged effects: \(\beta_1 + \beta_2\). The estimated total effect is -0.663, which is meaningfully larger than the contemporaneous effect alone.
[17] Compare the static model and distributed lag model. Which seems more appropriate? Why?
Answer The distributed lag model seems more appropriate, as it allows for the possibility that changes in GDP growth and inflation may affect unemployment changes with a lag. The static model assumes that the effects of GDP growth and inflation on unemployment changes occur instantaneously, which may not be realistic in a macroeconomic context. The statistical significance of the lagged GDP growth term in the distributed lag model also suggests that including lags provides additional explanatory power.
8 ADL model
Now we’ll consider the potential for unemployment changes to affect themselves over time (persistence). In other words, we will include a lagged dependent variable in the regression—an ADL(1,1) model:
[19] Is the lagged dependent variable statistically significant? What does this result suggest about the nature of quarterly changes in the unemployment rate?
Answer The lagged dependent variable is not statistically significant at the five-percent level (or any conventional level). This result suggests that conditional on the contemporaneous and lagged values of GDP growth and inflation, changes in the unemployment rate do not exhibit strong persistence from one quarter to the next. It also suggests that we don’t need an ADL model.
[20] How do the GDP and inflation coefficients change when including the lagged dependent variable? Why might that occur?
Answer The coefficients on GDP growth and inflation (both contemporaneous and lags) change very little when including the lagged dependent variable. Using our omitted-variable bias intuition/formula, this lack of change suggests that (1) (changes in) lagged unemployment does not strongly correlate with the other regressors, and/or (2) changes in lagged unemployment do not have a strong effect on current changes in unemployment.
[21] How does OLS perform in this setting—i.e., for ADL(1,1) models? Make sure you explain how the presence of autocorrelation in the disturbance affects OLS here.
Answer Because we’re considering ADL(1,1) models, we know there is a lagged dependent variable. In cases with a lagged dependent variable, we cannot guarantee OLS is unbiased—regardless of whether the disturbance is autocorrelated or not. If the disturbance is autocorrelated, then OLS will be both biased and inconsistent. If the disturbance is not autocorrelated, then OLS is biased but consistent (assuming the other OLS assumptions hold).
9 Back to levels?
[22] Suppose we had not transformed the data and instead estimated the following regression:
\[
\text{(Unemployment rate)}_t = \beta_0 + \beta_1 \text{(Real GDP per capita)}_t + \beta_2 \text{(CPI)}_t +
\]
Would this regression be appropriate? Why or why not? Be explicit about non-stationarity and spurious regression.
Answer This regression would likely be inappropriate due to the non-stationarity of the levels of real GDP per capita and the CPI. Non-stationary variables can lead to spurious regression results, where the estimated relationships between the variables may appear statistically significant even when there is no true underlying relationship.
[23] Run the regression above (using levels). How do the results compare to the regression using transformed variables? Do the results make economic sense?
Answer
model_levels =feols( unemp_rate ~ gdppc + cpi,data = ps_df,vcov ='NW',panel.id =~ country + time )etable(model_static_nw, model_levels)
The regression using levels shows a strong negative relationship between real GDP per capita and the unemployment rate, which is consistent with economic theory. The relationship between CPI and the unemployment rate is positive—implying that higher prices (a weaker dollar) are associated with higher unemployment.
[24] One approach to deal with trending variables (mean non-stationary variables) is to directly control for time (similar to trying to bring in the omitted variable). What happense to the regression above if we add a time trend? I.e., estimate \[
\text{(Unemployment rate)}_t = \beta_0 + \beta_1 \text{(Real GDP per capita)}_t + \beta_2 \text{(CPI)}_t + \beta_3 t + u_t
\]
Hint: We already have a time variable in the dataset (time), so you can just include that variable in the regression.
Answer
model_levels_trend =feols( unemp_rate ~ gdppc + cpi + time,data = ps_df,vcov ='NW',panel.id =~ country + time )etable(model_static_nw, model_levels, model_levels_trend)
Adding a time trend to the regression changes the estimated coefficients on real GDP per capita and CPI. The coefficient on real GDP per capita increases substantially. The coefficient on CPI becomes smaller and is no longer significantly different from zero. This suggests that the time trend is capturing some of the variation in the unemployment rate that was previously attributed to real GDP per capita and CPI, which is consistent with the idea that the original regression was picking up spurious relationships due to non-stationarity.
10 Reflection
[25] If your goal were forecasting unemployment changes, which model would you choose and why? On the other hand, if your goal were causal inference about the effect of GDP growth on unemployment, which model would you choose and why?
Answer This question is pretty open-ended.
If my goal were forecasting unemployment changes, then I would focus on a model that includes lags of the regressors, as the contemporaneous values of GDP growth and inflation may not be available at the time that we need a forecast/prediction. I might still use a lag of the dependent variable if it increases model performance sufficiently. While OLS might be biased/inconsistent for estimating the coefficients in an ADL model, prediction performance may still be good (prediction doesn’t care about biased coefficient estimates). I would probably add some interactions and generally make the model specification a bit more flexible.
If my goal were causal inference about the effect of GDP growth on unemployment, I would not use a lagged dependent variable, as we know that can generate bias and inconsistency for OLS. I would probably still use lags of the other regressors because it appears that they are important/significant. In fact, if we omit the lagged versions of the variables, we can induce omitted variable bias—something we’re generally trying to avoid in a causal inference setting.
11 Bonus
This is a bonus question. You do not have to answer it.
[26] Split the sample into two parts: (1) before the year 2000, and (2) from 2000 onward. Estimate the dynamic model from [15] separately for each subsample.
How do the results differ across the two subsamples? Does the relationship between GDP growth, inflation, and unemployment changes appear to have changed over time? Explain your findings.
Does the model fit better in one subsample than the other?
How could you incorporate such changes over time into a single regression model without splitting the sample (and test for such changes)?
Answer I’m going to create an indicator variable for post-2000 years (post_2000). Then I will use that indicator variable and feols’s argument fsplit to estimate the model on the full data and the two subsets (pre- and post-2000).
# Create an indicator variable for pre- or post-2000ps_df = ps_df %>%mutate(post_2000 =year(date) >=2000)# Estimate the dynamic model on the full and the two subsamplesmodel_dynamic_split =feols( d_unemp ~ g_gdp +l(g_gdp, 1) + infl +l(infl, 1),data = ps_df,fsplit =~ post_2000,vcov ='NW',panel.id =~ country + time )
NOTE: 2 observations removed because of NA values (LHS: 1, RHS: 2).
The results differ across the two subsamples—and each differs from the full sample.
The contemporaneous effect of GDP growth on unemployment changes is much larger (in magnitude) in the post-2000 subsample than in the pre-2000 subsample. The lagged effect of GDP growth is slightly larger in the post-2000 subsample.
Only in the pre-2000 subsample is the contemporaneous effect of inflation on unemployment changes statistically significant. That said, the coefficient estimate is quite similar to those in the other samples.
Finally, in the post-2000 model explains a lot more of the variation in unemployment changes than in the pre-2000 model (R-squared of .82 vs. .60).
Combining all years into a single model hides some of the differences across time.