We first clear the workspace using rm(list = ls())and
then include all packages we need. If a package is missing in your R
distribution (which is quite likely initially), just use
install.packages("package_name") with the respective
package name to install it on your system. If you execute the code in
the file install_packages.R, then all necessary packages
will be installed into your R distribution. If the variable
export_graphs is set to TRUE, then the graphs
will be exported as pdf-files. In addition, we define a set of colors
here to make graphs look more beautiful. Furthermore, we define some
axis breaks for graphs to look nicer.
rm(list = ls())
library(zoo)
library(xts)
library(dynlm)
library(reshape2)
library(base)
library(ggplot2)
library(grid)
library(scales)
library(stringr)
library(tidyverse)
library(fredr)
library(mFilter)
library(formattable)
# should graphs be exported to pdf
export_pdf <- FALSE
# define some colors
mygreen <- "#00BA38"
myblue <- "#619CFF"
myred <- "#F8766D"
# define plotting breaks
xbreaks <- c(seq(from = as.Date("1950-01-01"), to = as.Date("2020-01-01"),by = "10 years"))
We first want to be able to download time series data from the Frederal Reserve Economic Database
(FRED). This can be done using the fredr package that
we already included in the above library-statements. In order to be able
to download data from FRED, you have to go to https://fred.stlouisfed.org/docs/api/api_key.html.
Create an account and request an API key for downloading data. Then
insert your API key into the fredr_set_key code line.
We can download data from the FRED using the fredr
function. This function takes as input the ID of the series we want to
download. You can find this ID on the respective series website, in this
case here.. The series we want to download here is real quarterly
GDP. In addition to the series, we can specify the start and end dates
we want to use. If you don’t specify and of these dates,
fredr is going to use the maximum available time span. The
result of the call of fredr is a data frame
gdp_real with 5 columns, date,
series_id, value, realtime_start,
and realtime_end. For us, the most important columns are
date and value. Note that R
measures time and dates as numeric variables in days. The baseline day
is January 1st 1970, which has a numeric value of 0. Starting from this
point, the day scale continues in both positive and negative direction.
Finally, we take the value of real GDP, convert it into log-GDP, and
store it in the column log_gdp in the data frame.
#fredr_set_key("type-you-key")
# extract real GDP for the US (quarterly)
gdp_real <- fredr(series_id = "GDPC1",
observation_start = as.Date("1948-01-01"),
observation_end = as.Date("2023-12-31")
)
# calculate log gdp
gdp_real$log_gdp = log(gdp_real$value)
The first thing we want to do is plot log real GDP. After doing this, we can clearly see that there is a growth trend as well as cyclical movements around the trend.
# Plot GDP and estimated trends
myplot <- ggplot(data = gdp_real) +
geom_line(aes(x=date, y=log_gdp), color="darkblue", linewidth=1) +
labs(x = "Year t",
y = "Log of real GDP") +
scale_x_date(breaks = xbreaks, date_labels = "%Y", expand=c(0, 0)) +
theme_bw() +
theme(legend.position="bottom")
# print the plot
print(myplot)
Now we want to decompose our time series of log real GDP into a trend
and a cycle component \(y_t = g_t +
c_t\). To this end, we have to convert our data into a time
series object first. This can be done using, for example, the
xts function from the xts package. This
function takes as x variable the respective
log_gdp column from our data frame. In addition, we have to
specify the time dimension of the variable which is stored in
gdp_real$date. By wrapping the as.Date
function around this, we make sure that R recognizes this
as a date.
# create time series for log GDP
gdp_ts <- xts(x = gdp_real$log_gdp, order.by = as.Date(gdp_real$date))
The first trend component we use is a linear trend \(g_t = a + b\cdot t\). To estimate this
trend, we run a simple linear regression of gdp_ts on the
time index, which we can access through index(gdp_ts)). For
this to make sense to R, we convert the date stored in this
index to numeric variables with as.numeric, see the
discussion of the numeric representation of dates above. We can look at
the regression result using the summary statement. Finally,
we want to extract the linear trend from this regression. To do so, we
can call the predict function. This function takes as input
the regression result and gives us a vector of predicted linear trends
\(g_t = a + b\cdot t\). We store this
in the column g_lin of our data frame. The cyclical
component then just is the difference \(c_t =
y_t - g_t\).
res_lin <- lm(gdp_ts ~ as.numeric(index(gdp_ts)))
summary(res_lin)
##
## Call:
## lm(formula = gdp_ts ~ as.numeric(index(gdp_ts)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.20960 -0.06498 0.03582 0.06178 0.10501
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.523e+00 5.167e-03 1649.6 <2e-16 ***
## as.numeric(index(gdp_ts)) 8.400e-05 5.224e-07 160.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07299 on 302 degrees of freedom
## Multiple R-squared: 0.9885, Adjusted R-squared: 0.9884
## F-statistic: 2.586e+04 on 1 and 302 DF, p-value: < 2.2e-16
# calculate trend and cycle
gdp_real$g_lin <- predict(res_lin)
gdp_real$c_lin <- gdp_real$log_gdp - gdp_real$g_lin
Next to the linear trend, we want to calculate the filter
proposed by Hamilton (2018). To this end, we regress the 8 quarter ahead
lead of gdp_ts on the current and the past three quarters
of log real GDP. We can again summarize and predict the trend. Note
that, because of the 8 quarter lead difference and the fact that we need
four quarters for prediction, we lose the first 12 observations of the
time series.
# run regression with h = 8 and p = 4
h <- 8
res_ham <- lm(stats::lag(gdp_ts, k=-8) ~ stats::lag(gdp_ts, k=0) + stats::lag(gdp_ts, k=1) + stats::lag(gdp_ts, k=2) + stats::lag(gdp_ts, k=3))
summary(res_ham)
##
## Call:
## lm(formula = stats::lag(gdp_ts, k = -8) ~ stats::lag(gdp_ts,
## k = 0) + stats::lag(gdp_ts, k = 1) + stats::lag(gdp_ts, k = 2) +
## stats::lag(gdp_ts, k = 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.095566 -0.019086 0.006932 0.020286 0.088381
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.27410 0.02745 9.985 < 2e-16 ***
## stats::lag(gdp_ts, k = 0) 0.91386 0.17292 5.285 2.49e-07 ***
## stats::lag(gdp_ts, k = 1) -0.11190 0.25733 -0.435 0.664
## stats::lag(gdp_ts, k = 2) -0.08198 0.25728 -0.319 0.750
## stats::lag(gdp_ts, k = 3) 0.25683 0.17274 1.487 0.138
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03302 on 288 degrees of freedom
## (11 observations deleted due to missingness)
## Multiple R-squared: 0.9974, Adjusted R-squared: 0.9974
## F-statistic: 2.77e+04 on 4 and 288 DF, p-value: < 2.2e-16
# calculate trend and cycle
gdp_real$g_ham <- NA
gdp_real$c_ham <- NA
gdp_real$g_ham[12:length(gdp_real$log_gdp)] <- predict(res_ham)
gdp_real$c_ham[12:length(gdp_real$log_gdp)] = gdp_real$log_gdp[12:length(gdp_real$log_gdp)] - gdp_real$g_ham[12:length(gdp_real$log_gdp)]
Lastly, we apply the filter proposed by Hodrick and Prescott
(1997). The package mFilter contains a function
hpfilter that does the HP-filtering for us. We specify a
value of \(\lambda = 1600\) by setting
the variables freq and type accordingly. The
HP-filter function directly stores the trend and the cycle in two
columns of the resulting data frame res_hpf.
res_hpf <- hpfilter(gdp_ts, freq=1600, type="lambda")
summary(res_hpf)
##
## Title:
## Hodrick-Prescott Filter
##
## Call:
## hpfilter(x = gdp_ts, freq = 1600, type = "lambda")
##
## Method:
## hpfilter
##
## Filter Type:
## lambda
##
## Series:
## gdp_ts
##
## Descriptive Statistics:
##
## gdp_ts Trend Cycle
## Min. : 7.714 Min. : 7.694 Min. :-0.089297
## 1st Qu.: 8.489 1st Qu.: 8.478 1st Qu.:-0.008645
## Median : 9.072 Median : 9.063 Median : 0.001432
## Mean : 9.010 Mean : 9.010 Mean : 0.000000
## 3rd Qu.: 9.662 3rd Qu.: 9.660 3rd Qu.: 0.010194
## Max. :10.042 Max. :10.035 Max. : 0.037210
##
## In-sample error measures:
## ME MSE MAE MPE MAPE
## 5.307e-19 2.683e-04 1.219e-02 -4.571e-06 1.388e-03
# calculate trend and cycle
gdp_real$g_hpf <- res_hpf$trend
gdp_real$c_hpf <- res_hpf$cycle
Next we plot the time series of log real GDP as well as the three estimated trends. We can see clear differences between the three approaches. Because of the clear trend break in the GDP series, the linear trend performs the worse in filtering the GDP growth trend. But the Hamilton-filter has problems with quickly recognizing the trend break around 2010.
myplot <- ggplot(data = gdp_real) +
geom_line(aes(x=date, y=log_gdp), color="darkblue", linewidth=1) +
geom_line(aes(x=date, y=g_lin, color="linear"), linewidth=0.5) +
geom_line(aes(x=date, y=g_ham, color="hamilton"), linewidth=0.5) +
geom_line(aes(x=date, y=g_hpf, color="hpfilter"), linewidth=0.5) +
labs(x = "Year t",
y = "Log of real GDP") +
scale_color_manual(breaks = c("linear", "hamilton", "hpfilter"),
labels = c("Linear", "Hamilton", "HP-Filter"),
values = c(mygreen, myblue, myred), name='') +
scale_x_date(breaks = xbreaks, date_labels = "%Y", expand=c(0, 0)) +
theme_bw() +
theme(legend.position="bottom")
# print the plot
print(myplot)
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).
Now we want to take a look at the estimated cyclical component of
GDP. To see how the cyclicality of GDP coincides with the official
recessions of the United States, we also extract the official NBER
recession dates in the series USRECQP in FRED. This series
gives us a sequence of \(0\)s and \(1\)s. It takes a value of \(1\) every time the economy is in recession.
By calculating the difference between two successive values of the
series, we are able to determine the start and end dates of the
recessions in the US, which we store in r_start and
r_end. We create a separate data frame from these two
series.
# extract NBER recession indicator (including peak quarter)
recession <- fredr(series_id = "USRECQP",
observation_start = as.Date("1948-01-01"),
observation_end = as.Date("2023-12-31")
)
# add quarter numbers to recession dataset
recession$quarter <- c(1:length(recession$date))
# generate differences and trace out recession dates
recession$diff <- 0
recession$diff[2:length(recession$diff)] = diff(recession$value)
# date at beginning and end of recessions
rstart <- recession$date[recession$diff == 1]
rend <- recession$date[recession$diff == -1]
rec_df <- data.frame(rstart,rend)
Now we can generate a plot of the estimated cyclical components of
GDP together with the official recessions in the US. We indicate the
recession dates as gray areas using the geom_rect function.
This function creates a shaded are between a minimum and maximum value.
We use the recession start and end dates stored in the data frame
rec_df to draw these areas.
# set minimum and maximum
ymin <- -0.25
ymax <- 0.15
dy <- (abs(ymin)+abs(ymax))/2
# plot cycle
myplot <- ggplot(data = gdp_real) +
geom_rect(data=rec_df, aes(xmin=rstart, xmax=rend, ymin=ymin-0.05*dy, ymax=ymax+0.05*dy), fill='gray', alpha=0.5) +
geom_hline(yintercept=0, color="gray", linewidth=0.5) +
geom_line(aes(x=date, y=c_lin, color="linear"), linewidth=0.5) +
geom_line(aes(x=date, y=c_ham, color="hamilton"), linewidth=0.5) +
geom_line(aes(x=date, y=c_hpf, color="hpfilter"), linewidth=0.5) +
scale_x_date(breaks = xbreaks, date_labels = "%Y", expand=c(0, 0)) +
scale_y_continuous(breaks=seq(ymin, ymax, 0.05), expand=c(0, 0)) +
labs(x = "Year t",
y = "Log of real GDP (cyclical component)") +
scale_color_manual(breaks = c("linear", "hamilton", "hpfilter"),
labels = c("Linear", "Hamilton", "HP-Filter"),
values = c(mygreen, myblue, myred), name='') +
theme_bw() +
theme(legend.position="bottom")
# print the plot
print(myplot)
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).
The following graph only depicts the NBER recession dates as well as the cyclical component of GDP estimated from the HP filter.
# set minimum and maximum
ymin <- -0.1
ymax <- 0.05
dy <- (abs(ymin)+abs(ymax))/2
# plot cycle
myplot <- ggplot(data = gdp_real) +
geom_rect(data=rec_df, aes(xmin=rstart, xmax=rend, ymin=ymin-0.05*dy, ymax=ymax+0.05*dy), fill='gray', alpha=0.5) +
geom_hline(yintercept=0, color="gray", linewidth=0.5) +
geom_line(aes(x=date, y=c_hpf), color=myred, linewidth=0.5) +
scale_x_date(breaks = xbreaks, date_labels = "%Y", expand=c(0, 0)) +
scale_y_continuous(breaks=seq(ymin, ymax, 0.025), expand=c(0, 0)) +
labs(x = "Year t",
y = "Log of real GDP (cyclical component)") +
theme_bw()
# print the plot
print(myplot)
As a last piece of information, we summarize the recessions in the US
in a table. We therefore determine the recession starting quarters as
well as the recession duration from the recession data
frame. In addition, we calculate the decline in the cyclical component
of log real GDP in each of the recessions from the gdp_real
data frame. We store all this data in a new data frame
rec_data. The function formattable from the
corresponding library with the same name can generate a nice table from
this.
# calculate NBER dates recession statistics
rec_data <- data.frame("Starting Quarter"=recession$date[recession$diff == 1])
names(rec_data)[1] <- "Starting Quarter"
rec_data <- cbind(rec_data, "Number of Quarters" =
(recession$quarter[recession$diff == -1]-recession$quarter[recession$diff == 1]))
rec_data <- cbind(rec_data, "Log-point decline in cyclical component (in %)" =
percent(gdp_real$c_hpf[recession$diff == -1]-gdp_real$c_hpf[recession$diff == 1]))
# output table
formattable(rec_data,
align= c("c", "c", "c"))
| Starting Quarter | Number of Quarters | Log-point decline in cyclical component (in %) |
|---|---|---|
| 1948-10-01 | 4 | -6.62% |
| 1953-04-01 | 4 | -5.96% |
| 1957-07-01 | 3 | -4.96% |
| 1960-04-01 | 3 | -2.99% |
| 1969-10-01 | 4 | -3.11% |
| 1973-10-01 | 5 | -6.60% |
| 1980-01-01 | 2 | -3.26% |
| 1981-07-01 | 5 | -5.74% |
| 1990-07-01 | 2 | -2.52% |
| 2001-01-01 | 3 | -1.61% |
| 2007-10-01 | 6 | -5.10% |
| 2019-10-01 | 2 | -10.74% |