We first clear the workspace using rm(list = ls())and
then include all packages we need. If a package is missing in your R
distribution (which is quite likely initially), just use
install.packages("package_name") with the respective
package name to install it on your system. If you execute the code in
the file install_packages.R, then all necessary packages
will be installed into your R distribution. If the variable
export_graphs is set to TRUE, then the graphs
will be exported as pdf-files. In addition, we define a set of colors
here to make graphs look more beautiful. Finally, we load the Penn World
Table data and extract all data for 2019 from it.
rm(list = ls())
library(reshape2)
library(base)
library(ggplot2)
library(grid)
library(scales)
library(stringr)
library(tidyverse)
library(pwt10)
# should graphs be exported to pdf
export_pdf <- FALSE
# define some colors
mygreen <- "#00BA38"
myblue <- "#619CFF"
myred <- "#F8766D"
# load data and extract 2019 data
data("pwt10.01")
pwt_sub <- subset(pwt10.01, year=="2019")
We first want to look at cross country statistics of important
macroeconomic variables. Let us start with output per worker. Note that
before creating a histogram plot, we calculate some quantiles of the
distribution of output per worker using the R function
quantile. Note further that the data contains a couple of
missing values in output per worker (even in 2019). Hence, we restrict
our sample to the available subset of values.
# calculate output per worker
pwt_sub$output_per_worker <- pwt_sub$rgdpe/pwt_sub$emp
# calculate deciles of the output-per-worker distribution
qo <- quantile(pwt_sub[!is.na(pwt_sub$output_per_worker), ]$output_per_worker,
probs = seq(.1, .9, by = .2), names = TRUE)
# generate plot
myplot <- ggplot(data = pwt_sub[!is.na(pwt_sub$output_per_worker), ]) +
geom_histogram(aes(x=output_per_worker), bins = 20, color=mygreen, fill=mygreen, alpha = 0.4, boundary = 0) +
labs(x = expression(paste("GDP per worker ", Y[2019],"/", L[2019], " (in 2017 USD PPP)")),
y = "Frequency") +
coord_cartesian(xlim=c(0, 250000), ylim=c(0, 40)) +
theme_bw()
# print the plot
print(myplot)
There are a lot of differences in output per worker across
countries. It seems that all values between 0 and 250000 are present in
this graph. But how does this compare to the distribution of capital per
worker?
# calculate output per worker
pwt_sub$capital_per_worker <- pwt_sub$rnna/pwt_sub$emp
# calculate deciles of the output-per-worker distribution
qk <- quantile(pwt_sub[!is.na(pwt_sub$capital_per_worker), ]$capital_per_worker,
probs = seq(.1, .9, by = .2), names=TRUE)
# generate plot
myplot <- ggplot(data = pwt_sub[!is.na(pwt_sub$capital_per_worker), ]) +
geom_histogram(aes(x=capital_per_worker), bins = 20, color=myblue, fill=myblue, alpha = 0.4, boundary = 0) +
labs(x = expression(paste("Capital per worker ", K[2019],"/", L[2019], " (in 2017 USD PPP)")),
y = "Frequency") +
coord_cartesian(xlim=c(0, 1000000), ylim=c(0, 40)) +
theme_bw()
# print the plot
print(myplot)
The distribution of capital per worker is wider than the
distribution of output per worker. From eyeballing we would get that the
corresponding factor is about 4, i.e. the distribution spans from 0 to
1000000. We can be a bit more precise by looking at the quantiles of the
two distributions.
sprintf("Distribution of GDP per worker:")
print(qo)
sprintf("Distribution of capital per worker:")
print(qk)
sprintf("Percentile ratios : Y/L = %5.3f, K/L = %5.3f", qo["90%"]/qo["10%"], qk["90%"]/qk["10%"])
## [1] "Distribution of GDP per worker:"
## 10% 30% 50% 70% 90%
## 5732.407 17965.327 39693.631 64538.545 104359.221
## [1] "Distribution of capital per worker:"
## 10% 30% 50% 70% 90%
## 14980.33 62815.24 132389.67 275452.68 522416.23
## [1] "Percentile ratios : Y/L = 18.205, K/L = 34.873"
When we compare that 90/10 ratios of the two distributions,
i.e. the ratio between the 90th and the 10th percentile, we find them to
be substantial. For GDP per worker the 90/10 ratio is around 18, where
for capital per worker it is around 36, so twice as large.
We now do a growth accounting exercise for the US. We therefore have to load the US dataset again and calculate the relevant statistics, i.e. GDP per worker, output per worker as well as the capital share.
# load data and extract US data
data("pwt10.01")
pwt_sub <- subset(pwt10.01, isocode=="USA")
# calculate output, capital per worker and capital share
pwt_sub$output_per_worker <- pwt_sub$rgdpe/pwt_sub$emp
pwt_sub$capital_per_worker <- pwt_sub$rnna/pwt_sub$emp
pwt_sub$capital_share <- 1 - pwt_sub$labsh
With the above variables at hand, we can calculate the
year-by-year growth rate of the economy. Note the the function
lead allows us to access the next periods level of output
per worker, which is essential for calculating growth rates. Furthermore
we can decompose the growth rate input growth coming from the increase
in the capital stock as well as the Solow residual. Having derived the
decomposition, we plot the growth rates as well as the decomposition
over time.
pwt_sub$gy <- lead(pwt_sub$output_per_worker)/pwt_sub$output_per_worker - 1
pwt_sub$gk <- pwt_sub$capital_share*(lead(pwt_sub$capital_per_worker)/pwt_sub$capital_per_worker - 1)
pwt_sub$rt <- pwt_sub$gy - pwt_sub$gk
# now create a plot to show growth and its components
myplot <- ggplot(data = pwt_sub[!is.na(pwt_sub$gy), ]) +
geom_ribbon(aes(x=year, ymin=0, ymax=gk*100, fill= "1gk", color="1gk") , alpha=0.4) +
geom_ribbon(aes(x=year, ymin=gk*100, ymax=(gk+rt)*100, fill= "2rt", color="2rt") , alpha=0.4) +
geom_line(aes(x=year, y=gy*100), color="darkblue", linewidth=1) +
coord_cartesian(xlim=c(1950, 2020), ylim=c(-3, 5)) +
scale_x_continuous(breaks=seq(1950, 2020, 20), expand=c(0, 0)) +
labs(x = "Year t",
y = "Growth Decomposition for GDP per Worker (in %)") +
scale_fill_manual(breaks = c("1gk", "2rt"), name = "",
labels = c("Growth in Capital per Worker", "Solow Residual"),
values = c(myblue, mygreen)) +
scale_color_manual(breaks = c("1gk", "2rt"),
values = c(myblue, mygreen)) +
guides(colour = "none") +
theme_bw() +
theme(legend.position="bottom")
# print the plot
print(myplot)
By an large, we find the Solow residual to me the more
important driving factor of economic growth. Movements in the capital
stock only add modestly to economic performance. In a way, this is the
same result we already discussed in the previous section using
cross-country data.
To test the convergence hypothesis, we use all available data from the PWT for the years 1970 and 2015 and order the dataset by country and time. We then calculate output per capita in log terms. Next, we reshape the dataset such that every country just is one line and the 1970 and 2015 values stand next to each other. This makes it easier to calculate growth rates and create a graph. Once this is done, we exclude all countries that have missing data either for 1970 or for 2015 or both. This gives us a complete sample from which we can calculate the change in log-output per capita over the sample period.
# use 1970 and 2015 data for all available countries
data("pwt10.01")
pwt_sub <- subset(pwt10.01, year=="1970" | year=="2015")
# make sure you are in the right order
pwt_sub <- pwt_sub[order(pwt_sub$isocode, pwt_sub$year), ]
# calculate output per capita (not worker due to data availability)
pwt_sub$output_per_capita <- log(pwt_sub$rgdpe/pwt_sub$pop)
# reorganize dataset in wide form
data <- reshape(pwt_sub[c("year", "country", "isocode", "output_per_capita")],
idvar = c("country", "isocode"), timevar = "year", direction = "wide")
# drop NA cases
data <- data[!is.na(data$output_per_capita.1970) & !is.na(data$output_per_capita.2015), ]
# calculate growth in output per worker
data$gy <- data$output_per_capita.2015 - data$output_per_capita.1970
To test the extent of convergence, we regress growth in output
per worker on the 1970 level of output per worker in each country.
# run regression
reg <- lm(gy ~ data$output_per_capita.1970, data)
summary(reg)
##
## Call:
## lm(formula = gy ~ data$output_per_capita.1970, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1635 -0.4685 0.1045 0.4355 1.7887
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.12144 0.47956 4.424 1.82e-05 ***
## data$output_per_capita.1970 -0.13656 0.05659 -2.413 0.017 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.773 on 155 degrees of freedom
## Multiple R-squared: 0.03621, Adjusted R-squared: 0.02999
## F-statistic: 5.824 on 1 and 155 DF, p-value: 0.01698
The results are quite clear: the point estimate is very small
at around a value of \(-0.14\). For
full convergence, the coefficient should be in the range of \(-1.00\). Furthermore, the \(R^2\) value is very small, meaning that our
linear regression hardly explains the variance we find in the data. We
can do a scatter plot to get a more complete picture of the data and
analysis.
lab <- paste("b = ", format(round(reg$coefficients[2], 2), nsmall=2), " (", format(round(summary(reg)$coefficients[2, 2], 2), nsmall=2), ")",
" / R2 = ", format(round(summary(reg)$r.squared, 2), nsmall=2))
myplot <- ggplot(data = data) +
geom_hline(yintercept = 0, color="black", linewidth=0.5) +
geom_point(aes(x=output_per_capita.1970, y=gy), color="darkblue", fill="darkblue", size=1) +
geom_smooth(aes(x=output_per_capita.1970, y=gy), method="lm", formula="y ~ x", se=FALSE, color=myred) +
geom_label(aes(x = 13, y = 3, label = lab),
hjust = 1, vjust = 1, label.r = unit(0, "lines"), label.padding = unit(0.35, "lines")) +
coord_cartesian(xlim=c(6, 13), ylim=c(-1.5, 3)) +
labs(x = "Level of output per capita 1970 (in logs)",
y = "Growth in output per capita 1970-2015") +
theme_bw()
# print the plot
print(myplot)
## Warning in geom_label(aes(x = 13, y = 3, label = lab), hjust = 1, vjust = 1, : All aesthetics have length 1, but the data has 157 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
The scatter plot immediately shows that there is nearly no
convergence going on the data. What is more, it seems that the shape of
the regression line is majorly determined by three countries (outliers)
that started with a very high level of GDP already in 1970.
Investigating the data a bit further, we find that these countries are
Brunei, the United Arab Emirates and Qatar, all major oil producers who
probably follow somewhat different economic rules. Excluding them from
the analysis paints an even clearer picture.
# now drop Brunei, United Arab Emirates and Qatar
data <- data[data$output_per_capita.1970 < 11, ]
# run regression
reg <- lm(gy ~ data$output_per_capita.1970, data)
summary(reg)
##
## Call:
## lm(formula = gy ~ data$output_per_capita.1970, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.15619 -0.43498 0.04124 0.37153 1.82095
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.46376 0.51932 2.819 0.00547 **
## data$output_per_capita.1970 -0.05515 0.06186 -0.892 0.37407
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7569 on 152 degrees of freedom
## Multiple R-squared: 0.005202, Adjusted R-squared: -0.001343
## F-statistic: 0.7948 on 1 and 152 DF, p-value: 0.3741
# generate scatter plot
lab <- paste("b = ", format(round(reg$coefficients[2], 2), nsmall=2), " (", format(round(summary(reg)$coefficients[2, 2], 2), nsmall=2), ")",
" / R2 = ", format(round(summary(reg)$r.squared, 2), nsmall=2))
myplot <- ggplot(data = data) +
geom_hline(yintercept = 0, color="black", linewidth=0.5) +
geom_point(aes(x=output_per_capita.1970, y=gy), color="darkblue", fill="darkblue", size=1) +
geom_smooth(aes(x=output_per_capita.1970, y=gy), method="lm", formula="y ~ x", se=FALSE, color=myred) +
geom_label(aes(x = 11, y = 3, label = lab),
hjust = 1, vjust = 1, label.r = unit(0, "lines"), label.padding = unit(0.35, "lines")) +
coord_cartesian(xlim=c(6, 11), ylim=c(-1.5, 3)) +
labs(x = "Level of output per capita 1970 (in logs)",
y = "Growth in output per capita 1970-2015") +
theme_bw()
# print the plot
print(myplot)
## Warning in geom_label(aes(x = 11, y = 3, label = lab), hjust = 1, vjust = 1, : All aesthetics have length 1, but the data has 154 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
In this subset, the regression coefficient drops to virtually zero and becomes insignificant. The scatter plot now clearly shows that there seems to be no convergence going on in the data.