Loading packages, defining colors and using data

We first clear the workspace using rm(list = ls())and then include all packages we need. If a package is missing in your R distribution (which is quite likely initially), just use install.packages("package_name") with the respective package name to install it on your system. If you execute the code in the file install_packages.R, then all necessary packages will be installed into your R distribution. If the variable export_graphs is set to TRUE, then the graphs will be exported as pdf-files. In addition, we define a set of colors here to make graphs look more beautiful. Finally, we load the Penn World Table data and extract all data for 2019 from it.

rm(list = ls())
library(reshape2)
library(base)
library(ggplot2)
library(grid)
library(scales)
library(stringr)
library(tidyverse)
library(pwt10)

# should graphs be exported to pdf
export_pdf <- FALSE

# define some colors
mygreen <- "#00BA38"
myblue  <- "#619CFF"
myred   <- "#F8766D"

# load data and extract 2019 data
data("pwt10.01")
pwt_sub <- subset(pwt10.01, year=="2019")

Output per worker across countries

We first want to look at cross country statistics of important macroeconomic variables. Let us start with output per worker. Note that before creating a histogram plot, we calculate some quantiles of the distribution of output per worker using the R function quantile. Note further that the data contains a couple of missing values in output per worker (even in 2019). Hence, we restrict our sample to the available subset of values.

# calculate output per worker
pwt_sub$output_per_worker <- pwt_sub$rgdpe/pwt_sub$emp

# calculate deciles of the output-per-worker distribution
qo <- quantile(pwt_sub[!is.na(pwt_sub$output_per_worker), ]$output_per_worker, 
               probs = seq(.1, .9, by = .2), names = TRUE)

# generate plot
myplot <- ggplot(data = pwt_sub[!is.na(pwt_sub$output_per_worker), ]) + 
  geom_histogram(aes(x=output_per_worker), bins = 20, color=mygreen, fill=mygreen, alpha = 0.4, boundary = 0) +
  labs(x = expression(paste("GDP per worker ", Y[2019],"/", L[2019], " (in 2017 USD PPP)")),
       y = "Frequency") +
  coord_cartesian(xlim=c(0, 250000), ylim=c(0, 40)) + 
  theme_bw()

# print the plot
print(myplot)


There are a lot of differences in output per worker across countries. It seems that all values between 0 and 250000 are present in this graph. But how does this compare to the distribution of capital per worker?

# calculate output per worker
pwt_sub$capital_per_worker <- pwt_sub$rnna/pwt_sub$emp

# calculate deciles of the output-per-worker distribution
qk <- quantile(pwt_sub[!is.na(pwt_sub$capital_per_worker), ]$capital_per_worker, 
               probs = seq(.1, .9, by = .2), names=TRUE)

# generate plot
myplot <- ggplot(data = pwt_sub[!is.na(pwt_sub$capital_per_worker), ]) + 
  geom_histogram(aes(x=capital_per_worker), bins = 20, color=myblue, fill=myblue, alpha = 0.4, boundary = 0) +
  labs(x = expression(paste("Capital per worker ", K[2019],"/", L[2019], " (in 2017 USD PPP)")),
       y = "Frequency") +
  coord_cartesian(xlim=c(0, 1000000), ylim=c(0, 40)) + 
  theme_bw()

# print the plot
print(myplot)


The distribution of capital per worker is wider than the distribution of output per worker. From eyeballing we would get that the corresponding factor is about 4, i.e. the distribution spans from 0 to 1000000. We can be a bit more precise by looking at the quantiles of the two distributions.

sprintf("Distribution of GDP per worker:")
print(qo)
sprintf("Distribution of capital per worker:")
print(qk)
sprintf("Percentile ratios : Y/L = %5.3f, K/L = %5.3f", qo["90%"]/qo["10%"], qk["90%"]/qk["10%"])
## [1] "Distribution of GDP per worker:"
##        10%        30%        50%        70%        90% 
##   5732.407  17965.327  39693.631  64538.545 104359.221 
## [1] "Distribution of capital per worker:"
##       10%       30%       50%       70%       90% 
##  14980.33  62815.24 132389.67 275452.68 522416.23 
## [1] "Percentile ratios : Y/L = 18.205, K/L = 34.873"


When we compare that 90/10 ratios of the two distributions, i.e. the ratio between the 90th and the 10th percentile, we find them to be substantial. For GDP per worker the 90/10 ratio is around 18, where for capital per worker it is around 36, so twice as large.

A growth accounting exercise for the US

We now do a growth accounting exercise for the US. We therefore have to load the US dataset again and calculate the relevant statistics, i.e. GDP per worker, output per worker as well as the capital share.

# load data and extract US data
data("pwt10.01")
pwt_sub <- subset(pwt10.01, isocode=="USA")

# calculate output, capital per worker and capital share
pwt_sub$output_per_worker  <- pwt_sub$rgdpe/pwt_sub$emp
pwt_sub$capital_per_worker <- pwt_sub$rnna/pwt_sub$emp
pwt_sub$capital_share      <- 1 - pwt_sub$labsh


With the above variables at hand, we can calculate the year-by-year growth rate of the economy. Note the the function lead allows us to access the next periods level of output per worker, which is essential for calculating growth rates. Furthermore we can decompose the growth rate input growth coming from the increase in the capital stock as well as the Solow residual. Having derived the decomposition, we plot the growth rates as well as the decomposition over time.

pwt_sub$gy <- lead(pwt_sub$output_per_worker)/pwt_sub$output_per_worker - 1
pwt_sub$gk <- pwt_sub$capital_share*(lead(pwt_sub$capital_per_worker)/pwt_sub$capital_per_worker - 1)
pwt_sub$rt <- pwt_sub$gy - pwt_sub$gk

# now create a plot to show growth and its components
myplot <- ggplot(data = pwt_sub[!is.na(pwt_sub$gy), ]) + 
  geom_ribbon(aes(x=year, ymin=0, ymax=gk*100,    fill= "1gk", color="1gk") , alpha=0.4) +
  geom_ribbon(aes(x=year, ymin=gk*100, ymax=(gk+rt)*100, fill= "2rt", color="2rt")  , alpha=0.4) +
  geom_line(aes(x=year, y=gy*100), color="darkblue", linewidth=1) +
  coord_cartesian(xlim=c(1950, 2020), ylim=c(-3, 5)) + 
  scale_x_continuous(breaks=seq(1950, 2020, 20), expand=c(0, 0)) +
  labs(x = "Year t",
       y = "Growth Decomposition for GDP per Worker (in %)") +
  scale_fill_manual(breaks = c("1gk", "2rt"), name = "", 
                    labels = c("Growth in Capital per Worker", "Solow Residual"),
                    values = c(myblue, mygreen)) +
  scale_color_manual(breaks = c("1gk", "2rt"),
                     values = c(myblue, mygreen)) +
  guides(colour = "none") +
  theme_bw() + 
  theme(legend.position="bottom")

# print the plot
print(myplot)


By an large, we find the Solow residual to me the more important driving factor of economic growth. Movements in the capital stock only add modestly to economic performance. In a way, this is the same result we already discussed in the previous section using cross-country data.

A test for convergence (1970 to 2015)

To test the convergence hypothesis, we use all available data from the PWT for the years 1970 and 2015 and order the dataset by country and time. We then calculate output per capita in log terms. Next, we reshape the dataset such that every country just is one line and the 1970 and 2015 values stand next to each other. This makes it easier to calculate growth rates and create a graph. Once this is done, we exclude all countries that have missing data either for 1970 or for 2015 or both. This gives us a complete sample from which we can calculate the change in log-output per capita over the sample period.

# use 1970 and 2015 data for all available countries
data("pwt10.01")
pwt_sub <- subset(pwt10.01, year=="1970" | year=="2015")

# make sure you are in the right order
pwt_sub <- pwt_sub[order(pwt_sub$isocode, pwt_sub$year), ]

# calculate output per capita (not worker due to data availability)
pwt_sub$output_per_capita <- log(pwt_sub$rgdpe/pwt_sub$pop)

# reorganize dataset in wide form
data <- reshape(pwt_sub[c("year", "country", "isocode", "output_per_capita")], 
             idvar = c("country", "isocode"), timevar = "year", direction = "wide")

# drop NA cases
data <- data[!is.na(data$output_per_capita.1970) & !is.na(data$output_per_capita.2015), ]

# calculate growth in output per worker
data$gy <- data$output_per_capita.2015 - data$output_per_capita.1970


To test the extent of convergence, we regress growth in output per worker on the 1970 level of output per worker in each country.

# run regression
reg <- lm(gy ~ data$output_per_capita.1970, data)
summary(reg)
## 
## Call:
## lm(formula = gy ~ data$output_per_capita.1970, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1635 -0.4685  0.1045  0.4355  1.7887 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  2.12144    0.47956   4.424 1.82e-05 ***
## data$output_per_capita.1970 -0.13656    0.05659  -2.413    0.017 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.773 on 155 degrees of freedom
## Multiple R-squared:  0.03621,    Adjusted R-squared:  0.02999 
## F-statistic: 5.824 on 1 and 155 DF,  p-value: 0.01698


The results are quite clear: the point estimate is very small at around a value of \(-0.14\). For full convergence, the coefficient should be in the range of \(-1.00\). Furthermore, the \(R^2\) value is very small, meaning that our linear regression hardly explains the variance we find in the data. We can do a scatter plot to get a more complete picture of the data and analysis.

lab  <- paste("b = ", format(round(reg$coefficients[2], 2), nsmall=2), " (", format(round(summary(reg)$coefficients[2, 2], 2), nsmall=2), ")",
              " / R2 = ", format(round(summary(reg)$r.squared, 2), nsmall=2))

myplot <- ggplot(data = data) + 
  geom_hline(yintercept = 0, color="black", linewidth=0.5) + 
  geom_point(aes(x=output_per_capita.1970, y=gy), color="darkblue", fill="darkblue", size=1) +
  geom_smooth(aes(x=output_per_capita.1970, y=gy), method="lm", formula="y ~ x", se=FALSE, color=myred) +
  geom_label(aes(x = 13, y = 3, label = lab), 
             hjust = 1, vjust = 1, label.r = unit(0, "lines"), label.padding = unit(0.35, "lines")) +
  coord_cartesian(xlim=c(6, 13), ylim=c(-1.5, 3)) + 
  labs(x = "Level of output per capita 1970 (in logs)",
       y = "Growth in output per capita 1970-2015") +
  theme_bw()

# print the plot
print(myplot)
## Warning in geom_label(aes(x = 13, y = 3, label = lab), hjust = 1, vjust = 1, : All aesthetics have length 1, but the data has 157 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.


The scatter plot immediately shows that there is nearly no convergence going on the data. What is more, it seems that the shape of the regression line is majorly determined by three countries (outliers) that started with a very high level of GDP already in 1970. Investigating the data a bit further, we find that these countries are Brunei, the United Arab Emirates and Qatar, all major oil producers who probably follow somewhat different economic rules. Excluding them from the analysis paints an even clearer picture.

# now drop Brunei, United Arab Emirates and Qatar
data <- data[data$output_per_capita.1970 < 11, ]

# run regression
reg <- lm(gy ~ data$output_per_capita.1970, data)
summary(reg)
## 
## Call:
## lm(formula = gy ~ data$output_per_capita.1970, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.15619 -0.43498  0.04124  0.37153  1.82095 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                  1.46376    0.51932   2.819  0.00547 **
## data$output_per_capita.1970 -0.05515    0.06186  -0.892  0.37407   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7569 on 152 degrees of freedom
## Multiple R-squared:  0.005202,   Adjusted R-squared:  -0.001343 
## F-statistic: 0.7948 on 1 and 152 DF,  p-value: 0.3741
# generate scatter plot
lab  <- paste("b = ", format(round(reg$coefficients[2], 2), nsmall=2), " (", format(round(summary(reg)$coefficients[2, 2], 2), nsmall=2), ")",
              " / R2 = ", format(round(summary(reg)$r.squared, 2), nsmall=2))

myplot <- ggplot(data = data) + 
  geom_hline(yintercept = 0, color="black", linewidth=0.5) + 
  geom_point(aes(x=output_per_capita.1970, y=gy), color="darkblue", fill="darkblue", size=1) +
  geom_smooth(aes(x=output_per_capita.1970, y=gy), method="lm", formula="y ~ x", se=FALSE, color=myred) +
  geom_label(aes(x = 11, y = 3, label = lab), 
             hjust = 1, vjust = 1, label.r = unit(0, "lines"), label.padding = unit(0.35, "lines")) +
  coord_cartesian(xlim=c(6, 11), ylim=c(-1.5, 3)) + 
  labs(x = "Level of output per capita 1970 (in logs)",
       y = "Growth in output per capita 1970-2015") +
  theme_bw()

# print the plot
print(myplot)
## Warning in geom_label(aes(x = 11, y = 3, label = lab), hjust = 1, vjust = 1, : All aesthetics have length 1, but the data has 154 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

In this subset, the regression coefficient drops to virtually zero and becomes insignificant. The scatter plot now clearly shows that there seems to be no convergence going on in the data.