International trade datasets
NBER International Trade and Geography Data - Redding
Downloading and looking into WIOD data
For the final course project we will use the WIOD table, (Timmer et al. 2015). This table contains bilateral trade flows and underlying data, covering 43 countries, and a model for the rest of the world for the period 2000-2014. Data for 56 sectors are classified according to the International Standard Industrial Classification revision 4 (ISIC Rev. 4).
Download the 2016 release of WIOD table from the link above.
Let’s import the R version into a folder called input
. Once
you have downloaded and unzip the contents, you will have a file called
WIOT2014_October16_ROW.RData
in your input
folder.
library(data.table)
library(fst)
load("input/WIOT2014_October16_ROW.RData")
setnames(
wiot,
c("IndustryCode", "IndustryDescription", "Country", "RNr"),
c("ind_code", "ind_name", "out_country", "out_ind")
)
wiot[sample(5), .(ind_code, out_country, out_ind, AUS1, BRA4)]
## ind_code out_country out_ind AUS1 BRA4
## 1: A02 AUS 2 83.02964 0.012315142
## 2: C10-C12 AUS 5 1590.84059 0.070672046
## 3: A01 AUS 1 12924.17969 0.018449419
## 4: B AUS 4 115.92985 24.198535555
## 5: A03 AUS 3 19.14773 0.009592908
id_vars <- c("out_country", "out_ind")
wiot <- melt(wiot[, -c("ind_code", "ind_name", "Year")],
id.vars = id_vars,
variable.name = "in_country",
variable.factor = FALSE
)
wiot[, `:=`(
in_ind = sub("[[:alpha:]]+(\\d+)", "\\1", in_country),
in_country = sub("([[:alpha:]]+)\\d+", "\\1", in_country)
)]
wiot[, in_ind := as.integer(in_ind)]
setcolorder(
wiot,
c("out_country", "out_ind", "in_country", "in_ind", "value")
)
wiot[sample(5), 1:5]
## out_country out_ind in_country in_ind value
## 1: AUS 2 AUS 1 83.02964
## 2: AUS 5 AUS 1 1590.84059
## 3: AUS 3 AUS 1 19.14773
## 4: AUS 4 AUS 1 115.92985
## 5: AUS 1 AUS 1 12924.17969
Let’s check what are the countries included in the dataset. We can
see from the code below we have 45 “coutries”, but only 43 of them are
really countries. ROW
stands for Rest Of the World, while
TOT
is a “Total” column that came with the raw data and we
should drop it.
## [1] "AUS" "AUT" "BEL" "BGR" "BRA" "CAN" "CHE" "CHN" "CYP" "CZE" "DEU" "DNK"
## [13] "ESP" "EST" "FIN" "FRA" "GBR" "GRC" "HRV" "HUN" "IDN" "IND" "IRL" "ITA"
## [25] "JPN" "KOR" "LTU" "LUX" "LVA" "MEX" "MLT" "NLD" "NOR" "POL" "PRT" "ROU"
## [37] "RUS" "SVK" "SVN" "SWE" "TUR" "TWN" "USA" "ROW" "TOT"
From previous data analysis (not shown here), we also know that industry #56 - Activities of extraterritorial organizations and bodies is comprised of all zeros and, all other industries with codes greater than 56 are related to the Total aggregation. We will drop all these observations.
#' Drop industry U (56) since most countries do not expend anything on this
wiot <- wiot[!(out_country %chin% c("TOT") |
in_country %chin% c("TOT") |
out_ind > 55 |
in_ind > 55)]
This is the cleaned dataset we shall work through this series of classes, save it to a file for posterior loading.
Now we must get socio-economic data from this link. This data is fully compatible with WIOD, but comes in Microsoft Excel format.
library(openxlsx)
wiod_sea <- as.data.table(
read.xlsx("input/WIOD_SEA_Nov16.xlsx", sheet = "DATA")
)
#' Employed population in thousands
employed <- wiod_sea[variable == "EMP",
by = "country",
.(employed = sum(`2014`))
]
setnames(
employed,
c("country", "employed"),
c("out_country", "employed_i")
)
#' ATTENTION: socio-economic data from WIOD does not include ROW! I will just
#' input the employed in ROW having the same value as the sum of all countries!
employed <- rbind(employed, list("ROW", employed[, sum(employed_i)]))
#' Write employed to fst file
write_fst(employed, "output/employed.fst")
What about Python?
import numpy as np
import pandas as pd
wiot = pd.read_csv("output/wiot.csv")
wiod_sea = pd.read_excel("input/WIOD_SEA_Nov16.xlsx", sheet_name = "DATA",
engine = "openpyxl")
employed = (wiod_sea
.query("variable == 'EMP'")
.groupby('country')[2014]
.sum()
.reset_index()
.rename(columns={'country': 'out_country', 2014: 'employed_i'})
)
row = pd.DataFrame({'out_country': ['ROW'],
'employed_i': [employed['employed_i'].sum()]})
employed = employed.append(row)
What about Matlab?
You are on your own!
Sorry 😣