Theoretical Framework
In this section we will briefly describe the theoretical model as in Costinot, Donaldson, and Komunjer (2012), which is based on Eaton and Kortum (2002). It consists of an economy with multiple-countries, multiple industries, and labor as the solely production factor. Intra-industry heterogeneity is allowed, as in Eaton and Kortum (2002), each good is available in many varieties.
Consider a world economy comprising \(i=1, \ldots, I\) countries and labor as the only factor of production. There are \(k=1, \ldots, K\) industries or goods. Labor is perfectly mobile across industries and immobile across countries. We denote by \(L_{i}\) and \(w_{i}\) the number of workers and the wage in country \(i\), respectively.
Technology. Each good \(k\) may come in an infinite number of varieties indexed by \(\omega \in \Omega\) We denote by \(z_{i}^{k}(\omega)\) the number of units of the \(\omega\) th variety of good \(k\) that can be produced with one unit of labor in country \(i\). Assume the following:
A1. For all countries \(i\), goods \(k\), and their varieties \(\omega\), \(z_{i}^{k}(\omega)\) is a random variable drawn independently for each triplet \((i, k, \omega)\) from a Fréchet distribution \(F_{i}^{k}(\cdot)\) such that \[ F_{i}^{k}(z)=\exp \left[-\left(z / z_{i}^{k}\right)^{-\theta}\right], \quad \text { for all } z \geq 0 \] where \(z_{i}^{k}>0\) and \(\theta>1\). Technological differences across countries and industries only depend on two parameters, \(z_{i}^{k}\) and \(\theta\). We refer to \(z_{i}^{k}\) as the fundamental productivity of country \(i\) in industry \(k\) while \(\theta\) is the intra-industry heterogeneity.
A2. For each unit of good \(k\) shipped from country \(i\) to country \(j\), only \(1 / d_{i j}^{k} \leq 1\) units arrive, with \(d_{i j}^{k}\) such that
\[ \begin{cases} d_{i i}^{k}=1 \text{, and}\\ d_{i l}^{k} \leq d_{i j}^{k} \cdot d_{j l}^{k} \text{, for any third country } l. \end{cases} \]
Market structure. Markets are assumed to be perfectly competitive. Together with constant returns to scale in production, perfect competition implies the following:
A3. In any country \(j\), the price \(p_{j}^{k}(\omega)\) paid by buyers of variety \(\omega\) of good \(k\) is \[ p_{j}^{k}(\omega)=\min _{1 \leq i \leq I}\left[c_{i j}^{k}(\omega)\right] \] where \(c_{i j}^{k}(\omega)=\left(d_{i j}^{k} \cdot w_{i}\right) / z_{i}^{k}(\omega)\) is the cost of producing and delivering one unit of this variety from country \(i\) to country \(j\).
Buyers in country \(j\) are “shopping around the world” for the best price available. In what follows, we let \(c_{i j}^{k}=\left(d_{i j}^{k} \cdot w_{i}\right) / z_{i}^{k}>0\).
Preferences. The upper tier utility function is Cobb-Douglas, while the lower tier is CES1. Accordingly, expenditures are such that:
A4. In any country \(j\), total expenditure on variety \(\omega\) of good \(k\) is \[ x_{j}^{k}(\omega)=\left[p_{j}^{k}(\omega) / p_{j}^{k}\right]^{1-\sigma_{j}^{k}} \cdot \alpha_{j}^{k} w_{j} L_{j} \]
where \(p_j^k\equiv[\sum_{\omega'\in\Omega} p_j^k(\omega)^{1-\sigma_j^k}]^{1/(1-\sigma_j^k)}\) is the price index within industry \(k\) of country \(j\). The country \(j\)’s consumer price index is defined as: \(p_j\equiv\Pi_{k=1}^K(p_j^k)^{\alpha_j^k}\).
Trade balance. Denote by \(x_{i j}^{k} \equiv \sum_{\omega \in \Omega_{i j}^{k}} x_{j}^{k}(\omega)\) the value of total exports from country \(i\) to country \(j\) in industry \(k\), where \(\Omega_{i j}^{k}\) is the set of varieties exported by country \(i\) to country \(j\) in industry \(k\). Similarly, denote by \(\pi_{i j}^{k} \equiv x_{i j}^{k} / \sum_{i^{\prime}=1}^{I} x_{i^{\prime} j}^{k}\) the share of exports from country \(i\) in country \(j\) and industry \(k\). The final assumption is the following:
A5. For any country \(i\), trade is balanced \[ \sum_{j=1}^{I} \sum_{k=1}^{K} \pi_{i j}^{k} \alpha_{j}^{k} \gamma_{j}=\gamma_{i}, \] where \(\gamma_{i} \equiv w_{i} L_{i} / \sum_{i^{\prime}=1}^{I} w_{i^{\prime}} L_{i^{\prime}}\) is the share of country \(i\) in world income.
Calibration
We start the calibration by choosing an appropriate value for \(\theta\). From Table 3 in Costinot, Donaldson, and Komunjer (2012) I selected column (3), \(\theta = 6.534\), where the method of estimation was an IV, to be used throughout this project.
Next, we need the trade flows, \(x_{ij}^{k}\) by source country \(i\), destination \(j\) and industry \(k\). We will read the wiot.fst
file from last class and aggregate the flows.
library(data.table)
library(fst)
trade_flows <- read_fst("output/wiot.fst", as.data.table = TRUE)
trade_flows <- trade_flows[, by = c("out_ind", "out_country", "in_country"),
.(value = sum(value))]
head(trade_flows)
## out_ind out_country in_country value
## 1: 1 AUS AUS 45599.088
## 2: 2 AUS AUS 2085.468
## 3: 3 AUS AUS 1297.064
## 4: 4 AUS AUS 36547.446
## 5: 5 AUS AUS 33248.842
## 6: 6 AUS AUS 1593.366
Estimating revealed productivities
We have to estimate the following equation:
\[ \begin{equation} \ln x_{ij}^k=\delta_{ij}+\delta_{j}^k+\delta_{i}^k+\varepsilon_{ij}^k \end{equation} \]
and then invert the fixed effect of exporter-industry to get the productivity \(z_i^k\). I extract the revealed productivity of country \(i\) in industry \(k\) as \(z_{i}^k=\exp(\delta_i^k / \theta)\), notice we have normalized all USA productivities and industry A01 Crop and animal production, hunting and related service activities in all other countries to 1, similar to what the original authors have done in Table 2.
Let’s create the dummies by ourselves, so they are named when
fixef
is called
trade_flows[, `:=`(
delta_ij = paste0(out_country, "_", in_country),
delta_jk = paste0(in_country, "_", out_ind),
delta_ik = paste0(out_country, "_", out_ind)
)]
Run the regression of \(\log(1+x)\)
(But NOT YOU! You should use the PML method to get full grade) in order
to have a complete matrix of productivities. Rearrange the USA to be the
first country such that the fixed effects are normalized by it. Fixed
effects set to zero are the references (fixest
documentation), you should check that your fixed-effects are
correctly normalized.
library(fixest)
eq19 <- feols(
log1_value~1|delta_ij+delta_jk+delta_ik,
data = trade_flows[order(-out_country)])
#
fes <- fixef(eq19)
## NOTE: The fixed-effects are not regular, they cannot be straightforwardly interpreted.
## Fixed_effects coefficients
## delta_ij delta_jk delta_ik
## Number of fixed-effects 1936 2420 2420
## Number of references 0 44 98
## Mean 2.29 -0.814 0.316
## Standard-deviation 1.99 1.58 1.52
##
## COEFFICIENTS:
## delta_ij: AUS_AUS AUS_AUT AUS_BEL AUS_BGR AUS_BRA
## 9.025 1.366 2.607 1.332 2.155 ... 1,931 remaining
## -----
## delta_jk: AUS_1 AUS_10 AUS_11 AUS_12 AUS_13
## 0 1.517 2.063 0.665 0.8964 ... 2,415 remaining
## -----
## delta_ik: AUS_1 AUS_10 AUS_11 AUS_12 AUS_13
## 0 -2.472 -1.493 -1.245 -0.6142 ... 2,415 remaining
Now, we extract the revealed productivity of country \(i\) in industry \(k\) as \(z_{i}^k=\exp(\delta_i^k / \theta)\)
theta <- 6.534
z_ik <- data.table(delta_ik = names(fes$delta_ik), fe = fes$delta_ik)
# Save data.table z_ik for python users
fwrite(z_ik, "output/z_ik.csv")
z_ik[, `:=`(
z_ik = exp(fe / theta),
out_country = sub("([[:alpha:]]+)_.*", "\\1", delta_ik),
out_ind = as.integer(sub("[[:alpha:]]+_(.*)", "\\1", delta_ik)))]
#' Merge back into trade_flows as this is our main dataset. Both z_ik and z_jk
trade_flows <- merge(trade_flows, z_ik[, .(out_country, out_ind, z_ik)],
by = c("out_country", "out_ind"),
all.x = TRUE)
trade_flows <- merge(trade_flows, z_ik[, .(out_country, out_ind, z_ik)],
by.x = c("in_country", "out_ind"),
by.y = c("out_country", "out_ind"),
all.x = TRUE)
setnames(trade_flows, c("z_ik.x", "z_ik.y"), c("z_ik", "z_jk"))
#' Write trade_flows to fst file
write_fst(trade_flows, "output/trade_flows.fst")
Python code
import numpy as np
import pandas as pd
wiot = pd.read_csv("output/wiot.csv").drop(columns='Unnamed: 0')
wiod_sea = pd.read_excel("input/WIOD_SEA_Nov16.xlsx",
sheet_name="DATA",
engine="openpyxl")
employed = (wiod_sea
.query("variable == 'EMP'")
.groupby('country', as_index=False)[2014]
.sum()
.rename(columns={'country': 'out_country', 2014: 'employed_i'})
)
row = pd.DataFrame({'out_country': ['ROW'],
'employed_i': [employed['employed_i'].sum()]})
employed = employed.append(row)
# Starting calibration
theta = 6.534
trade_flows = wiot.copy() # Just to keep names the same as in R code
# Helps in cleaner code
ijk_cols = ['out_country', 'in_country', 'out_ind']
ij_cols = ['out_country', 'in_country']
ik_cols = ['out_country', 'out_ind']
jk_cols = ['in_country', 'out_ind']
trade_flows = (trade_flows
.groupby(ijk_cols, as_index=False)[['value']]
.sum()
)
trade_flows['log1_value'] = np.log(trade_flows['value'] + 1)
# Run a fixed-effect regression to extract the revealed productivities
# Could not get it done in Python!
#
# This takes 1 minute and crashes
# eq19 = smf.ols('log1_value ~ C(delta_ij)+C(delta_jk)+C(delta_ik)',
# data=trade_flows).fit()
# # FixedEffect library
# mod = fixedeffect(data_df=trade_flows,
# dependent=['log1_value'],
# exog_x=[],
# category=['delta_ij', 'delta_jk', 'delta_ik'],
# cluster=['delta_ij', 'delta_jk', 'delta_ik']).fit()
# mod.summary()
# fes = getfe(mod, normalize=True) # Error
# # pyhdfe library
# ids = trade_flows[['delta_ij', 'delta_jk', 'delta_ik']]
# variables = trade_flows[['log1_value']]
# algorithm = pyhdfe.create(ids)
# residualized = algorithm.residualize(variables)
# residualized
# reg = sm.OLS(residualized, np.ones(len(residualized))).fit(cov_type='HC1')
# reg.summary() # Dead-end, cannot retrieve FEs
# SOLUTION: make the regression in R and load the fixed-effects back
z_ik = pd.read_csv("output/z_ik.csv")
z_ik['z_ik'] = np.exp(z_ik['fe']/theta)
z_ik['out_country'] = z_ik['delta_ik'].str.split('_').apply(lambda x: x[0])
z_ik['out_ind'] = (z_ik['delta_ik'].str.split('_').apply(lambda x: x[1])
.astype('int64'))
# Merge back into trade_flows as this is our main dataset. Both z_ik and z_jk
trade_flows = trade_flows.merge(z_ik[['out_country', 'out_ind', 'z_ik']],
how='left',
left_on=jk_cols,
right_on=ik_cols)
trade_flows.rename(columns={'z_ik': 'z_jk', 'out_country_x': 'out_country'},
inplace=True)
trade_flows.drop('out_country_y', axis=1, inplace=True)
trade_flows = trade_flows.merge(z_ik[['out_country', 'out_ind', 'z_ik']],
how='left',
on=ik_cols)