Theoretical Framework

In this section we will briefly describe the theoretical model as in Costinot, Donaldson, and Komunjer (2012), which is based on Eaton and Kortum (2002). It consists of an economy with multiple-countries, multiple industries, and labor as the solely production factor. Intra-industry heterogeneity is allowed, as in Eaton and Kortum (2002), each good is available in many varieties.

Consider a world economy comprising \(i=1, \ldots, I\) countries and labor as the only factor of production. There are \(k=1, \ldots, K\) industries or goods. Labor is perfectly mobile across industries and immobile across countries. We denote by \(L_{i}\) and \(w_{i}\) the number of workers and the wage in country \(i\), respectively.

Technology. Each good \(k\) may come in an infinite number of varieties indexed by \(\omega \in \Omega\) We denote by \(z_{i}^{k}(\omega)\) the number of units of the \(\omega\) th variety of good \(k\) that can be produced with one unit of labor in country \(i\). Assume the following:

A1. For all countries \(i\), goods \(k\), and their varieties \(\omega\), \(z_{i}^{k}(\omega)\) is a random variable drawn independently for each triplet \((i, k, \omega)\) from a Fréchet distribution \(F_{i}^{k}(\cdot)\) such that \[ F_{i}^{k}(z)=\exp \left[-\left(z / z_{i}^{k}\right)^{-\theta}\right], \quad \text { for all } z \geq 0 \] where \(z_{i}^{k}>0\) and \(\theta>1\). Technological differences across countries and industries only depend on two parameters, \(z_{i}^{k}\) and \(\theta\). We refer to \(z_{i}^{k}\) as the fundamental productivity of country \(i\) in industry \(k\) while \(\theta\) is the intra-industry heterogeneity.

A2. For each unit of good \(k\) shipped from country \(i\) to country \(j\), only \(1 / d_{i j}^{k} \leq 1\) units arrive, with \(d_{i j}^{k}\) such that

\[ \begin{cases} d_{i i}^{k}=1 \text{, and}\\ d_{i l}^{k} \leq d_{i j}^{k} \cdot d_{j l}^{k} \text{, for any third country } l. \end{cases} \]

Market structure. Markets are assumed to be perfectly competitive. Together with constant returns to scale in production, perfect competition implies the following:

A3. In any country \(j\), the price \(p_{j}^{k}(\omega)\) paid by buyers of variety \(\omega\) of good \(k\) is \[ p_{j}^{k}(\omega)=\min _{1 \leq i \leq I}\left[c_{i j}^{k}(\omega)\right] \] where \(c_{i j}^{k}(\omega)=\left(d_{i j}^{k} \cdot w_{i}\right) / z_{i}^{k}(\omega)\) is the cost of producing and delivering one unit of this variety from country \(i\) to country \(j\).

Buyers in country \(j\) are “shopping around the world” for the best price available. In what follows, we let \(c_{i j}^{k}=\left(d_{i j}^{k} \cdot w_{i}\right) / z_{i}^{k}>0\).

Preferences. The upper tier utility function is Cobb-Douglas, while the lower tier is CES1. Accordingly, expenditures are such that:

A4. In any country \(j\), total expenditure on variety \(\omega\) of good \(k\) is \[ x_{j}^{k}(\omega)=\left[p_{j}^{k}(\omega) / p_{j}^{k}\right]^{1-\sigma_{j}^{k}} \cdot \alpha_{j}^{k} w_{j} L_{j} \]

where \(p_j^k\equiv[\sum_{\omega'\in\Omega} p_j^k(\omega)^{1-\sigma_j^k}]^{1/(1-\sigma_j^k)}\) is the price index within industry \(k\) of country \(j\). The country \(j\)’s consumer price index is defined as: \(p_j\equiv\Pi_{k=1}^K(p_j^k)^{\alpha_j^k}\).

Trade balance. Denote by \(x_{i j}^{k} \equiv \sum_{\omega \in \Omega_{i j}^{k}} x_{j}^{k}(\omega)\) the value of total exports from country \(i\) to country \(j\) in industry \(k\), where \(\Omega_{i j}^{k}\) is the set of varieties exported by country \(i\) to country \(j\) in industry \(k\). Similarly, denote by \(\pi_{i j}^{k} \equiv x_{i j}^{k} / \sum_{i^{\prime}=1}^{I} x_{i^{\prime} j}^{k}\) the share of exports from country \(i\) in country \(j\) and industry \(k\). The final assumption is the following:

A5. For any country \(i\), trade is balanced \[ \sum_{j=1}^{I} \sum_{k=1}^{K} \pi_{i j}^{k} \alpha_{j}^{k} \gamma_{j}=\gamma_{i}, \] where \(\gamma_{i} \equiv w_{i} L_{i} / \sum_{i^{\prime}=1}^{I} w_{i^{\prime}} L_{i^{\prime}}\) is the share of country \(i\) in world income.

Calibration

We start the calibration by choosing an appropriate value for \(\theta\). From Table 3 in Costinot, Donaldson, and Komunjer (2012) I selected column (3), \(\theta = 6.534\), where the method of estimation was an IV, to be used throughout this project.

Next, we need the trade flows, \(x_{ij}^{k}\) by source country \(i\), destination \(j\) and industry \(k\). We will read the wiot.fst file from last class and aggregate the flows.

library(data.table)
library(fst)

trade_flows <- read_fst("output/wiot.fst", as.data.table = TRUE)
trade_flows <- trade_flows[, by = c("out_ind", "out_country", "in_country"),
                           .(value = sum(value))]
head(trade_flows)
##    out_ind out_country in_country     value
## 1:       1         AUS        AUS 45599.088
## 2:       2         AUS        AUS  2085.468
## 3:       3         AUS        AUS  1297.064
## 4:       4         AUS        AUS 36547.446
## 5:       5         AUS        AUS 33248.842
## 6:       6         AUS        AUS  1593.366
#' Creates the log(1+) of value
trade_flows[, `:=`(
  log1_value = log(1 + value)
  )]

Estimating revealed productivities

We have to estimate the following equation:

\[ \begin{equation} \ln x_{ij}^k=\delta_{ij}+\delta_{j}^k+\delta_{i}^k+\varepsilon_{ij}^k \end{equation} \]

and then invert the fixed effect of exporter-industry to get the productivity \(z_i^k\). I extract the revealed productivity of country \(i\) in industry \(k\) as \(z_{i}^k=\exp(\delta_i^k / \theta)\), notice we have normalized all USA productivities and industry A01 Crop and animal production, hunting and related service activities in all other countries to 1, similar to what the original authors have done in Table 2.

Let’s create the dummies by ourselves, so they are named when fixef is called

trade_flows[, `:=`(
  delta_ij = paste0(out_country, "_", in_country),
  delta_jk = paste0(in_country, "_", out_ind),
  delta_ik = paste0(out_country, "_", out_ind)
)]

Run the regression of \(\log(1+x)\) (But NOT YOU! You should use the PML method to get full grade) in order to have a complete matrix of productivities. Rearrange the USA to be the first country such that the fixed effects are normalized by it. Fixed effects set to zero are the references (fixest documentation), you should check that your fixed-effects are correctly normalized.

library(fixest)
eq19 <- feols(
  log1_value~1|delta_ij+delta_jk+delta_ik,
  data = trade_flows[order(-out_country)])
# 
fes <- fixef(eq19)
## NOTE: The fixed-effects are not regular, they cannot be straightforwardly interpreted.
summary(fes)
## Fixed_effects coefficients
##                         delta_ij delta_jk delta_ik
## Number of fixed-effects     1936     2420     2420
## Number of references           0       44       98
## Mean                        2.29   -0.814    0.316
## Standard-deviation          1.99     1.58     1.52
## 
## COEFFICIENTS:
##   delta_ij: AUS_AUS AUS_AUT AUS_BEL AUS_BGR AUS_BRA                    
##               9.025   1.366   2.607   1.332   2.155 ... 1,931 remaining
## -----
##   delta_jk: AUS_1 AUS_10 AUS_11 AUS_12 AUS_13                    
##                 0  1.517  2.063  0.665 0.8964 ... 2,415 remaining
## -----
##   delta_ik: AUS_1 AUS_10 AUS_11 AUS_12  AUS_13                    
##                 0 -2.472 -1.493 -1.245 -0.6142 ... 2,415 remaining

Now, we extract the revealed productivity of country \(i\) in industry \(k\) as \(z_{i}^k=\exp(\delta_i^k / \theta)\)

theta <- 6.534
z_ik <- data.table(delta_ik = names(fes$delta_ik), fe = fes$delta_ik)
# Save data.table z_ik for python users
fwrite(z_ik, "output/z_ik.csv")

z_ik[, `:=`(
  z_ik = exp(fe / theta),
  out_country = sub("([[:alpha:]]+)_.*", "\\1", delta_ik),
  out_ind = as.integer(sub("[[:alpha:]]+_(.*)", "\\1", delta_ik)))]
#' Merge back into trade_flows as this is our main dataset. Both z_ik and z_jk
trade_flows <- merge(trade_flows, z_ik[, .(out_country, out_ind, z_ik)],
                     by = c("out_country", "out_ind"),
                     all.x = TRUE)
trade_flows <- merge(trade_flows, z_ik[, .(out_country, out_ind, z_ik)],
                     by.x = c("in_country", "out_ind"),
                     by.y = c("out_country", "out_ind"),
                     all.x = TRUE)
setnames(trade_flows, c("z_ik.x", "z_ik.y"), c("z_ik", "z_jk"))
#' Write trade_flows to fst file
write_fst(trade_flows, "output/trade_flows.fst")

Python code

import numpy as np
import pandas as pd

wiot = pd.read_csv("output/wiot.csv").drop(columns='Unnamed: 0')

wiod_sea = pd.read_excel("input/WIOD_SEA_Nov16.xlsx",
                         sheet_name="DATA",
                         engine="openpyxl")
employed = (wiod_sea
            .query("variable == 'EMP'")
            .groupby('country', as_index=False)[2014]
            .sum()
            .rename(columns={'country': 'out_country', 2014: 'employed_i'})
            )
row = pd.DataFrame({'out_country': ['ROW'],
                    'employed_i': [employed['employed_i'].sum()]})
employed = employed.append(row)

# Starting calibration
theta = 6.534

trade_flows = wiot.copy()  # Just to keep names the same as in R code
# Helps in cleaner code
ijk_cols = ['out_country', 'in_country', 'out_ind']
ij_cols = ['out_country', 'in_country']
ik_cols = ['out_country', 'out_ind']
jk_cols = ['in_country', 'out_ind']

trade_flows = (trade_flows
               .groupby(ijk_cols, as_index=False)[['value']]
               .sum()
               )
trade_flows['log1_value'] = np.log(trade_flows['value'] + 1)

# Run a fixed-effect regression to extract the revealed productivities
# Could not get it done in Python! 
#
# This takes 1 minute and crashes
# eq19 = smf.ols('log1_value ~ C(delta_ij)+C(delta_jk)+C(delta_ik)',
#                data=trade_flows).fit()

# # FixedEffect library
# mod = fixedeffect(data_df=trade_flows,
#                   dependent=['log1_value'],
#                   exog_x=[],
#                   category=['delta_ij', 'delta_jk', 'delta_ik'],
#                   cluster=['delta_ij', 'delta_jk', 'delta_ik']).fit()
# mod.summary()
# fes = getfe(mod, normalize=True) # Error

# # pyhdfe library
# ids = trade_flows[['delta_ij', 'delta_jk', 'delta_ik']]
# variables = trade_flows[['log1_value']]
# algorithm = pyhdfe.create(ids)
# residualized = algorithm.residualize(variables)
# residualized
# reg = sm.OLS(residualized, np.ones(len(residualized))).fit(cov_type='HC1')
# reg.summary() # Dead-end, cannot retrieve FEs

# SOLUTION: make the regression in R and load the fixed-effects back
z_ik = pd.read_csv("output/z_ik.csv")
z_ik['z_ik'] = np.exp(z_ik['fe']/theta)
z_ik['out_country'] = z_ik['delta_ik'].str.split('_').apply(lambda x: x[0])
z_ik['out_ind'] = (z_ik['delta_ik'].str.split('_').apply(lambda x: x[1])
                   .astype('int64'))
# Merge back into trade_flows as this is our main dataset. Both z_ik and z_jk
trade_flows = trade_flows.merge(z_ik[['out_country', 'out_ind', 'z_ik']],
                                how='left',
                                left_on=jk_cols,
                                right_on=ik_cols)
trade_flows.rename(columns={'z_ik': 'z_jk', 'out_country_x': 'out_country'},
                   inplace=True)
trade_flows.drop('out_country_y', axis=1, inplace=True)
trade_flows = trade_flows.merge(z_ik[['out_country', 'out_ind', 'z_ik']],
                                how='left',
                                on=ik_cols)

References

Costinot, Arnaud, Dave Donaldson, and Ivana Komunjer. 2012. “What Goods Do Countries Trade? A Quantitative Exploration of Ricardo’s Ideas.” The Review of Economic Studies 79 (2): 581–608.
Eaton, Jonathan, and Samuel Kortum. 2002. “Technology, Geography, and Trade.” Econometrica 70 (5): 1741–79.

  1. Demand comes from the usual Dixit-Stiglitz aggregator model. The basics of this model can be seen here.↩︎