The aipwML
package computes causal effects using outcome and propensity score functions estimated using linear / logistic regression, regularised regression (fit with glmnet
), random forests (fit with ranger
), and gradient boosted trees (fit with xgboost
). It is written to be as modular and possible so that users can specify different choices for the outcome and propensity score models.
This writeup demonstrates the estimation functions using the Lalonde observational dataset where experimental controls were replaced with control units from the PSID, and standard estimators are badly biased for the experimental effect of \(\approx\) $1700.
data(lalonde.psid); df = lalonde.psid
= 're78'; w = 'treat'
y = setdiff(colnames(df), c(y, w))
x
# outcome model formula
= re78 ~ (age + education + black + hispanic + married + nodegree +
fo + re75 + u74 + u75)
re74 # pscore formula
= treat ~ (age + education + black + hispanic + married + nodegree +
fp + re75 + u74 + u75) re74
We have data \(\{ y_i, w_i, x_i \}_{i=1}^N \in \mathbb{R} \times \{0, 1\} \times \mathbb{R}^k\). Under selection on observables assumptions, we can compute the ATE by imputing the missing potential outcome.
\[ \hat{\tau}_{\text{reg}}^{\text{ATE}} = \frac{1}{N} \sum_{i=1}^N ( \hat{\mu}_1 (x_i) - \hat{\mu}_0 (x_i) ) \]
= c(
regadjusts ate_reg('ols', w = w, y = y, df = df, fml = fo),
ate_reg('lasso', w = w, y = y, df = df, fml = fo),
ate_reg('ridge', w = w, y = y, df = df, fml = fo),
ate_reg('rforest', w = w, y = y, df = df, fml = fo),
ate_reg('xgboost', w = w, y = y, df = df, fml = fo)
)|> round(3) regadjusts
## [1] -8746 -13739 -11427 -9377 -11623
pretty bad.
\[ \hat{\tau}_{\text{ipw}}^{\text{ATE}} = \frac{1}{N} \sum_{i=1}^N \frac{y_i (w_i - \hat{e}(x_i)) }{\hat{e}(x_i) (1 - \hat{e}(x_i)) } \]
= c(
ipws ate_ipw('logit', w = w, y = y, df = df, fml = fp),
ate_ipw('lasso', w = w, y = y, df = df, fml = fp),
ate_ipw('ridge', w = w, y = y, df = df, fml = fp),
ate_ipw('rforest', w = w, y = y, df = df, fml = fp),
ate_ipw('xgboost', w = w, y = y, df = df, fml = fp)
)
|> round(3) ipws
## [1] -10454 -16261 -17703 -19566 -19442
Still pretty bad. Now trim extreme pscores.
= c(0.05, 0.95)
psr = c(
ipws2 ate_ipw('logit', w = w, y = y, df = df, fml = fp, psrange = psr),
ate_ipw('lasso', w = w, y = y, df = df, fml = fp, psrange = psr),
ate_ipw('ridge', w = w, y = y, df = df, fml = fp, psrange = psr),
ate_ipw('rforest', w = w, y = y, df = df, fml = fp, psrange = psr),
ate_ipw('xgboost', w = w, y = y, df = df, fml = fp, psrange = psr)
)
|> round(3) ipws2
## [1] -1356 -1927 -1624 328 2971
Better.
\[ \hat{\tau}_{\mathrm{AIPW}}^{\text{ATE}} = \frac{1}{N} \sum_{i=1}^{N} \left[\left( \hat{m}_{1}\left(x_{i}\right)+\frac{w_{i}}{\hat{e}\left(x_{i}\right)} \left(y_{i}-\hat{m}_{1}\left(x_{i}\right)\right)\right) - \left(\hat{m}_{0}\left(x_{i}\right)+\frac{1-w_{i}}{1-\hat{e}\left(x_{i}\right)} \left(y_{i}-\hat{m}_{0}\left(x_{i}\right)\right)\right)\right] \]
Need to chose how to estimate \(e\) and \(m\), so we perform an exhaustive search. For each choice of outcome model, I try every other fitter for the pscore.
The double-robustness property comes from the fact that getting one of the two right yields consistency. From Stefan Wager’s notes:
The fit_me
function fits the outcome model and pscore model (cross-fit by default, wherein nuisance function models are trained on a partition of the data and predicted on a hold-out set, so as to avoid over-fitting). Output from this function can then be used to estimate the ATE using ate_aipw
or subset for further analysis (e.g. for CATEs).
= c(
ols_mean ate_aipw(fit_me(meanfn = 'ols', pscorefn = 'logit',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'ols', pscorefn = 'lasso',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'ols', pscorefn = 'ridge',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'ols', pscorefn = 'rforest',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'ols', pscorefn = 'xgboost',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr)
)
= c(
ridge_mean ate_aipw(fit_me(meanfn = 'ridge', pscorefn = 'logit',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'ridge', pscorefn = 'lasso',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'ridge', pscorefn = 'ridge',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'ridge', pscorefn = 'rforest',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'ridge', pscorefn = 'xgboost',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr)
)
= c(
lasso_mean ate_aipw(fit_me(meanfn = 'lasso', pscorefn = 'logit',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'lasso', pscorefn = 'lasso',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'lasso', pscorefn = 'ridge',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'lasso', pscorefn = 'rforest',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'lasso', pscorefn = 'xgboost',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr)
)
= c(
rforest_mean ate_aipw(fit_me(meanfn = 'rforest', pscorefn = 'logit',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'rforest', pscorefn = 'lasso',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'rforest', pscorefn = 'ridge',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'rforest', pscorefn = 'rforest',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'rforest', pscorefn = 'xgboost',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr)
)
= c(
xgboost_mean ate_aipw(fit_me(meanfn = 'xgboost', pscorefn = 'logit',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'xgboost', pscorefn = 'lasso',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'xgboost', pscorefn = 'ridge',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'xgboost', pscorefn = 'rforest',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr),
ate_aipw(fit_me(meanfn = 'xgboost', pscorefn = 'xgboost',
mean_fml = fo, psc_fml = fp, y = y, w = w, df = df), psrange = psr)
)
# stack estimates
= rbind(ols_mean, lasso_mean, ridge_mean, rforest_mean, xgboost_mean)
aipw_estimates colnames(aipw_estimates) = c('PS: logit', 'PS: lasso', 'PS: ridge',
'PS: rforest', 'PS: xgboost')
rownames(aipw_estimates)= c('Outcome: ols', 'Outcome: lasso', 'Outcome: ridge',
'Outcome: rforest', 'Outcome: xgboost')
|> kbl() %>%
aipw_estimates kable_styling()
PS: logit | PS: lasso | PS: ridge | PS: rforest | PS: xgboost | |
---|---|---|---|---|---|
Outcome: ols | 56.49 | 471.5 | -516.0 | 472.4 | 3227 |
Outcome: lasso | -236.96 | -228.7 | -1073.6 | 370.0 | 3217 |
Outcome: ridge | -187.52 | 466.4 | -606.6 | 488.9 | 3248 |
Outcome: rforest | -169.92 | -148.3 | -406.7 | 543.4 | 3206 |
Outcome: xgboost | -320.48 | -280.0 | -1029.0 | 387.3 | 3249 |
A relatively stable row or column in the above table suggests that we got one of the two nuisance functions ‘right’. In this case, it looks like the GBM pscore function yields stable estimates across all choices of outcome models.
the fit_me
functions fits the m
functions and e
function for each observation and returns a dataset that can then be used for manual calculations.
library(data.table)
= fit_me(meanfn = 'xgboost', pscorefn = 'xgboost',
fit_mod mean_fml = fo, psc_fml = fp, y = y, w = w, df = df)
setDT(fit_mod)
|> head() fit_mod
## y w m0 m1 ehat
## <num> <num> <num> <num> <num>
## 1: 9930.0 1 9017 9929.8 0.9657
## 2: 3595.9 1 11787 3593.9 1.0012
## 3: 24909.5 1 3748 24906.3 0.9887
## 4: 7506.1 1 8709 2685.8 1.0012
## 5: 289.8 1 2021 291.1 0.9965
## 6: 4056.5 1 6751 4060.6 0.9889
|> ate_aipw(c(0.1, 0.9)) |> round(3) fit_mod
## [1] 2893
library(boot); library(MASS)
<- function(data, ind){
boot.fn = data[ind, ]
d = fit_me(meanfn = 'ols', pscorefn = 'logit',
fit_mod mean_fml = fo, psc_fml = fp, y = y, w = w, df = d) |>
ate_aipw(c(0.1, 0.9))
}= boot(df, boot.fn, R = 100)
out |> print() out
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = df, statistic = boot.fn, R = 100)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* -65.82 105.8 1039
the ATT subsets to treated units and computes the average between the realised \(Y\) and imputed \(Y(0)\), which can be done easily with our estimates.
== 1, mean(y - m0)] |> round(3) fit_mod[w
## [1] 1796