Manski Bounds

Many times it is hard for a researcher to motivate a specific functional form (i.e. linear with constant slope) of treatment response. Sometimes all we have is the shape of response functions, like monotonicity. The assumption of monotone treatment response (MTR) reads as, Charles F. Manski and Pepper (2000):

MTR: For all units \(j \in J\) and all treatment pairs \((w_2, w_1)\),

\[\begin{equation} w_2\geq w_1\implies Y_j(w_2)\geq Y_j(w_1). \end{equation}\]

The MTR assumption permits each unit to have a different response, although, it requires monotonicity of all responses. This assumption may be combined with the monotone treatment selection (MTS) to provide sharp bounds on treatment effect.

MTS: For each \(w \in W\),

\[ \begin{equation} u_2\geq u_1\implies E[Y_j(w)|z=u_2]\geq E[Y_j(w)|z=u_1]. \end{equation} \]

where \(w\) denotes the potential treatment status, \(Y_j\) is the potential outcome for person \(j \in J\), the realized treatment status is denoted by \(z\in W\).

Consider the variation of wages with schooling example. The MTS assumption asserts that persons who select higher levels of schooling have weakly higher mean wage functions than do those who select lower levels of schooling. The MTR interpretation is that each person’s wage function is weakly increasing in conjectured years of schooling. Those interpretations are not mutually exclusive.

Upper bound

First we will derive results for binary treatment, following the MTR and MTS assumptions from equations (1) and (2) to get an intuition on how those bounds can be derived. When we impose the MTR assumption in (1), we get a zero lower bound on ATE, while the MTS assumption yields an upper bound that is equal to the naive difference of means between the two groups. Putting the two assumptions together we have that:

\[\begin{equation} 0\leq\text{ATE}\leq E[Y_j|z=w_2]-E[Y_j|z=w_1] \end{equation}\]

We begin with the proof that MTR assumption implies the zero upper bound.

Proposition: Given the MTR assumption on (1) the lower bound – LB – on the average treatment effect – ATE – is zero.

Proof: Suppose the treatment is binary with two levels, \(w_2\geq w_1\). The probability of being assigned to \(w_2\) is \(\pi\). Then the MTR assumption implies that \(Y_j(w_2)\geq Y_j(w_1)\), and we have the following two inequalities,

\(E[Y_j(w_2)|z=w_1]\geq E[Y_j|z=w_1]\) and \(E[Y_j|z=w_2]\geq E[Y_j(w_1)|z=w_2]\)

The ATE has the following observational-counterfactual decomposition,

\(E[Y_j(w_2)-Y_j(w1)]=\pi E[Y_j|w_2]-(1-\pi)E[Y_j|w_1]-\pi E[Y_j(w_1)|w_2]+(1-\pi)E[Y_j(w_2)|w_1]\)

Making use of the inequalities in the observational-counterfactual decomposition to obtain an upper bound for the ATE we have that:

\[\begin{align*} \text{ATE}=E[Y_j(w_2)-Y_j(w1)]&\geq\pi E[Y_j|w_2]-(1-\pi)E[Y_j|w_1]-\pi E[Y_j|w_2]+(1-\pi)E[Y_j|w_1]\\ &\geq 0 \end{align*}\]

\[\tag*{$\blacksquare$}\]

Now we prove that the MTS assumption implies an upper bound on the ATE equal to \(E[Y_j|w_2]-E[Y_j|w_1]\).

Proposition: Given the MTS assumption, the upper bound – UB – on the average treatment effect is equal to the difference in means between the treatment groups, \(\text{ATE}\leq E[Y_j|w_2]-E[Y_j|w_1]\).

Proof: Suppose once again the treatment is binary with two levels, \(w_2\geq w_1\). The probability of being assigned to \(w_2\) is \(\pi\). Then the MTS assumption in (2) implies that potential outcomes for treatment group at \(z=w_2\) are higher than in group with \(z=w_1\), and we have the two inequalities,

\(E[Y_j(w_1)|w_2]\geq E[Y_j|w_1]\) and \(E[Y_j|w_2]\geq E[Y_j(w_2)|w_1]\)

by the observational-counterfactual decomposition the ATE has a lower bound given by:

\[\begin{align*} \text{ATE}=E[Y_j(w_2)-Y_j(w1)]&=\pi E[Y_j|w_2]-(1-\pi)E[Y_j|w_1]+(1-\pi) E[Y_j(w_2)|w_1] -\pi E[Y_j(w_1)|w_2]\\ &\leq E[Y_j|w_2]-E[Y_j|w_1] \end{align*}\]

\[\tag*{$\blacksquare$}\]

Therefore, for a multi-level treatment we can expect similar bounds to occur, the lower bound will be zero, but the upper bound must take into account all combinations of treatment levels. Therefore, the upper bound can be computed by eq. 9.19 from Charles F. Manski (2009).

\[\begin{align} \Delta(s, t)\leq &\sum_{t^\prime>t}E(y|z=t^\prime)P(z=t^\prime)+E(y|z=t)P(z\leq t)\nonumber\\ &-\sum_{s^\prime < s}E(y|z=s^\prime)P(z=s^\prime)-E(y|z=s)P(z\geq s) \end{align}\]

for \(s, t \in W\) and \(t>s\).

An important fact is that the combined assumption is refutable. When both assumptions hold, the observable (sample counterpart) value \(E[Y_j|z=t]\) is weakly increasing in \(t\).

Causal Machine Learning

Link to the presentation: Causal Forests

References

Manski, Charles F. 2009. Identification for Prediction and Decision. Harvard University Press.
Manski, Charles F., and John V. Pepper. 2000. “Monotone Instrumental Variables: With an Application to the Returns to Schooling.” Econometrica 68 (4): 997–1010. http://www.jstor.org/stable/2999533.