class: inverse, center, middle <br> <br> <br> <h1 class="title-own">Difference-in-Differences<br/>using Mixed-Integer Programming Matching Approach</h1> <br> <br> .small[Magdalena Bennett <br>*McCombs School of Business, The University of Texas at Austin* ] <br> .small[AEFP 50th Conference, Washington DC<br>March 13th, 2025] --- # Diff-in-Diff as an identification strategy .center[  ] --- # Parallel trend assumption (PTA) .center[  ] --- # Estimate Average Treatment Effect on the Treated (ATT) .center[  ] --- # But what if the PTA doesn't hold? <br> <br> .pull-left[ .center[  ] ] --- # But what if the PTA doesn't hold? .box-6trans[We can potentially remove [part of] the bias by matching on <i>X<sup>s</sup><sub>it</sub>=X<sub>i</sub></i>] .pull-left[ .center[  ] ] .pull-right[ .center[  ] ] --- # This paper - Identify contexts when matching can recover causal estimates under **.darkorange[certain violations of the parallel trend assumption]**. - Overall <u>bias reduction</u> and increase in <u>robustness for sensitivity analysis</u>. - Use **.darkorange[mixed-integer programming matching (MIP)]** to balance covariates directly. -- <br/> <br/> .pull-left[ .box-6trans[**Simulations:**<br/>Different DGP scenarios] ] .pull-right[ .box-6trans[**Application:**<br/>School segregation & vouchers] ] --- background-position: 50% 50% class: left, bottom, inverse .big[ Let's set up the problem <br> <br> ] --- # DD Setup - Let `\(Y_{it}(z)\)` be the potential outcome for unit `\(i\)` in period `\(t\)` under treatment `\(z\)`. - Intervention implemented in `\(T_0\)` `\(\rightarrow\)` No units are treated in `\(t\leq T_0\)` -- - Difference-in-Differences (DD) focuses on ATT for `\(t>T_0\)`: `$$ATT(t) = E[Y_{it}(1) - Y_{it}(0)|Z=1]$$` --- # DD Setup - Let `\(Y_{it}(z)\)` be the potential outcome for unit `\(i\)` in period `\(t\)` under treatment `\(z\)`. - Intervention implemented in `\(T_0\)` `\(\rightarrow\)` No units are treated in `\(t\leq T_0\)` - Difference-in-Differences (DD) focuses on ATT for `\(t>T_0\)`: `$$ATT(t) = E[Y_{it}(1) - Y_{it}(0)|Z=1]$$` - Under the PTA: `$$\begin{align} \hat{\tau}^{DD} = &\color{#FFC857}{\overbrace{\color{black}{E[Y_{i1}|Z=1] - E[Y_{i1}|Z=0]}}^{\color{#FFC857}{\Delta_{post}}}} - \\ &\color{#CBB3BF}{\underbrace{\color{black}{(E[Y_{i0}|Z=1] - E[Y_{i0}|Z=0])}}_{\color{#CBB3BF}{\Delta_{pre}}}} \end{align}$$` --- # Bias in a DD setting Bias can be introduced to DD in different ways: -- 1) **.darkorange[Time-invariant covariates with time-varying effects]**: *Obs. Bias* - e.g. Effect of gender on salaries. -- 2) **.darkorange[Differential time-varying effects]**: *Obs. Diff. Bias* - e.g. Effect of race on salaries evolve differently over time by group. -- 3) **.darkorange[Observed or unobserved time-varying covariates]**: *Unobs. Bias* - e.g. Test scores --- # If the PTA holds... `$$\begin{array}{rcc} \overbrace{(\bar{\gamma}_1(X^1,t') - \bar{\gamma}_1(X^0,t')) - (\bar{\gamma}_1(X^1,t) - \bar{\gamma}_1(X^0,t))}^{Obs. Bias} +& \\ \underbrace{(\bar{\gamma}_2(X^1,t') - \bar{\gamma}_2(X^1,t))}_{Obs. Diff. Bias} + \underbrace{(\lambda_{t'1}-\lambda_{t'0}) - (\lambda_{t1} - \lambda_{t0})}_{Unobs. Bias}&= 0 \\ \end{array}$$` -- .small[ **.darkorange[One of the two]** conditions need to hold: 1) No effect or constant effect of `\(X\)` on `\(Y\)` over time: `\(\mathbb{E}[\gamma_1(X,t)] = \mathbb{E}[\gamma_1(X)]\)` 2) Equal distribution of observed covariates between groups: `\(X_i|Z=1 \overset{d}{=} X_i|Z=0\)` ] -- .small[ **.darkorange[in addition to]**: 3) No differential time effect of `\(X\)` on `\(Y\)` by treatment group: `\(\mathbb{E}[\gamma_2(X,t)] = 0\)` 4) No unobserved time-varying effects: `\(\lambda_{t1} = \lambda_{t0}\)` ] -- <br> .small[ .pull-left[ .box-6trans[**Cond. 2** can hold through **matching**] ]] -- .small[ .pull-right[ .box-6trans[**Cond. 3 and 4** can be tested with **sensitivity analysis**] ] ] --- # Sensitivity analysis for Diff-in-Diff - In an event study `\(\rightarrow\)` null effects prior to the intervention: .center[  ] --- # Honest approach to test pretrends - One main issue with the previous test `\(\rightarrow\)` **.darkorange[Underpowered]** -- - Rambachan & Roth (2023) propose **.darkorange[sensitivity bounds]** to allow pre-trends violations: - E.g. Violations in the post-intervention period can be _at most_ `\(M\)` times the max violation in the pre-intervention period. -- .center[ ] --- background-position: 50% 50% class: left, bottom, inverse .big[ Simulations <br> <br> ] --- # Different scenarios For linear and quadratic functions: <br> .box-1trans[S1: No interaction between X and t] .box-2trans[S2: Equal interaction between X and t] .box-3trans[S3: Differential interaction between X and t] .box-4trans[S4: S3 + Bias cancellation] -- <br> - For all scenarios, differential distribution of covariates `\(X\)` between groups --- #Parameters: .center[ Parameter | Value -------------------------------------|---------------------------------------------- Number of obs (N) | 1,000 `Pr(Z=1)` | 0.5 Time periods (T) | 8 Last pre-intervention period (T_0) | 4 Matching PS | Nearest neighbor (using calipers) MIP Matching tolerance | .01 SD Number of simulations | 1,000 ] - Estimate compared to sample ATT (_can be different for matching_) --- # S1 - No interaction between X and t .center[ ] --- # S2 - Equal interaction between X and t by treatment .center[ ] --- # S3 - Differential interaction between X and t by treatment .center[ ] --- # Why is this bias reduction important? - Example of S2 (Quadratic) with no true effect: .center[  ] --- # Why is this bias reduction important? - Even under modest bias, we would incorrectly reject the null 20% of the time. .center[ ] --- # Why is this bias reduction important? - Sensitivity analysis results are skewed by the magnitude of the bias. .center[ ] --- # S4: Bias cancellation .center[  ] --- background-position: 50% 50% class: left, bottom, inverse .big[ Application <br> <br> ] --- #Preferential Voucher Scheme in Chile - Universal **.darkorange[flat voucher]** scheme `\(\stackrel{\mathbf{2008}}{\mathbf{\longrightarrow}}\)` Universal + **.darkorange[preferential voucher]** scheme - Preferential voucher scheme: - Targeted to bottom 40% of vulnerable students - Additional 50% of voucher per student - Additional money for concentration of SEP students. -- <br/> .pull-left[ .center[ .box-6trans[**Students:**<br/>- Verify SEP status<br/>- Attend a SEP school] ] ] .pull-right[ .center[ .box-6trans[**Schools:**<br/>- Opt-into the policy<br/>- No selection, no fees<br/>- Resources ~ performance] ] ] --- #Before matching: Household income .pull-left[  ] .pull-right[  ] --- #Before matching: Average SIMCE .pull-left[  ] .pull-right[  ] --- # Matching + DD - **.darkorange[Prior to matching]**: No parallel pre-trend - **.darkorange[Different types of schools]**: - Schools that charge high co-payment fees. - Schools with low number of SEP student enrolled. - **.darkorange[MIP Matching]** using constant or "sticky" covariates: - Mean balance (0.025 SD): Enrollment, average yearly subsidy, number of voucher schools in county, charges add-on fees - Exact balance: Geographic province --- # Groups are balanced in specific characteristics .center[ ] --- # Matching in 16 out of 53 provinces .center[ ] --- # After matching: Household income .pull-left[  ] .pull-right[  ] --- #After matching: Average SIMCE .pull-left[  ] .pull-right[  ] --- #Results - **.darkorange[Matched schools]**: - More vulnerable and lower test scores than the population mean. -- - **.darkorange[9pp increase in the income gap]** between SEP and non-SEP schools in matched DD: - SEP schools attracted even more vulnerable students. - Non-SEP schools increased their average family income. -- - **.darkorange[No evidence of increase in SIMCE score]**: - Could be a longer-term outcome. -- - Findings in segregation are **.darkorange[moderately robust to hidden bias]** (Keele et al., 2019): - `\(\Gamma_c = 1.76\)` `\(\rightarrow\)` Unobserved confounder would have to change the probability of assignment from 50% vs 50% to 32.7% vs 67.3%. - Allows up to 70% of the maximum deviation in the pre-intervention period (*M = 0.7*) vs 50% without matching (Rambachan & Roth, 2023) --- # Potential reasons? - Increase in probability of becoming SEP in 2009 **.darkorange[jumps discontinuously at 60%]** of SEP student concentration in 2008 (4.7 pp; SE = 0.024) .center[ ] --- background-position: 50% 50% class: left, bottom, inverse .big[ Let's wrap it up <br> <br> ] --- # Conclusions and Next Steps .pull-left[ - Matching can be an important tool to address **.darkorange[violations in PTA]**. - **.darkorange[Bias reduction]** is very important for sensitivity analysis. - **.darkorange[Serial correlation]** also plays an important role: Don't match on random noise. - Next steps: Partial identification using time-varying covariates] .pull-right[ .center[ ] ] --- class: inverse, center, middle <br> <br> <br> <h1 class="title-own">Difference-in-Differences<br/>using Mixed-Integer Programming Matching Approach</h1> <br> <br> .small[Magdalena Bennett <br>*McCombs School of Business, The University of Texas at Austin* ] <br> .small[AEFP 50th Conference, Washington DC<br>March 13th, 2025] --- # Honest approach to test pretrends - One drawback of the previous method is that it can **.darkorange[overstate]** (or understate) the robustness of findings if the point estimate is biased. - Honest CIs depend on the **.darkorange[magnitude of the point estimate]** as well as the **.darkorange[pre-trend violations]**. -- <br> - Matching can **.darkorange[reduce the overall bias]** of the point estimate -- .center[  ] --- # How do we match? - Match on covariates or outcomes? Levels or trends? - Propensity score matching? Optimal matching? etc. -- This paper: - **.darkorange[Match on time-invariant covariates]** that could make groups behave differently. - Use distribution of covariates to match on a template. - Use of **.darkorange[Mixed-Integer Programming (MIP) Matching]** .small[(Zubizarreta, 2015; Bennett, Zubizarreta, & Vielma, 2020)]: - Balance covariates directly - Yield largest matched sample under balancing constraints (cardinality matching) - Works fast with large samples --- # Data Generating Processes .center[  ] --- # SEP adoption over time .center[  ]