class: center, middle, inverse, title-slide .title[ # 12 - Prediction Policy Problems ] .subtitle[ ## ml4econ, HUJI 2025 ] .author[ ### Itamar Caspi ] .date[ ### June 29, 2025 (updated: 2025-06-29) ] --- # Outline - [Prediction Policy Problems](#prediction) - [Prediction and Fairness](#fairness) --- class: title-slide-section-blue, center, middle name: prediction # Prediction Policy Problems --- # Motivation Consider the following toy example from Kleinberg, Ludwig, Mullainathan, and Obermeyer (AER 2015): - `\(Y=\{\text{rain},\text{no rain}\}\)` - `\(X\)` atmospheric conditions - `\(D\)` is a binary policy decision - `\(\Pi(Y,D)\)` payoff (utility) The change in payoff resulting from a policy decision is given by `$$\underbrace{\frac{d\Pi(Y,D)}{dD}}_{\text{total effect of the decision}} \;=\; \underbrace{\left.\frac{\partial \Pi}{\partial D}\right|_{Y}} _{\substack{\text{direct / policy term}\\(\text{needs a *prediction* of }Y)}} \;+\; \underbrace{\frac{\partial \Pi}{\partial Y}\; \frac{\partial Y}{\partial D}} _{\substack{\text{indirect term}\\(\text{needs the *causal* effect of }D\text{ on }Y)}}$$` --- # Prediction-Policy Problems `$$\frac{d\Pi}{dD} = \underbrace{\left.\frac{\partial \Pi}{\partial D}\right|_{Y=\hat Y}}_{\text{direct (prediction)}} + \underbrace{\frac{\partial \Pi}{\partial Y}\, \frac{\partial Y}{\partial D}}_{\text{indirect (causal)}}$$` **Interpretation** * The *direct* term asks: *If the outcome `\(Y\)` were frozen at its forecast `\(\hat Y\)`, what is the marginal payoff of changing `\(D\)`?* – needs accurate **prediction**, no causal ID. * The *indirect* term multiplies how much an extra unit of `\(Y\)` is worth `\(\bigl(\partial \Pi/\partial Y\bigr)\)` by how `\(D\)` *causes* `\(Y\)` to move `\(\bigl(\partial Y/\partial D\bigr)\)`. – needs **causal** evidence. --- # Edge Cases vs. Reality `$$\frac{d\Pi}{dD}=\color{steelblue}{\underbrace{\left.\frac{\partial \Pi}{\partial D}\right|_{Y=\hat Y}}_{\text{direct (prediction)}}}+\color{darkorange}{\underbrace{\frac{\partial \Pi}{\partial Y}\,\frac{\partial Y}{\partial D}}_{\text{indirect (causal)}}}$$` * **Pure prediction** `\(∂Y/∂D = 0\)` – decision cannot move the outcome – *umbrella choice* (your umbrella won’t change the weather) * **Pure causation** `\((∂Π/∂D)|_Y = 0\)` – action matters *only* via its effect on Y – *vaccine dose* (ignoring side‑effects) * **Most real decisions**: both terms ≠ 0 → need **good forecasts** *and* **causal evidence** --- # Rain dance vs. umbrella <img src="figs/rain.png" width="70%" style="display: block; margin: auto;" /> --- # Prediction-Policy Problems In the past two lectures we focused on assessing policy with causal inference and treatment effects. Some decisions, however, depend *only* on prediction (Kleinberg, Ludwig, Mullainathan & Obermeyer 2015): - Which applicants will be **most effective teachers?** (hiring & promotion) - **How long will a worker stay unemployed?** (setting optimal savings) - **Which restaurants are likeliest to violate health codes?** (targeting inspections) - **Which youths face the highest risk of re-offending?** (allocating interventions) - **How credit-worthy is a loan applicant?** (approval decisions) When the action cannot influence the outcome, causal methods add no value—the planner’s job is to forecast as accurately as possible. --- # Real-world prediction-policy problem: Joint replacement - 750,000 hip or knee replacements are performed in the United States each year. - Benefits: substantial gains in mobility and pain relief. - Costs: about $15,000 per procedure plus a painful, months-long recovery. >**Working assumption:** the payoff, `\(\Pi\)`, from surgery rises with postoperative longevity, `\(Y\)`. >**Key Question:** Using only information available before the operation, can we forecast which surgeries will be futile and redirect those resources? --- # Joint replacement DAG <img src="figs/joint.png" width="30%" style="display: block; margin: auto;" /> _Note_: Kleinberg et al. (2015) abstract from surgical risk. --- # Data - A 20% sample of 7.4 million Medicare beneficiaries was analyzed; 98,090 individuals (1.3%) had a claim for joint replacement in 2010. - Among these patients, 1.4% died within one month of surgery—likely due to complications—and an additional 4.2% died within the subsequent 1–12 months. - The average mortality rate is approximately 5%. On average, surgeries are not futile. - However, this average may be misleading. A more relevant question is whether surgeries performed on the predictably highest-risk patients were futile. --- # Predicting mortality risk Setup: - **Outcome:** Mortality within 1–12 months post-surgery - **Features:** Medicare claims data prior to joint replacement, including patient demographics (age, sex, geography); comorbidities, symptoms, injuries, acute conditions, and their progression over time; and healthcare utilization - **Sample:** Training set: 65,000 observations; Test set: 33,000 observations - **ML algorithm:** Lasso regression The playbook: - Rank test-set beneficiaries by model-predicted mortality risk percentiles - Assign to each percentile its corresponding share of surgeries - Demonstrate that the algorithm outperforms physician decision-making --- # The riskiest people receiving joint replacement .pull-left[ <img src="figs/table2_hip.png" width="60%" style="display: block; margin: auto;" /> _Source_: Kleinberg et al. (2015). ] .pull-right[ How to read the table - Col 1: Model-predicted mortality percentile (1 = riskiest 1 %). - Col 2: Actual 12-month mortality for that percentile (standard errors in parentheses). - Col 3: Number of joint-replacement surgeries performed each year in that bin. Example: the top 1 % risk group received 4,905 surgeries, yet 56.2 % died within a year—so most of those operations were likely futile. ] --- # Can an algorithm beat physicians? **Econometric hurdle — selective-labels bias** We only observe post-operative mortality for people *who actually received a joint replacement*; the counterfactual outcome for untreated, yet eligible, patients is missing. **Counterfactual construction** 1. Pull the pool of patients who satisfied Medicare eligibility but **did not** undergo surgery. 2. Working assumption: surgeons schedule operations roughly in order of *increasing* risk (lowest-risk first). 3. Reallocate a fixed number of surgeries from the highest-risk treated patients to the lowest-risk untreated eligibles and compare predicted mortality. If this simulated swap lowers the expected death rate, the algorithm outperforms physician judgment. --- # So, can the lasso beat physicians? .pull-left[ <img src="figs/table2_hip_cont_2.png" width="90%" style="display: block; margin: auto;" /> _Source_: Kleinberg et al. (2015). *Simulation logic:* For each risk percentile, swap surgeries from the highest-risk treated patients with untreated, median-risk eligibles (50th percentile), holding total surgeries fixed. ] .pull-right[ The table reports: - Futile procedures averted (col 4) – operations reallocated away from patients with ≥50 % predicted 12-month mortality. - Annual savings (col 5) – hospital costs avoided (≈ $15 k per surgery). Key line: replacing the top 10 % risk group prevents 13,350 likely futile operations and saves $200m per year—evidence that the algorithm screens better than current physician judgement. ] ??? Note that the numbers that appear in this slide, taken from Susan Athey's presentation on "Policy Prediction Problems", are slightly different that those that appear in Kleinberg et al., (2015). --- # What can still go wrong? .pull-left[ **Econometric Concern #2 — Omitted Payoff Bias** Physicians might observe benefits—such as pain relief—that are not captured in our data. **Testing the Hypothesis** Use proxies for post-operative pain relief: - Physical therapy (PT) and joint injection claims - Follow-up visits for osteoarthritis Empirically, these proxies are flat across predicted risk percentiles. High-mortality patients do not appear to experience greater pain relief ⇒ Reallocating their surgeries is unlikely to reduce welfare. ] .pull-right[ <img src="figs/table2_hip_cont.png" width="90%" style="display: block; margin: auto;" /> ] --- #Leveling the Playing Field To fairly compare physicians (or any human experts) with a machine learning model, both must operate under identical inputs and incentives: - **Same information** – access to the full pre-decision dataset - **Same objective** – a shared payoff or loss function - **Same constraints** – identical budgetary, temporal, and policy limitations >Only after aligning these conditions can we meaningfully ask: Who allocates resources more effectively? --- # Key take-aways - A more accurate model does not automatically lead to better decisions. Alignment with payoffs and real-world constraints is essential. - Selective-labels and omitted-payoff bias can obscure algorithmic gains—or generate illusory ones. - Robust policy design requires marrying machine learning with social science: incentives, fairness, and welfare metrics must guide implementation. --- class: title-slide-section-blue, center, middle name: fairness # Prediction and Fairness --- # Blind Algorithms Algorithmic Fairness (Kleinberg, Ludwig, Mullainathan, and Rambachan, AER 2018): Can we increase algorithmic fairness by ignoring variables that induce such bias such as race, age, sex, etc.? Short answer: Not necessarily. --- # The basic setup The context: Student admission to college. Data: `\(\{Y_i, X_i, R_i\}_{i=1}^N\)`, where - `\(Y_i\)` is performance - `\(X_i\)` is a set of features - `\(R_i\)` is a binary race indicator where `\(R_i=1\)` for individuals that belong to the minority group and `\(R_i=0\)` otherwise. Predictors: - __"Aware"__: `\(\hat{f}(X_i,R_i)\)` - __"Blind"__: `\(\hat{f}(X_i)\)` - __"Orthogonality"__: `\(\hat{f}(\widetilde{X}_i)\)`, where `\(\widetilde{X}_i \perp R_i\)`. --- # Definitions Let `\(S\)` denote the set of admitted students and `\(\phi(S)\)` denote a function that depends only on the predicted performance, measured by `\(\hat{f}\)`, of the students in `\(S\)`. __Compatibility condition:__ If `\(S\)` and `\(S^\prime\)` are two sets of students of the same size, sorted in descending order of predicted performance `\(\widehat{f}(X,R)\)`, and the predicted performance of the `\(i^\text{th}\)` student in `\(S\)` is at least as large as the predicted performance of the `\(i^\text{th}\)` in `\(S^\prime\)` for all `\(i\)`, then `\(\phi(S)\geq\phi(S^\prime)\)`. Intuition: if every member of class `\(S\)` is no worse on paper than the counterpart in `\(S^\prime\)`, any planner who claims to care only about student performance shouldn’t prefer `\(S^\prime\)`. - The _efficient_ planner maximizes `\(\phi(S)\)` where `\(\phi(S)\)` is compatible with `\(\hat{f}\)`. - The _equitable_ planner seeks to maximize `\(\phi(S)+\gamma(S)\)`, where `\(\phi(S)\)` is compatible with `\(\hat{f}\)`, and `\(\gamma(S)\)` is monotonically increasing in the number of students in `\(S\)` who have `\(R=1\)`. --- # Main result: Keep `\(R\)` in Kleinberg et al. (2018) main result: > THEOREM 1: _For some choice of `\(K_0\)` and `\(K_1\)` with `\(K_0 + K_1 = K\)`, the equitable planner’s problem can be optimized by choosing the `\(K_0\)` applicants in the `\(R = 0\)` group with the highest `\(\widehat{f}(X,R)\)`, and the `\(K_1\)` applicants in the `\(R=1\)` group with the highest `\(\widehat{f}(X,R)\)`._ (See Kleinberg et al., 2018 for a sketch of the proof.) **In words:** If you want both quality and a fair share of minority students, first decide how many seats each group should get, then simply admit the highest-scoring people within each group—using the race-aware score. --- # Intuition - Good ranking of applicants is desired for both types of planners. - Equitable planners still care about ranking _within_ groups. - Achieving a more balanced acceptance rate is a _post_ prediction step. Can be adjusted by changing the group-wise threshold. --- # Illustration of the result Say that we have `\(10\)` open slots, `\(100\)` admissions from the majority group `\((R=0)\)` and `\(20\)` form the minority group `\((R=1)\)`. In addition, assume that the acceptance rate for the minority group is set to `\(30\%\)`. An equitable planner should: 1. Rank candidates within each group according to `\(\widehat{f}(X_i,R_i)\)`. 2. Accept the top `\(7\)` from the `\(R=0\)` group, and top `\(3\)` from the `\(R=1\)` group. --- # Empirical application .pull-left[ __DATA: __Panel data on This representative sample of students who entered eighth grade in the fall of 1988, and who were then followed up in 1990, 1992, 1994, and mid-20s). __OUTCOME:__ GPA `\(\geq\)` 2.75. __FEATURES:__ High school grades, course taking patterns, extracurricular activities, standardized test scores, etc. __RACE:__ White `\((N_0=4,274)\)` and black `\((N_1=469)\)`. __PREDICTORS:__ OLS (random forest for robustness) ] .pull-right[ __RESULT:__ The "aware" predictor dominates for both efficient planner and equitable planner. <img src="figs/fig1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Sources of disagreement .pull-left[ - On the right: The distribution of black students in the sample across predicted-outcome deciles according to the race-blind or race-aware predictors. - How to read this: In the case of agreement between race-blind and race-aware, the values would be aligned on the main diagonal. By contrast, disagreement is characterized by off-diagonal non-zero values. - Bottom line (again): Adding race to the equation improves within group ranking. ] .pull-right[ <img src="figs/fig2.png" width="90%" style="display: block; margin: auto;" /> ] --- # Main takeaways - Turning algorithms blind might actually do harm. - What actually matters is the rankings within groups. - Caveat: This is a very specific setup and source of bias. --- class: .title-slide-final, center, inverse, middle # `slides |> end()` [<i class="fa fa-github"></i> Source code](https://github.com/ml4econ/lecture-notes-2025/tree/master/12-prediction-policy) --- # Selected references Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. “Machine Bias.” _ProPublica_, May 23. [https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.](https://www.propublica.org/article/machine-bias-risk-assesments-in-criminal-sentencing.) Athey, S. (2018). The Impact of Machine Learning on Economics. Athey, S., & Wager, S. (2018). Efficient Policy Learning. Kleinberg, B. J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction Policy Problems. _American Economic Review: Papers & Proceedings_, 105(5), 491–495. Kleinberg, B. J., Ludwig, J., Mullainathan, S., & Rambachan, A. (2018). Algorithmic Fairness. _American Economic Review: Papers & Proceedings_, 108, 22–27. --- # Selected references Kleinberg, J., Lakkaraju, H., Leskovec, J., Ludwig, J., & Mullainathan, S. (2018). Human Descisions and Machine Predictions. _Quarterly Journal of Economics_, 133(1), 237–293. Kleinberg, J., Mullainathan, S., & Raghavan, M. (2017). Inherent Trade-Offs in the Fair Determination of Risk Scores. _Proceedings of the 8th Conference on Innovation in Theoretical Computer Science_, 43, 1–23. Mullainathan, S., & Spiess, J. (2017). Machine Learning: An Applied Econometric Approach. _Journal of Economic Perspectives_, 31(2), 87–106.