Causality

class: center, middle, inverse, title-slide

.title[
# Causality
]
.subtitle[
## EC 421, Set 10
]
.author[
### Edward Rubin
]

---

class: inverse, middle

# Prologue

---
name: schedule

# Schedule

## Last Time

Autocorrelation and nonstationarity

## Today

Causality

---
layout: true
# Causality

---
class: inverse, middle
name: intro
---
name: intro

## Intro

Most tasks in econometrics boil down to one of two goals:

$$
`\begin{align}
  y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k + u
\end{align}`
$$

1. .hi-purple[Prediction:] Accurately and dependably .purple[predict/forecast] `$\color{#6A5ACD}{y}$` using on some set of explanatory variables—doesn't need to be `$x_1$` through `$x_k$`. Focuses on `$\color{#6A5ACD}{\hat{y}}$`. `$\beta_j$` doesn't really matter.

1. .hi[Causal estimation:].super[.pink[†]] Estimate the actual data-generating process—learning about the true, population model that explains .pink[how] `$\color{#e64173}{y}$` .pink[changes when we change] `$\color{#e64173}{x_j}$`—focuses on `$\color{#e64173}{\beta_j}$`. Accuracy of `$\hat{y}$` is not important.

.footnote[
.pink[†] Often called *causal identification*.
]

For the rest of the term, we will focus on .hi[causally estimating] `$\color{#e64173}{\beta_j}$`.
---
name: challenges

## The challenges

As you saw in the data-analysis exercise, determining and estimating the true model can be pretty difficult—both .purple[practically] and .pink[econometrically].

.pull-left[.purple[
**Practical challenges**

- Which variables?
- Which functional form(s)?
- Do data exist? How much?
- Is the sample representative?
]]

.pull-right[.pink[
**Econometric challenges**

- Omitted-variable bias
- Reverse causality
- Measurement error
- How precise can/must we be?
]]

Many of these challenges relate to .hi-slate[exogeneity], _i.e._, `$\color{#314f4f}{\mathop{\boldsymbol{E}}\left[ u_i | X \right] = 0}$`.
--
 Causality requires us to .hi-slate[hold all else constant] (*ceterus paribus*).

---

## It's complicated

Occasionally, .hi[*causal*] relationships are simply/easily understood, _e.g._,

- What .pink[caused] the forest fire?
- .pink[How] did this baby get here?

Generally, .hi[*causal*] relationships are complex and challenging to answer, _e.g._,

- What .pink[causes] some countries to grow and others to decline?
- What .pink[caused] the capital riot?
- Did lax regulation.pink[cause] Texas's recent energy problems?
- .pink[How] does the number of police officers affect crime?
- What is the .pink[effect] of better air quality on test scores?
- Do longer prison sentences .pink[decrease] crime?
- How did cannabis legalization .pink[affect] mental health/opioid addiction?
---

## Correlation ≠ Causation

You've likely heard the saying

> Correlation is not causation.

The saying is just pointing out that there are violations of exogeneity.

Although correlation is not causation, .hi[causation *requires* correlation].

.hi-slate[New saying:]

> Correlation plus exogeneity is causation.

---
layout: false
class: clear, middle

Let's work through a few examples.
---
layout: true
# Causation
---
name: fertilizer

## Example: The causal effect of fertilizer.super[.pink[†]]

.footnote[
.pink[†] Many of the early statistical and econometric studies involved agricultural field trials.
]

Suppose we want to know the causal effect of fertilizer on corn yield.

**Q:** Could we simply regress yield on fertilizer?
--
 **A:** Probably not (if we want the causal effect).
--
 **Q:** Why not?
--
 **A:** Omitted-variable bias: Farmers may apply less fertilizer in areas that are already worse on other dimensions that affect yield (soil, slope, water). .pink[Violates *all else equal* (exogeneity). Biased and/or spurious results.]
--
 **Q:** So what *should* we do?
--
 **A:** .hi[Run an experiment!]
--
 💩
---

## Example: The causal effect of fertilizer

Randomized experiments help us maintain *all else equal* (exogeneity).

We often call these experiments .hi[*randomized control trials*] (RCTs)..super[.pink[†]]

.footnote[
.pink[†] Econometrics (and statistics) borrows this language from biostatistics and pharmaceutical trials.
]

Imagine an RCT where we have two groups:

- .hi-slate[Treatment:] We apply fertilizer.
- .hi-slate[Control:] We do not apply fertilizer.

By randomizing plots of land into .hi-slate[treatment] or .hi-slate[control], we will, on average, include all kinds of land (soild, slope, water, *etc.*) in both groups.

*All else equal*!
---
class: clear

.hi-slate[54 equal-sized plots]
<img src="slides_files/figure-html/fertilizer_plot1-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality]
<img src="slides_files/figure-html/fertilizer_plot2-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_1-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_2-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_3-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_4-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_5-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_6-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_7-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_8-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_9-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_10-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_11-1.svg" style="display: block; margin: auto;" />
---
class: clear
count: false

.hi-slate[54 equal-sized plots] .hi[of varying quality] .hi-orange[plus randomly assigned treatment]
<img src="slides_files/figure-html/fertilizer_plot3_12-1.svg" style="display: block; margin: auto;" />
---

## Example: The causal effect of fertilizer

We can estimate the .hi[causal effect] of fertilizer on crop yield by comparing the average yield in the treatment group (💩) with the control group (no 💩).

$$
`\begin{align}
  \overline{\text{Yield}}_\text{Treatment} - \overline{\text{Yield}}_\text{Control}
\end{align}`
$$

Alternatively, we can use the regression

$$
`\begin{align}
  \text{Yield}_i = \beta_0 + \beta_1 \text{Trt}_i + u_i \tag{1}
\end{align}`
$$

where `$\text{Trt}_i$` is a binary variable (=1 if plot `$i$` received the fertilizer treatment).

**Q:** Should we expect `$(1)$` to satisfy exogeneity? Why?
--
 **A:** On average, .hi[randomly assigning treatment should balance] trt. and control across the other dimensions that affect yield (soil, slope, water).
---
layout: true
# Causality
## Example: Returns to education
---
name: returns

Labor economists, policy makers, parents, and students are all interested in the (monetary) *return to education.*

.hi-slate[Thought experiment:]
- Randomly select an individual.
- Give her an additional year of education.
- How much do her earnings increase?

This change in earnings gives the .hi-slate[causal effect] of education on earnings.
---

**Q:** Could we simply regress earnings on education?
--
 **A:** Again, probably not if we want the true, causal effect.

--
1. People *choose* education based upon many factors, *e.g.*, ability.
1. Education likely reduces experience (time out of the workforce).
1. Education is .hi[*endogenous*] (violates *exogeneity*).

The point (2) above also illustrates the difficulty in learning about educations while *holding all else constant*.

Many important variables have the same challenge—gender, race, income.
---

**Q:** So how can we estimate the returns to education?

.hi-slate[Option 1:] Run an .hi[experiment].

- Randomly .pink[assign education] (might be difficult).
- Randomly .pink[encourage education] (might work).
- Randomly .pink[assign programs] that affect education (*e.g.*, mentoring).

.hi-slate[Option 2:] Look for a .hi-purple[*natural experiment*]—a policy or accident in society that arbitrarily increased education for one subset of people.

- Admissions .purple[cutoffs]
- .purple[Lottery] enrollment and/or capacity .purple[constraints]
---
layout: true
# Causality
---
name: real

## Real-world experiments

Both examples consider .hi-slate[real experiments] that isolate causal effects.

.hi-slate[Characteristics]

- .purple[Feasible]—we can actually (potentially) run the experiment.
- .purple[Compare individuals] randomized into treatment against individuals randomized into control.
- .purple[Require "good" randomization] to get *all else equal* (exogeneity).

*Note:* Your experiment's results are only as good as your randomization.
---
class: clear
count: false

.hi-slate[Unfortunate randomization]
<img src="slides_files/figure-html/fertilizer_plot3_bad-1.svg" style="display: block; margin: auto;" />
---
layout: true
# Causality
## The ideal experiment
---
name: ideal

The .hi[ideal experiment] would be subtly different.

Rather than comparing units randomized as .pink[treatment] vs. .pink[control], the ideal experiment would compare treatment and control .hi[for the same, exact unit].

$$
`\begin{align}
  y_{\text{Treatment},i} - y_{\text{Control},i}
\end{align}`
$$

which we will write (for simplicity) as

$$
`\begin{align}
  y_{1,i} - y_{0,i}
\end{align}`
$$

This .pink[*ideal experiment*] is clearly infeasible.super[.pink[†]], but it creates nice notation for causality (the Rubin causal model/Neyman potential outcomes framework).

.footnote[
.pink[†] Without (1) God-like abilities and multiple universes or (2) a time machine.
]
---

.pull-left[
The *ideal* data for 10 people

```
#>     i trt  y1i  y0i
#> 1   1   1 5.01 2.56
#> 2   2   1 8.85 2.53
#> 3   3   1 6.31 2.67
#> 4   4   1 5.97 2.79
#> 5   5   1 7.61 4.34
#> 6   6   0 7.63 4.15
#> 7   7   0 4.75 0.56
#> 8   8   0 5.77 3.52
#> 9   9   0 7.47 4.49
#> 10 10   0 7.79 1.40
```
]

.pull-right[
Calculate the causal effect of trt.
$$
`\begin{align}
  \tau_i = y_{1,i} -  y_{0,i}
\end{align}`
$$
for each individual `$i$`.
]
---
count: false

.pull-left[
The *ideal* data for 10 people

```
#>     i trt  y1i  y0i effect_i
#> 1   1   1 5.01 2.56     2.45
#> 2   2   1 8.85 2.53     6.32
#> 3   3   1 6.31 2.67     3.64
#> 4   4   1 5.97 2.79     3.18
#> 5   5   1 7.61 4.34     3.27
#> 6   6   0 7.63 4.15     3.48
#> 7   7   0 4.75 0.56     4.19
#> 8   8   0 5.77 3.52     2.25
#> 9   9   0 7.47 4.49     2.98
#> 10 10   0 7.79 1.40     6.39
```
]

.pull-right[
Calculate the causal effect of trt.
$$
`\begin{align}
  \tau_i = y_{1,i} -  y_{0,i}
\end{align}`
$$
for each individual `$i$`.
]
---
count: false

.pull-left[
The *ideal* data for 10 people

.pull-right[
Calculate the causal effect of trt.
$$
`\begin{align}
  \tau_i = y_{1,i} -  y_{0,i}
\end{align}`
$$
for each individual `$i$`.

The mean of `$\tau_i$` is the .hi[average treatment effect] (.pink[ATE]).

Thus, `$\color{#e64173}{\overline{\tau} = 3.82}$`
]

---

This model highlights the fundamental problem of causal inference.
$$
`\begin{align}
  \tau_i = \color{#e64173}{y_{1,i}} &- \color{#6A5ACD}{y_{0,i}}
\end{align}`
$$

.hi-slate[The challenge:]

If we observe `$\color{#e64173}{y_{1,i}}$`, then we cannot observe `$\color{#6A5ACD}{y_{0,i}}$`.
 If we observe `$\color{#6A5ACD}{y_{0,i}}$`, then we cannot observe `$\color{#e64173}{y_{1,i}}$`.
---

So a dataset that we actually observe for 6 people will look something like
.pull-left[

```
#>     i trt  y1i  y0i
#> 1   1   1 5.01   NA
#> 2   2   1 8.85   NA
#> 3   3   1 6.31   NA
#> 4   4   1 5.97   NA
#> 5   5   1 7.61   NA
#> 6   6   0   NA 4.15
#> 7   7   0   NA 0.56
#> 8   8   0   NA 3.52
#> 9   9   0   NA 4.49
#> 10 10   0   NA 1.40
```
]

.pull-right[
We can't observe `$\color{#e64173}{y_{1,i}}$` and `$\color{#6A5ACD}{y_{0,i}}$`.

But, we do observe
- `$\color{#e64173}{y_{1,i}}$` for `$i$` in 1, 2, 3, 4, 5
- `$\color{#6A5ACD}{y_{0,j}}$` for `$j$` in 6, 7, 8, 9, 10

]

**Q:** How do we "fill in" the `NA`s and estimate `$\overline{\tau}$`?
---
layout: true
# Causality
## Causally estimating the treatment effect
---
name: estimation

.hi-slate[Notation:] Let `$D_i$` be a binary indicator variable such that

- `$\color{#e64173}{D_i=1}$` .pink[if individual] `$\color{#e64173}{i}$` .pink[is treated].
- `$\color{#6A5ACD}{D_i=0}$` .purple[if individual] `$\color{#6A5ACD}{i}$` .purple[is not treated (*control* group).]

Then, rephrasing the previous slide,

- We only observe `$\color{#e64173}{y_{1,i}}$` when `$\color{#e64173}{D_{i}=1}$`.
- We only observe `$\color{#6A5ACD}{y_{0,i}}$` when `$\color{#6A5ACD}{D_{i}=0}$`.

**Q:** How can we estimate `$\overline{\tau}$` using only `$\left(\color{#e64173}{y_{1,i}|D_i=1}\right)$` and `$\left(\color{#6A5ACD}{y_{0,i}|D_i=0}\right)$`?

---

**Q:** How can we estimate `$\overline{\tau}$` using only `$\left(\color{#e64173}{y_{1,i}|D_i=1}\right)$` and `$\left(\color{#6A5ACD}{y_{0,i}|D_i=0}\right)$`?

**Idea:** What if we compare the groups' means? _I.e._,
$$
`\begin{align}
  \color{#e64173}{\mathop{Avg}\left( y_i\mid D_i = 1 \right)} - \color{#6A5ACD}{\mathop{Avg}\left( y_i\mid D_i =0 \right)}
\end{align}`
$$

**Q:** When does this simple difference in groups' means provide information on the .hi-slate[causal effect] of the treatment?

**Q.sub[2.0]:** Is `$\color{#e64173}{\mathop{Avg}\left( y_i\mid D_i = 1 \right)} - \color{#6A5ACD}{\mathop{Avg}\left( y_i\mid D_i =0 \right)}$` a *good* estimator for `$\overline{\tau}$`?

Time for math! .bigger[🎉]
---

.hi-slate[Assumption:] Let `$\tau_i = \tau$` for all `$i$`.

This assumption says that the treatment effect is equal (constant) across all individuals `$i$`.

.hi-slate[Note:] We defined

$$
`\begin{align}
  \tau_i = \tau = \color{#e64173}{y_{1,i}} - \color{#6A5ACD}{y_{0,i}}
\end{align}`
$$

which implies

$$
`\begin{align}
   \color{#e64173}{y_{1,i}} = \color{#6A5ACD}{y_{0,i}} + \tau
\end{align}`
$$

---
layout: false
class: clear
name: derivation

**Q.sub[3.0]:** Is `$\color{#e64173}{\mathop{Avg}\left( y_i\mid D_i = 1 \right)} - \color{#6A5ACD}{\mathop{Avg}\left( y_i\mid D_i =0 \right)}$` a *good* estimator for `$\tau$`?

Difference in groups' means
--
 `$\quad \color{#ffffff}{\Bigg|}=\color{#e64173}{\mathop{Avg}\left( y_i\mid D_i = 1 \right)} - \color{#6A5ACD}{\mathop{Avg}\left( y_i\mid D_i =0 \right)}$`
--
 `$\quad \color{#ffffff}{\Bigg|}=\color{#e64173}{\mathop{Avg}\left( y_{1,i}\mid D_i = 1 \right)} - \color{#6A5ACD}{\mathop{Avg}\left( y_{0,i}\mid D_i =0 \right)}$`
--
 `$\quad \color{#ffffff}{\Bigg|}=\color{#e64173}{\mathop{Avg}\left( \color{#000000}{\tau \: +} \: \color{#6A5ACD}{y_{0,i}} \mid D_i = 1 \right)} - \color{#6A5ACD}{\mathop{Avg}\left( y_{0,i}\mid D_i =0 \right)}$`
--
 `$\quad \color{#ffffff}{\Bigg|}=\tau + \color{#e64173}{\mathop{Avg}\left(\color{#6A5ACD}{y_{0,i}} \mid D_i = 1 \right)} - \color{#6A5ACD}{\mathop{Avg}\left( y_{0,i}\mid D_i =0 \right)}$`
--
 `$\quad \color{#ffffff}{\Bigg|}= \text{Average causal effect} + \color{#FFA500}{\text{Selection bias}}$`

So our proposed group-difference estimator give us the sum of

1. `$\tau$`, the .hi-slate[causal, average treatment effect] that we want
2. .hi-orange[Selection bias:] How much trt. and control groups differ (on average).
---
class: clear, middle

.hi-slate[Next time:] Solving selection bias.

---
layout: false
# Table of contents

.pull-left[
### Admin
.smallest[

1. [Schedule](#schedule)

]
]

.pull-right[
### Causality
.smallest[

1. [Introduction](#intro)
1. [The challenges](#challenges)
1. Examples
  - [Fertilizer](#fertilizer)
  - [Returns to education](#returns)
1. [*Real* experiments](#real)
1. [The ideal experiment](#ideal)
1. [Estimation](#estimation)
1. [Derivation](#derivation)
]
]

---
exclude: true