Problem Set 4

class: center, middle, inverse, title-slide

# Problem Set 4
## Nonstationarity, Causality, Instrumental Variables
### <strong>EC 421:</strong> Introduction to Econometrics
### <br>Due <em>before</em> midnight (11:59pm) on Wednesday, 05 June 2019

---

class: clear

.mono[DUE] Your solutions to this problem set are due *before* midnight on Wednesday, 05 June 2019. Your files must be uploaded to [Canvas](https://canvas.uoregon.edu/).

.mono.b[IMPORTANT] Your submission must include (1) **your responses/answers to the question in a PDF, Word, or similar file** and (2) the .mono[R] script you used to generate your answers. **The .mono[R] script is just for your code. To receive credit, your answers/figures/*etc.* must be in the PDF/Word document.** Each student must turn in her/his own answers.

.mono[OBJECTIVE] This problem set has three purposes: (1) reinforce econometrics topics from class; (2) build your .mono[R] toolset; (3) strengthen your intuition on causality and time series.

## Problem 1: Nonstationarity—the Basics

**1a.** Define stationarity.

*Note:* You can define it using math or words (or both).

**1b.** If our disturbance term `$u_t$` follows a .pink[random walk], *i.e.*,
$$
`\begin{align}
  u_{t} = u_{t-1} + \varepsilon_t
\end{align}`
$$
then it's variance is `$\mathop{\text{Var}} \left( u_t \right) = t \sigma_{\varepsilon}^2$`. Explain how this expression of its variance shows that the disturbance is .purple[nonstationary] (*i.e.*, it violates .pink[stationarity]).

**1c.** We previously discussed autocorrelated distrubances, *e.g.*, an AR(1) process such that
$$
`\begin{align}
  u_{t} = \rho u_{t-1} + \varepsilon_t
\end{align}`
$$
Under which circumstances would this AR(1) process become a random walk?

*Hint:* Consider the values of `$\rho$`.

---
class: clear

## Problem 2: Nonstationarity—the Simulation

In this problem, we are going to create two independent, .hi-purple[nonstationary] time series. Specifically, we'll create two random walks. Then, we'll regress the first random walk on the second random walk.

*Hint:* Generating random walks is *nearly* identical to generating AR(1) processes, as you did in lab.

**2a.** Generate the first 50-period random walk. We will name it `v`.
$$
`\begin{align}
  v_t = v_{t-1} + \varepsilon_t
\end{align}`
$$
where `$\varepsilon_t$` comes from a normal distribution with mean 0 and standard deviation 1.

Here is some .mono[R] to help.

```r
# Set a seed (so your results stay the same)
set.seed(1234)
# Generate the initial number, (this will be v[1])
v <- rnorm(1, mean = 0, sd = 1)
# For loop to create the random walk
for (t in 2:50) {
  # Create the 'next' observation
  ...
}
```
while you're filling in the `for` loop, keep in mind (**1**) our equation for the random walk at the beginning of this question (meaning `$v_t$` depends upon `$v_{t-1}$` and `$\varepsilon_t$`) and (**2**) the fact that you can reference different observations in .mono[R], *e.g.*,

- `v[t]` refers to the `$t$`.super[th] observation
- `v[t-1]` refers to the `$(t-1)$`.super[th] observation
- `v[3]` refers to the `$3$`.super[rd] observation

If you need more help on for loops, don't forget there are lab materials on Canvas and resources online (*e.g.*, [datamentor.io](https://www.datamentor.io/r-programming/for-loop/) and [datacamp.com](https://www.datacamp.com/community/tutorials/tutorial-on-loops-in-r) have lots of resources).

**2b.** Generate a second 50-period random walk called `w`. This part is exactly the same as (2a), but you **use a different seed** (*i.e.*, `set.seed(456)`) and **name the variable** `w`.

**2c.** We .orange[independently] generated these two time series. Ideally (from a statistical point of view), should we find a statistically significant relationship between the two series? Explain.

**2d.** Regress `w` on `v`. Report the results from the `$t$` test. Do they match your expectations from (2c)? Explain.

**2e.** As we've mentioned, one (simple) way you can work with the nonstationary from random walks is to take differences, _i.e._, `$v_t - v_{t-1}$`. The interpretation of the relationship does not change, whether we regress `$w$` on `$v$` or `$\Delta w$` on `$\Delta v$` (where `$\Delta w = w_t - w_{t-1}$`). In .mono[R], you can use the `diff()` function to take difference, _i.e._, `diff(v)` will calculate the differences for the variable `v`.

Regress the differenced `w` on the differenced `v`. Does it change your results from **2d**?

---
class: clear

## Problem 3: Causality

Following the Rubin causal model, imagine that we observe the following data (which would be impossible observe in real life):

.center[
.bold[Table: Imaginary dataset]
]
<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> .math[i] </th>
   <th style="text-align:right;"> Trt. </th>
   <th style="text-align:right;"> y.sub[1] </th>
   <th style="text-align:right;"> y.sub[0] </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;background-color: white;"> 1 </td>
   <td style="text-align:right;background-color: white;"> 0 </td>
   <td style="text-align:right;background-color: white;"> 25 </td>
   <td style="text-align:right;background-color: white;"> 17 </td>
  </tr>
  <tr>
   <td style="text-align:right;background-color: white;"> 2 </td>
   <td style="text-align:right;background-color: white;"> 0 </td>
   <td style="text-align:right;background-color: white;"> 15 </td>
   <td style="text-align:right;background-color: white;"> 11 </td>
  </tr>
  <tr>
   <td style="text-align:right;background-color: white;"> 3 </td>
   <td style="text-align:right;background-color: white;"> 1 </td>
   <td style="text-align:right;background-color: white;"> 11 </td>
   <td style="text-align:right;background-color: white;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;background-color: white;"> 4 </td>
   <td style="text-align:right;background-color: white;"> 1 </td>
   <td style="text-align:right;background-color: white;"> 13 </td>
   <td style="text-align:right;background-color: white;"> 9 </td>
  </tr>
</tbody>
</table>

**3a.** Calculate the treatment effect **for each individaul** (*i.e.*, `$\tau_i$`).

**3b.** **[T/F]** The treatment effect is constant across individuals.

**3c.** Calculate the **average treatment effect**.

**3d** **Estimate the average treatment effect** by comparing the **mean of the treatment group** to the **mean of the control group**.

**3e.** Should we expect our estimator in (3d) to provide unbiased estimates? **Explain.**

**3f.** Why would it be impossible to actually observe all of the data in the table (in real life)?

**3g.** How does your answer in (3f) relate to *the fundametal problem of causal inference*?

## Problem 4: Instrumental Variables

**4a.** What are the two requirements for a valid instrument?

We're interested in estimating `$\beta_1$` in
$$
`\begin{align}
  \text{Wage}_i = \beta_0 + \beta_1 \text{Education}_i + u_i
\end{align}`
$$
but we have a problem with omitted-variable bias. Instrumental variables can potentially help.

**4b.** As we've discussed, we need an instrument for (endogenous) education. Do you think the number of children would be a valid instrument? Explain why it passes/fails ech of the two requirements for a valid instrument.

**4c.** Which estimates would you trust more—OLS or IV, where number-of-children is your instrument? Explain.