ScPoEconometrics: Advanced

class: center, middle, inverse, title-slide

.title[
# ScPoEconometrics: Advanced
]
.subtitle[
## Intro and Recap 1
]
.author[
### Florian Oswald
]
.date[
### SciencesPo Paris 2024-01-25
]

---

layout: true

---

# Welcome to *ScPoEconometrics: Advanced*!

.pull-left[

## Today

1. Communication: Slack Intro

2. Who Am I

3. This Course

4. Recap 1 of topics from intro course

]

.pull-right[

### Next time
 
* Recap 2

]

---

# Who Am I

.pull-left[
* I'm a Professor in the Dept of Economics at SciencesPo Paris.

* I work on Econometrics, and I am especially interested in Causal Inference with Machine Learning (ML)
    1. What are the benefits of using ML for economists?
    1. What are the consequences of using ML for our statistics?
    2. How to use ML in our Economics models?
    3. How to develop new algorithms to conduct sound inference in High-dimensional contexts?
]

.pull-right[
    
* I do mostly theory, but, as an economist, I am interested in the practical implications of my work to economic/policy questions. 
]

---

# This Course

## Prerequisites

* This course is the *follow-up* to [Introduction to Econometrics with R](https://github.com/argafacu/IntrotoEconometricswithR).

* You are supposed to be familiar with all the econometrics material from [the slides](https://github.com/argafacu/IntrotoEconometricswithR) of that course and/or chapters 1-9 in our [textbook](https://scpoecon.github.io/ScPoEconometrics/).

* We also assume you have basic `R` working knowledge at the level of the intro course!
    * basic `data.frame` manipulation with `dplyr`
    * simple linear models with `lm`
    * basic plotting with `ggplot2`

---

# This Course

.pull-left[

## Grading

1. There will be ***on the spot*** quizzes => 15%.

1. There will be a ***midterm take-home exam*** => 35%.

1. There will be a ***final project***=> 50%.
]

.pull-right[

## Course Materials

1. [Book](https://scpoecon.github.io/ScPoEconometrics/) chapter 9 onwards.

1. The [Slides](https://github.com/argafacu/Advanced-Metrics-slides).

]

---

# Syllabus

.pull-left[
0\. Intro, Difference-in-Differences

1\. Intro, Recap 1

2\. Recap 2
    
    
3\. Tools: `Rmarkdown` and `data.table`

4\. Instrumental Variables 1

5\. Instrumental Variables 2
    (*Midterm Exam (?*)
]

.pull-right[
6\. Panel Data 1

7\. Panel Data 2
    
8\. Discrete Outcomes

9\. Intro to Machine Learning 1

10\. Intro to Machine Learning 2 
      (*Final Project*)
      
11\. Recap / Buffer

12\. Recap / Buffer
      
]

---
class: separator, middle

# Recap 1

Let's get cracking! 💪

---

# Population *vs.* sample

## Models and notation

We write our (simple) population model

$$ y_i = \beta_0 + \beta_1 x_i + u_i $$

and our sample-based estimated regression model as

$$ y_i = \hat{\beta}_0 + \hat{\beta}_1 x_i + e_i $$

An estimated regression model produces estimates for each observation:

$$ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i $$

which gives us the _best-fit_ line through our dataset.

(A lot of this set slides - in particular: pictures! - have been taken from [Ed Rubin's](https://edrub.in/index.html) outstanding material. Thanks Ed 🙏)

---
class: inverse

# Task 1: Run Simple OLS (4 minutes)
 
1. Load data [here](https://www.dropbox.com/s/wwp2cs9f0dubmhr/grade5.dta?dl=1). in `dta` format. (Hint: use `haven::read_dta("filename")` to read this format.)

1. Obtain common summary statistics for the variables `classize`, `avgmath` and `avgverb`. Hint: use the `skimr` package.

1. Estimate the linear model
    `$$\text{avgmath}_i = \beta_0 + \text{classize}_i x_i + u_i$$`

---
class: inverse

# Task 1: Solution

1. Load the data 
    
    ```r
    grades = haven::read_dta(file ="https://www.dropbox.com/s/wwp2cs9f0dubmhr/grade5.dta?dl=1")
    ```
1. Describe the dataset:
    
    ```r
    library(dplyr)
    grades %>% 
      select(classize,avgmath,avgverb) %>%
      skimr::skim()
    ```
1. Run OLS to estimate the relationship between class size and student achievement? 
    
    ```r
    summary(lm(formula = avgmath ~ classize, data = grades))
    ```

---
layout: true

# **Question:** Why do we care about *population vs. sample*?

---

.pull-left[

.center[**Population**]

]

.pull-right[

.center[**Population relationship**]

$$ y_i = 2.53 + 0.57 x_i + u_i $$

$$ y_i = \beta_0 + \beta_1 x_i + u_i $$

]

---

.pull-left[

.center[**Sample 1:** 30 random individuals]

]

.pull-right[

.center[

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 2.36 + 0.61 x_i$`

]

---
count: false

.pull-left[

.center[**Sample 2:** 30 random individuals]

]

.pull-right[

.center[

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 2.79 + 0.56 x_i$`

]

]
---
count: false

.pull-left[

.center[**Sample 3:** 30 random individuals]

]

.pull-right[

.center[

**Population relationship**
 
`$y_i = 2.53 + 0.57 x_i + u_i$`

**Sample relationship**
 
`$\hat{y}_i = 3.21 + 0.45 x_i$`

]

---
layout: false
class: clear, middle

Let's repeat this **10,000 times**.

(This exercise is called a (Monte Carlo) simulation.)

---
count: false

---
# Population *vs.* sample

**Question:** Why do we care about *population vs. sample*?

.pull-left[
<img src="recap1_files/figure-html/simulation scatter2-1.png" style="display: block; margin: auto;" />
]

.pull-right[

- On **average**, our regression lines match the population line very nicely.

- However, **individual lines** (samples) can really miss the mark.

- Differences between individual samples and the population lead to **uncertainty** for the econometrician.

]

---
layout: false

# Population *vs.* sample

**Question:** Why do we care about *population vs. sample*?

**Answer:** Uncertainty matters.

.pull-left[

* Every random sample of data is different.

* Our (OLS) estimators are computed from those samples of data.

* If there is sampling variation, there is variation in our estimates.

]

.pull-right[

* OLS inference depends on certain assumptions.

* If violated, our estimates will be biased or imprecise.

* Or both. 😧
]

---
# Linear regression

## The estimator

We can estimate a regression line in `R` (`lm(y ~ x, my_data)`) and `stata` (`reg y x`). But where do these estimates come from?

A few slides back:

> $$ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i $$
> which gives us the *best-fit* line through our dataset.

But what do we mean by "best-fit line"?

---
layout: false

# Being the "best"

**Question:** What do we mean by *best-fit line*?

**Answers:**

- In general (econometrics), *best-fit line* means the line that minimizes the sum of squared errors (SSE):

.center[

`$\text{SSE} = \sum_{i = 1}^{n} e_i^2\quad$` where `$\quad e_i = y_i - \hat{y}_i$`

]

- Ordinary **least squares** (**OLS**) minimizes the sum of the squared errors.
- Based upon a set of (mostly palatable) assumptions, OLS
  - Is unbiased (and consistent)
  - Is the *best* (minimum variance) linear unbiased estimator (BLUE)

---
layout: true
# OLS *vs.* other lines/estimators

---

Let's consider the dataset we previously generated.

---
count: false

For any line `$\left(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\right)$`

---
count: false

For any line `$\left(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\right)$`, we can calculate errors: `$e_i = y_i - \hat{y}_i$`

---
count: false

For any line `$\left(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\right)$`, we can calculate errors: `$e_i = y_i - \hat{y}_i$`

---
count: false

For any line `$\left(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\right)$`, we can calculate errors: `$e_i = y_i - \hat{y}_i$`

---
count: false

SSE squares the errors `$\left(\sum e_i^2\right)$`: bigger errors get bigger penalties.

---
count: false

The OLS estimate is the combination of `$\hat{\beta}_0$` and `$\hat{\beta}_1$` that minimize SSE.

---
layout: false
class: middle

```r
ScPoApps::launchApp("reg_simple")
```

---
layout: true
# OLS

## Formally

---

In simple linear regression, the OLS estimator comes from choosing the `$\hat{\beta}_0$` and `$\hat{\beta}_1$` that minimize the sum of squared errors (SSE), _i.e._,

$$ \min_{\hat{\beta}_0,\, \hat{\beta}_1} \text{SSE} $$

but we already know `$\text{SSE} = \sum_i e_i^2$`. Now use the definitions of `$e_i$` and `$\hat{y}$`.

$$
`\begin{aligned}
  e_i^2 &= \left( y_i - \hat{y}_i \right)^2 = \left( y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i \right)^2 \\
  &= y_i^2 - 2 y_i \hat{\beta}_0 - 2 y_i \hat{\beta}_1 x_i + \hat{\beta}_0^2 + 2 \hat{\beta}_0 \hat{\beta}_1 x_i + \hat{\beta}_1^2 x_i^2
\end{aligned}`
$$

**Recall:** Minimizing a multivariate function requires (**1**) first derivatives equal zero (the *1.super[st]-order conditions*) and (**2**) second-order conditions (concavity).

---
layout: false

# OLS

## Interactively

```r
ScPoApps::launchApp("SSR_cone")
```

---
# OLS

## Interactively

We skipped the maths.

We now have the OLS estimators for the slope

$$ \hat{\beta}_1 = \dfrac{\sum_i (x_i - \overline{x})(y_i - \overline{y})}{\sum_i (x_i - \overline{x})^2} $$

and the intercept

$$ \hat{\beta}_0 = \overline{y} - \hat{\beta}_1 \overline{x} $$

Remember that *those* two formulae are amongst the very few ones from the intro course that you should know by heart! ❤️

We now turn to the assumptions and (implied) properties of OLS.

---
layout: true
# OLS: Assumptions and properties

---

**Question:** What properties might we care about for an estimator?

**Tangent:** Let's review statistical properties first.

---

**Refresher:** Density functions

Recall that we use **probability density functions** (PDFs) to describe the probability a **continuous random variable** takes on a range of values. (The total area = 1.)

These PDFs characterize probability distributions, and the most common/famous/popular distributions get names (_e.g._, normal, *t*, Gamma).

Here is the definition of a *PDF* `$f_X$`for a *continuous* RV `$X$`:

$$
\Pr[a \leq X \leq b] \equiv \int_a^b f_X (x) dx
$$

---

**Refresher:** Density functions

The probability a standard normal random variable takes on a value between -2 and 0: `$\mathop{\text{P}}\left(-2 \leq X \leq 0\right) = 0.48$`

---

**Refresher:** Density functions

The probability a standard normal random variable takes on a value between -1.96 and 1.96: `$\mathop{\text{P}}\left(-1.96 \leq X \leq 1.96\right) = 0.95$`

---

**Refresher:** Density functions

The probability a standard normal random variable takes on a value beyond 2: `$\mathop{\text{P}}\left(X > 2\right) = 0.023$`

---

Imagine we are trying to estimate an unknown parameter `$\beta$`, and we know the distributions of three competing estimators. Which one would we want? How would we decide?

---

**Question:** What properties might we care about for an estimator?

**Answer one: Bias.**

On average (after *many* samples), does the estimator tend toward the correct value?

**More formally:** Does the mean of estimator's distribution equal the parameter it estimates?

$$ \mathop{\text{Bias}}_\beta \left( \hat{\beta} \right) = \mathop{\boldsymbol{E}}\left[ \hat{\beta} \right] - \beta $$

---

**Answer one: Bias.**

.pull-left[

**Unbiased estimator:** `$\mathop{\boldsymbol{E}}\left[ \hat{\beta} \right] = \beta$`

]

.pull-right[

**Biased estimator:** `$\mathop{\boldsymbol{E}}\left[ \hat{\beta} \right] \neq \beta$`

]

---

**Answer two: Variance.**

The central tendencies (means) of competing distributions are not the only things that matter. We also care about the **variance** of an estimator.

$$ \mathop{\text{Var}} \left( \hat{\beta} \right) = \mathop{\boldsymbol{E}}\left[ \left( \hat{\beta} - \mathop{\boldsymbol{E}}\left[ \hat{\beta} \right] \right)^2 \right] $$

Lower variance estimators mean we get estimates closer to the mean in each sample.

---
count: false

**Answer two: Variance.**

---

**Answer one: Bias.**

**Answer two: Variance.**

**Subtlety:** The bias-variance tradeoff.

Should we be willing to take a bit of bias to reduce the variance?

In econometrics, we generally stick with unbiased (or consistent) estimators. But other disciplines (especially computer science) think a bit more about this tradeoff.

---
layout: false

# The bias-variance tradeoff.

---
# OLS: Assumptions and properties

## Properties

As you might have guessed by now,

- OLS is **unbiased**.
- OLS has the **minimum variance** of all unbiased linear estimators.

---
# OLS: Assumptions and properties

## Properties

But... these (very nice) properties depend upon a set of assumptions:

1. The population relationship is linear in parameters with an additive disturbance.

2. Our `$X$` variable is **exogenous**, _i.e._, `$\mathop{\boldsymbol{E}}\left[ u |X \right] = 0$`.

3. The `$X$` variable has variation. And if there are multiple explanatory variables, they are not perfectly collinear.

4. The population disturbances `$u_i$` are independently and identically distributed as normal random variables with mean zero `$\left( \mathop{\boldsymbol{E}}\left[ u \right] = 0 \right)$` and variance `$\sigma^2$` (_i.e._,  `$\mathop{\boldsymbol{E}}\left[ u^2 \right] = \sigma^2$`). Independently distributed and mean zero jointly imply `$\mathop{\boldsymbol{E}}\left[ u_i u_j \right] = 0$` for any `$i\neq j$`.

---
# OLS: Assumptions and properties

## Assumptions

Different assumptions guarantee different properties:

- Assumptions (1), (2), and (3) make OLS unbiased.
- Assumption (4) gives us an unbiased estimator for the variance of our OLS estimator.

We will discuss solutions to **violations of these assumptions**. See also our discussion [in the book](https://scpoecon.github.io/ScPoEconometrics/std-errors.html#class-reg)

- Non-linear relationships in our parameters/disturbances (or misspecification).
- Disturbances that are not identically distributed and/or not independent.
- Violations of exogeneity (especially omitted-variable bias).

---
# OLS: Assumptions and properties

## Conditional expectation

For many applications, our most important assumption is **exogeneity**, _i.e._,
$$
`\begin{align}
  \mathop{E}\left[ u | X \right] = 0
\end{align}`
$$
but what does it actually mean?

One way to think about this definition:

> For *any* value of `$X$`, the mean of the residuals must be zero.

- _E.g._, `$\mathop{E}\left[ u | X=1 \right]=0$` *and* `$\mathop{E}\left[ u | X=100 \right]=0$`

- _E.g._, `$\mathop{E}\left[ u | X_2=\text{Female} \right]=0$` *and* `$\mathop{E}\left[ u | X_2=\text{Male} \right]=0$`

- Notice: `$\mathop{E}\left[ u | X \right]=0$` is more restrictive than `$\mathop{E}\left[ u \right]=0$`
---
layout: false
class: clear, middle

Graphically...
---
exclude: true

---
class: clear

Valid exogeneity, _i.e._, `$\mathop{E}\left[ u | X \right] = 0$`

<img src="recap1_files/figure-html/ex_good_exog-1.svg" style="display: block; margin: auto;" />
---
class: clear

Invalid exogeneity, _i.e._, `$\mathop{E}\left[ u | X \right] \neq 0$`

---
layout: false
class: title-slide-final, middle
background-image: url(../../img/logo/ScPo-econ.png)
background-size: 250px
background-position: 9% 19%

# END