Introduction and Overview

class: center, middle, inverse, title-slide

.title[
# Introduction and Overview
]
.subtitle[
## EC 421, Set 1
]
.author[
### Edward Rubin
]

---

class: inverse, middle

# Prologue

---
# Why?

## Motivation

Let's start with a few __basic, general questions:__

1. What is the goal of econometrics?

2. Why do economists (or other people) study or use econometrics?

__One simple answer:__ Learn about the world using data.

- _Learn about the world_ = Raise, answer, and challenge questions, theories, assumptions.

- _data_ = Plural of datum.

---
# Why?

## Example

One might (reasonably) guess a company's .purple[sales] are a function of its .pink[advertising spending, price, and intesity of competitors].

So, one might hypothesize a model `$\color{#6A5ACD}{\text{Sales}} = f(\color{#e64173}{\text{Ad}, \text{Price}, \text{Comp}})$`

where

- `$\color{#e64173}{\text{Ad}}$` represents dollars spent on advertising,
- `$\color{#e64173}{\text{Price}}$` is the product's price,
- `$\color{#e64173}{\text{Comp}}$` gives the product's competition.

We expect that .purple[sales] `$\uparrow$` with .pink[advertising] and `$\downarrow$` with .pink[price] and .pink[competition].

---
class: clear, middle

But who needs to .grey[_expect_]?

We can .orange[_test_] these hypotheses __using regression__.

.white[
_More importantly:_ Regression estimates the _size_ of these effects

- *How much* does an additional dollar of advertising increase sales?
- *How much* does a one-dollar increase in price decrease sales?
- *How much* does an additional competitor reduce sales?

These (causal) questions are central to efficient decision-making and are the bread and butter of econometrics.
]

---
class: clear, middle

But who needs to .grey[_expect_]?

We can .orange[_test_] these hypotheses __using regression__.

.grey-vlight[_More importantly:_] Regression estimates the .orange[_size_] of these effects

- *How much* does an additional dollar of .pink[advertising] increase .purple[sales]?
- *How much* does a one-dollar increase in .pink[price] decrease .purple[sales]?
- *How much* does an additional .pink[competitor] reduce .purple[sales]?

These (causal) questions are central to efficient decision-making and are the bread and butter of econometrics.

---
layout: true
# Why?

## Example, cont.

__Regression model:__

$$ \color{#6A5ACD}{\text{Sales}}_i = \beta_0 + \beta_1 \color{#e64173}{\text{Ad}}_i + \beta_2 \color{#e64173}{\text{Price}}_i + \beta_3 \color{#e64173}{\text{Comp}}_i + \varepsilon_i $$

---

With this basic regression model, we can test/estimate/quantify the (linear) relationship between sales and advertising, price, and competition.

---

### (Review) Questions

- __Q:__ How do we interpret `$\beta_1$`?
--

- __A:__ An additional dollar of .pink[advertising] corresponds with a `$\beta_1$`-unit change in .purple[sales] (holding .pink[price] and .pink[competition] fixed).

---

### (Review) Questions

- __Q:__ Are the `$\beta_k$` terms population parameters or sample statistics?
--

- __A:__ Greek letters denote __population parameters__. Their estimates get hats, _e.g._, `$\hat{\beta}_k$`. Population parameters represent the __average__ behavior across the population.

---

### (Review) Questions

- __Q:__ Can we interpret the estimates for `$\beta_2$` as causal?
--

- __A:__ Not without making more assumptions and/or knowing more about the data-generating process.

---

### (Review) Questions

- __Q:__ What is `$\varepsilon_i$`?
--

- __A:__ An individual's random deviation/disturbance from the population parameters.

Population parameters are averages; individuals are rarely average.

---

### (Review) Questions

- __Q:__ Which assumptions do we impose when estimating with OLS?
--

- __A:__
 - The relationship between the sales and the .pink[explanatory variables] is linear in parameters, and `$\varepsilon$` enters additively.
 - The .pink[explanatory variables] are __exogenous__, _i.e._, `$E[\varepsilon|X] = 0$`.
 - You've also typically assumed something along the lines of: `$E[\varepsilon_i] = 0$`, `$E[\varepsilon_i^2] = \sigma^2$`, `$E[\varepsilon_i \varepsilon_j] = 0$` for `$i \neq j$`.
 - And (maybe) `$\varepsilon_i$` is distributed normally.

---
layout: false

# Assumptions

## How important can they be?

You've learned how **powerful and flexible** ordinary least squares (**OLS**) regression can be.

However, the results you learned required assumptions.

**Real life often violates these assumptions.**

---
class: clear, middle

EC421 asks "**What happens when we violate these assumptions?**"
- Can we find a fix? (Especially: How/when is `$\beta$` *causal*?)
- What happens if we don't (or can't) apply a fix?

OLS still does some amazing things—but you need to know when to be **cautious, confident, or dubious**.

---
exclude: true

# Not everything is causal

---
# Not everything is causal
## But what _is_?

Suppose you estimate our sales model for your boss.

$$ \color{#6A5ACD}{\text{Sales}}_i = \hat{\beta}_0 + \hat{\beta}_1 \color{#e64173}{\text{Ad}}_i + \hat{\beta}_2 \color{#e64173}{\text{Price}}_i + \hat{\beta}_3 \color{#e64173}{\text{Comp}}_i + e_i $$

Can you trust that `$\hat{\beta}_2$` gives you the actual effect of .pink[price] on .purple[sales]?

You should be asking several questions...

1. .it[Where] does the .it.pink[variation in price] come from?
  - Is it .it.pink[random] (.it[exogenous])?
  - .it.pink[Why] are some products (or times) .it.pink[more] expensive than others?
1. .it.orange[Whom] do the data represent? Are they .it.orange[relevant] to your setting?
1. How .it.orange[confident] are you in your answer?

---

# Econometrics

Applied econometrics, data science, analytics require:

1. Intuition for the __theory__ behind statistics/econometrics (assumptions, results, strengths, weaknesses).

1. Practical knowledge of how to __apply theoretical methods__ to data.

1. Efficient methods for __working with data__ (cleaning, aggregating, joining, visualizing).

__This course__ aims to deepen your knowledge in each of these three areas.

- 1: As before.
- 2–3: __R__

---
# Econometrics

My ".b[big-picture takeaways]" (the .it.slab[intuition] that I hope you form)

- .it[most] interesting questions are .slab.pink[causal];
- .slab.pink[selection into treatment] dominates correlation (esp. cross-sectional);
- .slab.pink[measurement error] can too;
- causality comes from .slab.pink[design]—not from models/assumptions;
- ask about the .slab.pink[counterfactual];
- .slab.pink[non-stationary] time series will lead you to bad conclusions;

--
- quantifying .slab.purple[uncertainty] is just as important as the effect estimate;
- consider .slab.purple[.it[which] population] your data represent;
- the .slab.purple[mean] is only one of many ways to summarize a population;
- don’t mistake .slab.purple[mean reversion] for treatment effects/heterogeneity;

--
- many .slab.orange[maps] are just .it[population];
- .slab.orange[graphs] should clearly communicate a .it[message]... beautifully.

---
layout: false
class: clear, middle

.it.slab[Next:] .mono[R] basics + (More) Metrics review(s)