Introduction and overview

EC421, Set 01

Ed Rubin

edwardr@uoregon.edu

Prologue

Why?

Motivation

Let’s start with a few basic, general questions:

What is the goal of econometrics?
Why do economists (or other people) study or use econometrics?

One simple answer: Learn about the world using data.

Learn about the world = Raise, answer, and challenge questions, theories, assumptions.
data = Plural of datum.

Why?

Example

One might (reasonably) guess a company’s sales are a function of its advertising spending, price, and intesity of competitors.

So, one might hypothesize a model \(\textcolor{#6A5ACD}{\text{Sales}} = f(\textcolor{#e64173}{\text{Ad}, \text{Price}, \text{Comp}})\)

where

\(\textcolor{#e64173}{\text{Ad}}\) represents dollars spent on advertising,
\(\textcolor{#e64173}{\text{Price}}\) is the product’s price,
\(\textcolor{#e64173}{\text{Comp}}\) gives the product’s competition.

We expect that sales \(\uparrow\) with advertising and \(\downarrow\) with price and competition.

But who needs to expect?

We can test these hypotheses using regression.

More importantly: Regression estimates the size of these effects

How much does an additional dollar of advertising increase sales?
How much does a one-dollar increase in price decrease sales?
How much does an additional competitor reduce sales?

These (causal) questions are central to efficient decision-making
and are the bread and butter of econometrics.

Regression model:

\[ \textcolor{#6A5ACD}{\text{Sales}}_i = \beta_0 + \beta_1 \textcolor{#e64173}{\text{Ad}}_i + \beta_2 \textcolor{#e64173}{\text{Price}}_i + \beta_3 \textcolor{#e64173}{\text{Comp}}_i + \varepsilon_i \]

Regression model:

\[ \textcolor{#6A5ACD}{\text{Sales}}_i = \beta_0 + \beta_1 \textcolor{#e64173}{\text{Ad}}_i + \beta_2 \textcolor{#e64173}{\text{Price}}_i + \beta_3 \textcolor{#e64173}{\text{Comp}}_i + \varepsilon_i \]

With this basic regression model, we can test/estimate/quantify the (linear) relationship between sales and advertising, price, and competition.

Regression model:

\[ \textcolor{#6A5ACD}{\text{Sales}}_i = \beta_0 + \beta_1 \textcolor{#e64173}{\text{Ad}}_i + \beta_2 \textcolor{#e64173}{\text{Price}}_i + \beta_3 \textcolor{#e64173}{\text{Comp}}_i + \varepsilon_i \]

(Review) Questions

Q: How do we interpret \(\beta_1\)?

A: An additional dollar of advertising corresponds with a \(\beta_1\)-unit change in sales (holding price and competition fixed).

Regression model:

\[ \textcolor{#6A5ACD}{\text{Sales}}_i = \beta_0 + \beta_1 \textcolor{#e64173}{\text{Ad}}_i + \beta_2 \textcolor{#e64173}{\text{Price}}_i + \beta_3 \textcolor{#e64173}{\text{Comp}}_i + \varepsilon_i \]

(Review) Questions

Q: Are the \(\beta_k\) terms population parameters or sample statistics?

A: Greek letters denote population parameters. Their estimates get hats, e.g., \(\hat{\beta}_k\). Population parameters represent the average behavior across the population.

Regression model:

\[ \textcolor{#6A5ACD}{\text{Sales}}_i = \beta_0 + \beta_1 \textcolor{#e64173}{\text{Ad}}_i + \beta_2 \textcolor{#e64173}{\text{Price}}_i + \beta_3 \textcolor{#e64173}{\text{Comp}}_i + \varepsilon_i \]

(Review) Questions

Q: Can we interpret the estimates for \(\beta_2\) as causal?

A: Not without making more assumptions and/or knowing more about the data-generating process.

Regression model:

\[ \textcolor{#6A5ACD}{\text{Sales}}_i = \beta_0 + \beta_1 \textcolor{#e64173}{\text{Ad}}_i + \beta_2 \textcolor{#e64173}{\text{Price}}_i + \beta_3 \textcolor{#e64173}{\text{Comp}}_i + \varepsilon_i \]

(Review) Questions

Q: What is \(\varepsilon_i\)?

A: An individual’s random deviation/disturbance from the population parameters.

Population parameters are averages; individuals are rarely average.

Regression model:

\[ \textcolor{#6A5ACD}{\text{Sales}}_i = \beta_0 + \beta_1 \textcolor{#e64173}{\text{Ad}}_i + \beta_2 \textcolor{#e64173}{\text{Price}}_i + \beta_3 \textcolor{#e64173}{\text{Comp}}_i + \varepsilon_i \]

(Review) Questions

Q: Which assumptions do we impose when estimating with OLS?

A:
- The relationship between the sales and the explanatory variables is linear in parameters, and \(\varepsilon\) enters additively.
- The explanatory variables are exogenous, i.e., \(E[\varepsilon|X] = 0\).
- You’ve also typically assumed something along the lines of:
  \(E[\varepsilon_i] = 0\), \(E[\varepsilon_i^2] = \sigma^2\), \(E[\varepsilon_i \varepsilon_j] = 0\) for \(i \neq j\).
- And (maybe) \(\varepsilon_i\) is distributed normally.

Assumptions

How important can they be?

You’ve learned how powerful and flexible ordinary least squares (OLS) regression can be.

However, the results you learned required assumptions.

Real life often violates these assumptions.

Assumptions

How important can they be?

EC421 asks “What happens when we violate these assumptions?”

Can we find a fix? (Especially: How/when is \(\beta\) causal?)
What happens if we don’t (or can’t) apply a fix?

OLS still does some amazing things, but you need to know when to be cautious, confident, or dubious.

Not everything is causal

But what is?

Suppose you estimate our sales model for your boss.

\[ \textcolor{#6A5ACD}{\text{Sales}}_i = \hat{\beta}_0 + \hat{\beta}_1 \textcolor{#e64173}{\text{Ad}}_i + \hat{\beta}_2 \textcolor{#e64173}{\text{Price}}_i + \hat{\beta}_3 \textcolor{#e64173}{\text{Comp}}_i + e_i \]

Can you trust that \(\hat{\beta}_2\) gives you the actual effect of price on sales?

You should be asking several questions…

Where does the variation in price come from?
- Is it random (exogenous)?
- Why are some products (or times) more expensive than others?
Whom do the data represent? Are they relevant to your setting?
How confident are you in your answer?

Econometrics

Applied econometrics, data science, analytics require:

Intuition for the theory behind statistics/econometrics
(assumptions, results, strengths, weaknesses).
Practical knowledge of how to apply theoretical methods to data.
Efficient methods for working with data
(cleaning, aggregating, joining, visualizing).

This course aims to deepen your knowledge in each of these three areas.

1: As before.
2-3: R

My “big-picture takeaways” (the intuition that I hope you form)

most interesting questions are causal;
selection into treatment dominates correlation (esp. cross-sectional);
measurement error can too;
causality comes from design, not from models/assumptions;
ask about the counterfactual;
non-stationary time series will lead you to bad conclusions;

quantifying uncertainty is just as important as the effect estimate;
consider the which population your data represent;
the mean is only one of many ways to summarize a population;
don’t mistake mean reversion for treatment effects/heterogeneity;

many maps are just population;
graphs should clearly communicate a message… beautifully.

Next: R basics + (More) Metrics review(s)