class: center, middle, inverse, title-slide .title[ # Introduction and Overview ] .subtitle[ ## EC 421, Set 1 ] .author[ ### Edward Rubin ] --- class: inverse, middle # Prologue --- # Why? ## Motivation Let's start with a few __basic, general questions:__ -- 1. What is the goal of econometrics? 2. Why do economists (or other people) study or use econometrics? -- __One simple answer:__ Learn about the world using data. -- - _Learn about the world_ = Raise, answer, and challenge questions, theories, assumptions. - _data_ = Plural of datum. --- # Why? ## Example One might (reasonably) guess a company's .purple[sales] are a function of its .pink[advertising spending, price, and intesity of competitors]. -- So, one might hypothesize a model `\(\color{#6A5ACD}{\text{Sales}} = f(\color{#e64173}{\text{Ad}, \text{Price}, \text{Comp}})\)` where - `\(\color{#e64173}{\text{Ad}}\)` represents dollars spent on advertising, - `\(\color{#e64173}{\text{Price}}\)` is the product's price, - `\(\color{#e64173}{\text{Comp}}\)` gives the product's competition. -- We expect that .purple[sales] `\(\uparrow\)` with .pink[advertising] and `\(\downarrow\)` with .pink[price] and .pink[competition]. --- class: clear, middle But who needs to .grey[_expect_]? We can .orange[_test_] these hypotheses __using regression__. .white[ _More importantly:_ Regression estimates the _size_ of these effects - *How much* does an additional dollar of advertising increase sales? - *How much* does a one-dollar increase in price decrease sales? - *How much* does an additional competitor reduce sales? These (causal) questions are central to efficient decision-making<br>and are the bread and butter of econometrics. ] --- class: clear, middle But who needs to .grey[_expect_]? We can .orange[_test_] these hypotheses __using regression__. .grey-vlight[_More importantly:_] Regression estimates the .orange[_size_] of these effects - *How much* does an additional dollar of .pink[advertising] increase .purple[sales]? - *How much* does a one-dollar increase in .pink[price] decrease .purple[sales]? - *How much* does an additional .pink[competitor] reduce .purple[sales]? These (causal) questions are central to efficient decision-making<br>and are the bread and butter of econometrics. --- layout: true # Why? ## Example, cont. __Regression model:__ $$ \color{#6A5ACD}{\text{Sales}}_i = \beta_0 + \beta_1 \color{#e64173}{\text{Ad}}_i + \beta_2 \color{#e64173}{\text{Price}}_i + \beta_3 \color{#e64173}{\text{Comp}}_i + \varepsilon_i $$ --- With this basic regression model, we can test/estimate/quantify the (linear) relationship between sales and advertising, price, and competition. --- ### (Review) Questions -- - __Q:__ How do we interpret `\(\beta_1\)`? -- - __A:__ An additional dollar of .pink[advertising] corresponds with a `\(\beta_1\)`-unit change in .purple[sales] (holding .pink[price] and .pink[competition] fixed). --- ### (Review) Questions - __Q:__ Are the `\(\beta_k\)` terms population parameters or sample statistics? -- - __A:__ Greek letters denote __population parameters__. Their estimates get hats, _e.g._, `\(\hat{\beta}_k\)`. Population parameters represent the __average__ behavior across the population. --- ### (Review) Questions - __Q:__ Can we interpret the estimates for `\(\beta_2\)` as causal? -- - __A:__ Not without making more assumptions and/or knowing more about the data-generating process. --- ### (Review) Questions - __Q:__ What is `\(\varepsilon_i\)`? -- - __A:__ An individual's random deviation/disturbance from the population parameters. Population parameters are averages; individuals are rarely average. --- ### (Review) Questions - __Q:__ Which assumptions do we impose when estimating with OLS? -- - __A:__ - The relationship between the sales and the .pink[explanatory variables] is linear in parameters, and `\(\varepsilon\)` enters additively. - The .pink[explanatory variables] are __exogenous__, _i.e._, `\(E[\varepsilon|X] = 0\)`. - You've also typically assumed something along the lines of:<br> `\(E[\varepsilon_i] = 0\)`, `\(E[\varepsilon_i^2] = \sigma^2\)`, `\(E[\varepsilon_i \varepsilon_j] = 0\)` for `\(i \neq j\)`. - And (maybe) `\(\varepsilon_i\)` is distributed normally. --- layout: false # Assumptions ## How important can they be? You've learned how **powerful and flexible** ordinary least squares (**OLS**) regression can be. -- However, the results you learned required assumptions. -- **Real life often violates these assumptions.** --- class: clear, middle EC421 asks "**What happens when we violate these assumptions?**" - Can we find a fix? (Especially: How/when is `\(\beta\)` *causal*?) - What happens if we don't (or can't) apply a fix? OLS still does some amazing things—but you need to know when to be **cautious, confident, or dubious**. --- exclude: true # Not everything is causal <img src="slides_files/figure-html/spurious-1.svg" style="display: block; margin: auto;" /> --- # Not everything is causal ## But what _is_? Suppose you estimate our sales model for your boss. $$ \color{#6A5ACD}{\text{Sales}}_i = \hat{\beta}_0 + \hat{\beta}_1 \color{#e64173}{\text{Ad}}_i + \hat{\beta}_2 \color{#e64173}{\text{Price}}_i + \hat{\beta}_3 \color{#e64173}{\text{Comp}}_i + e_i $$ Can you trust that `\(\hat{\beta}_2\)` gives you the actual effect of .pink[price] on .purple[sales]? -- You should be asking several questions... -- 1. .it[Where] does the .it.pink[variation in price] come from? - Is it .it.pink[random] (.it[exogenous])? - .it.pink[Why] are some products (or times) .it.pink[more] expensive than others? 1. .it.orange[Whom] do the data represent? Are they .it.orange[relevant] to your setting? 1. How .it.orange[confident] are you in your answer? --- # Econometrics Applied econometrics, data science, analytics require: 1. Intuition for the __theory__ behind statistics/econometrics<br>(assumptions, results, strengths, weaknesses). 1. Practical knowledge of how to __apply theoretical methods__ to data. 1. Efficient methods for __working with data__<br>(cleaning, aggregating, joining, visualizing). -- __This course__ aims to deepen your knowledge in each of these three areas. -- - 1: As before. - 2–3: __R__ --- # Econometrics My ".b[big-picture takeaways]" (the .it.slab[intuition] that I hope you form) -- - .it[most] interesting questions are .slab.pink[causal]; - .slab.pink[selection into treatment] dominates correlation (esp. cross-sectional); - .slab.pink[measurement error] can too; - causality comes from .slab.pink[design]—not from models/assumptions; - ask about the .slab.pink[counterfactual]; - .slab.pink[non-stationary] time series will lead you to bad conclusions; -- - quantifying .slab.purple[uncertainty] is just as important as the effect estimate; - consider .slab.purple[.it[which] population] your data represent; - the .slab.purple[mean] is only one of many ways to summarize a population; - don’t mistake .slab.purple[mean reversion] for treatment effects/heterogeneity; -- - many .slab.orange[maps] are just .it[population]; - .slab.orange[graphs] should clearly communicate a .it[message]... beautifully. --- layout: false class: clear, middle .it.slab[Next:] .mono[R] basics + (More) Metrics review(s)