Lecture .mono[000]

class: center, middle, inverse, title-slide

.title[
# Lecture .mono[000]
]
.subtitle[
## Why are we here?
]
.author[
### Edward Rubin
]

---

exclude: true

---
class: inverse, middle
# Admin

---
name: admin
# Admin

.hi-slate[In-class today]

- .note[Course website:] [https://github.com/edrubin/EC524W25/](https://github.com/edrubin/EC524W25/)
- [Syllabus](https://raw.githack.com/edrubin/EC524W25/master/syllabus/syllabus.pdf) (on website)

.hi-slate[.mono[TODO] list]

- .note[Today:] Sign up for [Kaggle](https://www.kaggle.com)
- Upcoming readings:
  - ISL Ch1–Ch2
  - [Prediction Policy Problems](https://www.aeaweb.org/articles?id=10.1257/aer.p20151023) by Kleinberg .it[et al.] (2015)
- .note[Assignment:] This week? (Getting to know prediction and Kaggle)

---
class: inverse, middle
# What's the goal?

---
layout: true
# What's the goal?

---
name: different
## What's different?

We've got a whole class on .hi-purple[prediction]. Why?

Up to this point, we've focused on causal .hi[identification/inference] of `$\beta$`, _i.e._,

`$$\color{#6A5ACD}{\text{Y}_{i}} = \text{X}_{i} \color{#e64173}{\beta} + u_i$$`

meaning we want an unbiased (consistent) and precise estimate `$\color{#e64173}{\hat\beta}$`.

With .hi-purple[prediction], we shift our focus to accurately estimating outcomes.

In other words, how can we best construct `$\color{#6A5ACD}{\hat{\text{Y}}_{i}}$`?

---
## ... so?

So we want "nice"-performing estimates `$\hat y$` instead of `$\hat\beta$`.

.qa[Q] Can't we just use the same methods (_i.e._, OLS)?

.qa[A] It depends.
--
 How well does your .hi[linear]-regression model approximate the underlying data? (And how do you plan to select your model?)

.note[Recall] Least-squares regression is a great .hi[linear] estimator.

---
layout: false
class: clear, middle

Data can be tricky.super[.pink[†]]—as can understanding many relationships.

.footnote[
<br>.pink[†] A typo previously had this slide saying "Data data be tricky", which I really like. "Tricky" might mean nonlinear... or many other things...
]

---
layout: true
class: clear

---
exclude: true

---
name: graph-example

.white[blah]
<img src="slides_files/figure-html/plot points-1.svg" style="display: block; margin: auto;" />

---
.pink[Linear regression]
<img src="slides_files/figure-html/plot ols-1.svg" style="display: block; margin: auto;" />

---
.pink[Linear regression], .turquoise[linear regression] `$\color{#20B2AA}{\left( x^4 \right)}$`
<img src="slides_files/figure-html/plot ols poly-1.svg" style="display: block; margin: auto;" />

---
.pink[Linear regression], .turquoise[linear regression] `$\color{#20B2AA}{\left( x^4 \right)}$`, .purple[KNN (100)]
<img src="slides_files/figure-html/plot knn-1.svg" style="display: block; margin: auto;" />

---
.pink[Linear regression], .turquoise[linear regression] `$\color{#20B2AA}{\left( x^4 \right)}$`, .purple[KNN (100)], .orange[KNN (10)]
<img src="slides_files/figure-html/plot knn more-1.svg" style="display: block; margin: auto;" />

---
.pink[Linear regression], .turquoise[linear regression] `$\color{#20B2AA}{\left( x^4 \right)}$`, .purple[KNN (100)], .orange[KNN (10)], .slate[random forest]
<img src="slides_files/figure-html/plot rf-1.svg" style="display: block; margin: auto;" />

---
class: clear, middle

.note[Note] That example only had one predictor...

---
layout: false
name: tradeoffs
# What's the goal?
## Tradeoffs

In prediction, we constantly face many tradeoffs, _e.g._,
- .hi[flexibility] and .hi-slate[parametric structure] (and interpretability)
- performance in .hi[training] and .hi-slate[test] samples
- .hi[variance] and .hi-slate[bias]

As your economic training should have predicted, in each setting, we need to .b[balance the additional benefits and costs] of adjusting these tradeoffs.

Many machine-learning (ML) techniques/algorithms are crafted to optimize with these tradeoffs, but the practitioner (you) still needs to be careful.

---
name: more-goals
# What's the goal?

There are many  reasons to step outside the world of linear regression...

.hi-slate[Multi-class] classification problems
- Rather than {0,1}, we need to classify `$y_i$` into 1 of K classes
- _E.g._, ER patients: {heart attack, drug overdose, stroke, nothing}

.hi-slate[Text analysis] and .hi-slate[image recognition]
- Comb though sentences (pixels) to glean insights from relationships
- _E.g._, detect sentiments in tweets or roof-top solar in satellite imagery

.hi-slate[Unsupervised learning]
- You don't know groupings, but you think there are relevant groups
- _E.g._, classify spatial data into groups

---
layout: true
class: clear, middle

---
name: example-articles
<img src="images/ml-xray.png" width="90%" style="display: block; margin: auto;" />

---
<img src="images/ml-cars.png" width="90%" style="display: block; margin: auto;" />

---
<img src="images/ml-oil.png" width="90%" style="display: block; margin: auto;" />

---
<img src="images/ml-methane.png" width="90%" style="display: block; margin: auto;" />

---
<img src="images/ml-writing.png" width="90%" style="display: block; margin: auto;" />

---
<img src="images/ml-issues.jpeg" width="90%" style="display: block; margin: auto;" />

---

And of course...

[**OpenAI**](https://openai.com/), [**ChatGPT**](https://openai.com/blog/chatgpt/), [**Copilot**](https://github.com/features/copilot), [**Claude**](https://claude.ai/), [**Midjourney**](https://www.midjourney.com/) ...

---
layout: false
# Takeaways?

Any main takeaways/thoughts from these examples?

.note[Mine]

- Interactions and nonlinearities likely matter
- .it[Engineering] features/variables can be important
- .it[Related:] We might not even know the features that matter
- Flexibility is huge—but we still want to avoid overfitting

---
class: clear, middle

.note[Next time] Start formal building blocks of prediction.

---
name: sources
layout: false

# Sources

Sources (articles) of images

- [Deep learning and radiology](https://www.smart2zero.com/news/algorithm-beats-radiologists-diagnosing-x-rays)
- [Parking lot detection](https://www.smart2zero.com/news/algorithm-beats-radiologists-diagnosing-x-rays)
- [.it[New Yorker] writing](https://www.newyorker.com/magazine/2019/10/14/can-a-machine-learn-to-write-for-the-new-yorker)
- [Oil surplus](https://www.wired.com/2015/03/orbital-insight/)
- [Methane leaks](https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-5P/Monitoring_methane_emissions_from_gas_pipelines)
- [Gender Shades](http://gendershades.org/overview.html)

---
# Table of contents

.col-left[
.small[
#### Admin
- <a href="#admin">Today and upcoming</a>

#### What's the goal?
- <a href="#different">What's different?</a>
- <a href="#graph-example">Graphical example</a>
- <a href="#tradeoffs">Tradeoffs</a>
- <a href="#more-goals">More goals</a>
- <a href="#example-articles">Examples</a>

#### Other
- <a href="#sources">Image sources</a>
]
]