class: center, middle, inverse, title-slide .title[ # Lecture .mono[000] ] .subtitle[ ## Why are we here? ] .author[ ### Edward Rubin ] --- exclude: true --- class: inverse, middle # Admin --- name: admin # Admin .hi-slate[In-class today] - .note[Course website:] [https://github.com/edrubin/EC524W25/](https://github.com/edrubin/EC524W25/) - [Syllabus](https://raw.githack.com/edrubin/EC524W25/master/syllabus/syllabus.pdf) (on website) .hi-slate[.mono[TODO] list] - .note[Today:] Sign up for [Kaggle](https://www.kaggle.com) - Upcoming readings: - ISL Ch1–Ch2 - [Prediction Policy Problems](https://www.aeaweb.org/articles?id=10.1257/aer.p20151023) by Kleinberg .it[et al.] (2015) - .note[Assignment:] This week? (Getting to know prediction and Kaggle) --- class: inverse, middle # What's the goal? --- layout: true # What's the goal? --- name: different ## What's different? We've got a whole class on .hi-purple[prediction]. Why? -- Up to this point, we've focused on causal .hi[identification/inference] of `\(\beta\)`, _i.e._, `$$\color{#6A5ACD}{\text{Y}_{i}} = \text{X}_{i} \color{#e64173}{\beta} + u_i$$` meaning we want an unbiased (consistent) and precise estimate `\(\color{#e64173}{\hat\beta}\)`. -- With .hi-purple[prediction], we shift our focus to accurately estimating outcomes. In other words, how can we best construct `\(\color{#6A5ACD}{\hat{\text{Y}}_{i}}\)`? --- ## ... so? So we want "nice"-performing estimates `\(\hat y\)` instead of `\(\hat\beta\)`. .qa[Q] Can't we just use the same methods (_i.e._, OLS)? -- .qa[A] It depends. -- How well does your .hi[linear]-regression model approximate the underlying data? (And how do you plan to select your model?) -- .note[Recall] Least-squares regression is a great .hi[linear] estimator. --- layout: false class: clear, middle Data can be tricky.super[.pink[†]]—as can understanding many relationships. .footnote[ <br>.pink[†] A typo previously had this slide saying "Data data be tricky", which I really like. "Tricky" might mean nonlinear... or many other things... ] --- layout: true class: clear --- exclude: true --- name: graph-example .white[blah] <img src="slides_files/figure-html/plot points-1.svg" style="display: block; margin: auto;" /> --- .pink[Linear regression] <img src="slides_files/figure-html/plot ols-1.svg" style="display: block; margin: auto;" /> --- .pink[Linear regression], .turquoise[linear regression] `\(\color{#20B2AA}{\left( x^4 \right)}\)` <img src="slides_files/figure-html/plot ols poly-1.svg" style="display: block; margin: auto;" /> --- .pink[Linear regression], .turquoise[linear regression] `\(\color{#20B2AA}{\left( x^4 \right)}\)`, .purple[KNN (100)] <img src="slides_files/figure-html/plot knn-1.svg" style="display: block; margin: auto;" /> --- .pink[Linear regression], .turquoise[linear regression] `\(\color{#20B2AA}{\left( x^4 \right)}\)`, .purple[KNN (100)], .orange[KNN (10)] <img src="slides_files/figure-html/plot knn more-1.svg" style="display: block; margin: auto;" /> --- .pink[Linear regression], .turquoise[linear regression] `\(\color{#20B2AA}{\left( x^4 \right)}\)`, .purple[KNN (100)], .orange[KNN (10)], .slate[random forest] <img src="slides_files/figure-html/plot rf-1.svg" style="display: block; margin: auto;" /> --- class: clear, middle .note[Note] That example only had one predictor... --- layout: false name: tradeoffs # What's the goal? ## Tradeoffs In prediction, we constantly face many tradeoffs, _e.g._, - .hi[flexibility] and .hi-slate[parametric structure] (and interpretability) - performance in .hi[training] and .hi-slate[test] samples - .hi[variance] and .hi-slate[bias] -- As your economic training should have predicted, in each setting, we need to .b[balance the additional benefits and costs] of adjusting these tradeoffs. -- Many machine-learning (ML) techniques/algorithms are crafted to optimize with these tradeoffs, but the practitioner (you) still needs to be careful. --- name: more-goals # What's the goal? There are many reasons to step outside the world of linear regression... -- .hi-slate[Multi-class] classification problems - Rather than {0,1}, we need to classify `\(y_i\)` into 1 of K classes - _E.g._, ER patients: {heart attack, drug overdose, stroke, nothing} -- .hi-slate[Text analysis] and .hi-slate[image recognition] - Comb though sentences (pixels) to glean insights from relationships - _E.g._, detect sentiments in tweets or roof-top solar in satellite imagery -- .hi-slate[Unsupervised learning] - You don't know groupings, but you think there are relevant groups - _E.g._, classify spatial data into groups --- layout: true class: clear, middle --- name: example-articles <img src="images/ml-xray.png" width="90%" style="display: block; margin: auto;" /> --- <img src="images/ml-cars.png" width="90%" style="display: block; margin: auto;" /> --- <img src="images/ml-oil.png" width="90%" style="display: block; margin: auto;" /> --- <img src="images/ml-methane.png" width="90%" style="display: block; margin: auto;" /> --- <img src="images/ml-writing.png" width="90%" style="display: block; margin: auto;" /> --- <img src="images/ml-issues.jpeg" width="90%" style="display: block; margin: auto;" /> --- And of course... [**OpenAI**](https://openai.com/), [**ChatGPT**](https://openai.com/blog/chatgpt/), [**Copilot**](https://github.com/features/copilot), [**Claude**](https://claude.ai/), [**Midjourney**](https://www.midjourney.com/) ... --- layout: false # Takeaways? Any main takeaways/thoughts from these examples? -- .note[Mine] - Interactions and nonlinearities likely matter - .it[Engineering] features/variables can be important - .it[Related:] We might not even know the features that matter - Flexibility is huge—but we still want to avoid overfitting --- class: clear, middle .note[Next time] Start formal building blocks of prediction. --- name: sources layout: false # Sources Sources (articles) of images - [Deep learning and radiology](https://www.smart2zero.com/news/algorithm-beats-radiologists-diagnosing-x-rays) - [Parking lot detection](https://www.smart2zero.com/news/algorithm-beats-radiologists-diagnosing-x-rays) - [.it[New Yorker] writing](https://www.newyorker.com/magazine/2019/10/14/can-a-machine-learn-to-write-for-the-new-yorker) - [Oil surplus](https://www.wired.com/2015/03/orbital-insight/) - [Methane leaks](https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-5P/Monitoring_methane_emissions_from_gas_pipelines) - [Gender Shades](http://gendershades.org/overview.html) --- # Table of contents .col-left[ .small[ #### Admin - <a href="#admin">Today and upcoming</a> #### What's the goal? - <a href="#different">What's different?</a> - <a href="#graph-example">Graphical example</a> - <a href="#tradeoffs">Tradeoffs</a> - <a href="#more-goals">More goals</a> - <a href="#example-articles">Examples</a> #### Other - <a href="#sources">Image sources</a> ] ]