class: center, middle, inverse, title-slide .title[ # 01 - Course Overview ] .subtitle[ ## ml4econ, HUJI 2024 ] .author[ ### Itamar Caspi ] .date[ ### May 5, 2024 (updated: 2024-05-05) ] --- # An aside: about the structure of these slides - The course's slide decks are created using the [xaringan](https://slides.yihui.name/xaringan/#44) (/ʃæ.'riŋ.ɡæn/) R package and [Rmarkdown](https://rmarkdown.rstudio.com/). - Some slides include hidden comments. To view them, press __p__ on your keyboard <img src="figs/comments.gif" width="80%" style="display: block; margin: auto;" /> ??? Here is a comment --- class: title-slide-section-gray # Outline 1. [Logistics](#logistics) 2. [About the Course](#about) 3. [To Do List](#todo) --- class: title-slide-section-blue, center, middle name: logistics # Logistics --- # ml4econ GitHub repository The class's GitHub repository: [https://github.com/ml4econ](https://github.com/ml4econ) <img src="figs/repo.gif" width="80%" style="display: block; margin: auto;" /> --- # Posit Cloud workspace [Posit Cloud](https://posit.cloud/) is a hosted version of RStudio in the cloud that will make it easy for R and Python novices to learn data science and machine learning using R and Python. <img src="figs/posit.jpg" width="80%" style="display: block; margin: auto;" /> --- # People - __Itamar Caspi__ - email: [caspi.itamar@gmail.com](mailto:caspi.itamat@gmail.com) - homepage: [itamarcaspi.rbind.io](https://itamarcaspi.rbind.io/) - __Inbar Avni (TA)__ - email: [TBA]() * Meeting hours: after class/zoom, on demand. --- # Feedback Your continuous feedback is important! Please feel free to contact us by - email - in person - or open an issue in our discussion forum --- class: title-slide-section-blue, center, middle name: about # About the Course --- # Prerequisites - Advanced course in econometrics. - Some early experience with R (or another programming language) are a plus. --- # This course is .pull-left[ .big[__About__] How and when to apply ML methods in economics - estimate treatment effects. - prediction policy. - work with new types of data (e.g., text). To do that we will need to understand - what is ML? - how it relates to stuff you already know? - how it differs? ] .pull-right[ .big[__Not about__] - Cutting-edge ML techniques (e.g., generative AI) - Computational aspects (e.g., gradient descent) - Data wrangling (a.k.a. "feature engineering") - Distributed file systems (e.g., Hadoop, Spark) ] --- # Tentative schedule | Week | Topic | |:----------------------|:-------------------------------------------| | [**1**](#week-1) | Course Overview & ML Basics | | [**2**](#week-2) | Reproducibility and ML Workflow | | [**3**](#week-3) | Regression and Regularization | | [**4**](#week-4) | Classification | | [**5**](#week-5) | Non-parametrics | | [**6**](#week-6) | Unsupervised Learning | | [**7**](#week-7) | Text analysis | | [**8**](#week-7) | Causal Inference | | [**9**](#week-8) | Lasso and Average Treatment Effects | | [**10**](#week-9) | Trees and Heterogeneous Treatment Effects | | [**11**](#week-10) | Prediction Policy Problems | | [**12**](#week-11) | Large Language Models | > __NOTE__: This schedule can (and probably will) go through changes! --- # Readings on ML for economists > All materials and lecture notes will be available on the [class website](https://ml4econ.github.io/course-spring2024/lectures.html). .pull-left[ Please read the following excellent surveys: - [The impact of machine learning on economics](https://www.nber.org/chapters/c14009.pdf) Athey (2018) In _The Economics of Artificial Intelligence: An Agenda_. University of Chicago Press. - [Machine learning: an applied econometric approach](https://www.aeaweb.org/articles?id=10.1257/jep.31.2.87) Mullainathan and Spiess (2017) _Journal of Economic Perspectives_, 31(2), 87-106. ] .pull-right[ <img src="figs/susan_sendhil.png" width="100%" style="display: block; margin: auto;" /> ] --- # Readings on ML > All materials and lecture notes will be available on the [course repo](https://ml4econ.github.io/course-spring2023/lectures.html). .pull-left[ There are __no__ required textbooks. A couple of suggestions: - [An Introduction to Statistical Learning with Applications in R/Python (ISLR), 2 ed.](https://www.statlearning.com/) James, Hastie, Witten et al. (2013) __PDF available online__ - [The Elements of Statistical Learning (ELS)](https://hastie.su.domains/ElemStatLearn/download.html) Hastie, Tibshirani, and Friedman (2009) __PDF available online__ ] .pull-right[ <img src="figs/books.png" width="100%" style="display: block; margin: auto;" /> ] --- # Textbooks (optional) > All materials and lecture notes will be available on the [course repo](https://ml4econ.github.io/course-spring2023/lectures.html). .pull-left[ There are __no__ required textbooks. A couple of suggestions: - [Business Data Science by Matt Taddy](http://taddylab.com/BDS.html) __No free version available__ - [Econometrics by Bruce Hansen, Ch. 29](https://www.ssc.wisc.edu/~bhansen/econometrics/) __PDF available online__ ] .pull-right[ <img src="figs/bds.png" width="50%" style="display: block; margin: auto;" /> ] --- # More resources Can be found at our GitHub repo: [https://github.com/ml4econ/lecture-notes-2024/blob/master/resources.md](https://github.com/ml4econ/lecture-notes-2020/blob/master/resources.md) --- # Programming * Two of the most popular open-source programming languages for data science: - [<i class="fab fa-r-project"></i>](https://www.r-project.org/) - [<i class="fab fa-python"></i> Python](https://www.python.org/) * This course: Mostly R. * Why R? See presentation notes and the [FAQ section](https://ml4econ.github.io/course-spring2019/faq.html) of our class website. * We do encourage you to try out Python. However, I will only be able to provide limited support for Python users. Inbar on the other hand, will be able to provide more support. --- # Catching up with R [ChatGPT](https://chatgpt.com) is a great tool to learn R. [Posit Recipes](https://posit.cloud/learn/recipes) is a great (old school) resource to learn R. --- # Econometrics with R [Introduction to Econometrics with R (Hanck, Arnold, Gerber, and Schmelzer)](https://www.econometrics-with-r.org/index.html) <img src="figs/introR.png" width="25%" style="display: block; margin: auto;" /> --- # Large Language Models (LLMs) We encourage you to use [ChatGPT](https://openai.com/blog/chatgpt), [Claude](https://claude.ai/), or any other LLM in this course, as it is an **essential skill to acquire**. It is important you understand the (current) limitations of LLMs: - Prompt engineering is necessary for quality outcomes. - Always assume that it is wrong. - Acknowledge its use in assignments and explain what prompts were used. Three useful resources: - Follow [@emollick](https://twitter.com/emollick) (Ethan Mollick) - Read ["Generative AI for Economic Research: Use Cases and Implications for Economists"](https://www.aeaweb.org/articles?id=10.1257/jel.20231736) by Korinek (2023 JEL). **Share your discoveries with us and your classmates!** --- # Grading Assignments: - Submit 4 out of a total of 6 Problem sets. Two projects: - Kaggle prediction competition. - Conduct a replication study based on one of the datasets included in the [experimentdatar](https://itamarcaspi.github.io/experimentdatar/) package, or a paper of your choice. .center[ __GRADING:__ Assingments __20%__, kaggle __30%__, project __50%__. ] ??? * One of your tasks in this course will be a [Kaggle](https://www.kaggle.com) competition. In this competition, you will rely on the “Boston Housing Data” to train and test machine learning models learned in the course. In particular, you will be required to apply the tools introduced in the course in order to predict ... [website](https://www.kaggle.com/t/97eb0edcbe7c406882c7c067076bedd3). --- # Kaggle <img src="figs/kaggle.gif" width="100%" style="display: block; margin: auto;" /> --- # experimentdatar We will also make use of he [`experimentdatar`](https://itamarcaspi.github.io/experimentdatar/) data package that contains publicly available datasets that were used in Susan Athey and Guido Imbens' course ["Machine Learning and Econometrics"](https://www.aeaweb.org/conference/cont-ed/2018-webcasts) (AEA continuing Education, 2018). - You can install the **development** version from [GitHub](https://github.com/itamarcaspi/experimentdatar/) ```r # install.packages("devtools") devtools::install_github("itamarcaspi/experimentdatar") ``` - __EXAMPLE:__ Load the [`experimentdatar`](https://itamarcaspi.github.io/experimentdatar/) package and the `social` dataset: ```r library(experimentdatar) data(social) ``` - Tips: 1. Runnig `?social` privides variable definitions. 2. Running `dataDetails("social")` will open a link to the paper associated with `social`. --- class: title-slide-section-blue, center, middle name: todo # To Do List --- # Homework <i class="fas fa-check-square"></i> Download and install [Git](https://git-scm.com/downloads). <i class="fas fa-check-square"></i> Download and install [R and RStudio](https://posit.co/download/rstudio-desktop/). <i class="fas fa-check-square"></i> Create an account on [GitHub](http://github.com/) <i class="fas fa-check-square"></i> Download and install [GitHub Desktop](https://desktop.github.com/). --- class: .title-slide-final, center, inverse, middle # `slides %>% end()` [<i class="fa fa-github"></i> Source code](https://github.com/ml4econ/lecture-notes-2024/blob/master/01-overview/01-overview.Rmd) --- # References <a name=bib-athey2018the></a>[[1]](#cite-athey2018the) S. Athey. "The impact of machine learning on economics". In: _The Economics of Artificial Intelligence: An Agenda_. University of Chicago Press, 2018. [2] T. Hastie, R. Tibshirani, and J. Friedman. _The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition_. Springer, פבר. 2009. ISBN: 9780387848570. [3] G. James, T. Hastie, D. Witten, et al. _An Introduction to Statistical Learning: With Applications in R_. Springer Texts in Statistics. Springer London, Limited, 2013. ISBN: 9781461471370. [4] A. Korinek. "Language Models and Cognitive Automation for Economic Research". In: _NBER Working Paper_ 30957 (2023). [5] S. Mullainathan and J. Spiess. "Machine learning: an applied econometric approach". In: _Journal of Economic Perspectives_ 31.2 (2017), pp. 87-106.