class: center, middle, inverse, title-slide # Course Overview ### Itamar Caspi ### March 10, 2019 (updated: 2019-03-18) --- class: .title-slide-final, center, inverse, middle # 10-Year challenge .big[ __2009: ML = Maximum Likelihood__ __2019: ML = Machine Learning__ ] --- # An aside: about the structure of these slides - This slide deck was created using the R package [xaringan](https://slides.yihui.name/xaringan/#44) (/ʃæ.'riŋ.ɡæn/) and [Rmarkdown](https://rmarkdown.rstudio.com/). - Some slides include hidden comments. To view them, press __p__ on your keyboard <img src="figs/comments.gif" width="80%" style="display: block; margin: auto;" /> ??? Here is a comment --- class: title-slide-section-gray # Outline 1. [Logistics](#logistics) 2. [About the Course](#about) 3. [To Do List](#todo) --- class: title-slide-section-blue, center, middle name: logistics # Logistics --- # Class website The (unofficial) class website: [https://ml4econ.github.io/course-spring2019](https://ml4econ.github.io/course-spring2019/index.html) <img src="figs/homepage.gif" width="80%" style="display: block; margin: auto;" /> --- # Discussion forum We will use a [GitHub discussion repository](https://github.com/ml4econ/discussion-spring2019). To use it, you'll need to create a GitHub account and ask for an invitations from Itamar. <img src="figs/discussions.gif" width="80%" style="display: block; margin: auto;" /> ??? Google and [StackExchange](https://stackexchange.com/) are typically the first places to look for answers for computational related questions. It is safe to always assume that you are not the first one to encounter your problem. Nevertheless, we will also be using a private GitHub repository for class discussion. Please post there any questions or requests you may have that are relevant to all course participants, rather than emailing us. All of your classmates will benefit from the public discussions. You can access our GitHub discussion repository [here](https://github.com/ml4econ/discussion-spring2019), but before you do, make sure that you have created a GitHub account (see instruction [here](https://ml4econ.github.io/course-spring2019/git.html)) and that you have received a collaboration invitations from me (Itamar). Shoot me an email if you haven't received an invitation. --- # People - __Itamar Caspi__ - Head of Monetary Analysis Unit, Research Department, Bank of Israel. - email: [caspi.itamar@gmail.com](mailto:caspi.itamat@gmail.com) - homepage and blog: [https://itamarcaspi.rbind.io/](https://itamarcaspi.rbind.io/) - __Ariel Mansura__ - Head of Statistical Methodology Unit, Information and Statistics Department, Bank of Israel. - email: [ariel.mansura@boi.org.il](mailto:ariel.mansura@boi.org.il) * Meeting hours: after class, on demand. --- # Feedback This is the first time we run this course `\(\Rightarrow\)` your continuous feedback is important! Please contact us by - email - in person - or open an issue in our discussion forum --- class: title-slide-section-blue, center, middle name: about # About the Course --- # Prerequisites - Advanced course in econometrics. - Some experience with R (or another programming language) are a plus. --- # This course is .pull-left[ .big[__About__] How and when to apply ML methods in economics - estimate treatment effects - prediction policy - generate new data To do that we will need to understand - what is ML? - How it relates to stuff we already know? - How it differs? ] .pull-right[ .big[__Not about__] - Cutting-edge ML techniques (e.g., deep learning) - Computational aspects (e.g., gradient descent) - Data wrangling (a.k.a. "feature engeneering") - Distributed file systems (e.g., Hadoop, Spark) ] --- # Syllabus | Week | Who | Topic | |:----------------------|:--------------------|:---------------------------------------------------------------------------------| | [**1**](#week-1) | Itamar | Course Overview | | [**2**](#week-2) | Itamar | R, Rstudio, git, and GitHub | | [**3**](#week-3) | Ariel | Regression and K Nearest Neighbors| | [**4**](#week-4) | Ariel | Classification | | [**5**](#week-5) | Ariel | Dimenssionality Reduction | | [**6**](#week-6) | Ariel | Decision Trees and Random Forests| | [**7**](#week-7) | Ariel | Unsupervised Learning| | [**8**](#week-8) | Itamar | Prediction in Aid of Estimation I | | [**9**](#week-9) | Itamar | Prediction in Aid of Estimation II | | [**10**](#week-10) | Itamar | Prediction Policy Problems | | [**11**](#week-11) | Itamar | Text as Data | | [**12**](#week-12) | Itamar | TBA | > __NOTE__: This schedule can (and probably will) go through changes! --- # Readings on ML > All materials and lecture notes will be available on the [class website](https://ml4econ.github.io/course-spring2019/lectures.html). .pull-left[ There are __no__ required textbooks. A couple of suggestions: - [An Introduction to Statistical Learning with Applications in R (ISLR)](http://www-bcf.usc.edu/~gareth/ISL) James, Hastie, Witten, and Tibshirani (2013) __PDF available online__ - [The Elements of Statistical Learning (ELS)](http://statweb.stanford.edu/~tibs/ElemStatLearn) Hastie, Tibshirani, and Friedman (2009) __PDF available online__ ] .pull-right[ <img src="figs/books.png" width="100%" style="display: block; margin: auto;" /> ] --- # Readings on ML for economists > All materials and lecture notes will be available on the [class website](https://ml4econ.github.io/course-spring2019/lectures.html). .pull-left[ Read these excellent surveys: - [The impact of machine learning on economics](https://www.nber.org/chapters/c14009.pdf) Athey (2018) In _The Economics of Artificial Intelligence: An Agenda_. University of Chicago Press. - [Machine learning: an applied econometric approach](https://www.aeaweb.org/articles?id=10.1257/jep.31.2.87) Mullainathan and Spiess (2017) _Journal of Economic Perspectives_, 31(2), 87-106. ] .pull-right[ <img src="figs/susan_sendhil.png" width="100%" style="display: block; margin: auto;" /> ] --- # Programming language * Two of the most popular open-source programming languages for data science: - [<i class="fab fa-r-project"></i>](https://www.r-project.org/) - [<i class="fab fa-python"></i> Python](https://www.python.org/) * Our recommendation: R. * Why R? See presentation notes and the [FAQ section](https://ml4econ.github.io/course-spring2019/faq.html) of our class website. * We do encourage you to try out Python. However, we will be able to provide limited support for Python users. ??? __Why R (and not Python)?__ First things first, there is nothing that [R can do that Pyhton can't](https://www.quora.com/What-can-R-do-that-Python-cant) and [vice verca](https://www.quora.com/What-can-Python-do-that-R-cant). From our perspective, the main difference between the two is their purpose - Python is a general purpose programming language, whereas R was designed by statisticians (and a like) for statisticians. This means that R comes with thousands of built-in and user contributed libraries that help with statistical and econometric analysis. Python can do all that, but in many cases that are important to us, you'll need to code R procedures in Python from scratch. For instance, during this course we will make use of the [__causalTree__](https://github.com/susanathey/causalTree) and [__grf__](https://github.com/grf-labs/grf) R packages (created by Susan Athey et al.). These packages use trees and forests in order to estimate _heterogeneous_ treatment effects. __causalTree__ and __grf__ extend the popular __rpart__ R package, to deal with the task of of estimating causal effects. If you want to perform a causal tree analysis with Python, you'll have to program the whole thing by yourself. Though this could serve as a valuable exercise (and it even appears as an open [issue](https://github.com/grf-labs/grf/issues/257) in the __grf__ project repository), it is way beyond the scope of this course. --- # DataCamp in the classroom <img src="figs/datacamp1.gif" width="100%" style="display: block; margin: auto;" /> --- # Grading Assignments: - [DataCamp Classroom](https://www.datacamp.com/enterprise/machine-learning-for-economists): you will be assigned with specific courses that will teach you essential R programming skills. - Problem sets. Projects: - [Kaggle](https://www.kaggle.com/c/55750-machine-learning-for-economists-huji-2019#) prediction competition: predict median house value in Boston area. - Conduct a replication study based on one of the datasets included in the [experimentdatar](https://itamarcaspi.github.io/experimentdatar/) package. .center[ __GRADING:__ Assingments __20%__, project __40%__, final exam __40%__. ] ??? * DataCamp for the Classroom offer of high-quality, self-paced content and straightforward, easy-to-use interface for both students and instructors. In particular, enrolling to our ml4e DataCamp for the Classroom account will enable you 100% free access to all of DataCamp’s content, including their premium content. * One of your tasks in this course will be a [Kaggle](https://www.kaggle.com) competition. In this competition, you will rely on the “Boston Housing Data” to train and test machine learning models learned in the course. In particular, you will be required to apply the tools introduced in the course in order to predict the median house value based on a set of area specific housing market features. For more details, please visit the competition’s [website](https://www.kaggle.com/t/97eb0edcbe7c406882c7c067076bedd3). --- # Kaggle <img src="figs/kaggle.gif" width="100%" style="display: block; margin: auto;" /> --- # experimentdatar We will also make use of he [`experimentdatar`](https://itamarcaspi.github.io/experimentdatar/) data package that contains publicly available datasets that were used in Susan Athey and Guido Imbens' course ["Machine Learning and Econometrics"](https://www.aeaweb.org/conference/cont-ed/2018-webcasts) (AEA continuing Education, 2018). - You can install the **development** version from [GitHub](https://github.com/itamarcaspi/experimentdatar/) ```r # install.packages("devtools") devtools::install_github("itamarcaspi/experimentdatar") ``` - __EXAMPLE:__ Load the [`experimentdatar`](https://itamarcaspi.github.io/experimentdatar/) package and the `social` dataset: ```r library(experimentdatar) data(social) ``` - Tips: 1. Runnig `?social` privides variable definitions. 2. Running `dataDetails("social")` will open a link to the paper associated with `social`. --- class: title-slide-section-blue, center, middle name: todo # To Do List --- # Homework<sup>*</sup> <i class="fas fa-check-square"></i> Download and install [Git](https://git-scm.com/downloads). <i class="fas fa-check-square"></i> Download and install [R](https://cloud.r-project.org/) and [RStudio](https://www.rstudio.com/). <i class="fas fa-check-square"></i> Create an account on [GitHub](http://github.com/) <i class="fas fa-check-square"></i> Download and install [GitHub Desktop](https://desktop.github.com/). <i class="fas fa-check-square"></i> Create an account on [DataCamp](https://www.datacamp.com/home) and ask Itamar to invite you to [DataCamp Classroom](https://www.datacamp.com/enterprise/machine-learning-for-economists). <i class="fas fa-check-square"></i> Create an account on [Kaggle](www.kaggle.com) and ask Itamar to invite you the course's [prediction competition](https://www.kaggle.com/t/97eb0edcbe7c406882c7c067076bedd3). .footnote[ [*] Please consult the [Guides](https://ml4econ.github.io/course-spring2019/RnRStudio.html) section in our course's website. ] --- class: .title-slide-final, center, inverse, middle # `slides %>% end()` [<i class="fa fa-github"></i> Source code](https://github.com/ml4econ/notes-spring2019/tree/master/01-overview) --- # References <a name=bib-athey2018the></a>[[1]](#cite-athey2018the) S. Athey. "The impact of machine learning on economics". In: _The Economics of Artificial Intelligence: An Agenda_. University of Chicago Press, 2018. [2] T. Hastie, R. Tibshirani and J. Friedman. _The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition_. Springer, Feb. 2009. ISBN: 9780387848570. [3] G. James, T. Hastie, D. Witten, et al. _An Introduction to Statistical Learning: With Applications in R_. Springer Texts in Statistics. Springer London, Limited, 2013. ISBN: 9781461471370. [4] S. Mullainathan and J. Spiess. "Machine learning: an applied econometric approach". In: _Journal of Economic Perspectives_ 31.2 (2017), pp. 87-106.