class: center, middle, inverse, title-slide .title[ # ScPoEconometrics ] .subtitle[ ## Introduction ] .author[ ### Mylène Feuillade, Gustave Kenedi, Florian Oswald and Junnan He ] .date[ ### Sciences Po Paris 2023-02-09 ] --- layout: true <div class="my-footer"><img src="../img/logo/ScPo-shield.png" style="height: 60px;"/></div> --- # Welcome to ScPoEconometrics! -- - In this course you will learn the core tools of ***econometrics***. -- - You will also learn to use the `R` programming language! -- ## What is *econometrics*? - A set of ***techniques and methods*** to answer (economic) questions with ***data***. - Some examples! --- # Answering Important Questions with Econometrics [<ru-blockquote> Does immigration *lead to* lower wages and/or higher unemployment for locals? </ru-blockquote>](http://davidcard.berkeley.edu/papers/mariel-impact.pdf) -- [<ru-blockquote> Does raising the minimum wage *reduce* employment for the low-skilled? </ru-blockquote>](http://davidcard.berkeley.edu/papers/njmin-aer.pdf) -- [<ru-blockquote> Does getting a college degree *afford* higher wages? </ru-blockquote>](http://davidcard.berkeley.edu/papers/causal_educ_earnings.pdf) -- [<ru-blockquote> Do higher public debt levels *lead to* lower economic growth? </ru-blockquote>](https://www.imf.org/external/pubs/ft/wp/2014/wp1434.pdf) -- [<ru-blockquote> Does the neighborhood you grew up in have an *impact* on your life outcomes? </ru-blockquote>](https://academic.oup.com/qje/article/133/3/1107/4850660) --- # Causality * Notice that ***many other factors could have caused*** each of the outcomes mentioned. -- * Often, we'll want to focus on the ***causal impact*** of just one of these factors (immigration, minimum wage, education ,etc.) -- * Econometrics is about spelling out ***conditions*** under which we can ***claim to measure causal relationships***. * We will encounter the most basic of those conditions, and talk about some potential pitfalls. * ["Credibility Revolution"](https://www.aeaweb.org/articles?id=10.1257/jep.24.2.3) in econometrics over the past 30 years ([2022 Economics Nobel](https://www.nobelprize.org/prizes/economic-sciences/2021/press-release/) awarded to some of the main protagonists of this "revolution") --- layout: false class: title-slide-section-red, middle # R --- layout: true <div class="my-footer"><img src="../img/logo/ScPo-shield.png" style="height: 60px;"/></div> --- ## What is `R`? `R` is a __programming language__ with powerful statistical and graphic capabilities. -- ## Why are we using `R`?<sup>1</sup> .footnote[ [1]: This list has been inspired by [Ed Rubin's](https://github.com/edrubin/EC421S19). <span style="visibility:hidden">[2]: Learning `R` definitely requires time and effort but it's worth it, trust me! .</span> ] -- 1. `R` is __free__ and __open source__—saving both you and the university 💰💵💰. -- 1. `R` is very __flexible and powerful__—adaptable to nearly any task, (data cleaning, data visualization, econometrics, spatial data analysis, machine learning, web scraping, etc.) -- 1. `R` has a vibrant, [thriving online community](https://stackoverflow.com/questions/tagged/r) that will (almost) always have a solution to your problem. -- 1. If you put in the work<sup>2</sup>, you will come away with a __very valuable and useful__ tool. .footnote[ <span style="visibility:hidden">[1]: This list has been inspired by [Ed Rubin's](https://github.com/edrubin/EC421S19).</span> [2]: Learning `R` definitely requires time and effort but it's worth it, trust me! ] <!-- --- --> <!-- # Why can't we just use Excel? --> <!-- Many reasons but here are just a few: --> <!-- -- --> <!-- - Not ***reproducible***. --> <!-- -- --> <!-- - Not straightforward to ***merge*** datasets together. --> <!-- -- --> <!-- - Very fastidious to ***clean*** data. --> <!-- -- --> <!-- - Limited to ***small datasets***. --> <!-- -- --> <!-- - Not designed for proper ***econometric analyses***, maps, complex visualisations, etc. --> --- layout: false class: title-slide-section-red, middle # First Taste of R --- layout: true <div class="my-footer"><img src="../img/logo/ScPo-shield.png" style="height: 60px;"/></div> --- # In Practice: Data Wrangling -- * You will spend a lot of time preparing data for further analysis. -- * The `gapminder` dataset contains data on life expectancy, GDP per capita and population by country between 1952 and 2007. * Suppose we want to know the average life expectancy and average GDP per capita for each continent in each year. -- * We need to group the data by continent *and* year, then compute the average life expectancy and average GDP per capita -- .pull-left[ ```r # load gapminder package library(gapminder) # load the dataset from the gapminder package data(gapminder, package = "gapminder") # show first 4 lines of this dataframe head(gapminder,n = 4) ``` ] .pull-right[ ``` ## # A tibble: 4 × 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ``` ] --- # In Practice: Data Wrangling * There are always several ways to achieve a goal. (As in life 😁) * Here we will only focus on the `dplyr` way: ```r # compute the required statistics gapminder_dplyr <- gapminder %>% group_by(continent, year) %>% summarise(count = n(), mean_lifeexp = mean(lifeExp), mean_gdppercap = mean(gdpPercap)) ``` ```r # show first 5 lines of the new data head(gapminder_dplyr, n = 5) ``` ``` ## # A tibble: 5 × 5 ## # Groups: continent [1] ## continent year count mean_lifeexp mean_gdppercap ## <fct> <int> <int> <dbl> <dbl> ## 1 Africa 1952 52 39.1 1253. ## 2 Africa 1957 52 41.3 1385. ## 3 Africa 1962 52 43.3 1598. ## 4 Africa 1967 52 45.3 2050. ## 5 Africa 1972 52 47.5 2340. ``` --- # Visualisation .pull-left[ * Now we could *look* at the result in `gapminder_dplyr`, or compute some statistics from it. * Nothing beats a picture, though: ```r ggplot(data = gapminder_dplyr, mapping = aes(x = mean_lifeexp, y = mean_gdppercap, color = continent, size = count)) + geom_point(alpha = 1/2) + labs(x = "Average life expectancy", y = "Average GDP per capita", color = "Continent", size = "Nb of countries") + theme_bw() ``` ] .pull-right[ <img src="chapter_intro_files/figure-html/gampminder_plot-1.svg" style="display: block; margin: auto;" /> ] --- # Animated Plotting 👌 <sup>1</sup> .center[] .footnote[ [1]: This animation is taken from [Ed Rubin](https://raw.githack.com/edrubin/EC421S19/master/LectureNotes/01Intro/01_intro.html#40). ] --- layout: false class: title-slide-section-red, middle # R 101: Here Is Where You Start --- layout: true <div class="my-footer"><img src="../img/logo/ScPo-shield.png" style="height: 60px;"/></div> --- # Start your `RStudio`! ## First Glossary of Terms * `R`: a programming language. * `RStudio`: an integrated development environment (IDE) to work with `R`. -- * *command*: user input (text or numbers) that `R` *understands*. * *script*: a list of commands collected in a text file, each separated by a new line, to be run one after the other. -- * To run a script, you need to highlight the relevant code lines and hit `Ctrl`+`Enter` (Windows) or `Cmd`+`Enter` (Mac). --- # `RStudio` Layout <img src="chapter_intro_files/figure-html/rstudio.png" width="600px" style="display: block; margin: auto;" /> --- # R as a Calculator * You can use the `R` console like a calculator * Just type an arithmetic operation after `>` and hit `Enter`! -- * Some basic arithmetic first: ```r 4 + 1 ``` ``` ## [1] 5 ``` ```r 8 / 2 ``` ``` ## [1] 4 ``` * Great! What about this? ```r 2^3 ``` ``` ## [1] 8 ``` ```r # by the way: this is a comment! R therefore disregards it ``` --- class: inverse # Task 1 ```r countdown(minutes = 5, top = 0) ```
−
+
05
:
00
1. Create a new R script (File `\(\rightarrow\)` New File `\(\rightarrow\)` R Script). Save it somewhere as `lecture_intro.R`. 1. Type the following code in your script and run it. To run the code press `Ctrl` or `Cmd` + `Enter` (you can either highlight the code or just put your cursor at the end of the line) ```r 4 * 8 ``` 1. Type the following code in your script and run it. What happens if you only run the first line of the code? ```r x = 5 # equivalently x <- 5 x ``` Congratulations, you have created your first `R` "object"! Everything is an object in R! Objects are assigned using `=` or `<-`. 1. Create a new object named `x_3` to which you assign the cube of `x`. Note that to assign you need to use `=` or `<-`. Use code to compute the cube, not a calculator. --- # Where to get Help? .pull-left[ `R` built-in `help`: ```r ?log #? in front of function help(lm) # help() is equivalent ??plot # get all help on keyword "plot" ``` ] -- .pull-right[ In practice:  ] --- # Collaborate! <img src="chapter_intro_files/figure-html/gator_error.jpg" alt="Gator collaboration" width="900" style="display: block; margin-left: auto; margin-right: auto"/> --- # R Packages * `R` users contribute add-on data and functions as *packages* * Installing packages is easy! Just use the `install.packages` function: ```r install.packages("ggplot2") ``` * To *use* the contents of a packge, we must load it from our library using `library`: ```r library(ggplot2) ``` --- # Vectors .pull-left[ * The `c` function creates vectors, i.e. *one-dimensional arrays*. ```r c(1, 3, 5, 7, 8, 9) ``` ``` ## [1] 1 3 5 7 8 9 ``` * Coercion to unique types: ```r (v <- c(42, "Statistics", TRUE)) ``` ``` ## [1] "42" "Statistics" "TRUE" ``` ] -- .pull-right[ * Creating a *range* ```r 1:10 ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` * get vector elements with square bracket operator `[index]`: ```r v[c(1,3)] ``` ``` ## [1] "42" "TRUE" ``` ] --- # `data.frame`'s `data.frame`s represent **tabular data**. Like spreadsheets. ```r example_data = data.frame(x = c(1, 3, 5, 7), y = c(rep("Hello", 3), "Goodbye"), z = c("one", 2, "three", 4)) example_data ``` ``` ## x y z ## 1 1 Hello one ## 2 3 Hello 2 ## 3 5 Hello three ## 4 7 Goodbye 4 ``` * A `data.frame` has 2 dimensions: *rows* and *columns*. Like a *matrix*. Can get elements with `[row_index,col_index]`. * In practice, you will be importing files that contain the data into `R` rather than creating `data.frame`s by hand. --- class: inverse # Task 2
−
+
07
:
00
1. Find out (using `help()` or google) how to import a .csv file. Do NOT use the "Import Dataset" button, nor install a package. 1. Import [gun_murders.csv](https://www.dropbox.com/s/zuk0qcfm3kyzs4e/gun_murders.csv?dl=1)<sup>1</sup> in a new object `murders`. This file contains data on gun murders by US state in 2010. (Hint: objects are created using `=` or `<-`). 1. Ensure that `murders` is a data.frame by running: ```r class(murder) # check class ``` 1. Find out what variables are contained in `murders` by running: ```r names(murders) # obtain variable names ``` 1. View the contents of `murders` by clicking on `murders` in your workspace. What does the `total` variable correspond to? .footnote[ [1]: This dataset is taken from the `dslabs` package. ] --- # `data.frame`s Useful functions to describe a dataframe: ```r str(murders) # `str` describes structure of any R object ``` ``` ## 'data.frame': 51 obs. of 5 variables: ## $ state : chr "Alabama" "Alaska" "Arizona" "Arkansas" ... ## $ abb : chr "AL" "AK" "AZ" "AR" ... ## $ region : chr "South" "West" "West" "South" ... ## $ population: int 4779736 710231 6392017 2915918 37253956 5029196 3574097 897934 601723 19687653 ... ## $ total : int 135 19 232 93 1257 65 97 38 99 669 ... ``` -- ```r names(murders) # column names ``` ``` ## [1] "state" "abb" "region" "population" "total" ``` -- ```r nrow(murders) # number of rows ``` ``` ## [1] 51 ``` -- ```r ncol(murders) # number of columns ``` ``` ## [1] 5 ``` --- # Accessing `data.frame` Columns * To extract one column **as a vector** we can use the `$` operator (as in `murders$state`), or the square bracket operator `[which_index]` with name or position index: ```r first5 <- murders[1:5, ] # take first 5 states only first5$state # extract with $ operator ``` ``` ## [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California" ``` ```r first5[ ,"state"] # extract with column name ``` ``` ## [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California" ``` ```r first5[ ,1] # get first column ``` ``` ## [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California" ``` -- .pull-left[ * Check `class` of an object: ```r class(murders) ``` ``` ## [1] "data.frame" ``` ] -- .pull-right[ * `typeof` gives the R-internal data type: ```r typeof(murders) ``` ``` ## [1] "list" ``` ] --- # Subsetting `data.frames` * Subsetting a data.frame: `murders[row condition, column number]` or `murders[row condition, "column name"]` ```r # Only keep states with over 500 gun murders and keep only the "state" and "total" variables murders[murders$total > 500, c("state", "total")] ``` ``` ## state total ## 5 California 1257 ## 10 Florida 669 ## 33 New York 517 ## 44 Texas 805 ``` ```r # Only keep California and Texas and keep only the "state" and "total" variables murders[murders$state %in% c("California", "Texas"), c("state", "total")] ``` ``` ## state total ## 5 California 1257 ## 44 Texas 805 ``` --- class: inverse # Task 3
−
+
10
:
00
1. How many observations are there in `murders`? 1. How many variables? What are the data types of each variable? 1. Remember that the colon operator `1:10` is just short for *construct a sequence from `1` to `10`* (i.e. 1, 2, 3, etc). Create a new object `murders_2` containing the rows 10 to 25 of `murders`. 1. Create a new object `murders_3` which only contains the columns `state` and `total`. (Recall that `c` creates vectors.) 1. Create a `total_percap` variable equal to the number of murders per 10,000 inhabitants by running the following code. ```r murders$total_percap = (murders$total / murders$population) * 10000 ``` Congratulations, you've created your first variable! Click on the `murders` object to see the new variable. --- layout: false class: title-slide-section-red, middle # Course Details --- layout: true <div class="my-footer"><img src="../img/logo/ScPo-shield.png" style="height: 60px;"/></div> --- # This Course - Teach you the basics of ***linear regression***, ***statistical inference*** and ***impact evaluation***. -- - Equip you with a framework to think more deeply about ***causality***. -- - Introduce you to the `R` software environment. -- - ⚠️ This is *not* a course about `R`. -- ## Grading 1. There will be ***five quizzes*** on Moodle roughly every two weeks `\(\rightarrow\)` 40% -- 1. There will be ***two take home exams / case studies*** `\(\rightarrow\)` 60% -- 1. There will be _no_ final exam 😅 --- # Syllabus 🤞 .pull-left[ Lecture 1 (today): ***Introduction*** *Quiz 1 (after lecture 2)* Lectures 2 and 3: ***Tidying, Visualising and Summarising Data*** *Quiz 2* Lecture 4: ***Simple Linear Regression*** Lecture 5: ***Introduction to Causality*** *Midterm Project* Lecture 6: ***Multiple Linear Regression*** Lecture 7: ***Linear Regression Extensions*** ] .pull-right[ *Quiz 3* Lecture 8: ***Sampling*** Lecture 9: ***Confidence Intervals and Hypothesis Testing*** *Quiz 4* Lecture 10: ***Statistical Inference*** Lecture 11: ***Regression Discontinuity*** *Quiz 5* Lecture 12: ***Recap*** *Final Project* ] --- # Course Materials and Useful Resources [Book](https://scpoecon.github.io/ScPoEconometrics/) | [Slides](https://github.com/ScPoEcon/ScPoEconometrics-Slides) | [Reading Suggestions](https://github.com/ScPoEcon/ScPoEconometrics-Slides/blob/master/syllabus.md) | [Interactive R Apps](https://github.com/ScPoEcon/ScPoApps) -- | -- | -- -- .pull-left[ ## Econometrics - [*Mastering Metrics* by Angrist and Pischke](https://uspc-spo.primo.exlibrisgroup.com/permalink/33USPC_SPO/1d4eftp/alma991003947949705808) - [*Introductory Econometrics: A Modern Approach* by Wooldridge](https://uspc-spo.primo.exlibrisgroup.com/permalink/33USPC_SPO/mgsi7h/alma991003442469705808) - [*Introduction to Econometrics* by Stock and Watson](https://uspc-spo.primo.exlibrisgroup.com/permalink/33USPC_SPO/mgsi7h/alma991003436429705808) - [*Causal Inference: The Mixtape* by Cunningham](https://mixtape.scunning.com/) - [Ben Lambert's youtube channel](https://www.youtube.com/user/SpartacanUsuals) ] .pull-right[ ## Metrics and `R` - [ModernDive](https://moderndive.com/) - [Introduction to Econometrics with R](https://www.econometrics-with-r.org/) - [R for Data Science](https://r4ds.had.co.nz/) - [Awesome R Learning Resources](https://github.com/iamericfletcher/awesome-r-learning-resources) ] --- # Course Policies *Be nice. Be honest. Don't cheat.* -- (BTW you should apply these principles outside of the classroom as well ***#lifeadvice*** 😉) -- .pull-left[ ***Slack*** We will ***exclusively*** use Slack for our interactions. Please ***DO NOT*** contact me by email (unless there's a legitimate reason to). I will be checking Slack sparingly so please help one another! ] -- .pull-right[ ***Class conduct and expectations*** *Late work:* Won't be accepted unless you have a very good reason. *Cheating:* You will get 0. Just don't cheat, it's honestly not worth it. *Work in groups:* You can/should work in groups of 2-3 on the quizzes. ] --- # Mental Health > ***Pay attention to your mental health in the same way you would your physical health.*** -- Seeking help is a sign of perceptive self-awareness, [not weakness](https://www.nytimes.com/2020/09/24/well/mind/cord-jefferson-emmy-black-mental-health.html). -- Don't hesitate to ***reach out*** to your friends, family, any professor (including myself!) if you feel the need. ***Sciences Po*** offers free and confidential [psychological support services](https://www.sciencespo.fr/students/en/living/health/health-centre) with professionals. --- # Last But Not Least: Class Rep .center[] --- class: title-slide-final, middle background-image: url(../img/logo/ScPo-econ.png) background-size: 250px background-position: 9% 19% # SEE YOU NEXT WEEK! | | | | :--------------------------------------------------------------------------------------------------------- | :-------------------------------- | | <a href="https://github.com/ScPoEcon/ScPoEconometrics-Slides">.ScPored[<i class="fa fa-link fa-fw"></i>] | Slides | | <a href="https://scpoecon.github.io/ScPoEconometrics">.ScPored[<i class="fa fa-link fa-fw"></i>] | Book | | <a href="http://twitter.com/ScPoEcon">.ScPored[<i class="fa fa-twitter fa-fw"></i>] | @ScPoEcon | | <a href="http://github.com/ScPoEcon">.ScPored[<i class="fa fa-github fa-fw"></i>] | @ScPoEcon |