class: center, middle, inverse, title-slide .title[ # Econometrics ] .subtitle[ ## Introduction ] .author[ ### Florian Oswald ] .date[ ### UniTo ESOMAS 2025-09-27 ] --- layout: true <div class="my-footer"><img src="../img/logo/unito-shield.png" style="height: 60px;"/></div> --- # Welcome to Econometrics @ ESOMAS UniTo! ## Team .pull-left[ * My name is Florian Oswald, I'm a Professor at ESOMAS. Check out my [website](https://floswald.github.io)! * I work on urban, macro and IO topics. * Our TA this year is [Kacper Krasowski](https://kacperkrasowski.github.io), PhD student at Collegio Carlo Alberto. ] -- .pull-right[ * I do a lot of computation (who doesn't). I like `R`, `python` and [`julia`](https://julialang.org) - I teach computational econ to our PhD students. * I profited *a lot* from the open source software (OSS) community * OSS is key to reproducible research. π seeing that every day as [Data Editor](https://jpedataeditor.github.io) * I try to use and teach my students tools which enable greater reproducibility. ] --- # Welcome to Econometrics @ ESOMAS UniTo! - In this course you will learn the core tools of ***econometrics***. -- - You will also learn to use the `R` programming language! -- ## What is *econometrics*? - A set of ***techniques and methods*** to answer (economic) questions with ***data***. - Some examples! --- # Answering Important Questions with Econometrics <!-- -- --> <!-- [<ru-blockquote> --> <!-- Does refugee migration *impact* Electoral Outcomes? --> <!-- </ru-blockquote>](https://academic.oup.com/restud/article-abstract/86/5/2035/5112970) --> -- [<ru-blockquote> Does raising the minimum wage *reduce* employment for the low-skilled? </ru-blockquote>](http://davidcard.berkeley.edu/papers/njmin-aer.pdf) -- [<ru-blockquote> Does mandating a 40% representation of each gender on the board of public limited liability companies increase the number of women in top jobs? </ru-blockquote>](https://academic.oup.com/restud/article-abstract/86/1/191/5042274) -- [<ru-blockquote> Does the neighborhood you grew up in have an *impact* on your life outcomes? </ru-blockquote>](https://academic.oup.com/qje/article/133/3/1107/4850660) -- [<ru-blockquote> Does giving a work permit to immigrants *cause* them to commit less crimes? </ru-blockquote>](https://www.aeaweb.org/articles?id=10.1257/aer.20150355) --- # Causality * Notice that ***many other factors could have caused*** each of the outcomes mentioned. -- * Often, we'll want to focus on the ***causal impact*** of just one of these factors (immigration, minimum wage, education ,etc.) -- * Econometrics is about spelling out ***conditions*** under which we can ***claim to measure causal relationships***. * We will encounter the most basic of those conditions, and talk about some potential pitfalls. * ["Credibility Revolution"](https://www.aeaweb.org/articles?id=10.1257/jep.24.2.3) in econometrics over the past 30 years ([2022 Economics Nobel](https://www.nobelprize.org/prizes/economic-sciences/2021/press-release/) awarded to some of the main protagonists of this "revolution") ??? test comment speaker note. --- # Welcome to the Post-Truth Age - Bullshit Rules the World .pull-left[ - "Alternative facts" [Kellyanne Conway, Meet the Press, January 22, 2017] ] .pull-right[  ] --- # Welcome to the Post-Truth Age - Bullshit Rules the World .pull-left[ - "Alternative facts" [Kellyanne Conway, Meet the Press, January 22, 2017] - "Trade wars are good and easy to win" [Donald Trump, 2018] ] .pull-right[  ] --- # Welcome to the Post-Truth Age - Bullshit Rules the World .pull-left[ - "Alternative facts" [Kellyanne Conway, Meet the Press, January 22, 2017] - "Trade wars are good and easy to win" [Donald Trump, 2018] - "The concept of global warming was created by and for the Chinese" [Donald Trump, Twitter, 2012] ] .pull-right[  ] --- # Welcome to the Post-Truth Age - Bullshit Rules the World .pull-left[ - "Alternative facts" [Kellyanne Conway, Meet the Press, January 22, 2017] - "Trade wars are good and easy to win" [Donald Trump, 2018] - "The concept of global warming was created by and for the Chinese" [Donald Trump, Twitter, 2012] - Brexit: "We send the EU Β£350 million a week" [Vote Leave campaign bus, 2016] ] .pull-right[  ] --- # Welcome to the Post-Truth Age - Bullshit Rules the World .pull-left[ - "Alternative facts" [Kellyanne Conway, Meet the Press, January 22, 2017] - "Trade wars are good and easy to win" [Donald Trump, 2018] - "The concept of global warming was created by and for the Chinese" [Donald Trump, Twitter, 2012] - Brexit: "We send the EU Β£350 million a week" [Vote Leave campaign bus, 2016] ] .pull-right[ [<ru-blockquote>The law of Brandolini.</ru-blockquote>](https://en.wikipedia.org/wiki/Brandolini%27s_law) ] --- # The Motto Of the USA (and This Course)  --- # The Motto Of the USA (and This Course)  --- # The Motto Of the USA (and This Course) ## "In God We Trust, All Others Must Bring Data" *βW. Edwards Deming (attribution uncertain, often credited to him)*  --- # This Course - Teach you the basics of ***linear regression***, ***statistical inference*** and ***impact evaluation***. -- - Equip you with a framework to think more deeply about ***causality***. -- - Introduce you to the `R` software environment. -- - β οΈ This is *not* a course about `R`. -- **Grading. Two Options:** .pull-left[ The **Good** Way: * come to class * take 5(?) quizzes on moodle during the semester (0% of grade). * take closed book exam (100%) early December 2025. ] -- .pull-right[ The **Other** Way * (come to class?) * take closed book exam (100%) later. * (do worse on the exam.) ] --- # Communication: Slack .pull-left[ Questions like * *I don't understand x* * *y does not work for me* * *when is the exam* * *can I come to office hours?"* will *only* be answered on Slack. All other questions via email to `florian.oswald@unito.it` or `kacper.krasowki@carloalberto.org` What is *Slack*?? ] -- .pull-right[  ] --- # Your To-Do List for Tomorrow 1. Sign up on moodle: [https://elearning.unito.it/sme/course/view.php?id=8568](https://elearning.unito.it/sme/course/view.php?id=8568) 2. From moodle, sign up on slack 3. Update your laptop OS and install `R`: [https://cloud.r-project.org](https://cloud.r-project.org) 4. Install `RStudio` at [https://posit.co/download/rstudio-desktop/](https://posit.co/download/rstudio-desktop/) --- # Class Conduct and My Expectations π§ -- 1. Come to class: You will understand better. -- 2. Be on time, be polite, don't use your phone. `#respect #reciprocity` -- 3. Open/Close your laptop when I say so. (take notes on paper) -- 4. Ask questions ***any time*** by raising your hand. -- 5. Work in groups: You can/should work in groups of 2-3 on the quizzes. -- 6. Don't cheat in exam. No phones. Penalties are severe. --- layout: false class: title-slide-section-red, middle # Notation in Slides --- layout: true <div class="my-footer"><img src="../img/logo/unito-shield.png" style="height: 60px;"/></div> --- # Notation is Important 1. Simple Text 2. ***Important*** text is in italic red. **Very Important** text is in boldface red. 3. Maths looks like this: `\(\int f(x) dx\)` 4. `R` Code inline has pink background: `data(gapminder, package = "dslabs")` --- class: inverse # Notation is Important 1. Simple Text 2. ***Important*** text is in italic red. **Very Important** text is in boldface red. 3. Maths looks like this: `\(\int f(x) dx\)` 4. `R` Code inline has pink background: `data(gapminder, package = "dslabs")` 5. In-class tasks for you have a pink background. Do the tasks! π --- layout: false class: title-slide-section-red, middle # R --- layout: true <div class="my-footer"><img src="../img/logo/unito-shield.png" style="height: 60px;"/></div> --- ## What is `R`? `R` is a __programming language__ with powerful statistical and graphic capabilities. -- ## Why are we using `R`?<sup>1</sup> .footnote[ [1]: This list has been inspired by [Ed Rubin's](https://github.com/edrubin/EC421S19). <span style="visibility:hidden">[2]: Learning `R` definitely requires time and effort but it's worth it, trust me! .</span> ] -- 1. `R` is __free__ and __open source__βsaving both you and the university π°π΅π°. -- 1. `R` is very __flexible and powerful__βadaptable to nearly any task, (data cleaning, data visualization, econometrics, spatial data analysis, machine learning, web scraping, etc.) -- 1. `R` has a vibrant, [thriving online community](https://stackoverflow.com/questions/tagged/r) that will (almost) always have a solution to your problem. -- 1. If you put in the work<sup>2</sup>, you will come away with a __very valuable and useful__ tool. .footnote[ <span style="visibility:hidden">[1]: This list has been inspired by [Ed Rubin's](https://github.com/edrubin/EC421S19).</span> [2]: Learning `R` definitely requires time and effort but it's worth it, trust me! ] ??? * Single user Stata/SE annual license costs 485 USD for education. * Student lab for 25 students costs 4135 USD per year. <!-- --- --> <!-- # Why can't we just use Excel? --> <!-- Many reasons but here are just a few: --> <!-- -- --> <!-- - Not ***reproducible***. --> <!-- -- --> <!-- - Not straightforward to ***merge*** datasets together. --> <!-- -- --> <!-- - Very fastidious to ***clean*** data. --> <!-- -- --> <!-- - Limited to ***small datasets***. --> <!-- -- --> <!-- - Not designed for proper ***econometric analyses***, maps, complex visualisations, etc. --> --- layout: false class: title-slide-section-red, middle # First Taste of R --- layout: true <div class="my-footer"><img src="../img/logo/unito-shield.png" style="height: 60px;"/></div> --- # In Practice: Data Wrangling -- * You will spend a lot of time preparing data for further analysis. -- * The `gapminder` dataset contains data on life expectancy, GDP per capita and population by country between 1952 and 2007. * Let's first discuss some basics, and then try to answer a simple question. --- # Loading a Dataset ``` r # load gapminder package library(gapminder) # load the dataset from the gapminder package data(gapminder, package = "gapminder") # show first 10 lines of this dataframe head(gapminder,n = 10) ``` ``` ## # A tibble: 10 Γ 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ## 7 Afghanistan Asia 1982 39.9 12881816 978. ## 8 Afghanistan Asia 1987 40.8 13867957 852. ## 9 Afghanistan Asia 1992 41.7 16317921 649. ## 10 Afghanistan Asia 1997 41.8 22227415 635. ``` --- # What is a *Dataset*? ## Cross Sectional Data π **One index only:** `country` .pull-left[ ### Cross Section of Countries in 1950 ``` ## # A tibble: 10 Γ 5 ## country year lifeExp pop gdpPercap ## <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan 1952 28.8 8425333 779. ## 2 Albania 1952 55.2 1282697 1601. ## 3 Algeria 1952 43.1 9279525 2449. ## 4 Angola 1952 30.0 4232095 3521. ## 5 Argentina 1952 62.5 17876956 5911. ## 6 Australia 1952 69.1 8691212 10040. ## 7 Austria 1952 66.8 6927772 6137. ## 8 Bahrain 1952 50.9 120447 9867. ## 9 Bangladesh 1952 37.5 46886859 684. ## 10 Belgium 1952 68 8730405 8343. ``` ] -- .pull-right[ ### Cross Section of Countries in 2007 ``` ## # A tibble: 10 Γ 5 ## country year lifeExp pop gdpPercap ## <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan 2007 43.8 31889923 975. ## 2 Albania 2007 76.4 3600523 5937. ## 3 Algeria 2007 72.3 33333216 6223. ## 4 Angola 2007 42.7 12420476 4797. ## 5 Argentina 2007 75.3 40301927 12779. ## 6 Australia 2007 81.2 20434176 34435. ## 7 Austria 2007 79.8 8199783 36126. ## 8 Bahrain 2007 75.6 708573 29796. ## 9 Bangladesh 2007 64.1 150448339 1391. ## 10 Belgium 2007 79.4 10392226 33693. ``` ] --- # What is a *Dataset*? ## Panel (or longitudinal) Data π **two indices:** `country` and `year` .pull-left[ ``` ## # A tibble: 18 Γ 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 India Asia 1952 37.4 372000000 547. ## 2 India Asia 1962 43.6 454000000 658. ## 3 India Asia 1972 50.7 567000000 724. ## 4 India Asia 1982 56.6 708000000 856. ## 5 India Asia 1992 60.2 872000000 1164. ## 6 India Asia 2002 62.9 1034172547 1747. ## 7 Italy Europe 1952 65.9 47666000 4931. ## 8 Italy Europe 1962 69.2 50843200 8244. ## 9 Italy Europe 1972 72.2 54365564 12269. ## 10 Italy Europe 1982 75.0 56535636 16537. ## 11 Italy Europe 1992 77.4 56840847 22014. ## 12 Italy Europe 2002 80.2 57926999 27968. ## 13 Poland Europe 1952 61.3 25730551 4029. ## 14 Poland Europe 1962 67.6 30329617 5339. ## 15 Poland Europe 1972 70.8 33039545 8007. ## 16 Poland Europe 1982 71.3 36227381 8452. ## 17 Poland Europe 1992 71.0 38370697 7739. ## 18 Poland Europe 2002 74.7 38625976 12002. ``` ] -- .pull-right[ * Can you tell me what the definition of an `index` is in this context? * Why is `year` and `continent` not a valid index? * π€ ] --- # What is a *Dataset*? .pull-left[ ### Italy ``` r gapminder %>% filter(country == "Italy") ``` ``` ## # A tibble: 12 Γ 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Italy Europe 1952 65.9 47666000 4931. ## 2 Italy Europe 1957 67.8 49182000 6249. ## 3 Italy Europe 1962 69.2 50843200 8244. ## 4 Italy Europe 1967 71.1 52667100 10022. ## 5 Italy Europe 1972 72.2 54365564 12269. ## 6 Italy Europe 1977 73.5 56059245 14256. ## 7 Italy Europe 1982 75.0 56535636 16537. ## 8 Italy Europe 1987 76.4 56729703 19207. ## 9 Italy Europe 1992 77.4 56840847 22014. ## 10 Italy Europe 1997 78.8 57479469 24675. ## 11 Italy Europe 2002 80.2 57926999 27968. ## 12 Italy Europe 2007 80.5 58147733 28570. ``` ] -- .pull-right[ ### Poland ``` r gapminder %>% filter(country == "Poland") ``` ``` ## # A tibble: 12 Γ 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Poland Europe 1952 61.3 25730551 4029. ## 2 Poland Europe 1957 65.8 28235346 4734. ## 3 Poland Europe 1962 67.6 30329617 5339. ## 4 Poland Europe 1967 69.6 31785378 6557. ## 5 Poland Europe 1972 70.8 33039545 8007. ## 6 Poland Europe 1977 70.7 34621254 9508. ## 7 Poland Europe 1982 71.3 36227381 8452. ## 8 Poland Europe 1987 71.0 37740710 9082. ## 9 Poland Europe 1992 71.0 38370697 7739. ## 10 Poland Europe 1997 72.8 38654957 10160. ## 11 Poland Europe 2002 74.7 38625976 12002. ## 12 Poland Europe 2007 75.6 38518241 15390. ``` ] --- # In Practice: Data Wrangling * Suppose we want to know the average life expectancy and average GDP per capita for each **continent** in each year. -- * We need to group the data by continent *and* year, then compute the average life expectancy and average GDP per capita -- * There are always several ways to achieve a goal. (As in life π) * Here we will only focus on the `dplyr` way: .pull-left[ ``` r # compute the required statistics # average life exp and gdp per cap gapminder_dplyr = gapminder %>% group_by(continent, year) %>% summarise(count = n(), mean_lifeexp = mean(lifeExp), mean_gdppercap = mean(gdpPercap)) ``` ] .pull-right[ ``` r # show first 5 lines of this new dataset head(gapminder_dplyr, n = 5) ``` ``` ## # A tibble: 5 Γ 5 ## # Groups: continent [1] ## continent year count mean_lifeexp mean_gdppercap ## <fct> <int> <int> <dbl> <dbl> ## 1 Africa 1952 52 39.1 1253. ## 2 Africa 1957 52 41.3 1385. ## 3 Africa 1962 52 43.3 1598. ## 4 Africa 1967 52 45.3 2050. ## 5 Africa 1972 52 47.5 2340. ``` ] --- # Visualisation .pull-left[ * Now we could *look* at the result in `gapminder_dplyr`, or compute some statistics from it. * Nothing beats a picture, though: ``` r ggplot(data = gapminder_dplyr, mapping = aes(x = mean_lifeexp, y = mean_gdppercap, color = continent, size = count)) + geom_point(alpha = 1/2) + labs(x = "Average life expectancy", y = "Average GDP per capita", color = "Continent", size = "Nb of countries") + theme_bw() ``` ] -- .pull-right[ <img src="chapter_intro_files/figure-html/gampminder_plot-1.svg" style="display: block; margin: auto;" /> ] ??? * We map different features of the data to different ways of representing it * color * size of point * different scales for each --- # Animated Plotting π <sup>1</sup> .center[] .footnote[ [1]: This animation is taken from [Ed Rubin](https://raw.githack.com/edrubin/EC421S19/master/LectureNotes/01Intro/01_intro.html#40). ] --- layout: false class: title-slide-section-red, middle # R 101: Here Is Where You Start --- layout: true <div class="my-footer"><img src="../img/logo/unito-shield.png" style="height: 60px;"/></div> --- # Start your `RStudio`! ## First Glossary of Terms * `R`: a programming language. * `RStudio`: an integrated development environment (IDE) to work with `R`. -- * *command*: user input (text or numbers) that `R` *understands*. * *script*: a list of commands collected in a text file, each separated by a new line, to be run one after the other. -- * To run a script, you need to highlight the relevant code lines and hit `Ctrl`+`Enter` (Windows) or `Cmd`+`Enter` (Mac). --- # `RStudio` Layout <img src="chapter_intro_files/figure-html/rstudio.png" width="600px" style="display: block; margin: auto;" /> --- # R as a Calculator * You can use the `R` console like a calculator * Just type an arithmetic operation after `>` and hit `Enter`! -- * Some basic arithmetic first: ``` r 4 + 1 ``` ``` ## [1] 5 ``` ``` r 8 / 2 ``` ``` ## [1] 4 ``` * Great! What about this? ``` r 2^3 ``` ``` ## [1] 8 ``` ``` r # by the way: this is a comment! R therefore disregards it ``` --- class: inverse # Task 1
−
+
05
:
00
1. Create a new R script (File `\(\rightarrow\)` New File `\(\rightarrow\)` R Script). Save it somewhere as `lecture_intro.R`. 1. Type the following code in your script and run it. To run the code press `Ctrl` or `Cmd` + `Enter` (you can either highlight the code or just put your cursor at the end of the line) ``` r 4 * 8 ``` 1. Type the following code in your script and run it. What happens if you only run the first line of the code? ``` r x = 5 # equivalently x <- 5 x ``` Congratulations, you have created your first `R` "object"! Everything is an object in R! Objects are assigned using `=` or `<-`. 1. Create a new object named `x_3` to which you assign the cube of `x`. Note that to assign you need to use `=` or `<-`. Use code to compute the cube, not a calculator. --- # Where to get Help? .pull-left[ `R` built-in `help`: ``` r ?log #? in front of function help(lm) # help() is equivalent ??plot # get all help on keyword "plot" ``` ] -- .pull-right[ In practice:  ] --- # Collaborate! <img src="chapter_intro_files/figure-html/gator_error.jpg" alt="Gator collaboration" width="900" style="display: block; margin-left: auto; margin-right: auto"/> --- # R Packages * `R` users contribute add-on data and functions as *packages* * Installing packages is easy! Just use the `install.packages` function: ``` r install.packages("ggplot2") ``` * To *use* the contents of a packge, we must load it from our library using `library`: ``` r library(ggplot2) ``` --- # Data *Types*. What kinds of **Data** are there actually? -- π Numbers, text, categories, images, ... -- Unfortunately, your (mine, everybody's) computer only "speaks" `0` and `1`. That's why software **encodes** different kinds of data differently: <br> | Data | R Type | Binary Encoding | |---------|----------------------------------|-----------------------------| | `42` | double | `101010` (integer in base-2)| | `"A"` | character | `01000001` (ASCII) | | `TRUE` | logical | `1` | | `FALSE` | logical | `0` | | `factor("Male")` | integer | `00000001` | | `factor("Female")` | integer | `00000010` | --- # Vectors .pull-left[ * The `c` function creates vectors, i.e. *one-dimensional arrays*. ``` r c(1, 3, 5, 7, 8, 9) ``` ``` ## [1] 1 3 5 7 8 9 ``` * Coercion to unique types: ``` r (v <- c(42, "Statistics", TRUE)) ``` ``` ## [1] "42" "Statistics" "TRUE" ``` ] -- .pull-right[ * Creating a *range* ``` r 1:10 ``` ``` ## [1] 1 2 3 4 5 6 7 8 9 10 ``` * get vector elements with square bracket operator `[index]`: ``` r v[c(1,3)] ``` ``` ## [1] "42" "TRUE" ``` ] --- # `data.frame`'s `data.frame`s represent **tabular data**. Like spreadsheets. ``` r example_data = data.frame(x = c(1, 3, 5, 7), y = c(rep("Hello", 3), "Goodbye"), z = c("one", 2, "three", 4)) example_data ``` ``` ## x y z ## 1 1 Hello one ## 2 3 Hello 2 ## 3 5 Hello three ## 4 7 Goodbye 4 ``` * A `data.frame` has 2 dimensions: *rows* and *columns*. Like a *matrix*. Can get elements with `[row_index,col_index]`. * In practice, you will be importing files that contain the data into `R` rather than creating `data.frame`s by hand. --- layout: false class: title-slide-section-red, middle # Go to https://tinyurl.com/metrics-task2 ### You need some real data for the next task. --- layout: true <div class="my-footer"><img src="../img/logo/unito-shield.png" style="height: 60px;"/></div> --- class: inverse # Task 2
−
+
07
:
00
1. Find out (using `help()` or google) how to import a `.csv` file. Do NOT use the "Import Dataset" button, nor install a package. 1. Import [gun_murders.csv](https://www.dropbox.com/scl/fi/uq8xlecjczy2t2vu50h7l/gun_murders.csv?rlkey=4zr1t5o7jsi9pgoey4tep467w&dl=1)<sup>1</sup> in a new object `murders`. This file contains data on gun murders by US state in 2010. (Hint: objects are created using `=` or `<-`). 1. Ensure that `murders` is a data.frame by running: ``` r class(murder) # check class ``` 1. Find out what variables are contained in `murders` by running: ``` r names(murders) # obtain variable names ``` 1. View the contents of `murders` by clicking on `murders` in your workspace. What does the `total` variable correspond to? .footnote[ [1]: This dataset is taken from the `dslabs` package. ] --- # `data.frame`s Useful functions to describe a dataframe: ``` r str(murders) # `str` describes structure of any R object ``` ``` ## 'data.frame': 51 obs. of 5 variables: ## $ state : chr "Alabama" "Alaska" "Arizona" "Arkansas" ... ## $ abb : chr "AL" "AK" "AZ" "AR" ... ## $ region : chr "South" "West" "West" "South" ... ## $ population: int 4779736 710231 6392017 2915918 37253956 5029196 3574097 897934 601723 19687653 ... ## $ total : int 135 19 232 93 1257 65 97 38 99 669 ... ``` -- ``` r names(murders) # column names ``` ``` ## [1] "state" "abb" "region" "population" "total" ``` -- ``` r nrow(murders) # number of rows ``` ``` ## [1] 51 ``` -- ``` r ncol(murders) # number of columns ``` ``` ## [1] 5 ``` --- # Accessing `data.frame` Columns * To extract one column **as a vector** we can use the `$` operator (as in `murders$state`), or the square bracket operator `[which_index]` with name or position index: ``` r first5 <- murders[1:5, ] # take first 5 states only first5$state # extract with $ operator ``` ``` ## [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California" ``` ``` r first5[ ,"state"] # extract with column name ``` ``` ## [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California" ``` ``` r first5[ ,1] # get first column ``` ``` ## [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California" ``` -- .pull-left[ * Check `class` of an object: ``` r class(murders) ``` ``` ## [1] "data.frame" ``` ] -- .pull-right[ * `typeof` gives the R-internal data type: ``` r typeof(murders) ``` ``` ## [1] "list" ``` ] --- # Subsetting `data.frames` * Subsetting a data.frame: `murders[row condition, column number]` or `murders[row condition, "column name"]` ``` r # Only keep states with over 500 gun murders and keep only the "state" and "total" variables murders[murders$total > 500, c("state", "total")] ``` ``` ## state total ## 5 California 1257 ## 10 Florida 669 ## 33 New York 517 ## 44 Texas 805 ``` ``` r # Only keep California and Texas and keep only the "state" and "total" variables murders[murders$state %in% c("California", "Texas"), c("state", "total")] ``` ``` ## state total ## 5 California 1257 ## 44 Texas 805 ``` --- class: inverse # Task 3
−
+
10
:
00
1. How many observations are there in `murders`? 1. How many variables? What are the data types of each variable? 1. Remember that the colon operator `1:10` is just short for *construct a sequence from `1` to `10`* (i.e. 1, 2, 3, etc). Create a new object `murders_2` containing the rows 10 to 25 of `murders`. 1. Create a new object `murders_3` which only contains the columns `state` and `total`. (Recall that `c` creates vectors.) 1. Create a `total_percap` variable equal to the number of murders per 10,000 inhabitants by running the following code. ``` r murders$total_percap = (murders$total / murders$population) * 10000 ``` Congratulations, you've created your first variable! Click on the `murders` object to see the new variable. --- class: title-slide-final, middle background-image: url(../img/logo/esomas.png) background-size: 250px background-position: 9% 19% # That's it for this lesson! | | | | :--------------------------------------------------------------------------------------------------------- | :-------------------------------- | | <a href="https://github.com/floswald/Econometrics-Slides">.ScPored[<i class="fa fa-link fa-fw"></i>] | Slides | | <a href="https://floswald.github.io">.ScPored[<i class="fa fa-link fa-fw"></i>] | My Homepage | | <a href="https://scpoecon.github.io/ScPoEconometrics/">.ScPored[<i class="fa fa-github fa-fw"></i>] | Book |