ScPoProgramming

.title[
# ScPoProgramming
]
.subtitle[
## Introduction to R
]
.author[
### Florian Oswald
]
.date[
### Sciences Po Paris </br> 2024-01-23
]

---

---

# R

---

---

## What is `R`?

`R` is a __programming language__ with powerful statistical and graphic capabilities.

## Why are we using `R`?<sup>1</sup>

.footnote[
[1]: This list has been inspired by [Ed Rubin's](https://github.com/edrubin/EC421S19).  
<span style="visibility:hidden">[2]: Learning `R` definitely requires time and effort but it's worth it, trust me! .</span>
]

1. `R` is __free__ and __open source__—saving both you and the university 💰💵💰.

1. `R` is very __flexible and powerful__—adaptable to nearly any task, (data cleaning, data visualization, econometrics, spatial data analysis, machine learning, web scraping, etc.)

1. `R` has a vibrant, [thriving online community](https://stackoverflow.com/questions/tagged/r) that will (almost) always have a solution to your problem.

1. If you put in the work<sup>2</sup>, you will come away with a __very valuable and useful__ tool.

.footnote[
<span style="visibility:hidden">[1]: This list has been inspired by [Ed Rubin's](https://github.com/edrubin/EC421S19).</span>  
[2]: Learning `R` definitely requires time and effort but it's worth it, trust me! 
]

---

# First Taste of R

---

---

# In Practice: Data Wrangling

* You will spend a lot of time preparing data for further analysis.

* The `gapminder` dataset contains data on life expectancy, GDP per capita and population by country between 1952 and 2007.

* Suppose we want to know the average life expectancy and average GDP per capita for each continent in each year.

* We need to group the data by continent *and* year, then compute the average life expectancy and average GDP per capita

```r
# load gapminder package
library(gapminder)
# load the dataset from the gapminder package
data(gapminder, package = "gapminder") 
# show first 4 lines of this dataframe
head(gapminder,n = 4)
```
]

```
## # A tibble: 4 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
```
]

---

# In Practice: Data Wrangling

* There are always several ways to achieve a goal. (As in life 😁)

* Here we will only focus on the `dplyr` way:

```r
# compute the required statistics
gapminder_dplyr <- gapminder %>% 
  group_by(continent, year) %>% 
  summarise(count = n(),
            mean_lifeexp = mean(lifeExp),
            mean_gdppercap = mean(gdpPercap))
```

```r
# show first 5 lines of the new data
head(gapminder_dplyr, n = 5)
```

```
## # A tibble: 5 × 5
## # Groups:   continent [1]
##   continent  year count mean_lifeexp mean_gdppercap
##   <fct>     <int> <int>        <dbl>          <dbl>
## 1 Africa     1952    52         39.1          1253.
## 2 Africa     1957    52         41.3          1385.
## 3 Africa     1962    52         43.3          1598.
## 4 Africa     1967    52         45.3          2050.
## 5 Africa     1972    52         47.5          2340.
```

---

# Visualisation

.pull-left[
* Now we could *look* at the result in `gapminder_dplyr`, or compute some statistics from it.

* Nothing beats a picture, though:

```r
ggplot(data = gapminder_dplyr, 
       mapping = aes(x = mean_lifeexp,
                     y = mean_gdppercap,
                     color = continent,
                     size = count)) +
  geom_point(alpha = 1/2) +
  labs(x = "Average life expectancy",
       y = "Average GDP per capita",
       color = "Continent",
       size = "Nb of countries") +
  theme_bw()
```
]

.pull-right[
<img src="chapter_intro_files/figure-html/gampminder_plot-1.svg" style="display: block; margin: auto;" />
]

---

# Animated Plotting 👌 <sup>1</sup>

.footnote[
[1]: This animation is taken from [Ed Rubin](https://raw.githack.com/edrubin/EC421S19/master/LectureNotes/01Intro/01_intro.html#40).
]

---

# R 101: Here Is Where You Start

---

---

# Start your `RStudio`!

## First Glossary of Terms

* `R`: a programming language.

* `RStudio`: an integrated development environment (IDE) to work with `R`.

* *command*: user input (text or numbers) that `R` *understands*.

* *script*: a list of commands collected in a text file, each separated by a new line, to be run one after the other.

* To run a script, you need to highlight the relevant code lines and hit `Ctrl`+`Enter` (Windows) or `Cmd`+`Enter` (Mac).

---

# `RStudio` Layout

---

# R as a Calculator

* You can use the `R` console like a calculator

* Just type an arithmetic operation after `>` and hit `Enter`!

* Some basic arithmetic first:

```r
4 + 1
```

```
## [1] 5
```

```r
8 / 2
```

```
## [1] 4
```

* Great! What about this?

```r
2^3
```

```
## [1] 8
```

```r
# by the way: this is a comment! R therefore disregards it
```

---

# Task 1

1. Create a new R script (File `$\rightarrow$` New File `$\rightarrow$` R Script). Save it somewhere as `lecture_intro.R`.

1. Type the following code in your script and run it. To run the code press `Ctrl` or `Cmd` + `Enter` (you can either highlight the code or just put your cursor at the end of the line)
    
    ```r
    4 * 8
    ```

1. Type the following code in your script and run it. What happens if you only run the first line of the code?
    
    ```r
    x = 5 # equivalently x <- 5
    x
    ```
Congratulations, you have created your first `R` "object"! Everything is an object in R! Objects are assigned using `=` or `<-`.

1. Create a new object named `x_3` to which you assign the cube of `x`. Note that to assign you need to use `=` or `<-`. Use code to compute the cube, not a calculator.

---

# Where to get Help?

```r
?log #? in front of function
help(lm)   # help() is equivalent
??plot  # get all help on keyword "plot"
```
]

---

# Collaborate!

---

# R Packages

* `R` users contribute add-on data and functions as *packages*

* Installing packages is easy! Just use the `install.packages` function:
    
    ```r
    install.packages("ggplot2")
    ```

* To *use* the contents of a packge, we must load it from our library using `library`:
    
    ```r
    library(ggplot2)
    ```

---

# Vectors

.pull-left[
* The `c` function creates vectors, i.e. *one-dimensional arrays*.
    
    ```r
    c(1, 3, 5, 7, 8, 9)
    ```
    
    ```
    ## [1] 1 3 5 7 8 9
    ```
    
* Coercion to unique types:
    
    ```r
    (v <- c(42, "Statistics", TRUE))
    ```
    
    ```
    ## [1] "42"         "Statistics" "TRUE"
    ```
]

* Creating a *range*
    
    ```r
    1:10
    ```
    
    ```
    ##  [1]  1  2  3  4  5  6  7  8  9 10
    ```

* get vector elements with square bracket operator `[index]`:
    
    ```r
    v[c(1,3)]
    ```
    
    ```
    ## [1] "42"   "TRUE"
    ```
]

---

# `data.frame`'s

`data.frame`s represent **tabular data**. Like spreadsheets.

```r
example_data = data.frame(x = c(1, 3, 5, 7),
                          y = c(rep("Hello", 3), "Goodbye"),
                          z = c("one", 2, "three", 4))
example_data
```

```
##   x       y     z
## 1 1   Hello   one
## 2 3   Hello     2
## 3 5   Hello three
## 4 7 Goodbye     4
```

* A `data.frame` has 2 dimensions: *rows* and *columns*. Like a *matrix*. Can get elements with `[row_index,col_index]`.

* In practice, you will be importing files that contain the data into `R` rather than creating `data.frame`s by hand.

---

1. Find out (using `help()` or google) how to import a .csv file. Do NOT use the "Import Dataset" button, nor install a package.

1. Download the csv file [brexit.csv](https://vincentarelbundock.github.io/Rdatasets/csv/dslabs/brexit_polls.csv) and create a new object called `brexit` (Hint: objects are created using `=` or `<-`).

1. Ensure that `brexit` is a data.frame by running:
    
    ```r
    class(brexit) # check class
    ```

1. Find out what variables are contained in `brexit` by running:
    
    ```r
    names(brexit) # obtain variable names
    ```

1. View the contents of `brexit` by clicking on `brexit` in your workspace. What does the `remain` variable correspond to?

---

# `data.frame`s

Useful functions to describe a dataframe:

```r
str(brexit) # `str` describes structure of any R object
```

```
## 'data.frame':	127 obs. of  10 variables:
##  $ rownames  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ startdate : chr  "2016-06-23" "2016-06-22" "2016-06-20" "2016-06-20" ...
##  $ enddate   : chr  "2016-06-23" "2016-06-22" "2016-06-22" "2016-06-22" ...
##  $ pollster  : chr  "YouGov" "Populus" "YouGov" "Ipsos MORI" ...
##  $ poll_type : chr  "Online" "Online" "Online" "Telephone" ...
##  $ samplesize: int  4772 4700 3766 1592 3011 1032 1032 2320 1003 1652 ...
##  $ remain    : num  0.52 0.55 0.51 0.49 0.44 0.54 0.48 0.41 0.45 0.42 ...
##  $ leave     : num  0.48 0.45 0.49 0.46 0.45 0.46 0.42 0.43 0.44 0.44 ...
##  $ undecided : num  0 0 0 0.01 0.09 0 0.11 0.16 0.11 0.13 ...
##  $ spread    : num  0.04 0.1 0.02 0.03 -0.01 ...
```

```r
names(brexit) # column names
```

```
##  [1] "rownames"   "startdate"  "enddate"    "pollster"   "poll_type" 
##  [6] "samplesize" "remain"     "leave"      "undecided"  "spread"
```

```r
nrow(brexit) # number of rows
```

```
## [1] 127
```

```r
ncol(brexit) # number of columns
```

```
## [1] 10
```

---

# Accessing `data.frame` Columns
    
* To extract one column **as a vector** we can use the `$` operator (as in `brexit$state`), or the square bracket operator `[which_index]` with name or position index:
    
    ```r
    first5 <- brexit[1:5, ]  # take first 5 states only
    first5$pollster  # extract with $ operator
    ```
    
    ```
    ## [1] "YouGov"     "Populus"    "YouGov"     "Ipsos MORI" "Opinium"
    ```
    
    ```r
    first5[ ,"pollster"]  # extract with column name
    ```
    
    ```
    ## [1] "YouGov"     "Populus"    "YouGov"     "Ipsos MORI" "Opinium"
    ```
    
    ```r
    first5[ ,2] # get second column
    ```
    
    ```
    ## [1] "2016-06-23" "2016-06-22" "2016-06-20" "2016-06-20" "2016-06-20"
    ```

.pull-left[
* Check `class` of an object:
    
    ```r
    class(brexit)
    ```
    
    ```
    ## [1] "data.frame"
    ```
]

.pull-right[
* `typeof` gives the R-internal data type:
    
    ```r
    typeof(brexit)
    ```
    
    ```
    ## [1] "list"
    ```
]

---

# Subsetting `data.frames`

* Subsetting a data.frame: `brexit[row condition, column number]` or `brexit[row condition, "column name"]`
    
    ```r
    # Only keep polls with more than 1000 observations and keep 2 columns
    brexit[brexit$samplesize > 3900, c("remain", "leave")]
    ```
    
    ```
    ##    remain leave
    ## 1    0.52  0.48
    ## 2    0.55  0.45
    ## 95   0.48  0.45
    ```
    
    ```r
    # Only keep polls from YouGov and Opinium
    brexit[brexit$pollster %in% c("YouGov", "Opinium") & brexit$samplesize > 3000, c("remain", "leave", "pollster")]
    ```
    
    ```
    ##     remain leave pollster
    ## 1     0.52  0.48   YouGov
    ## 3     0.51  0.49   YouGov
    ## 5     0.44  0.45  Opinium
    ## 32    0.41  0.45   YouGov
    ## 56    0.42  0.40   YouGov
    ## 73    0.40  0.39   YouGov
    ## 79    0.39  0.38   YouGov
    ## 105   0.37  0.38   YouGov
    ```

---

# Task 3

<div class="countdown" id="timer_d6eb272e" data-update-every="1" tabindex="0" style="top:0;right:0;">
<div class="countdown-controls"><button class="countdown-bump-down">−</button><button class="countdown-bump-up">+</button></div>
<code class="countdown-time"><span class="countdown-digits minutes">05</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>
1. How many observations are there in `brexit`?

1. How many variables? What are the data types of each variable?

1. Remember that the colon operator `1:10` is just short for *construct a sequence from `1` to `10`* (i.e. 1, 2, 3, etc). Create a new object `brexit_2` containing the rows 10 to 25 of `brexit`.

1. Create a new object `brexit_3` which only contains the columns `poll_type` and `spread`. (Recall that `c` creates vectors.)

1. Create a `remainers` variable equal to the number of total remain-voters in each poll by running the following code.
    
    ```r
    brexit$remainers = brexit$remain * brexit$samplesize
    ```

Congratulations, you've created your first variable! Click on the `brexit` object to see the new variable.

---

class: title-slide-final, middle
background-image: url(../img/logo/ScPo-econ.png)
background-size: 250px
background-position: 9% 19%

# The END!

|                                                                                                            |                                   |
| :--------------------------------------------------------------------------------------------------------- | :-------------------------------- |
| <a href="https://github.com/ScPoEcon/ScPoEconometrics-Slides">.ScPored[<i class="fa fa-link fa-fw"></i>] | Slides |
| <a href="https://scpoecon.github.io/ScPoEconometrics">.ScPored[<i class="fa fa-link fa-fw"></i>] | Book |
| <a href="http://twitter.com/ScPoEcon">.ScPored[<i class="fa fa-twitter fa-fw"></i>]                          | @ScPoEcon                         |
| <a href="http://github.com/ScPoEcon">.ScPored[<i class="fa fa-github fa-fw"></i>]                          | @ScPoEcon                       |