Big Data and Economics

class: center, middle, inverse, title-slide

.title[
# Big Data and Economics
]
.subtitle[
## Lecture 1: Introduction
]
.author[
### Kyle Coombs (he/him/his)
]
.date[
### Bates College | <a href="https://github.com/big-data-and-economics">EC/DCS 368</a>
]

---

# Table of contents

1. [Prologue](#prologue)

2. [Veggies: Syllabus highlights](#syllabus)

2. [Dessert: What is data science?](#what_is_data_science)

4. [Appetizer: Getting started](#started)

5. [Entree: R for data science](#r4ds)

6. [Second Dessert: Data visualization with ggplot2](#ggplot2)

---
class: inverse, center, middle
name: prologue

# Prologue

---
# Introductions

### Course

You'll soon receive access to this GitHub organization, where we submit assignments, upload presentations, etc.

### Me

---
class: inverse, center, middle
name: syllabus

# Syllabus highlights

(Read the full document [here](https://github.com/big-data-and-economics/syllabus/blob/master/syllabus.pdf).)

---
# Why this course?

Fill in the gaps left by traditional econometrics and methods classes.

- Practical skills that tools that will benefit your thesis and future career.
- Neglected skills like how to actually find datasets in the wild and clean them.
- Apply skills to analyze empirical questions on economic and social problems.

Data science skills are largely distinct from (and complementary to) the core 'metrics familiar to economists.

- Acquiring data; scraping; maintaining databases; etc. 
- Data viz, cleaning and wrangling; programming; cloud computation; relational databases; machine learning; etc.

> *"In short, we will cover things that I wish someone had taught me when I was starting out in college."*

---

# Caveat

- This course will be **_hard._** You will need to:
  - Teach yourself new skills I cannot cover in 12 weeks
  - Be entrepreneurial: If you find a better way to do something, do (and share) it!
  - Be patient: You will encounter bugs and errors, and you will need to learn how to fix them

- This course will also be **_rewarding_**
  - You can avoid the mistakes you make here on your thesis and in your career
  - You will learn skills that employers, pre-doc programs, and grad schools want
  - You will learn how to be a better researcher and citizen
    - Seriously, a little data science goes a long way in helping you see through BS

---
# Tips for coding fruitfully

You're gonna write a lot of code for this class, which means you're gonna troubleshoot a lot of bugs.

- Some of these will be bugs of your own making, some will be bugs of open source R tools

1. Try to describe in plain words/simple pictures what you want code to do before you write it
  - Read in a CSV file with variables for annual population at the county level and calculate the change in population from the previous year

2. Break this description into smaller steps (1: Read in data, 2: Drop rows with NA County, etc.)

3. Write code "modularly" to do each step

4. Then you can troubleshoot modularly too

5. The `help` documentation for R functions is the best place to start for troubleshooting

6. Google is your dictionary, ChatGPT is your weird friend who knows a lot of words but sometimes uses them wrong<sup>1</sup>
  - Be precise in your Google searches and ChatGPT instructions

7. If you're stuck, ask for help from me and classmates on GitHub Discussions

<sup>1</sup> GitHub CoPilot (ChatGPT's cousin) wrote that sentence fragment!

---
# You, at the end of this course

---
# Syllabus readthrough

- Read over the class [ReadMe](https://github.com/big-data-and-economics/big-data-class-materials) for 3 minutes
- Identify at least one word you're unfamiliar with
- Anyone have guesses for how to define them?
- [Mentimeter](https://www.mentimeter.com/app/presentation/bldvy2hwjfb483btveenvon9k7mbtkdm/xqvbqbhy8hzo)

#### Highlights

1. Final project: Paper replication

2. 6 problem sets, drop the two lowest, grade improvement

3. 5-minute presentations

4. GitHub Participation

5. _Optional_ Lewiston Hack-a-thon: 1-day data science competition at end of semester
  - Details TBD
  - I will soon be asking your availability for 24-hour periods in late March/April
  - Does 24 hours lead to immediate red flags for anyone's schedule? If so please let me know now.

---
# Class outline

.pull-left[
### Data science basics

- Version control with Git and GitHub
- R language basics
- Data cleaning and wrangling
- Webscraping 
- Data visualization

### Analysis and Programming

- Regression analysis in R
- Spatial analysis in R
- Functions in R
- Parallel programming
]

.pull-right[
### Causal inference

- Regression discontinuity design
- Panel data and fixed effects
- Field experiments

### Scaling up: Big data, ML, and cloud computation

- High performance computing (Leavitt cluster) 
- Databases: SQL(ite) and BigQuery
- Machine Learning techniques
- Text analysis
]

---
class: inverse, center, middle
name: what_is_data_science

# What is Data Science?

---
# What is Data Science?

- .hi[Data science (DS):] The scientific discipline that deals with transforming data into useful information ("insights") using a variety of stats/ML techniques

- Facebook: Collects data on search history, friendship links, site clicks, occupation, etc.

- Chetty et al. used FB data to estimate users' SES and social network ([Social Capital Atlas](https://www.socialcapital.org/))

- The rise of data science has come because of the so-called Big Data revolution

- The rise of the internet in the late-1990s and 2000s `$\Rightarrow \,\uparrow$` opportunities for companies and governments to collect data on consumers & citizens

- Spread of mobile devices & social media from late 2000s until now generated even more data
    
#### Pillars of data science

- Programming (automation of data collection, manipulation, cleaning, visualization, and modeling)

- Visualization & exploration

- Causal inference (to be able to make a policy prescription)

- Machine learning (to select models, compress data, predict outcomes)

...Assuming one has the appropriate foundation of basic calculus and statistics

---
# The data science workflow

.center[
<img src="https://d33wubrfki0l68.cloudfront.net/571b056757d68e6df81a3e3853f54d3c76ad6efc/32d37/diagrams/data-science.png" width="85%" style="display: block; margin: auto;" />

Source: [R for Data Science](http://r4ds.had.co.nz/introduction.html)
]

---
# Big Data
.center[
<img src="pics/frisch.jpg" width="90%" style="display: block; margin: auto;" />
]

Source: Frisch, Ragnar. 1933. "Editor's Note" _Econometrica_ 1(1): 1-4

---
# What is Big Data?

It depends on who you ask. It could mean:

1. "Wild" data (unstructured; happenstance; collected without a particular intention; e.g. twitter, contrast with Census surveys)

2. "Wide" data (a.k.a. "Big-K" data because `$K>N$`, customer data sets where each click is a variable)

3. "Long" data (a.k.a. "Big-N" data because `$N$` very, very large [and may not all fit onto a single hard drive!], government tax records, Medicare claims data, etc.)

4. Any data set that cannot be analyzed with classical methods like OLS (e.g. all combinations of the above three types)

"Big Data" not so much about size of data, but about whether or not "small data" (read: classical) methods can be used

---
# Long data
<img src="pics/longdata.png" width="90%" style="display: block; margin: auto;" />

- Main application: *identifying causal effects*
- Example: effects of improving schools on income

---
# Wide data

- Main application: *prediction*
- Example: predicting income to target ads from tons of information like location, links clicked, etc.

---
# Why does data shape matter?

1. Data shape determines how much memory is required to store information

2. Data shape determines what method you can use to analyze the data

- A regression model with year fixed effects requires that year changes between observations, so you need long data
  - You cannot have a wide data set with one row per unit and one column per year

---
# What can we measure with data?

- Data can measure a lot of information about the world, but not everything

- Yes, that sounds like buzzword soup, but it is true

- Any dataset, no matter how big, has simplified the world in some way

- You want the simplification to match the question

- How do you record where a person is? 
  - County? Lots of people have same location. 
  - IP address? Changes frequently 
  - GPS coordinates? Too precise, and changes every second!

- The solution? Decide why you need location
  - Are you curious about the effect of local government policies or firms on people?
  - Are you looking to measure the effect of air pollution on health?
  - Do you want to see how people change their commute patterns over time? When there is a road closure?

- Different questions require different levels of precision

---
# Big data & machine learning

- You'll often hear the phrase "big data and machine learning"

- This is because many machine learning algorithms are helpful for big data problems:

- Selecting which `$k<K$` covariates should enter your model

- Streamlined techniques for processing "wild" data

- New modeling approaches that can leverage the greater amount of information that Big Data has

- ML also gave us generative AI ([ChatGPT](https://chat.openai.com/) and [GitHub CoPilot](https://github.com/features/copilot)), which I encourage you to use heavily to help you learn R and Git

- Think of it as a more interactive version of Googling for the solution to a bug
  - It is not (yet) a replacement for thinking through the best way to code something
  - It will actually perform better if you think through the basic coding tasks first, then ask it to fill in the blanks

---
# What is machine learning? What is AI?

- .hi[Machine learning (ML):] Allowing computers to learn for themselves without explicitly being programmed

- USPS: Computer to read handwriting on envelopes

- Google: AlphaGo, computer that defeated world champion Go player

- Apple/Amazon/Microsoft: Siri, Alexa, Cortana, Talon voice assistants

- .hi[Artificial intelligence (AI):] Constructing machines (robots, computers) to think and act like human beings

- ML is a subset of AI

---
class: inverse, center, middle
name: started

# Getting started

---
# Software installation and registration

1. Download [R](https://www.r-project.org/).

2. Download [RStudio](https://www.rstudio.com/products/rstudio/download/preview/).

3. Download [Git](https://git-scm.com/downloads).

4. Create an account on [GitHub](https://github.com/) and register for a student/educator [discount](https://education.github.com/discount_requests/new).
  - We will use GitHub to disseminate and submit assignments, receive feedback and grading, etc.

5. Make a folder on your computer for this class. Any and all repositories for this class should be cloned into this folder.

If you had trouble completing any of these steps, please raise your hand.
- My go-to place for installation guidance and troubleshooting is Jenny Bryan's http://happygitwithr.com.

---
# Some OS-specific extras

I'll detail further software requirements as and when the need arises. However, to help smooth some software installation issues further down the road, please also do the following (depending on your OS):

- **Windows:** Install [Rtools](https://cran.r-project.org/bin/windows/Rtools/). I also recommend that you install [Chocolately](https://chocolatey.org/) and [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/install).
- **Mac:** Install [Homebrew](https://brew.sh/). I also recommend that you configure/open your C++ toolchain (see [here](https://github.com/rmacoslib/r-macos-rtools#installer-package-for-macos-r-toolchain-).)
- **Linux:** None (you should be good to go).

---
# Checklist

☑ Do you have the most recent version of R?
  
  ```r
  version$version.string
  ```
  
  ```
  ## [1] "R version 4.3.1 (2023-06-16 ucrt)"
  ```
☑ Do you have the most recent version of RStudio? (The [preview version](https://www.rstudio.com/products/rstudio/download/preview/) is fine.)
  
  ```r
  RStudio.Version()$version
  ## Requires an interactive session but should return something like "[1] ‘1.4.1100’"
  ```

☑ Have you updated all of your R packages? 
  
  ```r
  update.packages(ask = FALSE, checkBuilt = TRUE)
  ```
  
---
# Checklist (cont.)

Open up the [shell](http://happygitwithr.com/shell.html#shell).
- Windows users, make sure that you installed a Bash-compatible version of the shell. If you installed [Git for Windows](https://gitforwindows.org), then you should be good to go.

☑ Which version of Git have you installed?
  
  ```bash
  git --version
  ```
  
  ```
  ## git version 2.34.1
  ```

☑ Did you introduce yourself to Git? (Substitute in your details.)
  
  ```bash
  git config --global user.name 'kgcsport'
  git config --global user.email 'kcoombs@bates.edu'
  git config --global --list
  ```

☑ Did you register an account in GitHub?

---
# Checklist (cont.)

- Navigate to the [class materials repository](https://github.com/big-data-and-economics/big-data-class-materials)

- Click the green "Code" button and copy the HTTPS link.

- Under Codespaces, click `Create codespace on main`. This will create a cloud-based server for you to work on.

- It may take a few minutes to get up and running.

- Once inside, navigate to `PORTS` at the bottom, and click the link under "Local Address" for the RStudio Port. The username and password are rstudio/rstudio.

- You should now be in RStudio!

- Open up test.Rmd and click `Knit` at the top to test that it works!

- Navigate back to the main GitHub repository page and click `Code`, the three dots next to the codespace name, and `Delete codespace`.

- I'll encourage you to use Codespaces on some problem sets when I want to hit the ground running instead of troubleshooting package installation issues.

---
# Checklist (cont.)

We will make sure that everything is working properly with your R and GitHub setup next lecture.

For the rest of today's lecture, I want to go over some very basic ChatGPT and R concepts.

</br>

PS — Just so you know where we're headed: We'll return to these R concepts (and delve much deeper) next week after a brief, but important detour to the lands of coding best practices and Git(Hub).

---
class: inverse, center, middle
name: r4ds

# R and Generative AI for data science
<html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html>

---
name: whyr
# Why R and RStudio?

### Data science positivism

- Alongside Python, R has become the *de facto* language for data science.
  - See: [*The Impressive Growth of R*](https://stackoverflow.blog/2017/10/10/impressive-growth-r/), [*The Popularity of Data Science Software*](http://r4stats.com/articles/popularity/)
- Open-source (free!) with a global user-base spanning academia and industry.
  - "Do you want to be a profit source or a cost center?"

### Bridge to applied economics and other tools

- Already has all of the statistics and econometrics support, and is amazingly adaptable as a “glue” language to other
programming languages and APIs.
- The RStudio IDE and ecosystem allow for further, seemless integration.

### Path dependency

- It's also the language that I know (second) best.
- (Learning multiple languages is a good idea, though.)

---
# R code example (linear regression)

```r
fit = lm(mpg ~ wt, data = mtcars)
summary(fit)
```

```
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528,	Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10
```

---
# Base R plot

```r
par(mar = c(4, 4, 1, .1)) ## Just for nice plot margins on this slide deck
plot(mtcars$wt, mtcars$mpg)
abline(fit, col = "red")
```

---
# ggplot2

```r
library(ggplot2)
ggplot(data = mtcars, aes(x = wt, y = mpg)) + 
  geom_smooth(method = "lm", col = "red") + 
  geom_point() 
```

---
# Do `$\uparrow$` in GDP cause life expectancy to `$\uparrow$`?

- Let's use our new-found R knowledge to try to separate correlation from causation for a critical question in economics:
  - Does increasing the economic pie (GDP) lead to longer lives (life expectancy)?

- We can use the gapminder dataset to explore this question
- The [gapminder](https://github.com/jennybc/gapminder) dataset contains panel data on life expectancy, population size, and GDP per capita for 142 countries since the 1950s
- Any predictions about what we'll learn?

- Also, we can tap into generative AI to help us climb the steep learning curve of coding in a new language

---
class: inverse, center, middle
name: ggplot2

# More ggplot2
<html><div style='float:left'></div><hr color='#EB811B' size=1px width=796px></html>

---
# Install and load

Open up your laptops. For the remainder of this first lecture, we're going to play around with [ggplot2](https://ggplot2.tidyverse.org/) (i.e. livecoding) to answer our question about GDP and life expectancy.

If you don't have them already, install the `ggplot2` and `gapminder` packages via either:
- **Console:** Enter `install.packages(c("ggplot2", "gapminder"), dependencies=T)`.
- **RStudio:** Click the "Packages" tab in the bottom-right window pane. Then click "Install" and search for these two packages.
  
![](pics/install.gif)

---
# Install and load (cont.)

Once the packages are installed, load them into your R session with the `library()` function.

```r
library(ggplot2)
library(gapminder) ## We're just using this package for the gapminder data
```

Notice too that you don't need quotes around the package names any more. Reason: R now recognises these packages as defined objects with given names.

PS — A convenient way to combine the package installation and loading steps is with the [pacman package's](https://github.com/trinker/pacman) `p_load()` function. If you run `pacman::p_load(ggplot, gapminder)` it will first look to see whether it needs to install either package before loading them. Clever.
- We'll get to this later, but if you want to run a function from an (installed) package without loading it, you can use the `PACKAGE::package_function()` syntax.

---
name: ai
# Leverage ChatGPT and GitHub CoPilot

- I'm going to ask you to use a number of new technical tools in this class

- The documentation will be concise and oftentimes you won't know where to start

- For example, how in the world are you supposed to make a ggplot plot of the gapminder data if you've never used ggplot before?
  - What if you've never even opened R? How do you even look at the gapminder data?

- I strongly encourage you to use ChatGPT and GitHub CoPilot in all aspects of this class

- ChatGPT is a large language model (LLM) developed by OpenAI

- Create an account at http://chat.openai.com/

- ChatGPT is excellent at producing basic code and helping resolve bugs

- It serves as a bridge between the ideas in your head and crisp instructions for a computer (code)

---
# ChatGPT example

First, submit a [prompt on Mentimeter](https://www.mentimeter.com/app/presentation/bldvy2hwjfb483btveenvon9k7mbtkdm/emjehaixxc7c) 
that loads the gapminder data, displays some key checks, then makes a scatterplot of GDP per capita and life expectancy

**Human:**

Write an R script that does the following:

(a) Reads in the gapminder dataset using the library gapminder.

(b) Prints out the first 10 rows of the gapminder dataset.

(d) Plots a scatterplot of the gdpPercap by lifeExp using the gapminder dataset using ggplot.

---
# ChatGPT example (continued)

**Assistant:**

Certainly! Here's an R script that accomplishes these tasks using the gapminder dataset:

.scroll-box-8[

```r
# Load necessary libraries
library(gapminder)
library(ggplot2)

# Step (a) - Read in the gapminder dataset
data("gapminder")
gapminder_df <- gapminder

# Step (b) - Print first 10 rows
print(head(gapminder_df, 10))

# Step (c) - List unique countries
unique_countries <- unique(gapminder_df$country)
print(unique_countries)

# Step (d) - Plot scatterplot of gdpPercap by lifeExp
ggplot(gapminder_df, aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  labs(x = "GDP per Capita", y = "Life Expectancy") +
  ggtitle("Relationship between GDP per Capita and Life Expectancy")
```
]

This script assumes that the gapminder package is installed and loaded. It reads in the gapminder dataset, prints the first 10 rows, lists unique countries present in the dataset, and then creates a scatterplot of gdpPercap (GDP per Capita) against lifeExp (Life Expectancy) using ggplot2. Adjust the plot aesthetics as needed to suit your preferences.
***

Exported on 1/8/2024.

---
# How's the code run?

```
## # A tibble: 10 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
```

```
##   [1] Afghanistan              Albania                  Algeria                 
##   [4] Angola                   Argentina                Australia               
##   [7] Austria                  Bahrain                  Bangladesh              
##  [10] Belgium                  Benin                    Bolivia                 
##  [13] Bosnia and Herzegovina   Botswana                 Brazil                  
##  [16] Bulgaria                 Burkina Faso             Burundi                 
##  [19] Cambodia                 Cameroon                 Canada                  
##  [22] Central African Republic Chad                     Chile                   
##  [25] China                    Colombia                 Comoros                 
##  [28] Congo, Dem. Rep.         Congo, Rep.              Costa Rica              
##  [31] Cote d'Ivoire            Croatia                  Cuba                    
##  [34] Czech Republic           Denmark                  Djibouti                
##  [37] Dominican Republic       Ecuador                  Egypt                   
##  [40] El Salvador              Equatorial Guinea        Eritrea                 
##  [43] Ethiopia                 Finland                  France                  
##  [46] Gabon                    Gambia                   Germany                 
##  [49] Ghana                    Greece                   Guatemala               
##  [52] Guinea                   Guinea-Bissau            Haiti                   
##  [55] Honduras                 Hong Kong, China         Hungary                 
##  [58] Iceland                  India                    Indonesia               
##  [61] Iran                     Iraq                     Ireland                 
##  [64] Israel                   Italy                    Jamaica                 
##  [67] Japan                    Jordan                   Kenya                   
##  [70] Korea, Dem. Rep.         Korea, Rep.              Kuwait                  
##  [73] Lebanon                  Lesotho                  Liberia                 
##  [76] Libya                    Madagascar               Malawi                  
##  [79] Malaysia                 Mali                     Mauritania              
##  [82] Mauritius                Mexico                   Mongolia                
##  [85] Montenegro               Morocco                  Mozambique              
##  [88] Myanmar                  Namibia                  Nepal                   
##  [91] Netherlands              New Zealand              Nicaragua               
##  [94] Niger                    Nigeria                  Norway                  
##  [97] Oman                     Pakistan                 Panama                  
## [100] Paraguay                 Peru                     Philippines             
## [103] Poland                   Portugal                 Puerto Rico             
## [106] Reunion                  Romania                  Rwanda                  
## [109] Sao Tome and Principe    Saudi Arabia             Senegal                 
## [112] Serbia                   Sierra Leone             Singapore               
## [115] Slovak Republic          Slovenia                 Somalia                 
## [118] South Africa             Spain                    Sri Lanka               
## [121] Sudan                    Swaziland                Sweden                  
## [124] Switzerland              Syria                    Taiwan                  
## [127] Tanzania                 Thailand                 Togo                    
## [130] Trinidad and Tobago      Tunisia                  Turkey                  
## [133] Uganda                   United Kingdom           United States           
## [136] Uruguay                  Venezuela                Vietnam                 
## [139] West Bank and Gaza       Yemen, Rep.              Zambia                  
## [142] Zimbabwe                
## 142 Levels: Afghanistan Albania Algeria Angola Argentina Australia ... Zimbabwe
```

---
# How's the code run? (cont.)
<img src="01-Intro_files/figure-html/gapminder-chatgpt-eval2-1.png" style="display: block; margin: auto;" />

---
# Tips for using ChatGPT

- Be as specific as possible in your instructions
  - If you know the name of the variables in your dataset, use them

- Think of it as a more interactive version of Googling for the solution to a bug

- Try things iteratively and in small steps
  - If you're not sure how to do something, try to break it down into smaller steps
  - This is a good tip for coding in general

- Your brain is still the most powerful tool you have
  - ChatGPT is a tool to help you, not replace you
  - You will not get much mileage if you say, "Read in the gapminder dataset and do something interesting with it"

- Often it only "skeleton code" for you, so you'll need to fill in the blanks

---
# Elements of ggplot2

[Hadley Wickham's](http://hadley.nz/) ggplot2 is one of the most popular packages in the entire R canon. 
- It also happens to be built upon some deep visualization theory: i.e. Leland Wilkinson's [*The Grammar of Graphics*](https://www.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448).

There's a lot to say about ggplot2's implementation of this "grammar of graphics" approach, but the three key elements are:

1. Your plot ("the visualization") is linked to your variables ("the data") through various **aesthetic mappings**.

2. Once the aesthetic mappings are defined, you can represent your data in different ways by choosing different **geoms** (i.e. "geometric objects" like points, lines or bars).

3. You build your plot in **layers**.

</br>

That's kind of abstract. Let's break down the elements of ggplot2 in turn with some actual plots.

- As a shortcut, we'll use AI to write the basic code for us then we'll fill in the blanks.

---
# 1. Aesthetic mappings

```r
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + 
  geom_point()
```

<img src="01-Intro_files/figure-html/aesthetics0-1.png" style="display: block; margin: auto;" />
---
# 1. Aesthetic mappings (cont.)

```r
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) + 
  geom_point()
```

Focus on the top line, which contains the initialising `ggplot()` function call. This function accepts various arguments, including:
- Where the data come from (i.e. `data = gapminder`).
- What the aesthetic mappings are (i.e. `mapping = aes(x = gdpPercap, y = lifeExp)`).

The aesthetic mappings here are pretty simple: They just define an x-axis (GDP per capita) and a y-axis (life expecancy).
- To get a sense of the power and flexibility that comes with this approach, however, consider what happens if we add more aesthetics to the plot call...