Using R for Common Scientific Tasks

# Using R for Common Scientific Tasks
### Eliot McKinley
### October 20, 2021

---

# Getting Started

.left[1) Download [R Studio Desktop](https://www.rstudio.com/products/rstudio/)]
.left[2a) Go to [my github](https://github.com/etmckinley/Coffey-Lab-R-Tutorial) and download the code]
.left[2b) Follow along here]

<br>

---

# Introduction

This tutorial gives examples of some common statistical and data visualization techniques that you may be used to in Excel or Prism using R. The advantages of using R include the ability to work with larger data sets, better control of you analysis methods, the availability of online tutorials and primers for almost anything you'd like to do, and, most importantly, everything will look better than using Excel or Prism.

Topics include:
1. Loading Data
2. Exploring and Summarizing Data
3. Saving Data
4. Basic Data Visualization
5. T-Tests, Wilcoxon Tests, and ANOVA
6. Correlations
7. PCA and Dimensionality Reduction
8. (Slightly More) Advanced Data Visualization
9. Making Publication Quality Plots and Panels

---
# Tidyverse

We will be working in the tidyverse, which is an ecosystem for data science. https://www.tidyverse.org/

We will be making all plots using the package {ggplot2} rather than the base R plotting. {ggplot2} provides an intuitive and powerful system for data visualization that is more flexible and beautiful than base R.

In the R Markdown file at [my github](https://github.com/etmckinley/Coffey-Lab-R-Tutorial) I go through all the packages you will need and install them. For this presentation, I'm going to assume that they are already installed.

```r
library(tidyverse)
```

```
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
```

```
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
```

```
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
```

---
class: inverse, center, middle
# Loading Data
How to get your data into R

---
# Loading Data
We are going to use the Palmer Penguins data set which is included in the package {palmerpenguins}. When the package is loaded you have access to the variable "penguins" which contains this data.

```r
library(palmerpenguins)
```
--
You can assign it to a different variable name.

```r
my_penguins = penguins
```
---
# Loading Data
### Reading a csv

More often your data is not available from a package so you need to read the data in to R. If your data is saved as a csv file you can pass the file path to your data into the read_csv function in order to import into R.

```r
my_penguins_csv = read_csv("./penguins.csv")
```

```
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   species = col_character(),
##   island = col_character(),
##   bill_length_mm = col_double(),
##   bill_depth_mm = col_double(),
##   flipper_length_mm = col_double(),
##   body_mass_g = col_double(),
##   sex = col_character(),
##   year = col_double()
## )
```
---

# Loading Data
### Reading an Excel file
If your data is saved as an Excel file, you have to install and load another library first, {redxl}. Then call the read_excel funtion

```r
library(readxl)

my_penguins_excel = read_excel("penguins.xlsx")
```

--
<br>
There are packages available to read all types of data, including fcs files for flow cytometry {flowCore} and google sheets {googlesheets4}.

---
class: inverse, center, middle
# Exploring and Summarizing Data

Looking at your data

---
# Exploring and Summarizing Data

Now that our data is loaded into R, we can do stuff with it. As all the data sets that we imported are identical, we will just stick with *my_penguins*. These are in the format of data frames.

There are a few different ways to view your data.

---

# Exploring and Summarizing Data
head() will show you the first 6 rows of data

```r
head(my_penguins)
```

```
## # A tibble: 6 × 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           NA            NA                 NA          NA <NA> 
## 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
## 6 Adelie  Torge…           39.3          20.6              190        3650 male 
## # … with 1 more variable: year <int>
```
---

# Exploring and Summarizing Data
glimpse() gives you a bit more information

```r
glimpse(my_penguins)
```

```
## Rows: 344
## Columns: 8
## $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex               <fct> male, female, female, NA, female, male, female, male…
## $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
```

---

# Exploring and Summarizing Data
summary() will give you some general summary info for each of your variables

```r
summary(my_penguins)
```

```
##       species          island    bill_length_mm  bill_depth_mm  
##  Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
##  Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
##  Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
##                                  Mean   :43.92   Mean   :17.15  
##                                  3rd Qu.:48.50   3rd Qu.:18.70  
##                                  Max.   :59.60   Max.   :21.50  
##                                  NA's   :2       NA's   :2      
##  flipper_length_mm  body_mass_g       sex           year     
##  Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
##  1st Qu.:190.0     1st Qu.:3550   male  :168   1st Qu.:2007  
##  Median :197.0     Median :4050   NA's  : 11   Median :2008  
##  Mean   :200.9     Mean   :4202                Mean   :2008  
##  3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
##  Max.   :231.0     Max.   :6300                Max.   :2009  
##  NA's   :2         NA's   :2
```

---

# Exploring and Summarizing Data
You can also do your own summaries. This will utilize the {dplyr} which provides many powerful tools to manipulate your data frames. In this case let's say we want to group the penguins by species and then calculate the mean and standard deviation for each species:

```r
my_penguins %>%  # %>% is a pipe, it applies a function to the data prior, in this case "my_penguins"
  group_by(species) %>% #this specifies species as the group
  summarize( #summarize specifies that you want to summarise each group
    mean_bill = mean(bill_length_mm, na.rm = TRUE), #na.rm will ignore any NAs in the data
    std_bill = sd(bill_length_mm, na.rm=TRUE))
```

```
## # A tibble: 3 × 3
##   species   mean_bill std_bill
##   <fct>         <dbl>    <dbl>
## 1 Adelie         38.8     2.66
## 2 Chinstrap      48.8     3.34
## 3 Gentoo         47.5     3.08
```
--
dplyr has many operators summarized in this cheat sheet: https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf
---
# Exploring and Summarizing Data

Let's say you want to summarize based upon species and year, it is as simple as adding year to your group_by() function call.

```r
my_penguins %>%  # %>% is a pipe, it applies a function to the data prior, in this case "my_penguins"
  group_by(species, year) %>% #this specifies species and year as the group
  summarize( #summarize specifies that you want to summarise each group
    mean_bill = mean(bill_length_mm, na.rm = TRUE), #na.rm will ignore any NAs in the data
    std_bill = sd(bill_length_mm, na.rm=TRUE))
```

```
## `summarise()` has grouped output by 'species'. You can override using the `.groups` argument.
```

```
## # A tibble: 9 × 4
## # Groups:   species [3]
##   species    year mean_bill std_bill
##   <fct>     <int>     <dbl>    <dbl>
## 1 Adelie     2007      38.8     2.47
## 2 Adelie     2008      38.6     2.98
## 3 Adelie     2009      39.0     2.56
## 4 Chinstrap  2007      48.7     3.47
## 5 Chinstrap  2008      48.7     3.62
## 6 Chinstrap  2009      49.1     3.10
## 7 Gentoo     2007      47.0     3.27
## 8 Gentoo     2008      46.9     2.64
## 9 Gentoo     2009      48.5     3.19
```

---
# Exploring and Summarizing Data

There are plenty of other things you can do with dplyr. For example, if you want to get rid of data earlier than 2008, you can filter by year.

```r
my_penguins %>%  # %>% is a pipe, it applies a function to the data prior, in this case "my_penguins"
  filter(year >= 2008) %>% #this filters out years prior to 2008
  group_by(species, year) %>% #this specifies species as the group
  summarize(#summarize specifies that you want to summarize each group
    mean_bill = mean(bill_length_mm, na.rm = TRUE),#na.rm will ignore any NAs in the data
    std_bill = sd(bill_length_mm, na.rm = TRUE)
  )
```

```
## `summarise()` has grouped output by 'species'. You can override using the `.groups` argument.
```

```
## # A tibble: 6 × 4
## # Groups:   species [3]
##   species    year mean_bill std_bill
##   <fct>     <int>     <dbl>    <dbl>
## 1 Adelie     2008      38.6     2.98
## 2 Adelie     2009      39.0     2.56
## 3 Chinstrap  2008      48.7     3.62
## 4 Chinstrap  2009      49.1     3.10
## 5 Gentoo     2008      46.9     2.64
## 6 Gentoo     2009      48.5     3.19
```

---
class: inverse, center, middle
# Saving Data

How to get you data out of R

---
# Saving Data

When creating summary tables above, we didn't assign them to a variable, so they were printed to the console. If we assign them to *my_penguins_summary* we now have a new variable that you may want to save for later use.

```r
my_penguins_summary = my_penguins %>%  # %>% is a pipe, it applies a function to the data prior, in this case "my_penguins"
  filter(year >= 2008) %>% #this filters out years prior to 2008
  group_by(species, year) %>% #this specifies species as the group
  summarize(#summarize specifies that you want to summarize each group
    mean_bill = mean(bill_length_mm, na.rm = TRUE),#na.rm will ignore any NAs in the data
    std_bill = sd(bill_length_mm, na.rm = TRUE)
  )
```

```
## `summarise()` has grouped output by 'species'. You can override using the `.groups` argument.
```
--
The most common way to save data is as a csv file. A csv file is versatile and can be read in any number of programs including Excel. In order to save a csv file we will use the write_csv function. You need to specify which variable you want to save and a path and/or filename to save it to.

```r
write_csv(my_penguins_summary, "penguins summary.csv")
```
---

# Saving Data

If you have large data sets and will continue working in R with them, it may be advantageous to save as an RDS file. RDS files compress data better than csv for large files so can save space and can also be opened quicker than a csv using readRDS().

```r
saveRDS(my_penguins_summary, "penguins summary.rds")
```
---
class: inverse, center, middle
# Basic Data Visualization

How to plot your data
---
# Basic Data Visualization
As mentioned before, we will be skipping plotting with base R and using {ggplot2} instead. ggplot2 works by layering plots (geoms) and annotations on a common coordinate system. There are many different built in plot types available including scatter plots, bar plots, box plots, dot plots, and violin plots.

There are many resources, including a fantastic [ggplot2 tutorial](https://cedricscherer.netlify.app/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/) by Cedric Scherer, a [cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/data-visualization.pdf) of available geoms, and a full [reference book](https://ggplot2-book.org/index.html).

We will start just plotting each replicate of bill length for each penguin species.

---
# Basic Data Visualization

.panelset[
.panel[.panel-name[Comments]
When calling ggplot, we have to give it some aesthetics. In this case we tell it the x-axis will be "species" and y-axis will be "bill_length_mm". We also have to tell it what type of plot we want, in this case geom_point() will give us some points.

You can call other geoms to get different plots and layer geoms on top of each other. 
]
.panel[.panel-name[Code]

```r
my_penguins %>%  # start with the data you want to plot
  ggplot(aes(x = species, y = bill_length_mm)) + # you have to establish aesthetics, in this case the x and y variables, to add another layer you use a "+"
  geom_point() # this adds each data point, NAs will be excluded
```

```
## Warning: Removed 2 rows containing missing values (geom_point).
```
]
.panel[.panel-name[Points]
<img src="R-tutorial-xaringnan_files/figure-html/unnamed-chunk-16-1.png" width="50%" height="50%" />
]
.panel[.panel-name[Code]

```r
my_penguins %>%  # start with the data you want to plot
  ggplot(aes(x = species, y = bill_length_mm)) + # you have to establish aesthetics, in this case the x and y variables, to add another layer you use a "+"
  geom_boxplot() # this adds a boxplot, NAs will be excluded
```

```
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
```
]
.panel[.panel-name[Boxplot]
<img src="R-tutorial-xaringnan_files/figure-html/unnamed-chunk-18-1.png" width="50%" height="50%" />
]
.panel[.panel-name[Code]

```r
my_penguins %>%  # start with the data you want to plot
  ggplot(aes(x = species, y = bill_length_mm)) + # you have to establish aesthetics, in this case the x and y variables, to add another layer you use a "+"
  geom_violin() # this adds a violin plot, NAs will be excluded
```

```
## Warning: Removed 2 rows containing non-finite values (stat_ydensity).
```
]
.panel[.panel-name[Violin]
<img src="R-tutorial-xaringnan_files/figure-html/unnamed-chunk-20-1.png" width="50%" height="50%" />
]
.panel[.panel-name[Code]

```r
my_penguins %>%  # start with the data you want to plot
  ggplot(aes(x = species, y = bill_length_mm)) + # you have to establish aesthetics, in this case the x and y variables, to add another layer you use a "+"
  geom_violin()+ # this adds a violin plot, NAs will be excluded
  geom_jitter()+ # this adds a jittered points plot
  geom_boxplot() # this adds a boxplot
```

```
## Warning: Removed 2 rows containing non-finite values (stat_ydensity).
```

```
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
```

```
## Warning: Removed 2 rows containing missing values (geom_point).
```
]
.panel[.panel-name[Combined]
<img src="R-tutorial-xaringnan_files/figure-html/unnamed-chunk-22-1.png" width="50%" height="50%" />
]
]
---

# Basic Data Visualization

.panelset[
.panel[.panel-name[Comments]
If you try to make a bar plot with geom_bar(), you will get an error since you can't have an x and y aesthetics without specifying a transformation or "stat". Stat identity will add up all the bill lenghts, which is likely not what you want.
]
.panel[.panel-name[Code]

```r
my_penguins %>%  
  ggplot(aes(x = species, y = bill_length_mm)) + 
  geom_bar(stat = "identity")
```

```
## Warning: Removed 2 rows containing missing values (position_stack).
```
]
.panel[.panel-name[Bar]
<img src="R-tutorial-xaringnan_files/figure-html/unnamed-chunk-24-1.png" width="50%" height="50%" />
]
.panel[.panel-name[Fixing the Bars]
To get a bar plot like you normally expect, with averages on the y-axis instead of sums, you use stat_summary which defaults to the mean.
]
.panel[.panel-name[Code]

```r
my_penguins %>%  
  ggplot(aes(x = species, y = bill_length_mm)) + 
  stat_summary(geom = "bar")
```

```
## Warning: Removed 2 rows containing non-finite values (stat_summary).
```

```
## No summary function supplied, defaulting to `mean_se()`
```
]
.panel[.panel-name[Bar]
<img src="R-tutorial-xaringnan_files/figure-html/unnamed-chunk-26-1.png" width="50%" height="50%" />
]
.panel[.panel-name[Add Errorbars]
You can add an error bar with another call to stat_summary to make dynamite plots, here it defaults to mean and standard error. Note that I put it above the bar function call so that the bar would be plotted on top of the errorbar. Order matters with {ggplot2}.
]
.panel[.panel-name[Code]

```r
my_penguins %>%  
  ggplot(aes(x = species, y = bill_length_mm)) + 
  stat_summary(geom = "errorbar") +
  stat_summary(geom = "bar")
```

```
## Warning: Removed 2 rows containing non-finite values (stat_summary).
```

```
## No summary function supplied, defaulting to `mean_se()`
```

```
## Warning: Removed 2 rows containing non-finite values (stat_summary).
```

```
## No summary function supplied, defaulting to `mean_se()`
```
]
.panel[.panel-name[Dynamite]
<img src="R-tutorial-xaringnan_files/figure-html/unnamed-chunk-28-1.png" width="50%" height="50%" />
]
]

---
class: inverse, center, middle

<br>

.left[However, [I highly recommend you do not use dynamite plots](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128), especially in publications. Bar plots hide the real distribution and number of replicates of the data and there are better ways to visualize data that will be discussed later.]
---
# Basic Data Visualization

.panelset[
.panel[.panel-name[Comments]
For time series data, you can change the x-axis to "year" and can add a color aesthetic to differentiate the penguin species and a line connecting the mean values.
]
.panel[.panel-name[Code]

```r
my_penguins %>%  
  ggplot(aes(x = year, y = bill_length_mm, color = species)) + 
  geom_point() + 
  geom_line(stat = "summary")
```

```
## Warning: Removed 2 rows containing non-finite values (stat_summary).
```

```
## No summary function supplied, defaulting to `mean_se()`
```

```
## Warning: Removed 2 rows containing missing values (geom_point).
```
]
.panel[.panel-name[Time Series Plot]
<img src="R-tutorial-xaringnan_files/figure-html/unnamed-chunk-30-1.png" width="50%" height="50%" />
]
]
---
# Basic Data Visualization

To save a plot, use the ggsave(). It will automatically save the last plot generated, or if you save a plot to a variable, you can specify which plot to save. You can specify image size as well.

```r
ggsave("last plot.png")
```

```
## Saving 5.5 x 4 in image
```

```
## No summary function supplied, defaulting to `mean_se()`
```

```r
plot_to_save = my_penguins %>%  # start with the data you want to plot
  ggplot(aes(x = species, y = bill_length_mm)) + # you have to establish aesthetics, in this case the x and y variables, to add another layer you use a "+"
  geom_violin(aes(color = species)) # this creates violin plots, NAs will be excluded

ggsave(
  "specific plot.png",
  plot = plot_to_save,
  width = 5,
  height = 5
)
```

---

How to test your data for significance
---
# T-tests, Wilcoxon Tests, and ANOVA
To run a t-test, you need two vectors to compare. If these are already columns in your data, then this is easy. Just use t.test() and pass the two columns.

```r
t.test(my_penguins$bill_length_mm, my_penguins$body_mass_g, paired = TRUE)
```

---
# T-tests, Wilcoxon Tests, and ANOVA
If the data is paired, you add "paired = TRUE".

```r
t.test(my_penguins$bill_length_mm, my_penguins$body_mass_g, paired = TRUE)
```

```
## 
## 	Paired t-test
## 
## data:  my_penguins$bill_length_mm and my_penguins$body_mass_g
## t = -96.269, df = 341, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4242.784 -4072.881
## sample estimates:
## mean of the differences 
##               -4157.832
```
---
# T-tests, Wilcoxon Tests, and ANOVA
Above, you can see that bill length is highly significantly different than body mass. But that is a nonsensical test, since one is a length and one is a weight. If you wanted to compare bill length between two species, you can just make two new vectors and then run t.test().

```r
gentoo = penguins %>% filter(species == "Gentoo") %>% pull(body_mass_g) #pull gives you a vector
chinstrap = penguins %>% filter(species == "Chinstrap") %>% pull(body_mass_g)

t.test(gentoo, chinstrap)
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  gentoo and chinstrap
## t = 20.628, df = 170.4, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1214.416 1471.440
## sample estimates:
## mean of x mean of y 
##  5076.016  3733.088
```
---
# T-tests, Wilcoxon Tests, and ANOVA
Alternatively, you can use parwise_t_test from {rstatix} to get pairwise comparisons between groups, which also allows for adjusting for multiple testing (e.g. Bonferroni).
--

```r
library(rstatix)
```

```r
my_penguins %>%
  pairwise_t_test(body_mass_g ~ species,
                  p.adjust.method = "bonferroni")
```

```
## # A tibble: 3 × 9
##   .y.         group1    group2       n1    n2        p p.signif    p.adj p.adj.signif
## * <chr>       <chr>     <chr>     <int> <int>    <dbl> <chr>       <dbl> <chr>       
## 1 body_mass_g Adelie    Chinstrap   152    68 6.31e- 1 ns       1   e+ 0 ns          
## 2 body_mass_g Adelie    Gentoo      152   124 5.42e-77 ****     1.63e-76 ****        
## 3 body_mass_g Chinstrap Gentoo       68   124 3.21e-56 ****     9.63e-56 ****
```
---
# T-tests, Wilcoxon Tests, and ANOVA
To use a non-parametric test, you do the same thing, but use wilcox.test() or pairwise_wilcox_test()

```r
wilcox.test(gentoo, chinstrap)
```

```
## 
## 	Wilcoxon rank sum test with continuity correction
## 
## data:  gentoo and chinstrap
## W = 8233, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
```

```r
my_penguins %>%
  pairwise_wilcox_test(body_mass_g ~ species)
```

```
## # A tibble: 3 × 9
##   .y.         group1    group2       n1    n2 statistic        p    p.adj p.adj.signif
## * <chr>       <chr>     <chr>     <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
## 1 body_mass_g Adelie    Chinstrap   152    68     4831  4.85e- 1 4.85e- 1 ns          
## 2 body_mass_g Adelie    Gentoo      152   124      400. 2.98e-42 8.94e-42 ****        
## 3 body_mass_g Chinstrap Gentoo       68   124      131  1.67e-28 3.34e-28 ****
```
---
# T-tests, Wilcoxon Tests, and ANOVA
ANOVA is similar, however you have to use summary to see the results

```r
penguin_anova = aov(body_mass_g ~ species, data = my_penguins)
summary(penguin_anova)
```

```
##              Df    Sum Sq  Mean Sq F value Pr(>F)    
## species       2 146864214 73432107   343.6 <2e-16 ***
## Residuals   339  72443483   213698                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness
```

---
class: inverse, center, middle
# Correlations

How to see if your variables are related
---
# Correlations
.panelset[
.panel[.panel-name[Comments]
Correlations with cor.test() work similarly to t-tests, but you need have the same number of measurements for each vector. You can also specify which correlation method you'd like to calculate, Pearson's is the default.  
]
.panel[.panel-name[Pearson]

```r
cor.test(my_penguins$bill_length_mm, my_penguins$body_mass_g, use = "complete.obs") #"complete.obs here specifies that you want to use only the data with measurements for each variable
```

```
## 
## 	Pearson's product-moment correlation
## 
## data:  my_penguins$bill_length_mm and my_penguins$body_mass_g
## t = 13.654, df = 340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5220040 0.6595358
## sample estimates:
##       cor 
## 0.5951098
```
]
.panel[.panel-name[Spearman]

```r
cor.test(
  my_penguins$bill_length_mm,
  my_penguins$body_mass_g,
  use = "complete.obs",
  method = "spearman"
)
```

```
## Warning in cor.test.default(my_penguins$bill_length_mm,
## my_penguins$body_mass_g, : Cannot compute exact p-value with ties
```

```
## 
## 	Spearman's rank correlation rho
## 
## data:  my_penguins$bill_length_mm and my_penguins$body_mass_g
## S = 2774758, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.5838003
```
]
]
---
#Correlations
.panelset[
.panel[.panel-name[Comments]
You can quickly get visualizations and correlations from each variable using ggpairs from {GGgally}.
]
.panel[.panel-name[Code]

```r
library(GGally)
```

```
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
```

```r
ggpairs(my_penguins)
```

```
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).

## Warning: Removed 2 rows containing non-finite values (stat_boxplot).

## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
```

```
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
```

```
## Warning: Removed 2 rows containing non-finite values (stat_bin).
```

```
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
```

```
## Warning: Removed 2 rows containing non-finite values (stat_bin).
```

```
## Warning: Removed 2 rows containing non-finite values (stat_density).
```

```
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values

## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
```

```
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
```

```
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 2 rows containing missing values
```

```
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
```

```
## Warning: Removed 2 rows containing non-finite values (stat_bin).
```

```
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
```

```
## Warning: Removed 2 rows containing non-finite values (stat_bin).
```

```
## Warning: Removed 2 rows containing missing values (geom_point).
```