02:30
Bending data with tidyverse to answer your questions
2024-03-07
02:30
01:30
“Happy families are all alike; every unhappy family is unhappy in its own way.” —- Leo Tolstoy
“Tidy datasets are all alike, but every messy dataset is messy in its own way.” —- Hadley Wickham
Loading required package: tidyverse
Warning: package 'ggplot2' was built under R version 4.3.1
Warning: package 'purrr' was built under R version 4.3.1
Warning: package 'dplyr' was built under R version 4.3.2
Warning: package 'stringr' was built under R version 4.3.2
Warning: package 'lubridate' was built under R version 4.3.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
02:30
Tidy can mean neat and well-organized, clean and orderly in appearance, methodical and efficient, or sufficient and adequate depending on the context.
There are basically three principles that we can follow to make a tidy dataset.
Tip
The tidyr package is designed to structure to make data frame tidy
Many datasets that you receive are untidy and will require some work on your end. There are several reasons why a dataset is messy.
We are going to look for the function that are regularly used to tidy the data frames. These inlude:
01:30
%>%
%>%
allows to combine multiple operations in R into a single sequential chain of actions01:30
Wide format organize data where each observation is represented by a single row, and each variable has its own column.
Long format organize data where observations are represented by multiple rows and columns represent variabels.