Let's try something

Exercise 1

  1. Write code to calculate the average age of children in the data using
    • data/raw/mother_survey.csv
    • data/tidy/child.csv
  2. Write code to calculate the average age of mothers in the data using
    • data/raw/mother_long.csv
    • data/tidy/mother.rds
    • data/analysis/child.csv

Tidy data

Some semantics

  1. Data can be acquired in all shapes and sizes
  2. In our line of work, it is mostly stored as one or multiple data tables, like the ones we were just working on
  3. Data tables organize information into rows and columns
  4. This kind of data is often called tabular or rectangular data

Can you think of an example of non-tabular data?

Some semantics

  • A dataset is a collection of values
  • Every value belongs to a variable and an observation
  • A variable contains all values that measure the same underlying attribute (like height, temperature, duration) across units
  • An observation contains all values measured on the same unit (like a person, or a day, or a race) across attributes

Tidying data

Why tidy data?

Questions?