Drinking from R hose

Harness Data Coding with R language

Masumbuko Semba

2024-01-23

Prerequisites

  1. Read the R for Data Science is a practical guide introducing R programming for data analysis and visualization, emphasizing the tidyverse approach and providing hands-on examples.”

  2. Also the Modern R with the tidyverse teaches the Modern R instead of “just” R and you will learn how to use modern packages (mostly those from the tidyverse) and concepts,

  3. Practical Spatial Data that orients you the basic of R with a focus on spatial data in coastal and marine environment

  4. Geospatial Technology and Spatial Analysis in R entail the new tools and package for the modern spatial data handling and manipulation

Key concepts

  • R, a statistical programming language, empowers us to craft engaging stories through data, but mastery requires a gradual learning curve like any language.

  • Our project workflow encompasses planning, simulating, acquiring, exploring, and sharing.

  • Effective learning in R involves starting with a small project, breaking down steps, studying others’ code, and advancing with each project completed. Progress to the next project to enhance skills gradually.

Software and Packages

  • Base R (R Core Team 2023)

  • Core tidyverse (Wickham et al. 2019)

    1. dplyr (Wickham et al. (2023))
    2. ggplot2 (Wickham (2016))
    3. tidyr (Wickham, Vaughan, and Girlich (2023))
    4. stringr (Wickham (2023))
    5. readr (Wickham, Hester, and Bryan (2023))
  • janitor (Firke (2023))

  • knitr (Xie (2015))

  • lubridate (Grolemund and Wickham (2011))

  • wbstats (Piburn (2020))

The concept

  • The concept is based on five itmes—plan, simulate, acquire, explore and share

Learning Agenda

  1. Get familiar with R and Rstudio

  2. Data structure and data types

  3. Reading and writing data in Rstudio

  4. Tidying and Data manipulation with tidyverse

  5. Plotting and Visualization

  6. Descriptive Statistics

  7. Inferential Statistics

  8. Modelling and simulation

  9. Spatial Handling and Analysis

  10. Further topics

    1. GIt and Github
    2. Reproducibility with Quarto
    3. Websites and blog
    4. Using python from Rstudio
    5. Generating HTML, PDF and Word Reports

References

Firke, Sam. 2023. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.
Piburn, Jesse. 2020. Wbstats: Programmatic Access to the World Bank API. Oak Ridge, Tennessee: Oak Ridge National Laboratory. https://doi.org/10.11578/dc.20171025.1827.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2023. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2023. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2023. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.