Prepare a simple R package for distributing documented functions
Explain the terms Repository, Dependency, and Namespace
Implement testing in an R package
Collaboratively work on an R package on GitHub
Imagine you are analyzing some bio data. You have written some nifty scripts that have sped up your analysis significantly. Wouldn’t it be great if:
share these with your colleagues?Document them for your future self?accessible to the entire scientific community?Welcome to the world of R packages!
industry-wide practice for ensuring reproducibility and consistency in data analysis.shareable collection of documented code and/or data
- source
Some examples you might be familiar with:
Tidyverse
dplyrtibbletidyrggplot2Makes functions/objects available.
Requires prefixing function/object with the package name: ::.
Key Point: Attaching makes calling functions easy but risks conflicts with function names from other packages. Using :: is explicit and safer.
OBS! Never use library() inside your package! Because it can lead to unexpected behavior.
Reusable Code: Avoid rewriting the same code for different projects.
Standardized Work: Organize your analysis and code neatly.
Easy Documentation: Maintain detailed documentation for every function and dataset.
Sharing & Collaboration: Share your tools, analysis, and workflows seamlessly with peers.
- source
At its core, an R package is essentially a collection of functions.
And/or data
Functions are reusable blocks of code designed to perform a specific task.
They accept parameter inputs (arguments) and, after processing, return an output.
Properly defined functions enhance code clarity, facilitate debugging, and foster modularity.
improves clarity:Avoid overwriting other function names
Avoid overwriting other function names
Avoid overwriting other function names
To resolve naming conflicts, utilize namespaces.
scoped environment where each package’s functions, data, and other objects reside.Avoid Clashes: Ensures that functions or objects from one package won’t accidentally reference or override those from another package.Isolation: Each package’s contributions are isolated, ensuring they work as intended.Using library() lets R know which package’s tools you intend to use.
However, if multiple packages have tools with the same name, the most recently attached package takes precedence.
To prevent such overlaps, explicitly call functions using their namespaces:
See how R’s environment changes when packages are attached.
See how R’s environment changes when packages are attached.
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:utils" "package:datasets"
[7] "package:methods" "Autoloads" "package:base"
# Attach the 'MASS' package
library("MASS")
# Attach the 'dplyr' package
library("dplyr")
# Search path after attaching packages
search() [1] ".GlobalEnv" "package:dplyr" "package:MASS"
[4] "package:stats" "package:graphics" "package:grDevices"
[7] "package:utils" "package:datasets" "package:methods"
[10] "Autoloads" "package:base"
Observation: As you load packages, they get added to the search path, affecting how R finds functions and objects.
Avoids Conflicts: Multiple packages might have functions with the same name. Namespaces ensure there’s no confusion.
Explicit Code: Clearly indicates the origin of functions, enhancing readability and clarity.
Ensures Stability: Your code behaves as expected, even if you load multiple packages.
Roxygen skeleton
Roxygen skeleton
R now knows that stringr is a dependency in your package.
Including @importFrom stringr str_c in the function description lets you use str_c in your package with no issues. But keep stringr:: for explicit code.
Now what exactly is a dependency?
A Dependency is a package that another package relies on. It ensures that all functions and features run as expected.
They help maintain the integrity of a package when sharing or collaborating.
Do not build what is already built!
A Dependency is a package that another package relies on. It ensures that all functions and features run as expected.
They help maintain the integrity of a package when sharing or collaborating.
Do not build what is already built!
All dependencies are installed with your package. This can lead to bloating.
Repositories are storage locations for packages.
The two main repositories for R packages are CRAN (Comprehensive R Archive Network) and Bioconductor.
Many developers also use GitHub as a platform to host and share their development versions of packages.
install.packages("devtools") # CRAN: The Comprehensive R Archive Network
devtools::install_bioc("pairedGSEA") # Bioconductor (but use BiocManager::install()
devtools::install_github("cyCombine") # GitHub
devtools::install_cran("dplyr") # CRAN again
# Side note: devtools uses the "remotes" package, i.e., remotes::install_<repo> does the sameinstall.packages("devtools") # CRAN: The Comprehensive R Archive Network
devtools::install_bioc("pairedGSEA") # Bioconductor (but use BiocManager::install())
devtools::install_github("cyCombine") # GitHub
devtools::install_cran("dplyr") # CRAN again
# Side note: devtools uses the "remotes" package, i.e., remotes::install_<repo> does the sameWhat if you want to include non-R packages/code?
In R, you can integrate other programming languages to take advantage of their specific capabilities and packages.
Building packages with
devtoolsusethisroxygen2testthat# Create the package
devtools::create("package name")
# Create function script
usethis::use_r("function name")
# Include dependencies
usethis::use_package("package name")
# Include data in your package
usethis::use_data(object) # set internal = TRUE if data should be internal
usethis::use_data_raw("object", open = TRUE) # describe how it was cleaned
# Create test for your function
usethis::use_test("function name")
# Automatically write package documentation
devtools::document()
# Simulate library("your package")
devtools::load_all()
# Check that your package is installable
devtools::check()The central dogma of molecular biology
- source
R for Bio Data Science