Prepare a simple R package for distributing documented functions
Explain the terms Repository
, Dependency
, and Namespace
Implement testing in an R package
Collaboratively work on an R package on GitHub
Imagine you are analyzing some bio data. You have written some nifty scripts that have sped up your analysis significantly. Wouldn’t it be great if:
share
these with your colleagues?Document
them for your future self?accessible
to the entire scientific community?Welcome to the world of R packages!
industry-wide
practice for ensuring reproducibility
and consistency
in data analysis.shareable
collection of documented
code and/or data
- source
Some examples you might be familiar with:
Tidyverse
dplyr
tibble
tidyr
ggplot2
Makes functions/objects available.
Requires prefixing function/object with the package name: ::
.
Key Point: Attaching makes calling functions easy but risks conflicts with function names from other packages. Using ::
is explicit and safer.
OBS! Never use library()
inside your package! Because it can lead to unexpected behavior.
Reusable Code
: Avoid rewriting the same code for different projects.
Standardized Work
: Organize your analysis and code neatly.
Easy Documentation
: Maintain detailed documentation for every function and dataset.
Sharing & Collaboration
: Share your tools, analysis, and workflows seamlessly with peers.
- source
At its core, an R package
is essentially a collection of functions
.
And/or data
Functions are reusable blocks of code
designed to perform a specific task.
They accept parameter inputs
(arguments) and, after processing, return an output
.
Properly defined functions enhance code clarity
, facilitate debugging
, and foster modularity
.
improves clarity
:Avoid overwriting other function names
Avoid overwriting other function names
Avoid overwriting other function names
To resolve naming conflicts, utilize namespaces
.
scoped environment
where each package’s functions, data, and other objects reside.Avoid Clashes:
Ensures that functions or objects from one package won’t accidentally reference or override those from another package.Isolation:
Each package’s contributions are isolated, ensuring they work as intended.Using library()
lets R know which package’s tools you intend to use.
However, if multiple packages have tools with the same name, the most recently attached package takes precedence.
To prevent such overlaps, explicitly call functions using their namespaces:
See how R’s environment changes when packages are attached.
See how R’s environment changes when packages are attached.
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:utils" "package:datasets"
[7] "package:methods" "Autoloads" "package:base"
# Attach the 'MASS' package
library("MASS")
# Attach the 'dplyr' package
library("dplyr")
# Search path after attaching packages
search()
[1] ".GlobalEnv" "package:dplyr" "package:MASS"
[4] "package:stats" "package:graphics" "package:grDevices"
[7] "package:utils" "package:datasets" "package:methods"
[10] "Autoloads" "package:base"
Observation: As you load packages, they get added to the search path, affecting how R finds functions and objects.
Avoids Conflicts
: Multiple packages might have functions with the same name. Namespaces ensure there’s no confusion.
Explicit Code
: Clearly indicates the origin of functions, enhancing readability and clarity.
Ensures Stability
: Your code behaves as expected, even if you load multiple packages.
Roxygen skeleton
Roxygen skeleton
R now knows that stringr
is a dependency
in your package.
Including @importFrom stringr str_c
in the function description lets you use str_c
in your package with no issues. But keep stringr::
for explicit code.
Now what exactly is a dependency
?
A Dependency
is a package that another package relies on. It ensures that all functions and features run as expected.
They help maintain
the integrity
of a package when sharing or collaborating.
Do not build what is already built!
A Dependency
is a package that another package relies on. It ensures that all functions and features run as expected.
They help maintain
the integrity
of a package when sharing or collaborating.
Do not build what is already built!
All dependencies
are installed with your package. This can lead to bloating
.
Repositories
are storage locations for packages.
The two main repositories for R packages are CRAN
(Comprehensive R Archive Network) and Bioconductor
.
Many developers also use GitHub
as a platform to host and share their development versions of packages.
install.packages("devtools") # CRAN: The Comprehensive R Archive Network
devtools::install_bioc("pairedGSEA") # Bioconductor (but use BiocManager::install()
devtools::install_github("cyCombine") # GitHub
devtools::install_cran("dplyr") # CRAN again
# Side note: devtools uses the "remotes" package, i.e., remotes::install_<repo> does the same
install.packages("devtools") # CRAN: The Comprehensive R Archive Network
devtools::install_bioc("pairedGSEA") # Bioconductor (but use BiocManager::install())
devtools::install_github("cyCombine") # GitHub
devtools::install_cran("dplyr") # CRAN again
# Side note: devtools uses the "remotes" package, i.e., remotes::install_<repo> does the same
What if you want to include non-R
packages/code?
In R, you can integrate other programming languages to take advantage of their specific capabilities and packages.
Building packages with
devtools
usethis
roxygen2
testthat
# Create the package
devtools::create("package name")
# Create function script
usethis::use_r("function name")
# Include dependencies
usethis::use_package("package name")
# Include data in your package
usethis::use_data(object) # set internal = TRUE if data should be internal
usethis::use_data_raw("object", open = TRUE) # describe how it was cleaned
# Create test for your function
usethis::use_test("function name")
# Automatically write package documentation
devtools::document()
# Simulate library("your package")
devtools::load_all()
# Check that your package is installable
devtools::check()
The central dogma of molecular biology
- source
R for Bio Data Science