Lecture Lab 8

Søren Helweg Dam

R packages

Lab 8 Learning Objectives

  • Prepare a simple R package for distributing documented functions

  • Explain the terms Repository, Dependency, and Namespace

  • Implement testing in an R package

  • Collaboratively work on an R package on GitHub

Why R Packages?

Imagine you are analyzing some bio data. You have written some nifty scripts that have sped up your analysis significantly. Wouldn’t it be great if:

  • You could easily share these with your colleagues?
  • Document them for your future self?
  • Make them accessible to the entire scientific community?

Welcome to the world of R packages!


  • In fact, R packages are an industry-wide practice for ensuring reproducibility and consistency in data analysis.

Today’s lab

  • What is an R package?
  • Using an R package
  • Building an R package
  • Namespace
  • Dependencies
  • Repositories
  • R package in 1-2-3
  • The exercises

What is an R package?

  • A shareable collection of documented code and/or data


- source

R package examples

Some examples you might be familiar with:

  • Tidyverse
    • dplyr
    • tibble
    • tidyr
    • ggplot2

Using an R Package

Loading

Attaching

Using an R Package

Loading

  • Makes functions/objects available.

  • Requires prefixing function/object with the package name: ::.

dplyr::mutate()

Attaching

Using an R Package

Loading

  • Makes functions/objects available.

  • Requires prefixing function/object with the package name: ::.

dplyr::mutate()

Attaching

  • Adds the package to the R search path.

  • Functions/objects can be used directly without using ::.

library("dplyr")
mutate()

Using an R Package

Loading

  • Makes functions/objects available.

  • Requires prefixing function/object with the package name: ::.

dplyr::mutate()

Attaching

  • Adds the package to the R search path.

  • Functions/objects can be used directly without using ::.

library("dplyr")


Key Point: Attaching makes calling functions easy but risks conflicts with function names from other packages. Using :: is explicit and safer.

OBS! Never use library() inside your package! Because it can lead to unexpected behavior.

Building an R package

R package benefits

  • Reusable Code: Avoid rewriting the same code for different projects.

  • Standardized Work: Organize your analysis and code neatly.

  • Easy Documentation: Maintain detailed documentation for every function and dataset.

  • Sharing & Collaboration: Share your tools, analysis, and workflows seamlessly with peers.

R package structure


- source

Building an R package

At its core, an R package is essentially a collection of functions.

And/or data

Introduction to Functions

  • Functions are reusable blocks of code designed to perform a specific task.

  • They accept parameter inputs (arguments) and, after processing, return an output.

  • Properly defined functions enhance code clarity, facilitate debugging, and foster modularity.

fun_name <- function(param1, param2 = 2){
  # Do stuff
  output <- paste(param1, param2)
  # Return stuff
  return(output)
}

Using functions in a package

  • Explicit parameters and arguments improves clarity:
# Good practice
fun_name(param1 = "something",
         param2 = 2)
[1] "something 2"
  • Using default arguments:
# Often fine practice
# Here param1 = "something_else" and param2 = 2
fun_name("something_else")
[1] "something_else 2"

Caution with function names

Avoid overwriting other function names

mean(1:5)
[1] 3

Caution with function names

Avoid overwriting other function names

mean(1:5)
[1] 3
mean <- function(vector){
  result <- sum(vector)
  return(result)
}
mean(1:5)
[1] 15

Caution with function names

Avoid overwriting other function names

mean(1:5)
[1] 3
mean <- function(vector){
  result <- sum(vector)
  return(result)
}
mean(1:5)
[1] 15

To resolve naming conflicts, utilize namespaces.

base::mean(1:5)
[1] 3
# Use namespaces with package::function()
# Note that "base" is an R package

Namespace

Namespace: An Introduction

  • Definition:
    • A namespace in R defines a scoped environment where each package’s functions, data, and other objects reside.
  • Purpose:
    • Avoid Clashes: Ensures that functions or objects from one package won’t accidentally reference or override those from another package.
    • Isolation: Each package’s contributions are isolated, ensuring they work as intended.

Seeing Namespace in action

Using library() lets R know which package’s tools you intend to use.

However, if multiple packages have tools with the same name, the most recently attached package takes precedence.

library("dplyr")
library("MASS")
select() # MASS::select()
library("MASS")
library("dplyr")
select() # dplyr::select()

To prevent such overlaps, explicitly call functions using their namespaces:

dplyr::select()
MASS::select()

The Namespace Search Path

See how R’s environment changes when packages are attached.

# Initial search path
search()
[1] ".GlobalEnv"        "package:stats"     "package:graphics" 
[4] "package:grDevices" "package:utils"     "package:datasets" 
[7] "package:methods"   "Autoloads"         "package:base"     

The Namespace Search Path

See how R’s environment changes when packages are attached.

# Initial search path
search()
[1] ".GlobalEnv"        "package:stats"     "package:graphics" 
[4] "package:grDevices" "package:utils"     "package:datasets" 
[7] "package:methods"   "Autoloads"         "package:base"     
# Attach the 'MASS' package
library("MASS")

# Attach the 'dplyr' package
library("dplyr")

# Search path after attaching packages
search()
 [1] ".GlobalEnv"        "package:dplyr"     "package:MASS"     
 [4] "package:stats"     "package:graphics"  "package:grDevices"
 [7] "package:utils"     "package:datasets"  "package:methods"  
[10] "Autoloads"         "package:base"     

Observation: As you load packages, they get added to the search path, affecting how R finds functions and objects.

So why is Namespace Important?

  1. Avoids Conflicts: Multiple packages might have functions with the same name. Namespaces ensure there’s no confusion.

  2. Explicit Code: Clearly indicates the origin of functions, enhancing readability and clarity.

  3. Ensures Stability: Your code behaves as expected, even if you load multiple packages.

Specifying Namespace in your package

Roxygen skeleton

#' Title
#'
#' @param param1 
#' @param param2 
#'
#' @return
#' @export
#'
#' @examples
fun_name <- function(param1, param2 = 2){
  # Do stuff
  output <- stringr::str_c(param1, param2, sep  = " ")
  # Return stuff
  return(output)
}

Specifying Namespace in your package

Roxygen skeleton

#' Title
#'
#' @param param1 
#' @param param2 
#' @importFrom stringr str_c
#'
#' @return string
#' @export
fun_name <- function(param1, param2 = 2){
  # Do stuff
  output <- stringr::str_c(param1, param2, sep  = " ")
  # Return stuff
  return(output)
}

R now knows that stringr is a dependency in your package.

Including @importFrom stringr str_c in the function description lets you use str_c in your package with no issues. But keep stringr:: for explicit code.

Now what exactly is a dependency?

Dependencies

Dependencies: Why They Matter

  • A Dependency is a package that another package relies on. It ensures that all functions and features run as expected.

  • They help maintain the integrity of a package when sharing or collaborating.

    • They are installed with your package.
  • Do not build what is already built!

Dependencies: Why They Matter

  • A Dependency is a package that another package relies on. It ensures that all functions and features run as expected.

  • They help maintain the integrity of a package when sharing or collaborating.

    • They are installed with your package.
  • Do not build what is already built!

    • Unless…

Dependencies: A word of caution

All dependencies are installed with your package. This can lead to bloating.

Dependency network - Tidyverse

Repositories

Repositories: A Brief Overview

  • Repositories are storage locations for packages.

  • The two main repositories for R packages are CRAN (Comprehensive R Archive Network) and Bioconductor.

  • Many developers also use GitHub as a platform to host and share their development versions of packages.

Repositories: Installing packages

install.packages("devtools")          # CRAN: The Comprehensive R Archive Network

devtools::install_bioc("pairedGSEA")  # Bioconductor (but use BiocManager::install()

devtools::install_github("cyCombine") # GitHub

devtools::install_cran("dplyr")       # CRAN again

# Side note: devtools uses the "remotes" package, i.e., remotes::install_<repo> does the same

Repositories: Installing packages

install.packages("devtools")          # CRAN: The Comprehensive R Archive Network

devtools::install_bioc("pairedGSEA")  # Bioconductor (but use BiocManager::install())

devtools::install_github("cyCombine") # GitHub

devtools::install_cran("dplyr")       # CRAN again

# Side note: devtools uses the "remotes" package, i.e., remotes::install_<repo> does the same


What if you want to include non-R packages/code?

Integrating Python and C++ in Your R Package

In R, you can integrate other programming languages to take advantage of their specific capabilities and packages.


Python in R

library("reticulate")

py_run_string("import numpy as np")

py_run_string("result = np.mean([1, 2, 3, 4, 5])")

py_run_string("print('Mean:', result)")
Mean: 3.0

C++ in R

library("Rcpp")
cppFunction('
  int sumC(int a, int b) {
    return a + b;
  }
')
sumC(5, 6)
[1] 11

Building an R package as easy as 1-2-3

Standing on the shoulder of giants

Building packages with

  • devtools
  • usethis
  • roxygen2
  • testthat

The 1-2-3 of R packages

# Create the package
devtools::create("package name")
# Create function script
usethis::use_r("function name")
# Include dependencies
usethis::use_package("package name")
# Include data in your package
usethis::use_data(object) # set internal = TRUE if data should be internal
usethis::use_data_raw("object", open = TRUE) # describe how it was cleaned
# Create test for your function
usethis::use_test("function name")
# Automatically write package documentation
devtools::document()
# Simulate library("your package")
devtools::load_all()
# Check that your package is installable
devtools::check()

Exercises

Build your own R package

The central dogma of molecular biology


- source

Break, then exercises!