About

This tutorial demonstrates the R package cpam for the analysis of time series omics data. It serves as a basic introduction to the package. There are also two detailed case studies using real world data:

These case studies and several simulation studies are presented in the accompanying manuscript by Yates et al. (2024).

Data

The data for the following examples have been simulated based on empirical RNA-seq data. These data are gene-level counts from a case-only experiment with 6 time points and 5 replicates per time point. The code to reproduce the data is available in this repository R/simulate_counts_example.R.

Installation

You can install cpam from GitHub using:

devtools::install_github("l-a-yates/cpam")

Getting started

Load packages

library(cpam)  
library(dplyr)
library(readr)
library(tidyr)
library(stringr)
library(ggplot2)

Experimental design

The experimental design for this example loads is included in the cpam package. Simply type the folowing to see the design.

exp_design_example

Count matrix

The count data for this example are provided as a matrix. Let’s take a look at the first few rows.

# convert to tibble (for display only)
as.data.frame(count_matrix_example) %>% head %>% tibble::rownames_to_column("target_id")

Fitting `cpam`

To fit the models, we first prepare the cpam object, then compute p-values, estimate changepoints, and select the shape for each gene In this simple example, the simulated data are gene-level, that is we do not have isoform-level counts. As such, we leave NULL the transcript-to-gene mapping (t2g) and we set gene_level = T.

  cpo <- prepare_cpam(exp_design = exp_design_example,
                      count_matrix = count_matrix_example,
                      model_type = "case-only",
                      t2g = NULL,
                      gene_level = T,
                      num_cores = 4)
  cpo <- compute_p_values(cpo) # 6 seconds
  cpo <- estimate_changepoint(cpo) # 4 seconds
  cpo <- select_shape(cpo) # 5 seconds

We can look at a summary of the fitted cpam object

cpo
## cpam object
## -----------
## case-only time series
## 30 samples
## 6 time points
## Counts aggregated for gene-level inference

Although it cannot be launched here within the online tutorial, you can launch the Shiny app on your own computer to visualise the results interactively using visualise(cpo).

Result tables

The results of the analysis are summarised using the results function.

results(cpo)

The generated results can be filtered by specifying minimum counts, minimum log-fold changes, and maximum p-values. For example, to return only the transcripts with a log-fold change greater than 2, at least 10 counts, and a \(p\)-value less than 0.01, we can run

results(cpo, min_count = 10, min_lfc = 1, p_threshold = 0.01)

Plotting genes and transcripts

A single gene can be plotted using the plot_cpam function. Here we plot the gene g063

plot_cpam(cpo, gene_id = "g063")

The subtitle shows (1,cv) indicating a changepoint at first time point (i.e., the gene responds immediately) and a convex ('cv') shaped trend.

Let’s look for a gene that has a more complex trend. Unconstrained shapes in cpam are denoted by 'tp'. We can filter the results for genes with unconstrained shapes and plot one of them.

results(cpo) %>% 
  filter(shape == "tp")

We plot the first gene in the list.

plot_cpam(cpo, gene_id = "g210")

This selection of 'tp' suggests that the trend for this gene does not conform to one of the simpler shape types that cpam uses. We can exclude 'tp' as an option and force cpam to choose among the simpler forms by setting shape_type = "shape2" in the plot_cpam function ("shape1", the default, allows the 'tp'). This choice can be useful for analysis such as clustering. For example:

plot_cpam(cpo, gene_id = "g210",shape_type = "shape2")

Here a monotonic increasing concave shape (‘micv’) is chosen, and we can see this trend deviates from the data substantially more that the unconstrained shape. See the manuscript for more details on the shape types available in cpam.

Next we look for a gene with a changepoint by filtering for genes with changepoints at the third time point.

results(cpo) %>% 
  filter(cp == 3)

Again, we plot the first gene in the list.

plot_cpam(cpo, gene_id = "g013")

Clusters

The results function can be used to generate clusters according to selected filters. The plot_cluster function can then be used to visualise the clusters. With such a small simulated data set, we don’t have many genes in each cluster, but we can try a few different clustering options to get an idea of how the function works.

res <- results(cpo)
plot_cluster(cpo, res, changepoints = 1, shapes = c("cv"))

There are 19 genes with a concave shape and a changepoint at the first time point.

More than one shape or changepoint can be provided. For example:

plot_cluster(cpo, res, changepoints = 2, shapes = c("dlin","mdcx"))

There are just four genes with decreasing linear or monotonic decreasing convex shapes which have a changepoint at the second time point.

Clustering can be further refined based on, for example, the rate at which the above transcripts attain their maximum values. We illustrate advanced refinements such as this case study.

Many more options

This is just a simple example to get you started. The package has many more features and options. Check out the two case studies to see how cpam can be used to analyse real-world data:

Session Info

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Pop!_OS 22.04 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
##  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8    LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Australia/Hobart
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
## [1] ggplot2_3.5.1   stringr_1.5.1   tidyr_1.3.1     readr_2.1.5     dplyr_1.1.4     cpam_0.0.0.9000
## 
## loaded via a namespace (and not attached):
##  [1] SummarizedExperiment_1.35.1 gtable_0.3.6                xfun_0.46                   bslib_0.9.0                
##  [5] lattice_0.22-6              Biobase_2.65.0              bspm_0.5.7                  tzdb_0.4.0                 
##  [9] vctrs_0.6.5                 tools_4.4.2                 generics_0.1.3              stats4_4.4.2               
## [13] parallel_4.4.2              pbmcapply_1.5.1             tibble_3.2.1                highr_0.11                 
## [17] pkgconfig_2.0.3             Matrix_1.7-1                RColorBrewer_1.1-3          S4Vectors_0.43.2           
## [21] scam_1.2-18                 lifecycle_1.0.4             GenomeInfoDbData_1.2.12     farver_2.1.2               
## [25] compiler_4.4.2              statmod_1.5.0               munsell_0.5.1               DESeq2_1.45.3              
## [29] codetools_0.2-20            GenomeInfoDb_1.41.1         htmltools_0.5.8.1           sass_0.4.9                 
## [33] yaml_2.3.10                 pillar_1.10.1               crayon_1.5.3                jquerylib_0.1.4            
## [37] BiocParallel_1.39.0         limma_3.61.8                DelayedArray_0.31.10        cachem_1.1.0               
## [41] abind_1.4-8                 nlme_3.1-166                aggregation_1.0.1           locfit_1.5-9.10            
## [45] tidyselect_1.2.1            digest_0.6.37               stringi_1.8.4               purrr_1.0.4                
## [49] labeling_0.4.3              splines_4.4.2               fastmap_1.2.0               grid_4.4.2                 
## [53] SparseArray_1.5.30          colorspace_2.1-1            cli_3.6.4                   magrittr_2.0.3             
## [57] S4Arrays_1.5.5              edgeR_4.3.7                 withr_3.0.2                 scales_1.3.0               
## [61] UCSC.utils_1.1.0            rmarkdown_2.27              XVector_0.45.0              httr_1.4.7                 
## [65] matrixStats_1.5.0           hms_1.1.3                   evaluate_0.24.0             knitr_1.48                 
## [69] GenomicRanges_1.57.1        IRanges_2.39.2              mgcv_1.9-1                  rlang_1.1.5                
## [73] Rcpp_1.0.14                 glue_1.8.0                  BiocGenerics_0.51.0         rstudioapi_0.17.0          
## [77] jsonlite_1.8.9              R6_2.6.1                    MatrixGenerics_1.17.0       zlibbioc_1.51.1

Yates, Luke A., Jazmine L. Humphreys, Michael A. Charleston, and Steven M. Smith. 2024. “Shape-Constrained, Changepoint Additive Models for Time Series Omics Data with cpam,” December. https://doi.org/10.1101/2024.12.22.630003.

`cpam`: an introductory example

Luke Yates

2025-02-24

About

Data

Installation

Getting started

Load packages

Experimental design

Count matrix

Fitting `cpam`

Result tables

Plotting genes and transcripts

Clusters

Many more options

Session Info

cpam: an introductory example

Luke Yates

2025-02-24

About

Data

Installation

Getting started

Load packages

Experimental design

Count matrix

Fitting cpam

Result tables

Plotting genes and transcripts

Clusters

Many more options

Session Info

`cpam`: an introductory example

Fitting `cpam`