cpam
: an introductory
exampleThis tutorial demonstrates the R package cpam
for the
analysis of time series omics data. It serves as a basic introduction to
the package. There are also two detailed case studies using real world
data:
These case studies and several simulation studies are presented in the accompanying manuscript by Yates et al. (2024).
The data for the following examples have been simulated based on
empirical RNA-seq data. The code to reproduce the data is available in
this repository R/simulate_counts_example.R
.
The experimental design for this example loads with the
cpam
package.
The design comprises 30 case-only samples from 6 time points with 5 replicates per time point.
The count data for this example are provided as a matrix. Let’s take a look at the first few rows.
# convert to tibble (for display only)
as.data.frame(count_matrix_example) %>% head %>% tibble::rownames_to_column("target_id")
cpam
To fit the models, we first prepare the cpam
object,
then compute p-values, estimate changepoints, and select the shape for
each gene In this simple example, the simulated data are gene-level,
that is we do not have isoform-level counts. As such, we leave NULL the
transcript-to-gene mapping (t2g
) and we set
gene_level = T
.
cpo <- prepare_cpam(exp_design = exp_design_example,
count_matrix = count_matrix_example,
model_type = "case-only",
t2g = NULL,
gene_level = T,
num_cores = 4)
cpo <- compute_p_values(cpo) # 6 seconds
cpo <- estimate_changepoint(cpo) # 4 seconds
cpo <- select_shape(cpo) # 5 seconds
We can look at a summary of the fitted cpam object
cpo
## cpam object
## -----------
## case-only time series
## 30 samples
## 6 time points
## Counts aggregated for gene-level inference
Although it cannot be launched here within the online tutorial, you
can launch the Shiny app on your own computer to visualise the results
interactively using visualise(cpo)
.
The results of the analysis are summarised using the
results
function.
The generated results can be filtered by specifying minimum counts, minimum log-fold changes, and maximum p-values. For example, to return only the transcripts with a log-fold change greater than 2, at least 50 counts, and a p-value less than 0.01, we can run
A single gene can be plotted using the plot_cpam
function. Here we plot the gene AT1G64140
The subtitle shows
(1,tp)
indicating a changepoint at first
time point (i.e., no changepoint) and a convex (‘cv’) shaped trend.
Let’s look for a gene that has a more complex trend. Unconstrained
shapes in cpam
are denoted by ‘tp’ (for thinplate spline).
We can filter the results for genes with unconstrained shapes and plot
one of them.
We plot the first gene in the list.
This selection of ‘tp’ suggests that the trend for this gene does not
conform to one of the simpler shape types that
cpam
uses.
We can exclude ‘tp’ as an option and force cpam
to choose
among the simpler forms by setting shape_type = "shape2"
in
the plot_cpam
function ("shape1"
, the default,
allows the ‘tp’). For example:
Here a monotonic increasing concave shape (‘micv’) is chosen, and we can
see this trend deviates from the data substantially more that the
unconstrained shape. See the manuscript for more details on the shape
types available in
cpam
.
Next we look for a gene with a changepoint.
Again, we plot the first gene in the list.
The results function can be used to generate clusters according to
selected filters. The plot_cluster
function can then be
used to visualise the clusters. With such a small simulated data set, we
don’t have many genes in each cluster, but we can try a few different
clustering options to get an idea of how the function works.
res <- results(cpo)
#plot_cluster(cpo, res, changepoints = 2, shapes = c("micv","cv","ilin","cx","mdcx","dlin"))
plot_cluster(cpo, res, changepoints = 1, shapes = c("cv"))
There are 19 genes with a concave shape and a changepoint at the first
time point.
More than one shape or changepoint can be provided. For example:
There are just four genes with decreasing linear or monotonic decreasing
convex shapes which have a changepoint at the second time point.
Clustering can be further refined based on, for example, the rate at which the above transcripts attain their maximum values. We illustrate advanced refinements such as this case study.
This is just a simple example to get you started. The package has
many more features and options. Check out the two case studies to see
how cpam
can be used to analyse real-world data:
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Pop!_OS 22.04 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-openmp/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Hobart
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.5.1 stringr_1.5.1 tidyr_1.3.1 readr_2.1.5 dplyr_1.1.4 cpam_0.0.0.9000
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.36.0 gtable_0.3.6 xfun_0.50 bslib_0.9.0
## [5] Biobase_2.66.0 lattice_0.22-6 tzdb_0.4.0 vctrs_0.6.5
## [9] tools_4.4.1 generics_0.1.3 stats4_4.4.1 parallel_4.4.1
## [13] pbmcapply_1.5.1 tibble_3.2.1 pkgconfig_2.0.3 Matrix_1.7-2
## [17] RColorBrewer_1.1-3 S4Vectors_0.44.0 scam_1.2-18 lifecycle_1.0.4
## [21] GenomeInfoDbData_1.2.13 farver_2.1.2 compiler_4.4.1 statmod_1.5.0
## [25] munsell_0.5.1 DESeq2_1.46.0 codetools_0.2-20 GenomeInfoDb_1.42.3
## [29] htmltools_0.5.8.1 sass_0.4.9 yaml_2.3.10 pillar_1.10.1
## [33] crayon_1.5.3 jquerylib_0.1.4 BiocParallel_1.40.0 limma_3.62.2
## [37] DelayedArray_0.32.0 cachem_1.1.0 abind_1.4-8 nlme_3.1-167
## [41] aggregation_1.0.1 locfit_1.5-9.11 tidyselect_1.2.1 digest_0.6.37
## [45] stringi_1.8.4 purrr_1.0.4 labeling_0.4.3 splines_4.4.1
## [49] fastmap_1.2.0 grid_4.4.1 SparseArray_1.6.1 colorspace_2.1-1
## [53] cli_3.6.4 magrittr_2.0.3 S4Arrays_1.6.0 edgeR_4.4.2
## [57] withr_3.0.2 scales_1.3.0 UCSC.utils_1.2.0 rmarkdown_2.29
## [61] XVector_0.46.0 httr_1.4.7 matrixStats_1.5.0 hms_1.1.3
## [65] evaluate_1.0.3 knitr_1.49 GenomicRanges_1.58.0 IRanges_2.40.1
## [69] mgcv_1.9-1 rlang_1.1.5 Rcpp_1.0.14 glue_1.8.0
## [73] BiocGenerics_0.52.0 rstudioapi_0.17.1 jsonlite_1.8.9 R6_2.6.1
## [77] MatrixGenerics_1.18.1 zlibbioc_1.52.0