Abstract
Although both copy number variations (CNVs) and single nucleotide variations (SNVs) detected by scRNA-seq are used to study intratumor heterogeneity and detect clonal groups, a software to integrate two types of data in the same cells is unavailable. We have developed CAISC, an R package for scRNA-seq data analysis that clusters single cells into distinct subclones through integrating CNV and SNV genotype matrices with an entropy weighted approach. The CAISC package presents a powerful application for integration of CNV and SNV data from scRNA-seq to identify clonal clusters, with elevated accuracy compared to using one type of data. It also allows users to interactively examine clonal assignments.Install all packages in the latest version of R. The package works for R >=4.1.
If you have any questions or problems, please feel free to open a new issue here. You can also email the maintainers of the package – the contact information is below.
Shouguo Gao (shouguo dot gao at nih dot gov)
NHLBI Hematopoiesis and Bone Marrow Failure Lab, Bethesda, MD
Jeerthi Kannan (jeerthi 98 at gmail dot com)
NHLBI Hematopoiesis and Bone Marrow Failure Lab, Bethesda, MD
Liza Mathews (liza mathews 5 at gmail dot com)
NHLBI Hematopoiesis and Bone Marrow Failure Lab, Bethesda, MD
Figure 1. A flowchart outlining the procedures for CAISC.
Create a matrix of SNV mutations from a VCF file of mutations that appear in a minimal number of cells.
Then, create a matrix of CNV mutations from a raw counts matrix containing assigned read counts, a sample annotation file, and a gene ordering file. Run only on a HPC cluster or a computer with at least 8 CPUs per task, 10GB of RAM, and 10GB of local scratch space
Next, create an integrated matrix with the created SNV and CNV matrices as input.
#load snv and cnv matrices
snv_matrix <- as.matrix(read.csv("snvDistanceMatrix.csv", row.names = 1, header = TRUE))
cnv_matrix <- as.matrix(read.csv("CNVdistMatrix.zscore.csv", row.names = 1, header = TRUE))
# create and save integrated matrix
integrated_matrix <- createIntegratedMatrix(cnv_matrix, snv_matrix)
write.csv(integrated_matrix, file="integratedMatrix.csv", quote = FALSE)
Create an interactive complex heatmap from the SNV, CNV, integrated matrices, infercnv object, and the same gene ordering file used to create the CNV matrix.
# create interactive complex heatmap
heatmap <- createComplexHeatmap(snv_matrix, cnv_matrix, integrated_matrix, "GSE57872.RData", genePath)
Use Shiny web app to interactively examine the complex heatmap.
The following is a static heatmap, but the interactive heatmap can be opened in the Shiny web app.
Create tanglegram of SNV and CNV clustering results, and calculate cophenetic correlation
# find cells in both snv and cnv matrices
snv_matrix<-(snv_matrix-mean(snv_matrix))/sd(snv_matrix)
overlapCells<-intersect(row.names(snv_matrix), rownames(cnv_matrix))
# find clusters in snv matrix
clust1=hclust(as.dist(snv_matrix[overlapCells,overlapCells]),method='ward.D')
dend1=as.dendrogram(clust1)
# find clusters in cnv matrix
clust2=hclust(as.dist(cnv_matrix[overlapCells,overlapCells]),method='ward.D')
dend2=as.dendrogram(clust2)
# create tanglegram and find cophenetic correlation
tanglegram(dend1, dend2, margin_inner = 10)
sessionInfo()
#> R Under development (unstable) (2021-02-27 r80043)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dendextend_1.14.0 factoextra_1.0.7
#> [3] cluster_2.1.1 forcats_0.5.1
#> [5] stringr_1.4.0 dplyr_1.0.5
#> [7] purrr_0.3.4 readr_1.4.0
#> [9] tidyr_1.1.3 tibble_3.0.6
#> [11] ggplot2_3.3.3 tidyverse_1.3.0
#> [13] InteractiveComplexHeatmap_0.99.12 leidenAlg_0.1.1
#> [15] Matrix_1.3-2 leiden_0.3.7
#> [17] igraph_1.2.6 CAISC_1.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] colorspace_2.0-0 grr_0.9.5 rjson_0.2.20
#> [4] ellipsis_0.3.1 circlize_0.4.12 GlobalOptions_0.1.2
#> [7] fs_1.5.0 clue_0.3-58 rstudioapi_0.13
#> [10] ggrepel_0.9.1 fansi_0.4.2 lubridate_1.7.10
#> [13] xml2_1.3.2 codetools_0.2-18 doParallel_1.0.16
#> [16] knitr_1.31 jsonlite_1.7.2 Cairo_1.5-12.1
#> [19] broom_0.7.5 dbplyr_2.1.0 png_0.1-7
#> [22] shiny_1.6.0 compiler_4.1.0 httr_1.4.2
#> [25] backports_1.2.1 assertthat_0.2.1 fastmap_1.1.0
#> [28] cli_2.3.1 later_1.1.0.1 htmltools_0.5.1.1
#> [31] tools_4.1.0 gtable_0.3.0 glue_1.4.2
#> [34] rappdirs_0.3.3 Rcpp_1.0.6 cellranger_1.1.0
#> [37] jquerylib_0.1.3 vctrs_0.3.6 svglite_1.2.3.1
#> [40] iterators_1.0.13 sccore_0.1.2 xfun_0.21
#> [43] rvest_1.0.0 mime_0.10 lifecycle_1.0.0
#> [46] gtools_3.8.2 scales_1.1.1 clisymbols_1.2.0
#> [49] hms_1.0.0 promises_1.2.0.1 parallel_4.1.0
#> [52] RColorBrewer_1.1-2 ComplexHeatmap_2.7.8 yaml_2.2.1
#> [55] reticulate_1.18 gridExtra_2.3 Matrix.utils_0.9.8
#> [58] gdtools_0.2.3 sass_0.3.1 stringi_1.5.3
#> [61] highr_0.8 S4Vectors_0.29.7 foreach_1.5.1
#> [64] BiocGenerics_0.37.1 shape_1.4.5 rlang_0.4.10
#> [67] pkgconfig_2.0.3 systemfonts_1.0.1 matrixStats_0.58.0
#> [70] evaluate_0.14 lattice_0.20-41 tidyselect_1.1.0
#> [73] magrittr_2.0.1 R6_2.5.0 IRanges_2.25.6
#> [76] generics_0.1.0 DBI_1.1.1 pillar_1.5.1
#> [79] haven_2.3.1 withr_2.4.1 modelr_0.1.8
#> [82] crayon_1.4.1 utf8_1.2.1 rmarkdown_2.7
#> [85] viridis_0.5.1 GetoptLong_1.0.5 grid_4.1.0
#> [88] readxl_1.3.1 reprex_2.0.0 digest_0.6.27
#> [91] webshot_0.5.2 xtable_1.8-4 httpuv_1.5.5
#> [94] stats4_4.1.0 munsell_0.5.0 viridisLite_0.3.0
#> [97] kableExtra_1.3.4 bslib_0.2.4