1. Installation

1.1 Install cTP-net

1.1.1 Support Python package

First, install the supporting Python package ctpnetpy. See the source code of the package here

pip install cTPnet

If there is problem with PyTorch, refer to pytorch website for more details.

1.1.2 R package

Next, open R and install the R package cTPnet

# If you do not want to update dependencies, please use the following command
devtools::install_github("zhouzilu/cTPnet",dependencies = FALSE)

1.1.3 Pretrained model

Download the pretrained model from weights.

2. Questions & issues

If you have any questions or problems when using cTPnet or ctpnetpy, please feel free to open a new issue here. You can also email the maintainers of the corresponding packages –

3. cTP-net analysis pipeline

To accurately impute surface protein abundance from scRNA-seq data, cTP-net employs two steps: (1) denoising of the scRNA-seq count matrix and (2) imputation based on the denoised data through a transcriptome-protein mapping (Figure 1). The initial denoising, by SAVERX, produces more accurate estimates of the RNA transcript relative abundances for each cell. Compared to the raw counts, the denoised relative expression values have significantly improved correlation with their cognate protein measurement.

Figure 1. (a) Overview of cTP-net analysis pipeline, which learns a mapping from the denoised scRNA-seq data to the relative abundance of surface proteins, capturing multi-gene features that reflect the cellular environment and related processes. (b) For three example proteins, cross-cell scatter and correlation of CITE-seq measured abundances vs. (1) raw RNA count, (2) SAVER-X denoised RNA level, and (3) cTP-net predicted protein abundance.

3.1 Raw counts denoising with SAVER-X

Please refer to SAVER-X package for detailed instruction. As for this vignette, we load a demo data set (17009 genes \(\times\) 2000 cells) from Bone Marrow Mononuclear Cell that has been already denoised with SAVER-X.

# Set python path and virtual environment using reticulate
# The above line has to be called right after loading reticulate library !
3.2 Immunophenotype (surface protein) imputation

3.2.1 Seurat v2 pipeline

Let’s create a seurat object demo and generate the prediction.

demo = CreateSeuratObject(demo_data)
demo = cTPnet(demo,data_type,model_file_path)
#> Start data preprocessing...
#> Start imputation. Running python ...
#> Postprocess...
#> Done!

3.2.2 Following analysis (Modified from Seurat v2.4)

# standard log-normalization
demo <- NormalizeData(demo,display.progress = FALSE)
# choose ~1k variable features
demo <- FindVariableGenes(demo, do.plot = FALSE, y.cutoff = 0.5)

# standard scaling (no regression)
demo <- ScaleData(demo, display.progress = FALSE)

# Run PCA, select 13 PCs for tSNE visualization and graph-based clustering
demo <- RunPCA(demo, pcs.print = 0)

demo <- FindClusters(demo, dims.use = 1:20, print.output = FALSE, k.param = 20, resolution = 0.8)
demo <- RunTSNE(demo, dims.use = 1:20)
TSNEPlot(demo, do.label = TRUE, pt.size = 0.5)

With cTP-net predicted surface protein, it is much easier for us to decided the cell type information.

# in this plot, predicted protein levels are on top, and RNA levels are on the
# bottom
FeaturePlot(demo, features.plot = c(
  "ctpnet_CD34", "ctpnet_CD4", "ctpnet_CD8", 
  "CD34", "CD4", "CD8A",
  "ctpnet_CD16", "ctpnet_CD11c", "ctpnet_CD19", 
  "ctpnet_CD45RA", "ctpnet_CD45RO", "ctpnet_CD27", 
     ), min.cutoff = "q25", max.cutoff = "q95",
    nCol = 3, cols.use = c("lightgrey", "blue"), pt.size = 0.5)

The cell type information can be easily determined by canonical immunophenotypes (i.e. surface protein markers).

current.cluster.ids <- 0:13
# CD4 and CD8 are markers for CD4 T cells and CD8 T cells
# CD45RA and CD45RO are markers for naive T cells and differentiated T cells
# CD19 is marker for B cells
# CD27 is marker for memory B cells
# CD16 is marker for NK cells
# CD34 is marker for developing precursor cells
# CD11c is for tradiational monocyte
new.cluster.ids <- c("Mono","naive CD4 T", "naive CD8 T", "CD4 T", "CD8 T", "Mono", "Pre.", "B", "NK", "Pre.", "memory B", "CD16+ Mono", "Unknown", "Unknown")
demo@ident <- plyr::mapvalues(x = demo@ident, from = current.cluster.ids, to = new.cluster.ids)
TSNEPlot(demo, do.label = TRUE, pt.size = 0.5)

RidgePlot(demo, features.plot = c("ctpnet_CD3", "ctpnet_CD11c", "ctpnet_CD8", "ctpnet_CD16"), 
    nCol = 2)
4. Session info

5. References

Surface protein imputation from single cell transcriptomes by deep neural networks

Zilu Zhou, Chengzhong Ye, Jingshu Wang, Nancy R. Zhang

bioRxiv 671180; doi: https://doi.org/10.1101/671180