library(spacexr)
library(Matrix)
library(doParallel)
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
library(ggplot2)
<- system.file("extdata",'SpatialRNA/Vignette',package = 'spacexr') # directory for sample Slide-seq dataset
datadir if(!dir.exists(datadir))
dir.create(datadir)
<- 'RCTD_results'
savedir if(!dir.exists(savedir))
dir.create(savedir)
Cell type-Specific Inference of Differential Expression, or CSIDE, is part of the spacexr R package for learning cell type-specific differential expression from spatial transcriptomics data. In this Vignette, we will use CSIDE to test for differential expression in a toy cerebellum Slide-seq dataset. First, we will first use RCTD to assign cell types to a cerebellum Slide-seq dataset. We will define cell type profiles using an annotated single nucleus RNA-sequencing (snRNA-seq) cerebellum dataset. We will test for differential expression across a random explanatory variable.
First, we run RCTD on the data to annotated cell types. Please note that this follows exactly the content of the spatial transcriptomics RCTD vignette (doublet mode). Please refer to the spatial transcriptomics vignette for more explanation on the RCTD algorithm.
### Load in/preprocess your data, this might vary based on your file type
if(!file.exists(file.path(savedir,'myRCTD.rds'))) {
<- read.csv(file.path(datadir,"MappedDGEForR.csv")) # load in counts matrix
counts <- read.csv(file.path(datadir,"BeadLocationsForR.csv"))
coords rownames(counts) <- counts[,1]; counts[,1] <- NULL # Move first column to rownames
rownames(coords) <- coords$barcodes; coords$barcodes <- NULL # Move barcodes to rownames
<- colSums(counts) # In this case, total counts per pixel is nUMI
nUMI <- SpatialRNA(coords, counts, nUMI)
puck <- colnames(puck@counts) # pixels to be used (a list of barcode names).
barcodes plot_puck_continuous(puck, barcodes, puck@nUMI, ylimit = c(0,round(quantile(puck@nUMI,0.9))),
title ='plot of nUMI')
<- system.file("extdata",'Reference/Vignette',package = 'spacexr') # directory for the reference
refdir <- read.csv(file.path(refdir,"dge.csv")) # load in counts matrix
counts rownames(counts) <- counts[,1]; counts[,1] <- NULL # Move first column to rownames
<- read.csv(file.path(refdir,"meta_data.csv")) # load in meta_data (barcodes, clusters, and nUMI)
meta_data <- meta_data$cluster; names(cell_types) <- meta_data$barcode # create cell_types named list
cell_types <- as.factor(cell_types) # convert to factor data type
cell_types <- meta_data$nUMI; names(nUMI) <- meta_data$barcode # create nUMI named list
nUMI <- Reference(counts, cell_types, nUMI)
reference <- create.RCTD(puck, reference, max_cores = 2)
myRCTD <- run.RCTD(myRCTD, doublet_mode = 'doublet')
myRCTD saveRDS(myRCTD,file.path(savedir,'myRCTD.rds'))
}
Now that we have successfully run RCTD, we can create a explanatory
variable (i.e. covariate) used for predicting differential expression in
CSIDE. In this case, the variable, explanatory.variable
,
will be randomly generated, but in general one should set the
explanatory variable to biologically relevant predictors of gene
expression such as spatial position. The explanatory variable itself is
a vector of values, constrained between 0 and 1, with names matching the
pixel names of the myRCTD
object.
Here, we also artifically upregulate the expression of the Lsamp gene (in regions of high explanatory variable) to see whether CSIDE can detect this differentially expressed gene.
### Create SpatialRNA object
<- readRDS(file.path(savedir,'myRCTD.rds'))
myRCTD set.seed(12345)
<- runif(length(myRCTD@spatialRNA@nUMI))
explanatory.variable names(explanatory.variable) <- names(myRCTD@spatialRNA@nUMI) # currently random explanatory variable
print(head(explanatory.variable))
#Differentially upregulate one gene
<- 'Lsamp'
change_gene <- names(explanatory.variable[explanatory.variable > 0.5])
high_barc <- names(explanatory.variable[explanatory.variable < 0.5])
low_barc @originalSpatialRNA@counts[change_gene, high_barc] <- myRCTD@spatialRNA@counts[change_gene, high_barc] * 3
myRCTD
plot_puck_continuous(myRCTD@spatialRNA, names(explanatory.variable), explanatory.variable, ylimit = c(0,1), title ='plot of explanatory variable')
After creating the explanatory variable, we are now ready to run
CSIDE using the run.CSIDE.single
function. We will use two
cores, and a false discovery rate of 0.25. Next, we will set a gene
threshold (i.e. minimum gene expression) of 0.01, and we will set a
cell_type_threshold (minimum instances per cell type) of 10.
Warning: On this toy dataset, we have made several
choices of parameters that are not recommended for regular use. On real
datasets, we recommend first consulting the CSIDE default parameters.
This includes gene_threshold
(default 5e-5),
cell_type_threshold
(default 125), and fdr
(default 0.01). Please see ?run.CSIDE.single
for more
information on these parameters.
#de
@config$max_cores <- 2
myRCTD<- run.CSIDE.single(myRCTD, explanatory.variable, gene_threshold = .01,
myRCTD cell_type_threshold = 10, fdr = 0.25)
#> Warning in run.CSIDE.general(myRCTD, X1, X2, barcodes, cell_types,
#> cell_type_threshold = cell_type_threshold, : run.CSIDE.general: some parameters
#> are set to the CSIDE vignette values, which are intended for testing but
#> not proper execution. For more accurate results, consider using the default
#> parameters to this function.
#> run.CSIDE.general: running CSIDE with cell types 10, 18
#> run.CSIDE.general: configure params_to_test = 2,
#> filter_genes: filtering genes based on threshold = 0.01
saveRDS(myRCTD,file.path(savedir,'myRCTDde.rds'))
After running CSIDE, it is time to examine CSIDE results. For each
cell type, a results dataframe for significant genes is stored in
myRCTD@de_results$sig_gene_list
. In particular, notice the
columns Z_score
(Z-score), log_fc
(estimated
DE loge-fold-change), and p_val
(p-value). We also have the
mean and standard errors of loge expression in each of the two regions
(mean_0
, mean_1
, sd_0
, and
sd_1
). We will focus on cell type 10. Furthermore, we will
examine the original Lsamp gene, which was detected to be significantly
differentially expressed in cell type 10. CSIDE model fits for all genes
are stored in myRCTD@de_results$gene_fits
, and we
demonstrate below how to access the point estimates, standard errors,
and convergence.
#print results for cell type '18'
<- '10'
cell_type <- myRCTD@de_results$sig_gene_list[[cell_type]]
results_de print(results_de)
#> Z_score log_fc se paramindex_best conv p_val mean_0
#> Rora 2.741861 -2.853199 1.0406068 2 TRUE 0.006109228 -4.306653
#> Kcnd2 2.399268 1.280853 0.5338516 2 TRUE 0.016427889 -4.670806
#> Lsamp 1.760034 1.644275 0.9342294 2 TRUE 0.078402081 -5.228156
#> Calm2 1.676232 1.037308 0.6188334 2 TRUE 0.093692839 -5.485009
#> mean_1 sd_0 sd_1
#> Rora -7.159851 0.4153836 0.9541063
#> Kcnd2 -3.389953 0.3124028 0.4328995
#> Lsamp -3.583881 0.5354939 0.7655265
#> Calm2 -4.447701 0.3611977 0.5024849
<- change_gene
sig_gene print(paste("following results hold for", sig_gene))
#> [1] "following results hold for Lsamp"
print("check for covergence of each cell type")
#> [1] "check for covergence of each cell type"
print(myRCTD@de_results$gene_fits$con_mat[sig_gene, ])
#> 10 18
#> TRUE TRUE
print('estimated DE')
#> [1] "estimated DE"
print(myRCTD@de_results$gene_fits$mean_val[sig_gene, ])
#> 10 18
#> 1.644275 2.691020
print('standard errors for non-intercept terms')
#> [1] "standard errors for non-intercept terms"
print(myRCTD@de_results$gene_fits$s_mat[sig_gene, c(2,4)])
#> 2_2_10 2_2_18
#> 0.9342294 2.7821257
Finally, we will plot CSIDE results in the savedir
directory!
The following plot shows a spatial visualization of the Lsamp gene, which was determined to be differentially expressed.
The function make_all_de_plots
will automatically
generate several types of plots displaying the CSIDE results across all
genes and cell types.
<- readRDS(file.path(savedir,'myRCTDde.rds'))
myRCTD plot_gene_two_regions(myRCTD, sig_gene, cell_type, min_UMI = 10)
make_all_de_plots(myRCTD, savedir)