library(spacexr)
library(Matrix)
library(doParallel)
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
library(ggplot2)
datadir <- system.file("extdata",'SpatialRNA/Vignette',package = 'spacexr')# directory for sample Slide-seq dataset
datadir2 <- system.file("extdata",'SpatialRNA/Vignette_rep2',package = 'spacexr')
datadir3 <- system.file("extdata",'SpatialRNA/Vignette_rep3',package = 'spacexr')
datadir4 <- system.file("extdata",'SpatialRNA/Vignette_rep4',package = 'spacexr')
if(!dir.exists(datadir))
dir.create(datadir)
savedir <- 'RCTD_results'
if(!dir.exists(savedir))
dir.create(savedir)
Cell type-Specific Inference of Differential Expression, or CSIDE, is part of the spacexr R package for learning cell type-specific differential expression from spatial transcriptomics data. In this Vignette, we will demonstrate running RCTD and CSIDE on multiple replicates (slices) of cerebellum Slide-seq toy data.
The ultimate goal will be to test for differential expression across two regions in the cerbellum. First, we will first use RCTD to assign cell types to a cerebellum Slide-seq dataset. We will define cell type profiles using an annotated single nucleus RNA-sequencing (snRNA-seq) cerebellum dataset. We will test for differential expression between the nodulus and anterior lobe regions.
First, we run RCTD on the data to annotated cell types (doublet
mode). Please refer to the spatial
transcriptomics vignette for more explanation on the RCTD algorithm,
such as precise details for loading in the SpatialRNA
object for this dataset. Since we need to load in the
SpatialRNA
object for four replicates, we will apply the
function read.SpatialRNA
to the directories of each of the
four datasets to generate puck1
, puck2
,
puck3
, and puck4
. Please note that in general,
we recommend applying to each replicate a custom function that can load
in a single replicate.
### Load in/preprocess your data, this might vary based on your file type
puck1 <- read.SpatialRNA(datadir, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv')
puck2 <- read.SpatialRNA(datadir2, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv')
puck3 <- read.SpatialRNA(datadir3, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv')
puck4 <- read.SpatialRNA(datadir4, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv')
In order to run CSIDE, we will for each replicate create a
explanatory variable (i.e. covariate) used for predicting differential
expression in CSIDE. In general one should set the explanatory variable
to biologically relevant predictors of gene expression such as spatial
position. In this case, we will use the explanatory variable two divide
each cerebellum replicate into two regions (left and right). In general,
one can also create a custom function that will compute the explanatory
variable on each SpatialRNA
replicate. We will pass in a
list of the explanatory variables to the
run.CSIDE.replicates
function.
Here, we also artifically upregulate the expression of the Kcnip4 gene in each dataset (in regions of high explanatory variable) to see whether CSIDE can detect this differentially expressed gene.
explanatory.variable1 <- as.integer(puck1@coords$x > 3000)
names(explanatory.variable1) <- rownames(puck1@coords)
explanatory.variable2 <- as.integer(puck2@coords$x > 3000)
names(explanatory.variable2) <- rownames(puck2@coords)
explanatory.variable3 <- as.integer(puck3@coords$x > 3000)
names(explanatory.variable3) <- rownames(puck3@coords)
explanatory.variable4 <- as.integer(puck4@coords$x > 3000)
names(explanatory.variable4) <- rownames(puck4@coords)
exvar_list <- list(explanatory.variable1, explanatory.variable2, explanatory.variable3, explanatory.variable4)
# Differentially upregulate one gene
change_gene <- 'Kcnip4'
high_barc1 <- names(explanatory.variable1[explanatory.variable1 > 0.5])
puck1@counts[change_gene, high_barc1] <- puck1@counts[change_gene, high_barc1] * 2
high_barc2 <- names(explanatory.variable2[explanatory.variable2 > 0.5])
puck2@counts[change_gene, high_barc2] <- puck1@counts[change_gene, high_barc2] * 2
high_barc3 <- names(explanatory.variable3[explanatory.variable3 > 0.5])
puck3@counts[change_gene, high_barc3] <- puck3@counts[change_gene, high_barc3] * 2
high_barc4 <- names(explanatory.variable4[explanatory.variable4 > 0.5])
puck4@counts[change_gene, high_barc4] <- puck4@counts[change_gene, high_barc4] * 2
We also load in the Reference
object, as follows.
refdir <- system.file("extdata",'Reference/Vignette',package = 'spacexr') # directory for the reference
counts <- read.csv(file.path(refdir,"dge.csv")) # load in counts matrix
rownames(counts) <- counts[,1]; counts[,1] <- NULL # Move first column to rownames
meta_data <- read.csv(file.path(refdir,"meta_data.csv")) # load in meta_data (barcodes, clusters, and nUMI)
cell_types <- meta_data$cluster; names(cell_types) <- meta_data$barcode # create cell_types named list
cell_types <- as.factor(cell_types) # convert to factor data type
nUMI <- meta_data$nUMI; names(nUMI) <- meta_data$barcode # create nUMI named list
reference <- Reference(counts, cell_types, nUMI)
Now, we are in a position to create a RCTD.replicates
object, an object that can store multiple SpatialRNA
objects allowing for joint processing with RCTD and CSIDE. We pass in
spatial.replicates
, a list of the four
SpatialRNA
objects, replicate_names
, a list of
names for each replicate, and group_ids
, a list of the
group number for each replicate. In this case, all replicates will come
from the same group. However, we have the option to assign different
replicates to different groups, which will allow for extra variation to
be modeled across groups.
spatial.replicates <- list(puck1, puck2, puck3, puck4)
replicate_names <- c('1','2','3','4')
group_ids <- c(1,1,1,1)
myRCTD.reps <- create.RCTD.replicates(spatial.replicates, reference, replicate_names, group_ids = group_ids, max_cores = 2)
#>
#> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
#> 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
We are now ready to run RCTD with the
run.RCTD.replicates
function to assign cell types to these
replicates. This function runs RCTD on each replicate in the
RCTD.replicate
object.
myRCTD.reps <- run.RCTD.replicates(myRCTD.reps)
#> run.RCTD.replicates: running RCTD for replicate 1
#> fitBulk: decomposing bulk
#> chooseSigma: using initial Q_mat with sigma = 1
#> Likelihood value: 1972.07376580346
#> Sigma value: 0.84
#> Likelihood value: 1966.20345908295
#> Sigma value: 0.84
#> run.RCTD.replicates: running RCTD for replicate 2
#> fitBulk: decomposing bulk
#> chooseSigma: using initial Q_mat with sigma = 1
#> Likelihood value: 1974.77542702319
#> Sigma value: 0.84
#> Likelihood value: 1969.28192633095
#> Sigma value: 0.84
#> run.RCTD.replicates: running RCTD for replicate 3
#> fitBulk: decomposing bulk
#> chooseSigma: using initial Q_mat with sigma = 1
#> Likelihood value: 1973.42952348242
#> Sigma value: 0.84
#> Likelihood value: 1967.73067452082
#> Sigma value: 0.84
#> run.RCTD.replicates: running RCTD for replicate 4
#> fitBulk: decomposing bulk
#> chooseSigma: using initial Q_mat with sigma = 1
#> Likelihood value: 1972.59047252235
#> Sigma value: 0.84
#> Likelihood value: 1966.71685085192
#> Sigma value: 0.84
After creating the explanatory variables and running RCTD, we are now
ready to run CSIDE using the run.CSIDE.replicates
function.
Here, we pass in exvar_list
, the list of explanatory
variables for each replicate. We will use cell types 10 and 18 and a
false discovery rate of 0.25. Next, we will set a gene threshold
(i.e. minimum gene expression) of 0.01, and we will set a
cell_type_threshold (minimum instances per cell type) of 3.
Warning: On this toy dataset, we have made several
choices of parameters that are not recommended for regular use. On real
datasets, we recommend first consulting the CSIDE default parameters.
This includes gene_threshold
(default 5e-5),
cell_type_threshold
(default 125), and fdr
(default 0.01). Please see ?run.CSIDE.replicates
for more
information on these parameters.
#de
cell_types <- c('10','18', '6')
myRCTD.reps <- run.CSIDE.replicates(myRCTD.reps, cell_types, exvar_list, gene_threshold = .01, cell_type_threshold = 3, fdr = 0.25, weight_threshold = 0.8)
#> run.CSIDE.replicates: running CSIDE for replicate 1
#> Warning in run.CSIDE.general(myRCTD, X1, X2, barcodes, cell_types,
#> cell_type_threshold = cell_type_threshold, : run.CSIDE.general: some parameters
#> are set to the CSIDE vignette values, which are intended for testing but
#> not proper execution. For more accurate results, consider using the default
#> parameters to this function.
#> run.CSIDE.general: running CSIDE with cell types 10, 18, 6
#> run.CSIDE.general: configure params_to_test = 2,
#> filter_genes: filtering genes based on threshold = 0.01
#> set_global_Q_all: begin
#> set_global_Q_all: finished
#> run.CSIDE.replicates: running CSIDE for replicate 2
#> Warning in run.CSIDE.general(myRCTD, X1, X2, barcodes, cell_types,
#> cell_type_threshold = cell_type_threshold, : run.CSIDE.general: some parameters
#> are set to the CSIDE vignette values, which are intended for testing but
#> not proper execution. For more accurate results, consider using the default
#> parameters to this function.
#> run.CSIDE.general: running CSIDE with cell types 10, 18, 6
#> run.CSIDE.general: configure params_to_test = 2,
#> filter_genes: filtering genes based on threshold = 0.01
#> set_global_Q_all: begin
#> set_global_Q_all: finished
#> run.CSIDE.replicates: running CSIDE for replicate 3
#> Warning in run.CSIDE.general(myRCTD, X1, X2, barcodes, cell_types,
#> cell_type_threshold = cell_type_threshold, : run.CSIDE.general: some parameters
#> are set to the CSIDE vignette values, which are intended for testing but
#> not proper execution. For more accurate results, consider using the default
#> parameters to this function.
#> run.CSIDE.general: running CSIDE with cell types 10, 18, 6
#> run.CSIDE.general: configure params_to_test = 2,
#> filter_genes: filtering genes based on threshold = 0.01
#> set_global_Q_all: begin
#> set_global_Q_all: finished
#> run.CSIDE.replicates: running CSIDE for replicate 4
#> Warning in run.CSIDE.general(myRCTD, X1, X2, barcodes, cell_types,
#> cell_type_threshold = cell_type_threshold, : run.CSIDE.general: some parameters
#> are set to the CSIDE vignette values, which are intended for testing but
#> not proper execution. For more accurate results, consider using the default
#> parameters to this function.
#> run.CSIDE.general: running CSIDE with cell types 10, 18, 6
#> run.CSIDE.general: configure params_to_test = 2,
#> filter_genes: filtering genes based on threshold = 0.01
#> set_global_Q_all: begin
#> set_global_Q_all: finished
saveRDS(myRCTD.reps,file.path(savedir,'myRCTDde_reps.rds'))
Note that the RCTD and CSIDE results on individual replicates are
stored in myRCTD.reps@RCTD.reps
, which is a list of
RCTD
objects containing individual CSIDE results as
discussed in other vignettes.
After running CSIDE on each replicate, the next step is to compute
population-level differential expression across all samples using
CSIDE.population.inference
. There are two ways to run this
function, and we will explore both.
The default mode for CSIDE.population.inference
is to
aggregate differential expression estimates across samples to form a
population “average” of differential expression:
myRCTD.reps <- CSIDE.population.inference(myRCTD.reps, fdr = 0.25)
#> CSIDE.population.inference: running population DE inference with use.groups=FALSE, and meta = FALSE
#> [1] "one_ct_genes: population inference on cell type, 10"
#> [1] "done"
#> [1] "one_ct_genes: population inference on cell type, 18"
#> [1] "done"
#> [1] "one_ct_genes: population inference on cell type, 6"
#> [1] "done"
We will focus on cell type 10. Furthermore, we will examine the
original Kcnip4 gene, which was detected to be significantly
differentially expressed in cell type 10. Other genes, including
nonsignificant genes, can be observed in
myRCTD.reps@population_de_results
. As shown below, DE
results on significant genes are stored in
population_sig_gene_df
. In particular, notice the columns
Z_est
(Z-score), log_fc_est
(estimated DE
coefficient), sd_est
(standard error of DE coefficient),
p
(p-value), and q_val
(q-value). Other
columns of interest include tau
, the estimated variance
across samples, and mean i
and sd i
, which
represent the DE point estimate and standard error on each of the four
replicates.
#print results for cell type '18'
cell_type <- '10'
sig_gene <- change_gene
print(paste("following results hold for", sig_gene))
#> [1] "following results hold for Kcnip4"
print(myRCTD.reps@population_sig_gene_df[[cell_type]])
#> tau log_fc_est sd_est Z_est p_cross ct_prop expr
#> Kcnip4 0.00000000 1.0554374 0.2758673 3.825888 0.7345539 1 0.019770883
#> Rora 0.42836080 0.6876000 0.3963400 1.734874 0.1014808 1 0.002325036
#> Kcnd2 0.09014169 -0.4050107 0.1781208 -2.273799 0.3290451 1 0.014091530
#> q_val p mean 1 mean 2 mean 3 mean 4
#> Kcnip4 0.000912112 0.0001303017 0.8401400 1.5089602 1.25066697 0.7088606
#> Rora 0.144835438 0.0827631075 1.0187850 2.4330355 0.54854271 -0.1151301
#> Kcnd2 0.080423340 0.0229780973 -0.2303601 -0.8156025 0.03587612 -0.5407490
#> sd 1 sd 2 sd 3 sd 4
#> Kcnip4 0.5837606 0.6737911 0.4744952 0.5298941
#> Rora 0.6652035 1.0544742 0.5611421 0.5648168
#> Kcnd2 0.3620835 0.3392899 0.3558457 0.3247958
Finally, we will save population CSIDE results in the
savedir
directory:
if(!dir.exists(file.path(savedir,'population')))
dir.create(file.path(savedir,'population'))
save.CSIDE.replicates(myRCTD.reps, file.path(savedir,'population'))
The second way to run CSIDE population inference is meta regression,
which is more flexible than aggregating across samples. For example, one
may control for covariates such as age and gender, as well as test
across covariates such as timepoints or case vs. control. In this
example, we will assume that the four samples came from two cases and
two controls. Meta regression works by including a design matrix,
meta.design.matrix
, of named covariates across samples. In
CSIDE.population.inference
, we set the option
meta = TRUE
, and meta.test_var = 'treat'
, the
variable we would like to test.
meta.design.matrix <- data.frame('treat' = c(0,1,0,1))
print(meta.design.matrix)
#> treat
#> 1 0
#> 2 1
#> 3 0
#> 4 1
myRCTD.reps_meta <- CSIDE.population.inference(myRCTD.reps, fdr = 0.25,
meta = TRUE,
meta.design.matrix = meta.design.matrix,
meta.test_var = 'treat')
#> CSIDE.population.inference: running population DE inference with use.groups=FALSE, and meta = TRUE
#> [1] "one_ct_genes: population inference on cell type, 10"
#> [1] "done"
#> [1] "one_ct_genes: population inference on cell type, 18"
#> [1] "done"
#> [1] "one_ct_genes: population inference on cell type, 6"
#> [1] "done"
head(myRCTD.reps_meta@population_de_results$`10`)
#> tau log_fc_est sd_est Z_est p_cross ct_prop
#> Snap25 0.000000 0.51359922 0.3103883 1.65469895 0.82539758 1.000000
#> Kcnip4 0.000000 -0.07273497 0.5559337 -0.13083389 0.55737760 1.000000
#> Kcnd2 0.000000 -0.57725078 0.3456309 -1.67013660 0.73438010 1.000000
#> Rora 0.884643 0.05103254 1.1293624 0.04518704 0.08938231 1.000000
#> Lsamp 0.000000 0.50366348 0.4924901 1.02268751 0.97895051 1.000000
#> Calm2 1.199270 -1.09960186 1.3023646 -0.84431184 0.00118826 0.859978
#> expr q_val p
#> Snap25 0.0016977534 0.3429495 0.09798558
#> Kcnip4 0.0197708833 0.9639582 0.89590671
#> Kcnd2 0.0140915301 0.3429495 0.09489234
#> Rora 0.0023250360 0.9639582 0.96395822
#> Lsamp 0.0065846141 0.6973665 0.30645562
#> Calm2 0.0005844512 0.6973665 0.39849517
Compared to the default analysis, this population-level analysis is measuring something different, the effect of treatment on differential expression. For example, the lowest p-value gene (yet not < 0.05), Snap25, has an estimated effect of treatment of 0.56. Notice that Kcnip4 is no longer estimated to have a large effect, because the question of difference between cases and controls is fundamentally different from estimating the average across samples.
Finally, we will save population CSIDE results in the
savedir
directory:
if(!dir.exists(file.path(savedir,'population_meta')))
dir.create(file.path(savedir,'population_meta'))
save.CSIDE.replicates(myRCTD.reps_meta, file.path(savedir,'population_meta'))