library(spacexr)
library(Matrix)
library(doParallel)
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
library(ggplot2)
datadir <- system.file("extdata",'SpatialRNA/VisiumLymphVignette',package = 'spacexr') # directory for downsampled Visium dataset
if(!dir.exists(datadir))
  dir.create(datadir)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  cache = TRUE,
  out.width = "100%"
)

Introduction

Cell type-specific inference of differential expression, or CSIDE, is part of the spacexr R package for learning cell type-specific differential expression from spatial transcriptomics data. In this Vignette, we will use CSIDE to test for genes that have cell type-specific differential expression in B cell-rich vs. B cell-poor regions in publicly available Visium data, located here. First, we will first use RCTD to assign cell types to a human lymph node data set. We will define cell type profiles using an annotated single cell RNA-sequencing (scRNA-seq) lymph node dataset.

NOTE: for the purposes of this vignette, the original data as well as the reference data have been greatly downsampled to provide a small test example.

Data Preprocessing and running RCTD

First, we create the RCTD object and run RCTD on the data to annotate cell types in full mode.

refdir <- system.file("extdata",'Reference/Visium_Lymph_Ref',package = 'spacexr') 
ref <- readRDS(file.path(refdir, 'ref_downsampled.rds'))
coords <- readRDS(file.path(datadir, 'coords.rds'))
expr <- readRDS(file.path(datadir, 'expr.rds'))
srna <- SpatialRNA(coords, expr)
myRCTD <- create.RCTD(srna, ref, max_cores=1)
#> Begin: process_cell_type_info
#> process_cell_type_info: number of cells in reference: 330
#> process_cell_type_info: number of genes in reference: 10237
#> 
#>           B    B_Plasma          DC        Endo         ILC Macrophages 
#>          30          30          30          30          30          30 
#>   Monocytes          NK       T_CD4       T_CD8        VSMC 
#>          30          30          30          30          30
#> End: process_cell_type_info
#> create.RCTD: getting regression differentially expressed genes:
#> get_de_genes: B found DE genes: 6
#> get_de_genes: B_Plasma found DE genes: 4
#> get_de_genes: DC found DE genes: 2
#> get_de_genes: Endo found DE genes: 14
#> get_de_genes: ILC found DE genes: 4
#> get_de_genes: Macrophages found DE genes: 12
#> get_de_genes: Monocytes found DE genes: 8
#> get_de_genes: NK found DE genes: 5
#> get_de_genes: T_CD4 found DE genes: 4
#> get_de_genes: T_CD8 found DE genes: 4
#> get_de_genes: VSMC found DE genes: 12
#> get_de_genes: total DE genes: 58
#> create.RCTD: getting platform effect normalization differentially expressed genes:
#> get_de_genes: B found DE genes: 18
#> get_de_genes: B_Plasma found DE genes: 9
#> get_de_genes: DC found DE genes: 11
#> get_de_genes: Endo found DE genes: 37
#> get_de_genes: ILC found DE genes: 18
#> get_de_genes: Macrophages found DE genes: 30
#> get_de_genes: Monocytes found DE genes: 18
#> get_de_genes: NK found DE genes: 21
#> get_de_genes: T_CD4 found DE genes: 9
#> get_de_genes: T_CD8 found DE genes: 17
#> get_de_genes: VSMC found DE genes: 31
#> get_de_genes: total DE genes: 146
myRCTD <- run.RCTD(myRCTD, doublet_mode='full')
#> fitBulk: decomposing bulk
#> chooseSigma: using initial Q_mat with sigma =  1
#> Likelihood value: 102935.601280532
#> Sigma value:  0.84
#> Likelihood value: 99601.360299997
#> Sigma value:  0.69
#> Likelihood value: 96546.5000174303
#> Sigma value:  0.61
#> Likelihood value: 94997.7767346109
#> Sigma value:  0.53
#> Likelihood value: 93546.6087101471
#> Sigma value:  0.45
#> Likelihood value: 92235.63947342
#> Sigma value:  0.37
#> Likelihood value: 91115.3377090666
#> Sigma value:  0.29
#> Likelihood value: 90239.2544273687
#> Sigma value:  0.21

Running CSIDE

The first step is to create our explanatory variable of interest. For example, we might be interested in finding genes that have differential expression in one cell type in the presence of another cell type, e.g. B cells. We can address this by creating our explanatory variable from the B cell weights from the RCTD object created above. In this case, we create a binary variable indicating B cell enrichment, using the median weight as the cutoff. We use the plot_puck_continuous function to visualize our variable on the spatial data.

B_weight <- myRCTD@results$weights[,'B']
B_med <- median(B_weight)
B_weight[B_weight>=B_med] <- 1
B_weight[B_weight<B_med] <- 0
plot_puck_continuous(myRCTD@spatialRNA, rownames(myRCTD@spatialRNA@coords), B_weight)

We are now ready to run CSIDE using the run.CSIDE.single function. We will use 1 core, and a false discovery rate of 0.25. Next, we will set a gene threshold (i.e. minimum gene expression) of 0.01, and we will set a cell_type_threshold (minimum instances per cell type) of 10.

Warning: On this toy dataset, we have made several choices of parameters that are not recommended for regular use. On real datasets, we recommend first consulting the CSIDE default parameters. This includes gene_threshold (default 5e-5), cell_type_threshold (default 125), fdr (default 0.01), and weight_threshold (default NULL).

#de
myRCTD@config$max_cores <- 1
cell_types <- c('Macrophages','T_CD4')
myRCTD <- run.CSIDE.single(myRCTD, B_weight, cell_types = cell_types, gene_threshold = .01, cell_type_threshold = 10, fdr = 0.25, doublet_mode = F, weight_threshold = 0.1)
#> choose_cell_types: running CSIDE with cell types Macrophages, T_CD4
#> run.CSIDE.general: configure params_to_test = 2
#> 1
#> 2
#> 3
#> 4
#> 5
#> 6
#> 7
#> 8
#> 9
#> 10

CSIDE results

The results of the run can be found in myRCTD@de_results$sig_gene_list. Again, note that these lists may be inaccurate since all steps of this analysis have been run with downsampled data.

print(myRCTD@de_results$sig_gene_list)
#> $Macrophages
#>        Z_score    log_fc         se paramindex_best conv        p_val
#> HMGN2 10.92224 0.7006541 0.06414932               2 TRUE 9.024486e-28
#> CFL1   7.74698 0.4274667 0.05517849               2 TRUE 9.410382e-15
#> 
#> $T_CD4
#>           Z_score    log_fc         se paramindex_best conv        p_val
#> PFN1     6.418563 0.4591616 0.07153650               2 TRUE 1.375664e-10
#> RPL15    6.366316 0.4439999 0.06974204               2 TRUE 1.936227e-10
#> CFL1     6.107601 0.4801002 0.07860701               2 TRUE 1.011397e-09
#> HMGN2    5.920119 0.5363842 0.09060362               2 TRUE 3.217088e-09
#> HSP90AB1 5.231480 0.4259819 0.08142665               2 TRUE 1.681580e-07