This vignette demonstrates how MOFA can be used to integrate scRNA-seq and scATAC-seq data from the Chromium Single Cell Multiome ATAC + Gene Expression assay recently commercialised by 10x Genomics. This vignette results from a collaboration between the 10x Genomics R&D team and the MOFA team. The data set consists of the conventional Peripheral Blood Mononuclear Cells (PBMC) from a single healthy donor, which is available here.
MOFA is a factor analysis model that provides a general framework for the integration of multi-omic data sets in an unsupervised fashion. Intuitively, it can be viewed as a versatile and statistically rigorous generalisation of principal component analysis (PCA) to multi-omics data. Briefly, the model performs unsupervised dimensionality reduction simultaneously across multiple data modalities, thereby capturing the global sources of cell-to-cell variability via a small number of inferred factors. Importantly, it distinguishes variation that is shared between assays from variation that is unique to a specific assay. Thus, in this data set MOFA can be useful to disentangle the RNA and the ATAC activity of the different cellular populations that exist in PBMCs.
Make sure that MOFA2
is imported last, to avoid collisions with functions from other packages
library(data.table)
library(ggplot2)
library(Seurat)
library(Signac)
# for GSEA analysis
library(msigdbr)
# For motif enrichment analysis
library(JASPAR2020)
library(TFBSTools)
library(BSgenome.Hsapiens.UCSC.hg38)
# MOFA
library(MOFA2)
We have created a Seurat object with RNA and ATAC data modalities stored as different assays. Here is the script to create the Seurat object from the CellRanger output
# run this to download the data set
# seurat <- readRDS(url("ftp://ftp.ebi.ac.uk/pub/databases/mofa/10x_rna_atac_vignette/seurat.rds"))
seurat <- readRDS("/Users/ricard/data/10x_rna_atac/PBMC/seurat.rds")
seurat
## An object of class Seurat
## 138109 features across 11909 samples within 2 assays
## Active assay: RNA (29732 features, 0 variable features)
## 1 other assay present: ATAC
The metadata slot contains the cell type annotations that have been done a priori by the 10x Genomics R&D team. This will be useful to characterise the MOFA factors. One could employ the MOFA factors to perform clustering and cell type annotation, but we will demonstrate this in a future vignette.
head(seurat@meta.data[,c("celltype","broad_celltype","pass_rnaQC","pass_accQC")])
## celltype broad_celltype pass_rnaQC pass_accQC
## AAACAGCCAAGGAATC naive CD4 T cells Lymphoid TRUE TRUE
## AAACAGCCAATCCCTT memory CD4 T cells Lymphoid TRUE TRUE
## AAACAGCCAATGCGCT naive CD4 T cells Lymphoid TRUE TRUE
## AAACAGCCACACTAAT <NA> <NA> FALSE FALSE
## AAACAGCCACCAACCG <NA> <NA> FALSE FALSE
## AAACAGCCAGGATAAC <NA> <NA> FALSE FALSE
table(seurat@meta.data$celltype)
##
## CD56 (bright) NK cells CD56 (dim) NK cells classical monocytes
## 407 472 1929
## effector CD8 T cells intermediate monocytes MAIT T cells
## 385 664 106
## memory B cells memory CD4 T cells myeloid DC
## 420 1611 242
## naive B cells naive CD4 T cells naive CD8 T cells
## 295 1462 1549
## non-classical monocytes plasmacytoid DC
## 383 107
table(seurat@meta.data$broad_celltype)
##
## Lymphoid Myeloid
## 6814 3218
Keep cells that pass QC for both omics
seurat <- seurat %>%
.[,seurat@meta.data$pass_accQC==TRUE & seurat@meta.data$pass_rnaQC==TRUE]
seurat
## An object of class Seurat
## 138109 features across 10032 samples within 2 assays
## Active assay: RNA (29732 features, 0 variable features)
## 1 other assay present: ATAC
The RNA expression consists of 29,732 genes and 10,032 cells
seurat@assays[["RNA"]]
## Assay data with 29732 features for 10032 cells
## First 10 features:
## AL627309.1, AL627309.2, AL627309.5, AL627309.4, AP006222.2, AL669831.2,
## LINC01409, FAM87B, LINC01128, LINC00115
The ATAC expression consists of 108,377 peaks and 10,032 cells
seurat@assays[["ATAC"]]
## Assay data with 108377 features for 10032 cells
## First 10 features:
## chr1:10109-10357, chr1:180730-181630, chr1:191491-191736,
## chr1:267816-268196, chr1:586028-586373, chr1:629721-630172,
## chr1:633793-634264, chr1:777634-779926, chr1:816881-817647,
## chr1:819912-823500
Collect a list of position-specific weight matrices (PWM) from the JASPAR database, we’ll use this in the downstream analysis.
pfm <- getMatrixSet(JASPAR2020,
opts = list(species = "Homo sapiens")
)
Feature metadata contains genomic information for both RNA expression and ATAC peaks
feature_metadata <- fread("ftp://ftp.ebi.ac.uk/pub/databases/mofa/10x_rna_atac_vignette/filtered_feature_bc_matrix/features.tsv.gz") %>%
setnames(c("ens_id","gene","view","chr","start","end"))
Fetch RNA metadata
feature_metadata.rna <- feature_metadata[view=="Gene Expression"]
head(feature_metadata.rna,n=3)
## ens_id gene view chr start end
## 1: ENSG00000243485 MIR1302-2HG Gene Expression chr1 29553 30267
## 2: ENSG00000237613 FAM138A Gene Expression chr1 36080 36081
## 3: ENSG00000186092 OR4F5 Gene Expression chr1 65418 69055
Fetch ATAC metadata
feature_metadata.atac <- feature_metadata[view=="Peaks"] %>%
.[,ens_id:=NULL] %>% setnames("gene","peak")
head(feature_metadata.atac,n=3)
## peak view chr start end
## 1: chr1:10109-10357 Peaks chr1 10109 10357
## 2: chr1:180730-181630 Peaks chr1 180730 181630
## 3: chr1:191491-191736 Peaks chr1 191491 191736
Classify peaks into promoter-overlapping and distal
foo <- fread("ftp://ftp.ebi.ac.uk/pub/databases/mofa/10x_rna_atac_vignette/atac_peak_annotation.tsv") %>%
.[,c("peak","peak_type")] %>%
.[peak_type%in%c("distal", "promoter")]
feature_metadata.atac <- feature_metadata.atac %>%
merge(foo,by="peak",all.x=TRUE)
table(feature_metadata.atac$peak_type)
##
## distal promoter
## 91822 16074
Split ATAC matrix depending on the peak type and create a ChromatinAssay
for each modality using the Signac package. This object requires a GRanges object with the peak metadata. We also provide an optional motif matrix that we’ll use for downstream analysis.
for (i in c("distal","promoter")) {
# Create GRanges
peaks.granges <- feature_metadata.atac %>%
.[peak_type==i] %>%
.[,c("chr","start","end","peak")] %>%
makeGRangesFromDataFrame(keep.extra.columns = TRUE, ignore.strand = TRUE)
# Scan motifs throughout the DNA sequence of each peak and create a binary matrix of motif-peak presence.
motif.matrix <- CreateMotifMatrix(
features = peaks.granges,
pwm = pfm,
genome = 'hg38',
use.counts = FALSE
) %>% as.matrix
# AddChromatinAssay to the Seurat object
seurat@assays[[paste0("ATAC_",i)]] <- CreateChromatinAssay(
seurat@assays$ATAC@counts[peaks.granges$peak,],
ranges = peaks.granges,
motifs = CreateMotifObject(motif.matrix, pfm)
)
}
seurat
## An object of class Seurat
## 245401 features across 10032 samples within 4 assays
## Active assay: RNA (29732 features, 0 variable features)
## 3 other assays present: ATAC, ATAC_distal, ATAC_promoter
Standard Log normalisation, nothing fancy. One could possibly regress out the nFeature_RNA
covariate or use the alternative SCTransform
normalisation
seurat <- NormalizeData(seurat, normalization.method = "LogNormalize", assay = "RNA")
seurat <- ScaleData(seurat, do.center = TRUE, do.scale = FALSE)
TFIDF normalisation, as implemented in Signac
. This is important to transform the binary measurements to continuous readouts, which is a mandatory step to use the Gaussian likelihood in MOFA. Keep in mind that other strategies are possible (and perhaps even more recommended). One could also try to scale the data and regress out nFeature_ATAC
for (i in c("ATAC_distal","ATAC_promoter")) {
seurat <- RunTFIDF(seurat, assay = i)
}
Select most variable genes using the VST procedure
seurat <- FindVariableFeatures(seurat,
selection.method = "vst",
nfeatures = 5000,
assay = "RNA",
verbose = FALSE
)
Select the most variable peaks using the FindTopFeatures
function
for (i in c("ATAC_distal","ATAC_promoter")) {
seurat <- FindTopFeatures(seurat, assay=i, min.cutoff = 2000)
print(length(seurat[[i]]@var.features))
}
## [1] 8456
## [1] 11198
MOFA can take as input a multi-assay Seurat object.
mofa <- create_mofa(seurat, assays = c("RNA","ATAC_distal","ATAC_promoter"))
mofa
## Untrained MOFA model with the following characteristics:
## Number of views: 3
## Views names: RNA ATAC_distal ATAC_promoter
## Number of features (per view): 5000 8456 11198
## Number of groups: 1
## Groups names: group1
## Number of samples (per group): 10032
The most important hyperparameter is the number of factors. It needs to be specified by the user and it generally depends on the complexity of the data set. It is usually recommended to provide a large enough number and trim factors a posteriori based on a minimum variance explained criteria. Here let’s define K=15 factors
model_opts <- get_default_model_options(mofa)
model_opts$num_factors <- 15
There also other options that can be tuned, see ?get_default_model_options
, ?get_default_training_options
and ?get_default_data_options
for details.
mofa <- prepare_mofa(mofa,
model_options = model_opts
)
This step can take around 2h in a MacBook Pro, but it can be speed up using multiple cores (~2-3x) or using GPUs (~10x). The training is done in Python, and the connection with R is done via reticulate. If this step fails please read our FAQ, as this is the most common source of errors when running MOFA
# mofa <- run_mofa(mofa)
Load pretrained MOFA object
# run this to download the data set
# mofa <- readRDS(url("ftp://ftp.ebi.ac.uk/pub/databases/mofa/10x_rna_atac_vignette/mofa.rds"))
mofa <- readRDS("/Users/ricard/data/10x_rna_atac/PBMC/mofa/mofa.rds")
mofa
## Trained MOFA with the following characteristics:
## Number of views: 3
## Views names: RNA ATAC_distal ATAC_promoter
## Number of features (per view): 5000 8456 11198
## Number of groups: 1
## Groups names: group1
## Number of samples (per group): 10032
## Number of factors: 13
The sample metadata must be provided as a data.frame and it must contain a column named sample with the sample IDs.
samples_metadata(mofa) <- seurat@meta.data %>%
tibble::rownames_to_column("sample") %>%
as.data.table
A good sanity check is to verify that the Factors are largely uncorrelated. In MOFA there are no orthogonality constraints such as in Principal Component Analysis, but if there is a lot of correlation between Factors this suggests a poor model fit
plot_factor_cor(mofa)
The variance decomposition analysis is my favourite MOFA functionality. It calculates the percentage of variance explained by each factor and data modality. What insight can we learn just from visualising this? Factor 1 has a very strong signal from all data modalities, so it must be the most important source of variability in the data. Factor 2 and Factor 3 are weaker but also shared between all data modalities, so they are also likely to be impotant sources of variability. Interestingly, most Factors have a stronger signal in distal peaks than in promoter peaks.
plot_variance_explained(mofa, max_r2 = 4)
A reasonable question is whether the model is providing a good fit to the data. To assess this we can plot the total variance explained (using all factors). The resulting values will depend on the nature of the data set, the number of samples, the number of factors, etc. For single-cell data the values tend to be quite low. Here we have a 30% for RNA, a very reasonable value, versus ~8-10% for ATAC data, which is also a decent value taking into account how noisy scATAC data is. In fact, compared to my previous analysis on the SNARE-seq data set these are very good values. In the SNARE-seq analysis the the ATAC modality was extremely noisy and we had to rely on denoising strategies to get meaningful signal. As you’ll see in this vignette, the 10x Genomics technology provides much better quality measurements and no denoising strategy is required.
plot_variance_explained(mofa, plot_total = TRUE)[[2]]
There are a few systematic strategies to characterise the molecular etiology underlying each MOFA Factor:
As a first approach, let’s correlate the standard quality control metrics with the Factor values. We see that Factor 4 is strongly associated with the number of expressed genes per cell and the Factors 1 and 7 are associated with the number of accessible peaks per cell. Some of these are likely technical factors that should be removed with a better normalisation strategy.
correlate_factors_with_covariates(mofa,
covariates = c("nFeature_RNA","nFeature_ATAC")
)
The variance expalined plot tells us that Factor 1 is the strongest source of variation and has a large contribution from all data modalities.
How do we interpret the factor values?
Mathematically, each Factor is defined by a linear combination of the input features. As the data is centered prior to running MOFA, each Factor ordinates cells along a one-dimensional axis that is centered at zero. Samples with different signs manifest opposite “effects” along the inferred axis of variation, with higher absolute value indicating a stronger effect. Note that the interpretation of MOFA factors is analogous to the interpretation of the principal components in PCA.
Let’s plot the Factor 1 values grouped by cell type and coloured by lineage of origin (Lymphoid or Myeloid). Clearly Factor 1 captures the molecular variation associated with lineage of origin.
plot_factor(mofa, factors=1, group_by = "celltype", color_by="broad_celltype") +
theme(
axis.text.x = element_text(color="black", angle=40, vjust=1, hjust=1)
)
how do we interpret the weights?
The weights provide a score for each feature on each factor. Features with no association with the corresponding factor are expected to have values close to zero, whereas features with strong association with the factor are expected to have large absolute values. The sign of the weights indicates the direction of the effect: a positive weights indicates that the feature has higher levels in the cells with positive factor values, and vice-versa.
Let’s plot the associated RNA weights. These are all good markers of myeloid (LYN, SLC8A1, LYZ, etc.) vs lymphoid fate (BCL11B)
plot_weights(mofa,
view = "RNA",
factors = 1,
nfeatures = 20,
text_size = 4
)
The weights are useful to get an idea of which genes are driving each factor. However, to get an idea of how good the association between features and factors is we can generate a scatterplot of the Factor values against mRNA expression for the genes with the largest weights:
plot_data_scatter(mofa,
view = "RNA",
factor = 1,
features = 6,
color_by = "broad_celltype",
add_lm = T,
dot_size = 1
)
Now let’s visualise the ATAC signatures by inspecting the weights, although they are not very interpretable. We’ll have to rely on motif-based approaches to get TF/motif signatures.
plot_top_weights(mofa,
view = "ATAC_distal",
factors = 1,
sign = "positive",
nfeatures = 15,
)
Again, we can visualise the ATAC patterns recovered by MOFA on the high-dimensional space. Instead of scatterplots we can use heatmaps:
plot_data_heatmap(mofa,
view = "ATAC_promoter",
factor = 1,
features = 50,
show_rownames = F, show_colnames = F,
cluster_rows = T, cluster_cols = F,
annotation_samples = "broad_celltype"
)
Again, we can visualise the ATAC patterns recovered by MOFA on the high-dimensional space. Instead of scatterplots we can also use heatmaps:
MOFA has an interesting option to denoise/smooth the data by reconstructing the data modalities using the latent factors. We can check how this works in the plot_data_heatmap
function using denoise = TRUE
. Instead of plotting the (noisy) input data, this plots the data reconstructed by the model:
plot_data_heatmap(mofa,
view = "ATAC_promoter",
factor = 1,
features = 50,
show_rownames = F, show_colnames = F,
cluster_rows = T, cluster_cols = F,
annotation_samples = "broad_celltype",
denoise = TRUE
)
Let’s proceed now in a similar way to characterise Factor 2. It is clearly capturing B cell commitment
plot_factor(mofa, factors=2, group_by = "celltype") +
theme(
axis.text.x = element_text(color="black", angle=40, vjust=1, hjust=1)
)
Let’s plot the associated RNA weights. We see good B cell markers, including CD74 and BANK1
plot_weights(mofa,
view = "RNA",
factors = 2,
nfeatures = 10,
text_size = 4
)
Let’s color the factor values by BANK1 expression
plot_factor(mofa, factors=2, group_by = "celltype", color_by="BANK1") +
theme(
axis.text.x = element_text(color="black", angle=40, vjust=1, hjust=1)
)
Again, we can visualise the ATAC signatures:
plot_data_heatmap(mofa,
view = "ATAC_promoter",
factor = 2,
features = 50,
show_rownames = F, show_colnames = F,
cluster_rows = T, cluster_cols = F,
annotation_samples = "celltype",
denoise = TRUE
)
In the correlation analysis we found that Factor 4 is strongly correlated to the total number of ATAC peaks, so it is likely a technical source of variation that has not been removed in the normalisation steps. The signal is clear, and it affects all cell types in a similar fashion:
plot_factor(mofa, factors=4, group_by = "celltype", color_by="nFeature_ATAC") +
theme(
axis.text.x = element_text(color="black", angle=40, vjust=1, hjust=1)
)
If we plot the associated ATAC weights notice that the distribution is massively shifted towards negative values. A typical signature of a technical factor is that most of the weights tend to non-zero:
plot_weights(mofa,
view = "ATAC_promoter",
factors = 4,
nfeatures = 0,
text_size = 4
)
The latent space inferred by MOFA can replace the PCA embedding as input to algorithms that learn non-linear manifolds such as t-SNE or UMAP. This can be very useful to identify cellular populations and reconstruct complex pseudotime trajectories. The advantage of MOFA is that (1) we use information from all available omics, and (2) we can characterise the Factors and remove the technical ones.
factors <- 1:get_dimensions(mofa)[["K"]]
factors <- factors[!factors%in%c(4,7)]
mofa <- run_umap(mofa,
factors = factors,
n_neighbors = 15,
min_dist = 0.30
)
plot_dimred(mofa,
method = "UMAP",
color_by = "celltype",
label = TRUE,
stroke=0.05,
dot_size = 1,
legend = FALSE
) + scale_fill_manual(values=colors)
We can try to add some interpretatibility on the UMAP by visualising the contribution of each Factor on the different groups of cells.
for (i in paste0("Factor",1:3)) {
p <- plot_dimred(mofa,
method = "UMAP",
color_by = i,
stroke = 0.05,
dot_size = 1
)
print(p)
}
In this case however we notice that the resulting UMAP is not very different from the one you could obtain using the RNA expression or the ATAC data alone. This is because most, if not all, biological factors have contributions from both the RNA and ATAC modalities.
DefaultAssay(seurat) <- "RNA"
seurat <- RunPCA(seurat, npcs = K, verbose = FALSE)
seurat <- RunUMAP(seurat, reduction = 'pca', dims = 1:K, verbose = FALSE)
DimPlot(seurat, label = TRUE, reduction="umap") +
NoLegend() + NoAxes() + scale_fill_manual(values=colors)
DefaultAssay(seurat) <- "ATAC_distal"
seurat <- RunSVD(seurat, n = K, verbose = FALSE)
seurat <- RunUMAP(seurat, reduction = 'lsi', dims = 1:K, verbose = FALSE)
DimPlot(seurat, label = TRUE, reduction="umap") +
NoLegend() + NoAxes() + scale_fill_manual(values=colors)
In addition to exploring the individual weights for each factor, we can use enrichment analysis to look for significant associations of factors to genesets. For more details on how GSEA works in MOFA we encourage the users to read the GSEA vignette.
First we define the gene set matrix. We’ll use the C5 category and the Biological Process subcategory from the MSigDB data base:
msgidb.matrix <- msigdbr(
species = "Homo sapiens",
category = "C5",
subcategory = "BP"
) %>% as.data.table %>% .[,id:=1] %>%
dcast(gs_name~gene_symbol, value.var="id", fill=0) %>%
matrix.please
Second, we run the enrichment analysis with default options. An important consideration when running GSEA is that MOFA Factors contain positive (+) and negative (-) weights. There will be cases where the genes with (-) weights all belong to a specific pathway but genes with (+) weights belong to other pathways. If this is true, doing GSEA with all of them together could dilute the signal. Hence, we recommend the user to do GSEA separately for (+) and (-) weights.
# GSEA on positive weights
gsea.positive <- run_enrichment(mofa,
feature.sets = msgidb.matrix,
view = "RNA",
sign = "positive"
)
# GSEA on negative weights
gsea.negative <- run_enrichment(mofa,
feature.sets = msgidb.matrix,
view = "RNA",
sign = "negative"
)
The enrichment analysis returns a list of 5 elements:
names(gsea.positive)
## [1] "feature.sets" "pval" "pval.adj"
## [4] "feature.statistics" "set.statistics" "sigPathways"
Factor 1 captured lineage of origin, where lymphoid cells wer linked to (-) factor values and myeloid cells to (+) factor values. Consistently, when plotting the pathways enriched in Factor 1 for (+) weight values we can see Myeloid-associated pathways:
plot_enrichment(gsea.positive, factor = 1, max.pathways = 15)
and when plotting the pathways enriched in Factor 1 for (-) weight values we can see Lymphoid-associated pathways:
plot_enrichment(gsea.negative, factor = 1, max.pathways = 15)
It is always advised to not rely only on the p-values and to visualise which genes are driving the enrichment within each pathways. This can be done with the plot_enrichment_detailed
function.
plot_enrichment_detailed(gsea.positive,
factor = 1,
max.genes = 10,
max.pathways = 5
)
Although peak annotation provides functional interpretation, it does not directly explain the underlying mechanism. Open chromatin can affect transcription through TFs, which facilitate transcription by recognizing and binding to specific DNA motifs. Let’s attempt to do motif enrichment analysis using the weights of the ATAC modality.
Here we can use the GSEA functionality implemented in MOFA. We just replace the gene set annotations by peak-motif annotations.
The variance explained plot suggests that there is more signal in distal peaks than in promoter peaks in the first two factors. Let’s do motif enrichment on distal peaks. Again, we compare positive weights (for Factor 1, peaks more open and motifs more enriched in Lymphoid cells) versus negative weights (for Factor 1, peaks more open and motifs more enriched in Myeloid cells)
# define motif matrix
motif.matrix <- t(as.matrix(seurat[["ATAC_distal"]]@motifs@data))
# Run GSEA enrichment analysis using the motif-peak matrix, (+) weights
motif.enrichment.positive <- run_enrichment(mofa,
view = "ATAC_distal",
factors = 1:2,
feature.sets = motif.matrix,
sign = "positive"
)
# Run GSEA enrichment analysis using the motif-peak matrix, (-) weights
motif.enrichment.negative <- run_enrichment(mofa,
view = "ATAC_distal",
factors = 1:2,
feature.sets = motif.matrix,
sign = "negative"
)
Plot motif enrichment results for Factor 1
plot_enrichment(motif.enrichment.positive, factor = 1, max.pathways = 15)
plot_enrichment(motif.enrichment.negative, factor = 1, max.pathways = 15)
We can visualise the motifs using the MotifPlot
function from the Signac
package.
Myeloid motifs (positive Factor values): CEBP
s are one of the most studied and important regulator of myeloid differentiation.
sig.motifs.positive <- motif.enrichment.positive$pval.adj[,"Factor1"] %>%
sort %>% head(n=6) %>% names
MotifPlot(seurat[["ATAC_distal"]], motifs = sig.motifs.positive)
Lymphoid motifs (negative Factor values):
LEF1
is essential for T-cell functionality.TCFL2
also known as TCF4, is a T-cell specific TF.RUNX3
is highly expressed in cells of lymphoid origin.TCF7
is highly expressed in cells of lymphoid origin.Seems like we are getting sensible results!
sig.motifs.negative <- motif.enrichment.negative$pval.adj[,"Factor1"] %>%
sort %>% head(n=6) %>% names
MotifPlot(seurat[["ATAC_distal"]], motifs = sig.motifs.negative)
To validate the motif enrichment results, we can calculate motif activity scores per cell using chromVAR. Briefly, this method computes an accessibility z-score for each motif in each cell that is adjusted by technical confounders. Positive values denote more accessible than background, and negative values denote less accessible than background
seurat <- RunChromVAR(
object = seurat,
assay = "ATAC_distal",
genome = BSgenome.Hsapiens.UCSC.hg38,
new.assay.name = "chromvar"
)
DefaultAssay(seurat) <- "chromvar"
Indeed, the top hits for Factor 1 have very clear chromVAR patterns that distinguish lymphoid versus myeloid cells
motifs.to.plot <- c(sig.motifs.positive[1:2], sig.motifs.negative[1:2])
FeaturePlot(seurat,
features = motifs.to.plot,
reduction = "umap",
combine = TRUE
) & NoLegend() & NoAxes()
This vignette demonstrates how MOFA can be used to integrate the RNA and ATAC modalities from the Chromium Single Cell Multiome ATAC + Gene Expression assay recently introduced by 10x Genomics. I want to thank Vijay, Shamoni, Paul and Kamila from the 10x R&D team for this collaboration and for feedback on the analysis.
If you have questions, suggestions about MOFA or the vignette you can contact me by email (ricard@ebi.ac.uk). Also, we have a Slack group where we provide quick and personalised advise if you are interested in MOFA.
Session Info
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] MOFA2_1.0 BSgenome.Hsapiens.UCSC.hg38_1.4.3
## [3] BSgenome_1.56.0 rtracklayer_1.48.0
## [5] Biostrings_2.56.0 XVector_0.28.0
## [7] GenomicRanges_1.40.0 GenomeInfoDb_1.24.2
## [9] IRanges_2.22.2 S4Vectors_0.26.1
## [11] BiocGenerics_0.34.0 TFBSTools_1.26.0
## [13] JASPAR2020_0.99.10 msigdbr_7.1.1
## [15] Signac_1.0.0 Seurat_3.2.0
## [17] ggplot2_3.3.2 data.table_1.13.0
## [19] BiocStyle_2.16.0
##
## loaded via a namespace (and not attached):
## [1] rappdirs_0.3.1 SnowballC_0.7.0
## [3] GGally_2.0.0 R.methodsS3_1.8.1
## [5] nabor_0.5.0 tidyr_1.1.2
## [7] bit64_4.0.5 knitr_1.29
## [9] irlba_2.3.3 DelayedArray_0.14.1
## [11] R.utils_2.10.1 rpart_4.1-15
## [13] KEGGREST_1.28.0 RCurl_1.98-1.2
## [15] AnnotationFilter_1.12.0 generics_0.0.2
## [17] GenomicFeatures_1.40.1 cowplot_1.0.0
## [19] RSQLite_2.2.0 RANN_2.6.1
## [21] chromVAR_1.10.0 future_1.18.0
## [23] bit_4.0.4 spatstat.data_1.4-3
## [25] httpuv_1.5.4 SummarizedExperiment_1.18.2
## [27] assertthat_0.2.1 DirichletMultinomial_1.30.0
## [29] xfun_0.16 hms_0.5.3
## [31] evaluate_0.14 promises_1.1.1
## [33] progress_1.2.2 readxl_1.3.1
## [35] caTools_1.18.0 dbplyr_1.4.4
## [37] tmvnsim_1.0-2 igraph_1.2.5
## [39] DBI_1.1.0 htmlwidgets_1.5.1
## [41] reshape_0.8.8 purrr_0.3.4
## [43] ellipsis_0.3.1 RSpectra_0.16-0
## [45] corrplot_0.84 ggpubr_0.4.0
## [47] dplyr_1.0.2 backports_1.1.9
## [49] bookdown_0.20 annotate_1.66.0
## [51] biomaRt_2.44.1 deldir_0.1-28
## [53] vctrs_0.3.4 Biobase_2.48.0
## [55] ensembldb_2.12.1 ROCR_1.0-11
## [57] abind_1.4-5 withr_2.2.0
## [59] checkmate_2.0.0 sctransform_0.2.1
## [61] GenomicAlignments_1.24.0 prettyunits_1.1.1
## [63] mnormt_2.0.2 goftest_1.2-2
## [65] cluster_2.1.0 ape_5.4-1
## [67] lazyeval_0.2.2 seqLogo_1.54.3
## [69] crayon_1.3.4 labeling_0.3
## [71] pkgconfig_2.0.3 nlme_3.1-148
## [73] vipor_0.4.5 ProtGenerics_1.20.0
## [75] nnet_7.3-14 rlang_0.4.7
## [77] globals_0.12.5 lifecycle_0.2.0
## [79] miniUI_0.1.1.1 BiocFileCache_1.12.1
## [81] rsvd_1.0.3 dichromat_2.0-0
## [83] cellranger_1.1.0 ggrastr_0.1.9
## [85] polyclip_1.10-0 matrixStats_0.56.0
## [87] lmtest_0.9-37 graph_1.66.0
## [89] Matrix_1.2-18 ggseqlogo_0.1
## [91] carData_3.0-4 Rhdf5lib_1.10.1
## [93] zoo_1.8-8 base64enc_0.1-3
## [95] beeswarm_0.2.3 pheatmap_1.0.12
## [97] ggridges_0.5.2 png_0.1-7
## [99] viridisLite_0.3.0 bitops_1.0-6
## [101] R.oo_1.24.0 KernSmooth_2.23-17
## [103] blob_1.2.1 stringr_1.4.0
## [105] qvalue_2.20.0 rstatix_0.6.0
## [107] readr_1.3.1 jpeg_0.1-8.1
## [109] ggsignif_0.6.0 CNEr_1.24.0
## [111] scales_1.1.1 memoise_1.1.0
## [113] magrittr_1.5 plyr_1.8.6
## [115] ica_1.0-2 zlibbioc_1.34.0
## [117] compiler_4.0.2 RColorBrewer_1.1-2
## [119] fitdistrplus_1.1-1 Rsamtools_2.4.0
## [121] listenv_0.8.0 patchwork_1.0.1
## [123] pbapply_1.4-3 htmlTable_2.0.1
## [125] Formula_1.2-3 MASS_7.3-51.6
## [127] mgcv_1.8-31 tidyselect_1.1.0
## [129] forcats_0.5.0 stringi_1.4.6
## [131] yaml_2.2.1 askpass_1.1
## [133] latticeExtra_0.6-29 ggrepel_0.8.2
## [135] grid_4.0.2 VariantAnnotation_1.34.0
## [137] fastmatch_1.1-0 tools_4.0.2
## [139] rio_0.5.16 future.apply_1.6.0
## [141] rstudioapi_0.11 TFMPvalue_0.0.8
## [143] foreign_0.8-80 lsa_0.73.2
## [145] gridExtra_2.3 farver_2.0.3
## [147] Rtsne_0.15 digest_0.6.25
## [149] BiocManager_1.30.10 shiny_1.5.0
## [151] pracma_2.2.9 motifmatchr_1.10.0
## [153] Rcpp_1.0.5 car_3.0-9
## [155] broom_0.7.0 later_1.1.0.1
## [157] RcppAnnoy_0.0.16 OrganismDbi_1.30.0
## [159] httr_1.4.2 AnnotationDbi_1.50.3
## [161] ggbio_1.36.0 biovizBase_1.36.0
## [163] psych_2.0.8 colorspace_1.4-1
## [165] XML_3.99-0.5 tensor_1.5
## [167] reticulate_1.16 splines_4.0.2
## [169] uwot_0.1.8 RBGL_1.64.0
## [171] RcppRoll_0.3.0 spatstat.utils_1.17-0
## [173] plotly_4.9.2.1 xtable_1.8-4
## [175] jsonlite_1.7.0 poweRlaw_0.70.6
## [177] spatstat_1.64-1 R6_2.4.1
## [179] Hmisc_4.4-1 pillar_1.4.6
## [181] htmltools_0.5.0 mime_0.9
## [183] DT_0.15 glue_1.4.2
## [185] fastmap_1.0.1 BiocParallel_1.22.0
## [187] codetools_0.2-16 lattice_0.20-41
## [189] tibble_3.0.3 curl_4.3
## [191] ggbeeswarm_0.6.0 leiden_0.3.3
## [193] gtools_3.8.2 zip_2.1.1
## [195] openxlsx_4.1.5 GO.db_3.11.4
## [197] openssl_1.4.2 survival_3.1-12
## [199] rmarkdown_2.3 munsell_0.5.0
## [201] rhdf5_2.32.2 GenomeInfoDbData_1.2.3
## [203] HDF5Array_1.16.1 haven_2.3.1
## [205] reshape2_1.4.4 gtable_0.3.0