1 Description

This vignette demonstrates the simultaneous multi-view and multi-group integration framework of MOFA+.

We consider a dataset where scNMT-seq was used to simultaneously profile RNA expression, DNA methylation and chromatin accessibility in 1,828 cells at multiple stages of mouse development. MOFA+ provides a method for delineating coordinated variation between the transcriptome and the epigenome and for detecting at which stage(s) of development it occurs.

As input to the model we quantified DNA methylation and chromatin accessibility values over different sets of regulatory elements. Here we considered gene promoters and enhancer elements (distal H3K27ac sites). RNA expression was quantified over protein-coding genes. After data processing, separate views were defined for the RNA expression and for each combination of genomic context and epigenetic readout. Cells were grouped according to their developmental stage (E5.5, E6.5 and E7.5). For details in the data processing, see the following github repository

The data set we use here is a simplified version of the original data set published in (Nature)[]. The full data set can be downloaded from this FTP.

2 Load libraries

Load dependencies. Make sure that MOFA is imported last, to avoid collisions with functions from other packages

Define cell type colors for the visualisations

In this vignette we skip all the data processing and model training part and we focus on the downstream characterisation of the model. For details on the data preparation and setting up the MOFA object we refer to the following github repository

3 Load pre-computed model

MOFA models are saved in hdf5 format and can be loaded into R with the function load_model. In this case, however, we provide the trained model as an RData file, which already contains the cell metadata.

# MOFAmodel <- load_model("(...)/model.hdf5")

# load("/Users/ricard/data/mofa2_vignettes/gastrulation_scnmt_mofa.RData")

Explore the cell metadata:
* sample: cell identity.
* stage: developmental stage.
* lineage: cell type annotation (derived from mapping the cells to the 10x atlas).
* pass_rnaQC: did the cell pass QC for RNA expression?.
* pass_metQC: did the cell pass QC for DNA methylation? NA if the cell was only profiled for RNA.
* pass_accQC: did the cell pass QC for chromatin accessibility? NA if the cell was only profiled for RNA.
* group: the grouping used for MOFA, corresponds to the stage.

##                    sample               plate pass_rnaQC pass_metQC pass_accQC
## 1 E4.5-5.5_new_Plate1_E09 E4.5-5.5_new_Plate1       TRUE      FALSE      FALSE
## 2 E4.5-5.5_new_Plate1_D09 E4.5-5.5_new_Plate1       TRUE         NA         NA
## 3 E4.5-5.5_new_Plate1_G09 E4.5-5.5_new_Plate1       TRUE       TRUE       TRUE
## 4 E4.5-5.5_new_Plate1_F09 E4.5-5.5_new_Plate1       TRUE         NA         NA
## 5 E4.5-5.5_new_Plate1_A09 E4.5-5.5_new_Plate1       TRUE      FALSE      FALSE
## 6 E4.5-5.5_new_Plate1_C09 E4.5-5.5_new_Plate1       TRUE       TRUE       TRUE
##   stage     stage_lineage group      lineage
## 1  E5.5     E5.5_Epiblast  E5.5     Epiblast
## 2  E5.5 E5.5_ExE Endoderm  E5.5 ExE Endoderm
## 3  E5.5     E5.5_Epiblast  E5.5     Epiblast
## 4  E5.5 E5.5_ExE Endoderm  E5.5 ExE Endoderm
## 5  E5.5     E5.5_Epiblast  E5.5     Epiblast
## 6  E5.5     E5.5_Epiblast  E5.5     Epiblast

4 Overview of training data

The function plot_data_overview can be used to obtain an overview of the input data. It shows how many views (rows) and how many groups (columns) exist, what are their corresponding dimensionalities and how many missing information they have (grey bars).

view.colors <- c(
  "RNA expression" = "#3CB54E",
  "Enhancer accessibility" = "#00BFC4",
  "Promoter accessibility" = "#00BFC4",
  "Enhancer methylation" = "#F37A71",
  "Promoter methylation" = "#F37A71"
view.colors = view.colors[views_names(MOFAmodel)]

plot_data_overview(MOFAmodel, colors = view.colors)

As a sanity check, one should verify that the factors are (fairly) uncorrelated. Otherwise it suggests that the model has not converged or that perhaps you are using too many factors.

cor <- plot_factor_cor(MOFAmodel)

5 Plot variance explained per factor

Quantifying the variance explained across groups and views is probably the most important plot that MOFA+ generates. It summarises the (latent) signal from a complex heterogeneous data set in a single figure.
When having a multi-group and multi-view setting, it is advised to plot one factor at a time:

Factor 1

plot_variance_explained(MOFAmodel, x="group", y="view", factor=1, legend = T)