library(MOFA2)
library(magrittr)
library(tidyverse)
library(cowplot)
library(reshape2)

1 Load data

The data for this vignette can be downloaded from here and placed in a data/ directory. You should then have two files:

  • data/evodevo.RData containing the expression data used as input
  • data/evodevo_model.hdf5 containing the pre-trained MEFISTO model

We first load the dataframe that contains the gene expression data for all organs (views) and species (groups) as well as the (unmatched) time annotation for each sample.

load("data/evodevo.RData")
head(evodevo)
## # A tibble: 6 x 6
##   group view  sample      feature               value  time
##   <chr> <chr> <chr>       <chr>                 <dbl> <dbl>
## 1 Human Brain 10wpc_Human ENSG00000000457_Brain  8.57     7
## 2 Human Brain 10wpc_Human ENSG00000001084_Brain  8.88     7
## 3 Human Brain 10wpc_Human ENSG00000001167_Brain 11.3      7
## 4 Human Brain 10wpc_Human ENSG00000001461_Brain  7.37     7
## 5 Human Brain 10wpc_Human ENSG00000001561_Brain  7.31     7
## 6 Human Brain 10wpc_Human ENSG00000001617_Brain  9.96     7

2 Prepare and train MOFA

Create the MOFA object

First, we need to create a MOFA object from this data. This step is analogous to use of MOFA without the time information and can be done using create_mofa, which results in an untrained MOFA object.

# create the MOFA object and add times as covariate
MOFAobject_untrained <- create_mofa(data = evodevo)
MOFAobject_untrained
## Untrained MOFA model with the following characteristics: 
##  Number of views: 5 
##  Views names: Brain Cerebellum Heart Liver Testis 
##  Number of features (per view): 7696 7696 7696 7696 7696 
##  Number of groups: 5 
##  Groups names: Human Mouse Opossum Rabbit Rat 
##  Number of samples (per group): 23 14 15 15 16 
## 

Next, we want to add the time information for each sample, which we can do using set_covariates. As the time is already contained in the data.frame that we passed to the MOFAobject in create_mofa, we can just specify to use this column as covariate. Alternatively, we could also supply a new matrix or data frame providing the sample names and covariates. See ?set_covariates.

head(samples_metadata(MOFAobject_untrained))
##                  sample group time
## 10wpc_Human 10wpc_Human Human    7
## 11wpc_Human 11wpc_Human Human    8
## 12wpc_Human 12wpc_Human Human    9
## 13wpc_Human 13wpc_Human Human   10
## 16wpc_Human 16wpc_Human Human   11
## 18wpc_Human 18wpc_Human Human   12
MOFAobject_untrained <- set_covariates(MOFAobject_untrained, covariates = "time")
MOFAobject_untrained
## Untrained MEFISTO model with the following characteristics: 
##  Number of views: 5 
##  Views names: Brain Cerebellum Heart Liver Testis 
##  Number of features (per view): 7696 7696 7696 7696 7696 
##  Number of groups: 5 
##  Groups names: Human Mouse Opossum Rabbit Rat 
##  Number of samples (per group): 23 14 15 15 16 
##  Number of covariates per sample: 1 
## 

Show data and time covariate Before moving towards the training, let’s first take a look at the data in the untrained object. The plot shows for species (groups) which organ (view) are available for which samples. The right plots show the time values for each sample.

gg_input <- plot_data_overview(MOFAobject_untrained,
                               show_covariate = TRUE,
                               show_dimensions = TRUE) 
gg_input