In the MOFA2 R package we provide a wide range of downstream analysis to visualise and interpret the model output. Here we provide a brief description of the main functionalities. This vignette is made of simulated data and we do not highlight biologically relevant results.
library(ggplot2)
library(MOFA2)
filepath <- system.file("extdata", "model.hdf5", package = "MOFA2")
print(filepath)
## [1] "/Library/Frameworks/R.framework/Versions/4.0/Resources/library/MOFA2/extdata/model.hdf5"
model <- load_model(filepath)
## 1 factors were found to explain little or no variance and they were removed for downstream analysis. You can disable this option by setting load_model(..., remove_inactive_factors = F)
The function plot_data_overview
can be used to obtain an overview of the input data.
It shows how many views (rows) and how many groups (columns) exist, what are their corresponding dimensionalities and how many missing information they have (grey bars).
plot_data_overview(model)
The metadata is stored as a data.frame object in model@samples_metadata
, and it requires at least the column sample
. The column group
is required only if you are doing multi-group inference.
The number of rows must match the total number of samples in the model (sum(model@dimensions$N)
).
Let’s add some artifical metadata…
Nsamples = sum(model@dimensions$N)
sample_metadata <- data.frame(
sample = samples_names(model)[[1]],
condition = sample(c("A","B"), size = Nsamples, replace = T),
age = sample(1:100, size = Nsamples, replace = T)
)
samples_metadata(model) <- sample_metadata
head(model@samples_metadata, n=3)
## sample condition age group
## 1 sample_0_group_0 A 49 single_group
## 2 sample_1_group_0 B 26 single_group
## 3 sample_2_group_0 B 52 single_group
The first step in the MOFA analysis is to quantify the amount of variance explained (\(R^2\)) by each factor in each data modality.
The variance explained estimates are stored in the hdf5 file and loaded in model@cache[["variance_explained"]]
:
# Total variance explained per view and group
head(model@cache$variance_explained$r2_total[[1]]) # group 1
## view_0 view_1
## 75.61104 76.02184
# Variance explained for every factor in per view and group
head(model@cache$variance_explained$r2_per_factor[[1]]) # group 1
## view_0 view_1
## Factor1 17.58253404 20.61236579
## Factor2 17.35265544 16.29387175
## Factor3 16.11976392 15.28539277
## Factor4 12.91324749 12.89727555
## Factor5 11.56900625 10.74459381
## Factor6 0.01365075 0.03250978
Variance explained estimates can be plotted using plot_variance_explained(model, ...)
. Options:
plot_variance_explained(model, x="view", y="factor")
plot_variance_explained(model, x="group", y="factor", plot_total = T)[[2]]
Each MOFA factor captures a different dimension of heterogeneity in the data. Mathematically, each factor ordinates cells along a one-dimensional axis centered at zero. Samples with different signs manifest opposite phenotypes along the inferred axis of variation, with higher absolute value indicating a stronger effect. Note that the interpretation of factors is analogous to the interpretation of the principal components in PCA.
Factors can be plotted using plot_factor
(for beeswarm plots of individual factors) or plot_factors
(for scatter plots of factor combinations)
plot_factor(model,
factor = 1:3,
color_by = "age",
shape_by = "condition"
)
Adding more options
p <- plot_factor(model,
factors = c(1,2,3),
color_by = "condition",
dot_size = 3, # change dot size
dodge = T, # dodge points with different colors
legend = F, # remove legend
add_violin = T, # add violin plots,
violin_alpha = 0.25 # transparency of violin plots
)
# The output of plot_factor is a ggplot2 object that we can edit
p <- p +
scale_color_manual(values=c("A"="black", "B"="red")) +
scale_fill_manual(values=c("A"="black", "B"="red"))
print(p)
Scatter plots
plot_factors(model,
factors = 1:3,
color_by = "condition"
)
The weights provide a score for how strong each feature relates to each factor. Features with no association with the factor have values close to zero, while features with strong association with the factor have large absolute values. The sign of the weight indicates the direction of the effect: a positive weight indicates that the feature has higher levels in the cells with positive factor values, and vice versa.
Weights can be plotted using plot_weights
(beeswarm plots) or plot_top_weights
(scatter plots)
plot_weights(model,
view = "view_0",
factor = 1,
nfeatures = 10, # Number of features to highlight
scale = T, # Scale weights from -1 to 1
abs = F # Take the absolute value?
)
plot_top_weights(model,
view = "view_0",
factor = 1,
nfeatures = 10
)
Instead of looking at weights, it is useful to observe the coordinated heterogeneity that MOFA captures in the original data. This can be done using the plot_data_heatmap
and plot_data_scatter
function.
Heatmap of observations. Top features are selected by its weight in the selected factor. By default, samples are ordered according to their corresponding factor value.
plot_data_heatmap(model,
view = "view_1", # view of interest
factor = 1, # factor of interest
features = 20, # number of features to plot (they are selected by weight)
# extra arguments that are passed to the `pheatmap` function
cluster_rows = TRUE, cluster_cols = FALSE,
show_rownames = TRUE, show_colnames = FALSE
)
Scatter plots of observations vs factor values. It is useful to add a linear regression estimate to visualise if the relationship between (top) features and factor values is linear.
plot_data_scatter(model,
view = "view_1", # view of interest
factor = 1, # factor of interest
features = 5, # number of features to plot (they are selected by weight)
add_lm = TRUE, # add linear regression
color_by = "condition"
)
The MOFA factors are linear (as in Principal Component analysis). Nevertheless, the MOFA factors can be used as input to other methods that learn compact nonlinear manifolds (t-SNE or UMAP).
Run UMAP and t-SNE
set.seed(42)
# model <- run_umap(model)
model <- run_tsne(model)
Plot non-linear dimensionality reduction
plot_dimred(model,
method = "TSNE", # method can be either "TSNE" or "UMAP"
color_by = "condition"
)
The user can rename the dimensions of the model
views_names(model) <- c("Transcriptomics", "Proteomics")
factors_names(model) <- paste("Factor", 1:model@dimensions[["K"]], sep=" ")
views_names(model)
## [1] "Transcriptomics" "Proteomics"
The user can extract the feature weights, the data and the factors to generate their own plots.
Extract factors
# "factors" is a list of matrices, one matrix per group with dimensions (nsamples, nfactors)
factors <- get_factors(model, factors = "all")
lapply(factors,dim)
## $single_group
## [1] 200 14
Extract weights
# "weights" is a list of matrices, one matrix per view with dimensions (nfeatures, nfactors)
weights <- get_weights(model, views = "all", factors = "all")
lapply(weights,dim)
## $Transcriptomics
## [1] 1000 14
##
## $Proteomics
## [1] 1000 14
Extract data
# "data" is a nested list of matrices, one matrix per view and group with dimensions (nfeatures, nsamples)
data <- get_data(model)
lapply(data, function(x) lapply(x, dim))[[1]]
## $single_group
## [1] 1000 200
For convenience, the user can extract the data in long data.frame format:
factors <- get_factors(model, as.data.frame = T)
head(factors, n=3)
## sample factor value group
## 1 sample_0_group_0 Factor 1 -1.67059230 single_group
## 2 sample_1_group_0 Factor 1 -1.34170336 single_group
## 3 sample_2_group_0 Factor 1 0.06668732 single_group
weights <- get_weights(model, as.data.frame = T)
head(weights, n=3)
## feature factor value view
## 1 feature_0_view_0 Factor 1 0.418919906 Transcriptomics
## 2 feature_1_view_0 Factor 1 -0.002412198 Transcriptomics
## 3 feature_2_view_0 Factor 1 0.001715417 Transcriptomics
data <- get_data(model, as.data.frame = T)
head(data, n=3)
## view group feature sample value
## 1 Transcriptomics single_group feature_0_view_0 sample_0_group_0 -2.05
## 2 Transcriptomics single_group feature_1_view_0 sample_0_group_0 0.83
## 3 Transcriptomics single_group feature_2_view_0 sample_0_group_0 -2.71
sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] MOFA2_0.99.0 ggplot2_3.3.2 BiocStyle_2.16.0
##
## loaded via a namespace (and not attached):
## [1] tidyr_1.1.0 splines_4.0.0 jsonlite_1.7.0
## [4] carData_3.0-4 BiocManager_1.30.10 stats4_4.0.0
## [7] cellranger_1.1.0 vipor_0.4.5 yaml_2.2.1
## [10] ggrepel_0.8.2 corrplot_0.84 pillar_1.4.4
## [13] backports_1.1.8 lattice_0.20-41 glue_1.4.1
## [16] reticulate_1.16 digest_0.6.25 RColorBrewer_1.1-2
## [19] ggsignif_0.6.0 colorspace_1.4-1 cowplot_1.0.0
## [22] htmltools_0.5.0 Matrix_1.2-18 plyr_1.8.6
## [25] pkgconfig_2.0.3 pheatmap_1.0.12 broom_0.5.6
## [28] haven_2.3.1 magick_2.4.0 bookdown_0.20
## [31] purrr_0.3.4 scales_1.1.1 HDF5Array_1.16.1
## [34] Rtsne_0.15 openxlsx_4.1.5 ggrastr_0.1.9
## [37] rio_0.5.16 tibble_3.0.1 mgcv_1.8-31
## [40] generics_0.0.2 farver_2.0.3 IRanges_2.22.2
## [43] car_3.0-8 ellipsis_0.3.1 ggpubr_0.4.0
## [46] withr_2.2.0 BiocGenerics_0.34.0 readxl_1.3.1
## [49] magrittr_1.5 crayon_1.3.4 evaluate_0.14
## [52] GGally_2.0.0 nlme_3.1-148 rstatix_0.6.0
## [55] forcats_0.5.0 foreign_0.8-80 beeswarm_0.2.3
## [58] tools_4.0.0 data.table_1.12.8 hms_0.5.3
## [61] lifecycle_0.2.0 matrixStats_0.56.0 stringr_1.4.0
## [64] Rhdf5lib_1.10.0 S4Vectors_0.26.1 munsell_0.5.0
## [67] zip_2.0.4 DelayedArray_0.14.0 compiler_4.0.0
## [70] rlang_0.4.6 rhdf5_2.32.1 grid_4.0.0
## [73] labeling_0.3 rmarkdown_2.3 gtable_0.3.0
## [76] abind_1.4-5 reshape_0.8.8 curl_4.3
## [79] reshape2_1.4.4 R6_2.4.1 knitr_1.29
## [82] dplyr_1.0.0 stringi_1.4.6 ggbeeswarm_0.6.0
## [85] parallel_4.0.0 Rcpp_1.0.4.6 vctrs_0.3.1
## [88] tidyselect_1.1.0 xfun_0.15