1 Data set overview

Traditionally, viruses were considered solely pathogens; however, growing evidence suggests a more dynamic relationship between the virome and host, essentially through direct interactions with the bacterial microbiome. Much as the virome, intestinal fungi have only recently been acknowledged as a small but potentially important part of the intestinal ecosystem. As these findings provide clues that specific cross-kingdom interactions potentially contribute to or exacerbate disease, there remains a large paucity on the composition, interactions and function of fungi and viruses. Hence, there is an increasing need for unsupervised integrative computational frameworks that can robustly and systematically identify underlying patterns of variation across these communities in health and disease.

To examine the extent of these transkingdom interactions during critical illness, we collected faecal samples from 33 patients admitted to the Intensive Care Unit (ICU). Of these patients, 24 were admitted with sepsis while nine patients had a non-infectious diagnosis (non-septic ICU). All patients were treated with broad-spectrum antibiotics. In addition. 13 healthy volunteers were evaluated as controls. 6 healthy subjects received oral broad-spectrum antibiotics for 7 days, whereas 7 volunteers did not receive antibiotics. Subjects were asked to collect faecal samples before antibiotic treatment and one day after the course of antibiotics.
We performed parallel sequencing of bacterial 16S ribosomal RNA, and the fungal ITS rDNA gene and sought to examine community compositions by characterizing fungal and bacterial sequences into exact amplicon-sequencing variants. In addition, we also obtained Virus composition using VIDISCA-NGS.

An overview of the data set is shown below. Link to the paper

2 Load libraries

library(data.table)
library(purrr)
library(ggplot2)
library(ggpubr)
library(MOFA2)

3 Load data

The data is already processed, filtered and normalised.

dt <- fread("ftp://ftp.ebi.ac.uk/pub/databases/mofa/microbiome/data.txt.gz")
metadata <- fread("ftp://ftp.ebi.ac.uk/pub/databases/mofa/microbiome/metadata.txt.gz")

The data consists of three views (no groups): quantification of bacterial, fungi and viral composition, respectively. They all have been normalised using a central log ratio transformation with the compositions::clr function. Let’s explore the data:

head(dt,n=3)

##     sample         feature      value     view
## 1:  TKI_F1 Acidaminococcus -0.7167634 Bacteria
## 2: TKI_F10 Acidaminococcus  0.8680834 Bacteria
## 3: TKI_F11 Acidaminococcus  1.3049681 Bacteria

# Views names
unique(dt$view)

## [1] "Bacteria" "Fungi"    "Viruses"

# Number of samples
length(unique(dt$sample))

## [1] 59

# Number of features per view
dt[,length(unique(feature)),by="view"]

##        view  V1
## 1: Bacteria 180
## 2:    Fungi  18
## 3:  Viruses  42

The bacteria and fungi data display real-valued distributions that are appropiately modelled using the gaussian likelihood. The virus data is very sparse, possibly zero-inflated. Better normalisation schemes that the central log ratio transformation that we applied may be appropriate:

ggdensity(dt, x="value", fill="gray70") +
  facet_wrap(~view, nrow=1, scales="free")

Let’s explore the sample metadata

colnames(metadata)

##  [1] "sample"          "Age"             "Sexs"            "Diagnosis"      
##  [5] "Category"        "Penicillins"     "Cephalosporins"  "Carbapenems"    
##  [9] "Macrolides"      "Aminoglycosides" "Quinolones"      "Co_trimoxazole" 
## [13] "Metronidazole"   "Vancomycin"      "Acetate"         "Propionate"     
## [17] "Butyrate"

The main column is “Category”, which classifies samples into “Healthy, no antibiotics”, “Healthy, antibiotics”, “Sepsis”, “Non septic ICU”.

table(metadata$Category)

## 
##    Healthy, antibiotics Healthy, no antibiotics          Non septic ICU 
##                       6                      20                       9 
##                  Sepsis 
##                      24

Then we also have information about the antibiotic treatment that they received:

antibiotics <- c(
  "Penicillins", "Cephalosporins", "Carbapenems", "Macrolides", "Aminoglycosides", 
  "Quinolones", "Co_trimoxazole", "Metronidazole", "Vancomycin"
)
head(metadata[,head(antibiotics,n=4),with=F], n=3)

##    Penicillins Cephalosporins Carbapenems Macrolides
## 1:       FALSE          FALSE       FALSE      FALSE
## 2:        TRUE          FALSE       FALSE      FALSE
## 3:       FALSE           TRUE       FALSE       TRUE

as well as the short chain fatty acid (SCFA) metabolite concentration in the feces:

metabolites <- c("Butyrate", "Acetate", "Propionate")
head(metadata[,metabolites,with=F], n=3)

##     Butyrate   Acetate Propionate
## 1: 0.2542837  6.442785   2.396135
## 2: 2.8831111 18.455667   3.592333
## 3: 2.4570000 10.499412   5.320059

4 Train MOFA model

4.1 Create MOFA object

mofa <- create_mofa(dt)
mofa

## Untrained MOFA model with the following characteristics: 
##  Number of views: 3 
##  Views names: Bacteria Fungi Viruses 
##  Number of features (per view): 180 18 42 
##  Number of groups: 1 
##  Groups names: single_group 
##  Number of samples (per group): 59

Visualise data structure

plot_data_overview(mofa)

4.2 Prepare MOFA object

We’ll use all default options, with K=10 factors. For details please see ?prepare_mofa:

model_opts <- get_default_model_options(mofa)
model_opts$num_factors <- 10

mofa <- prepare_mofa(mofa, model_options = model_opts)

4.3 Train the model

If you train the model from scratch you will get slightly different results from the ones we obtained when fitting the model, mainly because of some updates in the MOFA code. For reproducibility we provide below the pre-trained model that we used for the article:

# Don't run
# mofa <- run_mofa(mofa)

# Load pre-computed model
mofa <- readRDS(url("ftp://ftp.ebi.ac.uk/pub/databases/mofa/microbiome/model.rds"))

# Add sample metadata to the model
samples_metadata(mofa) <- metadata

5 Downstream analysis

5.1 Variance decomposition

MOFA identified six factors with a minimum explained variance of 5% (see Methods). All together, the latent representation explained 39% of the sample heterogeneity in bacteria, 38% for fungi and 19% for viral composition:

plot_variance_explained(mofa, plot_total = T)[[2]]

When plotting the variance explained per factor, we observe that Factor 1 and Factor 3 are very interesting because they capture coordinated variability across all three kingdoms.
Factor 2 identified sample heterogeneity that is exclusively driven by changes in fungal composition:

plot_variance_explained(mofa, max_r2=15)

We can also plot the cumulative variance explained per view:

r2 <- mofa@cache$variance_explained$r2_per_factor[[1]]

r2.dt <- r2 %>%
  as.data.table %>% .[,factor:=as.factor(1:mofa@dimensions$K)] %>%
  melt(id.vars=c("factor"), variable.name="view", value.name = "r2") %>%
  .[,cum_r2:=cumsum(r2), by="view"]

ggline(r2.dt, x="factor", y="cum_r2", color="view") +
  labs(x="Factor number", y="Cumulative variance explained (%)") +
  theme(
    legend.title = element_blank(), 
    legend.position = "top",
    axis.text = element_text(size=rel(0.8))
  )

5.2 Visualise the MOFA Factors

Define color code

category.colors <- c(
  "Healthy, no antibiotics" = "#66C2A5", 
  "Healthy, antibiotics" = "#8DA0CB",
  "Sepsis" = "#E78AC3",
  "Non septic ICU" = "#FC8D62"
)

MOFA Factor 1 and 3 are capable of completely partitioning transkingdom signatures pertaining to critical illness, antibiotic perturbation and health:

p <- plot_factors(mofa, 
  factors = c(1,3), 
  color_by = "Category", 
  dot_size = 4
) + scale_fill_manual(values=category.colors)

p + 
  # geom_density_2d(aes_string(color="color_by")) +
  stat_ellipse(aes(color=color_by), geom = "polygon", alpha=0.25) +
  scale_color_manual(values=category.colors)

5.3 Characterisation of Factor 1

Factor 1, the major source of variation, captures a transkingdom microbiome signature in response to antibiotics. Interestingly, this signature is shared by healthy and diseased patients.

5.3.1 Factor values

Beeswarm plot of Factor 1 values, grouped by Category

plot_factor(mofa, 
  factor = 1, 
  color_by = "Category", 
  dot_size = 4,
  dodge = TRUE,
  stroke = 0.4,
  add_violin = T,
  add_boxplot = T
) +
  scale_fill_manual(values=category.colors) +
  theme(
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank()
  )

5.3.2 Plot Weights

Define a helper function to plot weights, it will reduce the amount of code in the vignette

plot_weights_fn <- function(mofa, factor=1, view=1, nfeatures=10) {
  p1 <- plot_weights(mofa, 
    factors = factor, 
    view = view,
    nfeatures = nfeatures,
    text_size = 4
  )
  
  p2 <- plot_top_weights(mofa, 
    factors = factor, 
    view = view,
    nfeatures = nfeatures
  )
  
  p <- cowplot::plot_grid(plotlist=list(p1,p2), nrow=1)
  return(p)
}

Bacteria: bacterial taxa positively associated with this factor are facultative aerobic bacterial pathobionts that have been previously associated with critical illness, such as Staphylococcus, Enterococcus, Klebsiella, Escherichia/Shigella and Enterobacter spp. Bacterial taxa that were negatively associated with this factor consisted predominantly of members of the obligately anaerobic families Lachnospiraceae and Ruminococcaceae, which have been identified as markers of a healthy microbiota and are linked to colonization resistance against bacterial pathobionts.

plot_weights_fn(mofa, factor=1, view="Bacteria", nfeatures=8)

Fungi: Fungal taxa positively associated with this factor were characterized by yeasts capable of causing invasive disease, such as Candida, Aspergillus, and Debaryomyces, with a relative absence of the common gut constituents Filobasidium, Malassezia and Dipodascus

plot_weights_fn(mofa, factor=1, view="Fungi", nfeatures=10)

Viruses: the majority of viral contigs that were associated with Factor consist of bacteriophages that significantly correlated with the presence of the corresponding bacterial targets in the same factor. This is interesting, as it suggests that the expansion of aerobic bacterial species during critical illness and following antibiotics can therefore potentially facilitate the enrichment of their corresponding bacteriophages.

plot_weights_fn(mofa, factor=1, view="Viruses", nfeatures=9)

5.3.3 Plot feature values

Heatmaps of bacterial concentration, no denoising:

plot_data_heatmap(mofa, 
  factor = 1, 
  view = "Bacteria", 
  features = 20,
  denoise = FALSE,
  cluster_rows = T, cluster_cols = F,
  show_colnames = F, show_rownames = T,
  annotation_samples = "Category",  
  annotation_colors = list("Category"=category.colors), 
  annotation_legend = F,
  scale = "row"
)

Heatmaps of bacterial concentration, after denoising:

plot_data_heatmap(mofa, 
  factor = 1, 
  view = "Bacteria", 
  features = 20,
  denoise = TRUE,
  cluster_rows = T, cluster_cols = F,
  show_colnames = F, show_rownames = T,
  annotation_samples = "Category",  
  annotation_colors = list("Category"=category.colors), 
  annotation_legend = F,
  scale = "row"
)

Scatter plots of the bacterial concentration for the bacteria with the largest weight in Factor 1, plotted against the corresponding factor values

plot_data_scatter(mofa, 
  factor = 1, 
  view = "Bacteria", 
  features = 4,
  dot_size = 3,
  color_by = "Category",
  legend = F
)

5.4 Association with clinical covariates

5.4.1 Antibiotic treatment

Factor 1 is correlated to pretty much all antibiotics, which is consistent with the observation that it separates the “Healthy non-antibiotic” individuals from all the others.
Interestingly, Factor 4 is strongly correlated to Cephalosporin exposure.

correlate_factors_with_covariates(mofa, 
  covariates = antibiotics,
  plot = "r",  # use "log_pval" to plot log p-values 
)

5.4.2 Fecal metabolites

Factor 1 is strongly associated with differences in Short Chain Fatty Acids between individuals that did not receive antibiotics and the rest. This suggests that the microbiome changes induced by antibiotic treatment have profound effects in the metabolite concentrations generated by the microbiome.

correlate_factors_with_covariates(mofa, 
  covariates = metabolites,
  plot = "log_pval",
)

Plot factor values coloured by metabolite levels. Notice the gradient captured by Factor 1:

plot_factors(mofa, 
  factors = c(1,3), 
  color_by = "Butyrate", 
  shape_by = "Category", 
  dot_size = 3.5
) + scale_fill_gradient(low = "white", high = "#BF3EFF")

6 Conclusion

These findings further illustrate the complexity of transkingdom interactions within the intestinal environment, and show that modulation of the bacterial component of the microbiome has implications that extend beyond this kingdom alone. The short- and long-term impact of these disruptions will be an important focus of future investigations.

7 Contact

Please contact Bas (bashaak@live.nl) if you have questions regarding the experimental design and data interpretation.

8 sessionInfo

sessionInfo()

## R version 4.0.0 (2020-04-24)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] MOFA2_0.99.0      ggpubr_0.4.0      ggplot2_3.3.2     purrr_0.3.4      
## [5] data.table_1.12.8 BiocStyle_2.16.0 
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-148        matrixStats_0.56.0  RColorBrewer_1.1-2 
##  [4] tools_4.0.0         backports_1.1.8     R6_2.4.1           
##  [7] HDF5Array_1.16.1    vipor_0.4.5         BiocGenerics_0.34.0
## [10] mgcv_1.8-31         colorspace_1.4-1    withr_2.2.0        
## [13] tidyselect_1.1.0    mnormt_2.0.0        ggrastr_0.1.9      
## [16] curl_4.3            compiler_4.0.0      DelayedArray_0.14.0
## [19] labeling_0.3        bookdown_0.20       scales_1.1.1       
## [22] psych_1.9.12.31     stringr_1.4.0       digest_0.6.25      
## [25] foreign_0.8-80      rmarkdown_2.3       R.utils_2.9.2      
## [28] rio_0.5.16          pkgconfig_2.0.3     htmltools_0.5.0    
## [31] rlang_0.4.6         readxl_1.3.1        generics_0.0.2     
## [34] farver_2.0.3        jsonlite_1.7.0      dplyr_1.0.0        
## [37] zip_2.0.4           car_3.0-8           R.oo_1.23.0        
## [40] magrittr_1.5        Matrix_1.2-18       Rcpp_1.0.4.6       
## [43] ggbeeswarm_0.6.0    munsell_0.5.0       S4Vectors_0.26.1   
## [46] Rhdf5lib_1.10.0     abind_1.4-5         reticulate_1.16    
## [49] lifecycle_0.2.0     R.methodsS3_1.8.0   stringi_1.4.6      
## [52] yaml_2.2.1          carData_3.0-4       MASS_7.3-51.6      
## [55] rhdf5_2.32.1        plyr_1.8.6          grid_4.0.0         
## [58] parallel_4.0.0      ggrepel_0.8.2       forcats_0.5.0      
## [61] crayon_1.3.4        lattice_0.20-41     haven_2.3.1        
## [64] cowplot_1.0.0       splines_4.0.0       hms_0.5.3          
## [67] magick_2.4.0        tmvnsim_1.0-2       knitr_1.29         
## [70] pillar_1.4.4        ggsignif_0.6.0      reshape2_1.4.4     
## [73] stats4_4.0.0        glue_1.4.1          evaluate_0.14      
## [76] BiocManager_1.30.10 vctrs_0.3.1         cellranger_1.1.0   
## [79] gtable_0.3.0        tidyr_1.1.0         xfun_0.15          
## [82] openxlsx_4.1.5      broom_0.5.6         rstatix_0.6.0      
## [85] tibble_3.0.1        pheatmap_1.0.12     beeswarm_0.2.3     
## [88] IRanges_2.22.2      corrplot_0.84       ellipsis_0.3.1

MOFA+: analysis of multi-modal microbiome data

2020-09-03

Contents