Traditionally, viruses were considered solely pathogens; however, growing evidence suggests a more dynamic relationship between the virome and host, essentially through direct interactions with the bacterial microbiome. Much as the virome, intestinal fungi have only recently been acknowledged as a small but potentially important part of the intestinal ecosystem. As these findings provide clues that specific cross-kingdom interactions potentially contribute to or exacerbate disease, there remains a large paucity on the composition, interactions and function of fungi and viruses. Hence, there is an increasing need for unsupervised integrative computational frameworks that can robustly and systematically identify underlying patterns of variation across these communities in health and disease.
To examine the extent of these transkingdom interactions during critical illness, we collected faecal samples from 33 patients admitted to the Intensive Care Unit (ICU). Of these patients, 24 were admitted with sepsis while nine patients had a non-infectious diagnosis (non-septic ICU). All patients were treated with broad-spectrum antibiotics. In addition. 13 healthy volunteers were evaluated as controls. 6 healthy subjects received oral broad-spectrum antibiotics for 7 days, whereas 7 volunteers did not receive antibiotics. Subjects were asked to collect faecal samples before antibiotic treatment and one day after the course of antibiotics.
We performed parallel sequencing of bacterial 16S ribosomal RNA, and the fungal ITS rDNA gene and sought to examine community compositions by characterizing fungal and bacterial sequences into exact amplicon-sequencing variants. In addition, we also obtained Virus composition using VIDISCA-NGS.
An overview of the data set is shown below. Link to the paper
library(data.table)
library(purrr)
library(ggplot2)
library(ggpubr)
library(MOFA2)
The data is already processed, filtered and normalised.
dt <- fread("ftp://ftp.ebi.ac.uk/pub/databases/mofa/microbiome/data.txt.gz")
metadata <- fread("ftp://ftp.ebi.ac.uk/pub/databases/mofa/microbiome/metadata.txt.gz")
The data consists of three views (no groups): quantification of bacterial, fungi and viral composition, respectively. They all have been normalised using a central log ratio transformation with the compositions::clr
function. Let’s explore the data:
head(dt,n=3)
## sample feature value view
## 1: TKI_F1 Acidaminococcus -0.7167634 Bacteria
## 2: TKI_F10 Acidaminococcus 0.8680834 Bacteria
## 3: TKI_F11 Acidaminococcus 1.3049681 Bacteria
# Views names
unique(dt$view)
## [1] "Bacteria" "Fungi" "Viruses"
# Number of samples
length(unique(dt$sample))
## [1] 59
# Number of features per view
dt[,length(unique(feature)),by="view"]
## view V1
## 1: Bacteria 180
## 2: Fungi 18
## 3: Viruses 42
The bacteria and fungi data display real-valued distributions that are appropiately modelled using the gaussian likelihood. The virus data is very sparse, possibly zero-inflated. Better normalisation schemes that the central log ratio transformation that we applied may be appropriate:
ggdensity(dt, x="value", fill="gray70") +
facet_wrap(~view, nrow=1, scales="free")
Let’s explore the sample metadata
colnames(metadata)
## [1] "sample" "Age" "Sexs" "Diagnosis"
## [5] "Category" "Penicillins" "Cephalosporins" "Carbapenems"
## [9] "Macrolides" "Aminoglycosides" "Quinolones" "Co_trimoxazole"
## [13] "Metronidazole" "Vancomycin" "Acetate" "Propionate"
## [17] "Butyrate"
The main column is “Category”, which classifies samples into “Healthy, no antibiotics”, “Healthy, antibiotics”, “Sepsis”, “Non septic ICU”.
table(metadata$Category)
##
## Healthy, antibiotics Healthy, no antibiotics Non septic ICU
## 6 20 9
## Sepsis
## 24
Then we also have information about the antibiotic treatment that they received:
antibiotics <- c(
"Penicillins", "Cephalosporins", "Carbapenems", "Macrolides", "Aminoglycosides",
"Quinolones", "Co_trimoxazole", "Metronidazole", "Vancomycin"
)
head(metadata[,head(antibiotics,n=4),with=F], n=3)
## Penicillins Cephalosporins Carbapenems Macrolides
## 1: FALSE FALSE FALSE FALSE
## 2: TRUE FALSE FALSE FALSE
## 3: FALSE TRUE FALSE TRUE
as well as the short chain fatty acid (SCFA) metabolite concentration in the feces:
metabolites <- c("Butyrate", "Acetate", "Propionate")
head(metadata[,metabolites,with=F], n=3)
## Butyrate Acetate Propionate
## 1: 0.2542837 6.442785 2.396135
## 2: 2.8831111 18.455667 3.592333
## 3: 2.4570000 10.499412 5.320059
mofa <- create_mofa(dt)
mofa
## Untrained MOFA model with the following characteristics:
## Number of views: 3
## Views names: Bacteria Fungi Viruses
## Number of features (per view): 180 18 42
## Number of groups: 1
## Groups names: single_group
## Number of samples (per group): 59
Visualise data structure
plot_data_overview(mofa)
We’ll use all default options, with K=10 factors. For details please see ?prepare_mofa
:
model_opts <- get_default_model_options(mofa)
model_opts$num_factors <- 10
mofa <- prepare_mofa(mofa, model_options = model_opts)
If you train the model from scratch you will get slightly different results from the ones we obtained when fitting the model, mainly because of some updates in the MOFA code. For reproducibility we provide below the pre-trained model that we used for the article:
# Don't run
# mofa <- run_mofa(mofa)
# Load pre-computed model
mofa <- readRDS(url("ftp://ftp.ebi.ac.uk/pub/databases/mofa/microbiome/model.rds"))
# Add sample metadata to the model
samples_metadata(mofa) <- metadata
MOFA identified six factors with a minimum explained variance of 5% (see Methods). All together, the latent representation explained 39% of the sample heterogeneity in bacteria, 38% for fungi and 19% for viral composition:
plot_variance_explained(mofa, plot_total = T)[[2]]
When plotting the variance explained per factor, we observe that Factor 1 and Factor 3 are very interesting because they capture coordinated variability across all three kingdoms.
Factor 2 identified sample heterogeneity that is exclusively driven by changes in fungal composition:
plot_variance_explained(mofa, max_r2=15)
We can also plot the cumulative variance explained per view:
r2 <- mofa@cache$variance_explained$r2_per_factor[[1]]
r2.dt <- r2 %>%
as.data.table %>% .[,factor:=as.factor(1:mofa@dimensions$K)] %>%
melt(id.vars=c("factor"), variable.name="view", value.name = "r2") %>%
.[,cum_r2:=cumsum(r2), by="view"]
ggline(r2.dt, x="factor", y="cum_r2", color="view") +
labs(x="Factor number", y="Cumulative variance explained (%)") +
theme(
legend.title = element_blank(),
legend.position = "top",
axis.text = element_text(size=rel(0.8))
)
Define color code
category.colors <- c(
"Healthy, no antibiotics" = "#66C2A5",
"Healthy, antibiotics" = "#8DA0CB",
"Sepsis" = "#E78AC3",
"Non septic ICU" = "#FC8D62"
)
MOFA Factor 1 and 3 are capable of completely partitioning transkingdom signatures pertaining to critical illness, antibiotic perturbation and health:
p <- plot_factors(mofa,
factors = c(1,3),
color_by = "Category",
dot_size = 4
) + scale_fill_manual(values=category.colors)
p +
# geom_density_2d(aes_string(color="color_by")) +
stat_ellipse(aes(color=color_by), geom = "polygon", alpha=0.25) +
scale_color_manual(values=category.colors)
Factor 1, the major source of variation, captures a transkingdom microbiome signature in response to antibiotics. Interestingly, this signature is shared by healthy and diseased patients.
Beeswarm plot of Factor 1 values, grouped by Category
plot_factor(mofa,
factor = 1,
color_by = "Category",
dot_size = 4,
dodge = TRUE,
stroke = 0.4,
add_violin = T,
add_boxplot = T
) +
scale_fill_manual(values=category.colors) +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Define a helper function to plot weights, it will reduce the amount of code in the vignette
plot_weights_fn <- function(mofa, factor=1, view=1, nfeatures=10) {
p1 <- plot_weights(mofa,
factors = factor,
view = view,
nfeatures = nfeatures,
text_size = 4
)
p2 <- plot_top_weights(mofa,
factors = factor,
view = view,
nfeatures = nfeatures
)
p <- cowplot::plot_grid(plotlist=list(p1,p2), nrow=1)
return(p)
}
Bacteria: bacterial taxa positively associated with this factor are facultative aerobic bacterial pathobionts that have been previously associated with critical illness, such as Staphylococcus, Enterococcus, Klebsiella, Escherichia/Shigella and Enterobacter spp. Bacterial taxa that were negatively associated with this factor consisted predominantly of members of the obligately anaerobic families Lachnospiraceae and Ruminococcaceae, which have been identified as markers of a healthy microbiota and are linked to colonization resistance against bacterial pathobionts.
plot_weights_fn(mofa, factor=1, view="Bacteria", nfeatures=8)
Fungi: Fungal taxa positively associated with this factor were characterized by yeasts capable of causing invasive disease, such as Candida, Aspergillus, and Debaryomyces, with a relative absence of the common gut constituents Filobasidium, Malassezia and Dipodascus
plot_weights_fn(mofa, factor=1, view="Fungi", nfeatures=10)
Viruses: the majority of viral contigs that were associated with Factor consist of bacteriophages that significantly correlated with the presence of the corresponding bacterial targets in the same factor. This is interesting, as it suggests that the expansion of aerobic bacterial species during critical illness and following antibiotics can therefore potentially facilitate the enrichment of their corresponding bacteriophages.
plot_weights_fn(mofa, factor=1, view="Viruses", nfeatures=9)
Heatmaps of bacterial concentration, no denoising:
plot_data_heatmap(mofa,
factor = 1,
view = "Bacteria",
features = 20,
denoise = FALSE,
cluster_rows = T, cluster_cols = F,
show_colnames = F, show_rownames = T,
annotation_samples = "Category",
annotation_colors = list("Category"=category.colors),
annotation_legend = F,
scale = "row"
)
Heatmaps of bacterial concentration, after denoising:
plot_data_heatmap(mofa,
factor = 1,
view = "Bacteria",
features = 20,
denoise = TRUE,
cluster_rows = T, cluster_cols = F,
show_colnames = F, show_rownames = T,
annotation_samples = "Category",
annotation_colors = list("Category"=category.colors),
annotation_legend = F,
scale = "row"
)
Scatter plots of the bacterial concentration for the bacteria with the largest weight in Factor 1, plotted against the corresponding factor values
plot_data_scatter(mofa,
factor = 1,
view = "Bacteria",
features = 4,
dot_size = 3,
color_by = "Category",
legend = F
)
Factor 1 is correlated to pretty much all antibiotics, which is consistent with the observation that it separates the “Healthy non-antibiotic” individuals from all the others.
Interestingly, Factor 4 is strongly correlated to Cephalosporin exposure.
correlate_factors_with_covariates(mofa,
covariates = antibiotics,
plot = "r", # use "log_pval" to plot log p-values
)
Factor 1 is strongly associated with differences in Short Chain Fatty Acids between individuals that did not receive antibiotics and the rest. This suggests that the microbiome changes induced by antibiotic treatment have profound effects in the metabolite concentrations generated by the microbiome.
correlate_factors_with_covariates(mofa,
covariates = metabolites,
plot = "log_pval",
)
Plot factor values coloured by metabolite levels. Notice the gradient captured by Factor 1:
plot_factors(mofa,
factors = c(1,3),
color_by = "Butyrate",
shape_by = "Category",
dot_size = 3.5
) + scale_fill_gradient(low = "white", high = "#BF3EFF")
These findings further illustrate the complexity of transkingdom interactions within the intestinal environment, and show that modulation of the bacterial component of the microbiome has implications that extend beyond this kingdom alone. The short- and long-term impact of these disruptions will be an important focus of future investigations.
Please contact Bas (bashaak@live.nl) if you have questions regarding the experimental design and data interpretation.
sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] MOFA2_0.99.0 ggpubr_0.4.0 ggplot2_3.3.2 purrr_0.3.4
## [5] data.table_1.12.8 BiocStyle_2.16.0
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-148 matrixStats_0.56.0 RColorBrewer_1.1-2
## [4] tools_4.0.0 backports_1.1.8 R6_2.4.1
## [7] HDF5Array_1.16.1 vipor_0.4.5 BiocGenerics_0.34.0
## [10] mgcv_1.8-31 colorspace_1.4-1 withr_2.2.0
## [13] tidyselect_1.1.0 mnormt_2.0.0 ggrastr_0.1.9
## [16] curl_4.3 compiler_4.0.0 DelayedArray_0.14.0
## [19] labeling_0.3 bookdown_0.20 scales_1.1.1
## [22] psych_1.9.12.31 stringr_1.4.0 digest_0.6.25
## [25] foreign_0.8-80 rmarkdown_2.3 R.utils_2.9.2
## [28] rio_0.5.16 pkgconfig_2.0.3 htmltools_0.5.0
## [31] rlang_0.4.6 readxl_1.3.1 generics_0.0.2
## [34] farver_2.0.3 jsonlite_1.7.0 dplyr_1.0.0
## [37] zip_2.0.4 car_3.0-8 R.oo_1.23.0
## [40] magrittr_1.5 Matrix_1.2-18 Rcpp_1.0.4.6
## [43] ggbeeswarm_0.6.0 munsell_0.5.0 S4Vectors_0.26.1
## [46] Rhdf5lib_1.10.0 abind_1.4-5 reticulate_1.16
## [49] lifecycle_0.2.0 R.methodsS3_1.8.0 stringi_1.4.6
## [52] yaml_2.2.1 carData_3.0-4 MASS_7.3-51.6
## [55] rhdf5_2.32.1 plyr_1.8.6 grid_4.0.0
## [58] parallel_4.0.0 ggrepel_0.8.2 forcats_0.5.0
## [61] crayon_1.3.4 lattice_0.20-41 haven_2.3.1
## [64] cowplot_1.0.0 splines_4.0.0 hms_0.5.3
## [67] magick_2.4.0 tmvnsim_1.0-2 knitr_1.29
## [70] pillar_1.4.4 ggsignif_0.6.0 reshape2_1.4.4
## [73] stats4_4.0.0 glue_1.4.1 evaluate_0.14
## [76] BiocManager_1.30.10 vctrs_0.3.1 cellranger_1.1.0
## [79] gtable_0.3.0 tidyr_1.1.0 xfun_0.15
## [82] openxlsx_4.1.5 broom_0.5.6 rstatix_0.6.0
## [85] tibble_3.0.1 pheatmap_1.0.12 beeswarm_0.2.3
## [88] IRanges_2.22.2 corrplot_0.84 ellipsis_0.3.1