| Literature DB >> 34516542 |
Siyuan Ma1,2,3, Boyu Ren2,3, Himel Mallick2,3, Yo Sup Moon2, Emma Schwager2, Sagun Maharjan1,2,3, Timothy L Tickle2,3, Yiren Lu2, Rachel N Carmody4, Eric A Franzosa1,2,3, Lucas Janson5, Curtis Huttenhower1,2,3,6.
Abstract
Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA's model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. "taxa") or between features and "phenotypes" to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA's performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA's utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2.Entities:
Mesh:
Year: 2021 PMID: 34516542 PMCID: PMC8491899 DOI: 10.1371/journal.pcbi.1008913
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1A hierarchical model for microbial community feature profiles.
A) SparseDOSSA comprises a hierarchical model to capture the generation mechanism of microbial sequencing counts, including components for “hidden” absolute abundances, sequencing depth (and thus compositional relative abundances), zero inflation, and feature-feature and feature-environment interactions. Notations not defined in the figure: : cumulative density function (CDF) for the absolute abundance of feature A. μ, : mean and variance of the log normal sequencing depth distribution. B) SparseDOSSA can be fitted to varied microbial community types using cross-validation procedures by users; the software also provides pre-trained models are provided for human microbiome template datasets. This allows for C) simulation of either null or "true positive" association spiked-in synthetic datasets, to facilitate microbiome benchmarking or power analysis studies.