| Literature DB >> 33634824 |
Fangting Zhou1, Kejun He2, Qiwei Li3, Robert S Chapkin4, Yang Ni5.
Abstract
High-throughput sequencing technology provides unprecedented opportunities to quantitatively explore human gut microbiome and its relation to diseases. Microbiome data are compositional, sparse, noisy, and heterogeneous, which pose serious challenges for statistical modeling. We propose an identifiable Bayesian multinomial matrix factorization model to infer overlapping clusters on both microbes and hosts. The proposed method represents the observed over-dispersed zero-inflated count matrix as Dirichlet-multinomial mixtures on which latent cluster structures are built hierarchically. Under the Bayesian framework, the number of clusters is automatically determined and available information from a taxonomic rank tree of microbes is naturally incorporated, which greatly improves the interpretability of our findings. We demonstrate the utility of the proposed approach by comparing to alternative methods in simulations. An application to a human gut microbiome data set involving patients with inflammatory bowel disease reveals interesting clusters, which contain bacteria families Bacteroidaceae, Bifidobacteriaceae, Enterobacteriaceae, Fusobacteriaceae, Lachnospiraceae, Ruminococcaceae, Pasteurellaceae, and Porphyromonadaceae that are known to be related to the inflammatory bowel disease and its subtypes according to biological literature. Our findings can help generate potential hypotheses for future investigation of the heterogeneity of the human gut microbiome.Entities:
Keywords: Bayesian nonparametric prior; Compositional data analysis; Feature allocation; Mixture model; Phylogenetic Indian buffet process
Mesh:
Year: 2022 PMID: 33634824 PMCID: PMC9291645 DOI: 10.1093/biostatistics/kxab002
Source DB: PubMed Journal: Biostatistics ISSN: 1465-4644 Impact factor: 5.279