Jun Chen1,2, Emily King1,3, Rebecca Deek4, Zhi Wei5, Yue Yu1,6, Diane Grill1, Karla Ballman7, Oliver Stegle. 1. Division of Biomedical Statistics and Informatics. 2. Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA. 3. Department of Statistics, Iowa State University, Ames, IA 50011, USA. 4. Department of Biostatistics, Columbia University, New York, NY 10032, USA. 5. Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA. 6. College of Medical Informatics, Chongqing Medical University, Chongqing, China. 7. Department of Healthcare Policy and Research, Weill Cornell Medical College, New York, NY 10065, USA.
Abstract
Motivation: One objective of human microbiome studies is to identify differentially abundant microbes across biological conditions. Previous statistical methods focus on detecting the shift in the abundance and/or prevalence of the microbes and treat the dispersion (spread of the data) as a nuisance. These methods also assume that the dispersion is the same across conditions, an assumption which may not hold in presence of sample heterogeneity. Moreover, the widespread outliers in the microbiome sequencing data make existing parametric models not overly robust. Therefore, a robust and powerful method that allows covariate-dependent dispersion and addresses outliers is still needed for differential abundance analysis. Results: We introduce a novel test for differential distribution analysis of microbiome sequencing data by jointly testing the abundance, prevalence and dispersion. The test is built on a zero-inflated negative binomial regression model and winsorized count data to account for zero-inflation and outliers. Using simulated data and real microbiome sequencing datasets, we show that our test is robust across various biological conditions and overall more powerful than previous methods. Availability and implementation: R package is available at https://github.com/jchen1981/MicrobiomeDDA. Contact: chen.jun2@mayo.edu or zhiwei@njit.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: One objective of human microbiome studies is to identify differentially abundant microbes across biological conditions. Previous statistical methods focus on detecting the shift in the abundance and/or prevalence of the microbes and treat the dispersion (spread of the data) as a nuisance. These methods also assume that the dispersion is the same across conditions, an assumption which may not hold in presence of sample heterogeneity. Moreover, the widespread outliers in the microbiome sequencing data make existing parametric models not overly robust. Therefore, a robust and powerful method that allows covariate-dependent dispersion and addresses outliers is still needed for differential abundance analysis. Results: We introduce a novel test for differential distribution analysis of microbiome sequencing data by jointly testing the abundance, prevalence and dispersion. The test is built on a zero-inflated negative binomial regression model and winsorized count data to account for zero-inflation and outliers. Using simulated data and real microbiome sequencing datasets, we show that our test is robust across various biological conditions and overall more powerful than previous methods. Availability and implementation: R package is available at https://github.com/jchen1981/MicrobiomeDDA. Contact: chen.jun2@mayo.edu or zhiwei@njit.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: R R Cook; J A Fulcher; N H Tobin; F Li; D Lee; C Woodward; M Javanbakht; R Brookmeyer; S Shoptaw; R Bolan; G M Aldrovandi; P M Gorbach Journal: HIV Med Date: 2019-12-27 Impact factor: 3.180
Authors: Kalins Banerjee; Ni Zhao; Arun Srinivasan; Lingzhou Xue; Steven D Hicks; Frank A Middleton; Rongling Wu; Xiang Zhan Journal: Front Genet Date: 2019-04-24 Impact factor: 4.599