Gregory B Gloor1, Jia Rong Wu2, Vera Pawlowsky-Glahn3, Juan José Egozcue4. 1. Department of Biochemistry, University of Western Ontario, London, Ontario, Canada. Electronic address: ggloor@uwo.ca. 2. Department of Biochemistry, University of Western Ontario, London, Ontario, Canada. 3. Department of Computer Science, Applied Mathematics, and Statistics, University of Girona, Spain. 4. Department of Civil and Environmental Engineering, Technical University of Catalonia, Spain.
Abstract
PURPOSE: The ability to properly analyze and interpret large microbiome data sets has lagged behind our ability to acquire such data sets from environmental or clinical samples. Sequencing instruments impose a structure on these data: the natural sample space of a 16S rRNA gene sequencing data set is a simplex, which is a part of real space that is restricted to nonnegative values with a constant sum. Such data are compositional and should be analyzed using compositionally appropriate tools and approaches. However, most of the tools for 16S rRNA gene sequencing analysis assume these data are unrestricted. METHODS: We show that existing tools for compositional data (CoDa) analysis can be readily adapted to analyze high-throughput sequencing data sets. RESULTS: The Human Microbiome Project tongue versus buccal mucosa data set shows how the CoDa approach can address the major elements of microbiome analysis. Reanalysis of a publicly available autism microbiome data set shows that the CoDa approach in concert with multiple hypothesis test corrections prevent false positive identifications. CONCLUSIONS: The CoDa approach is readily scalable to microbiome-sized analyses. We provide example code and make recommendations to improve the analysis and reporting of microbiome data sets. Crown
PURPOSE: The ability to properly analyze and interpret large microbiome data sets has lagged behind our ability to acquire such data sets from environmental or clinical samples. Sequencing instruments impose a structure on these data: the natural sample space of a 16S rRNA gene sequencing data set is a simplex, which is a part of real space that is restricted to nonnegative values with a constant sum. Such data are compositional and should be analyzed using compositionally appropriate tools and approaches. However, most of the tools for 16S rRNA gene sequencing analysis assume these data are unrestricted. METHODS: We show that existing tools for compositional data (CoDa) analysis can be readily adapted to analyze high-throughput sequencing data sets. RESULTS: The Human Microbiome Project tongue versus buccal mucosa data set shows how the CoDa approach can address the major elements of microbiome analysis. Reanalysis of a publicly available autism microbiome data set shows that the CoDa approach in concert with multiple hypothesis test corrections prevent false positive identifications. CONCLUSIONS: The CoDa approach is readily scalable to microbiome-sized analyses. We provide example code and make recommendations to improve the analysis and reporting of microbiome data sets. Crown
Authors: Johanna W Lampe; Eunji Kim; Lisa Levy; Laurie A Davidson; Jennifer S Goldsby; Fayth L Miles; Sandi L Navarro; Timothy W Randolph; Ni Zhao; Ivan Ivanov; Andrew M Kaz; Christopher Damman; David M Hockenbery; Meredith A J Hullar; Robert S Chapkin Journal: Am J Clin Nutr Date: 2019-08-01 Impact factor: 7.045
Authors: Michael J LaMonte; Robert J Genco; Michael J Buck; Daniel I McSkimming; Lu Li; Kathleen M Hovey; Christopher A Andrews; Wei Zheng; Yijun Sun; Amy E Millen; Maria Tsompana; Hailey R Banack; Jean Wactawski-Wende Journal: BMC Oral Health Date: 2019-11-13 Impact factor: 2.757
Authors: Javier Aguilera-Lizarraga; Morgane V Florens; Maria Francesca Viola; Piyush Jain; Lisse Decraecker; Iris Appeltans; Maria Cuende-Estevez; Naomi Fabre; Kim Van Beek; Eluisa Perna; Dafne Balemans; Nathalie Stakenborg; Stavroula Theofanous; Goele Bosmans; Stéphanie U Mondelaers; Gianluca Matteoli; Sales Ibiza Martínez; Cintya Lopez-Lopez; Josue Jaramillo-Polanco; Karel Talavera; Yeranddy A Alpizar; Thorsten B Feyerabend; Hans-Reimer Rodewald; Ricard Farre; Frank A Redegeld; Jiyeon Si; Jeroen Raes; Christine Breynaert; Rik Schrijvers; Cédric Bosteels; Bart N Lambrecht; Scott D Boyd; Ramona A Hoh; Deirdre Cabooter; Maxim Nelis; Patrick Augustijns; Sven Hendrix; Jessica Strid; Raf Bisschops; David E Reed; Stephen J Vanner; Alexandre Denadai-Souza; Mira M Wouters; Guy E Boeckxstaens Journal: Nature Date: 2021-01-13 Impact factor: 49.962
Authors: Sarah E Berry; Ana M Valdes; Nicola Segata; Paul W Franks; Tim D Spector; David A Drew; Francesco Asnicar; Mohsen Mazidi; Jonathan Wolf; Joan Capdevila; George Hadjigeorgiou; Richard Davies; Haya Al Khatib; Christopher Bonnett; Sajaysurya Ganesh; Elco Bakker; Deborah Hart; Massimo Mangino; Jordi Merino; Inbar Linenberg; Patrick Wyatt; Jose M Ordovas; Christopher D Gardner; Linda M Delahanty; Andrew T Chan Journal: Nat Med Date: 2020-06-11 Impact factor: 53.440