Matthew C B Tsilimigras1, Anthony A Fodor2. 1. Department of Bioinformatics and Genomics, UNC Charlotte, Bioinformatics Building, The University of North Carolina, Charlotte 9201, University City Blvd, Charlotte. 2. Department of Bioinformatics and Genomics, UNC Charlotte, Bioinformatics Building, The University of North Carolina, Charlotte 9201, University City Blvd, Charlotte. Electronic address: anthony.fodor@gmail.com.
Abstract
PURPOSE: Human microbiome studies are within the realm of compositional data with the absolute abundances of microbes not recoverable from sequence data alone. In compositional data analysis, each sample consists of proportions of various organisms with a sum constrained to a constant. This simple feature can lead traditional statistical treatments when naively applied to produce errant results and spurious correlations. METHODS: We review the origins of compositionality in microbiome data, the theory and usage of compositional data analysis in this setting and some recent attempts at solutions to these problems. RESULTS: Microbiome sequence data sets are typically high dimensional, with the number of taxa much greater than the number of samples, and sparse as most taxa are only observed in a small number of samples. These features of microbiome sequence data interact with compositionality to produce additional challenges in analysis. CONCLUSIONS: Despite sophisticated approaches to statistical transformation, the analysis of compositional data may remain a partially intractable problem, limiting inference. We suggest that current research needs include better generation of simulated data and further study of how the severity of compositional effects changes when sampling microbial communities of widely differing diversity.
PURPOSE:Human microbiome studies are within the realm of compositional data with the absolute abundances of microbes not recoverable from sequence data alone. In compositional data analysis, each sample consists of proportions of various organisms with a sum constrained to a constant. This simple feature can lead traditional statistical treatments when naively applied to produce errant results and spurious correlations. METHODS: We review the origins of compositionality in microbiome data, the theory and usage of compositional data analysis in this setting and some recent attempts at solutions to these problems. RESULTS: Microbiome sequence data sets are typically high dimensional, with the number of taxa much greater than the number of samples, and sparse as most taxa are only observed in a small number of samples. These features of microbiome sequence data interact with compositionality to produce additional challenges in analysis. CONCLUSIONS: Despite sophisticated approaches to statistical transformation, the analysis of compositional data may remain a partially intractable problem, limiting inference. We suggest that current research needs include better generation of simulated data and further study of how the severity of compositional effects changes when sampling microbial communities of widely differing diversity.
Authors: Katherine Clark; Dora M Taggart; Brett R Baldwin; Kirsti M Ritalahti; Robert W Murdoch; Janet K Hatt; Frank E Löffler Journal: Environ Sci Technol Date: 2018-11-08 Impact factor: 9.028
Authors: Richard Meier; Jeffrey A Thompson; Mei Chung; Naisi Zhao; Karl T Kelsey; Dominique S Michaud; Devin C Koestler Journal: Stat Appl Genet Mol Biol Date: 2019-11-08
Authors: Michelle Shardell; Neeta Parimi; Lisa Langsetmo; Toshiko Tanaka; Lingjing Jiang; Eric Orwoll; James M Shikany; Deborah M Kado; Peggy M Cawthon Journal: J Gerontol A Biol Sci Med Sci Date: 2020-06-18 Impact factor: 6.053
Authors: Cormac Powell; Leonard D Browne; Brian P Carson; Kieran P Dowd; Ivan J Perry; Patricia M Kearney; Janas M Harrington; Alan E Donnelly Journal: Sports Med Date: 2020-01 Impact factor: 11.136