| Literature DB >> 29163406 |
Abhishek Kaul1, Siddhartha Mandal2, Ori Davidov3, Shyamal D Peddada1.
Abstract
Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data.Entities:
Keywords: Aitchisons log-ratio; Microbiome data; bootstrap; covariates; cross-sectional data; false discovery rate (FDR)
Year: 2017 PMID: 29163406 PMCID: PMC5682008 DOI: 10.3389/fmicb.2017.02114
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Illustration of hypotheses H1 and H2 testing for trends amongst groups.
Figure 2FDR (Left) and Power (Right) comparisons among ANCOM II, DESeq2, Prop-T, and Pseudo-C. Power comparisons are for δ = 0.5.
Figure 3Power comparisons among ANCOM II, DESeq2, Prop-T, and Pseudo-C, for different values of δ ∈ (0, 0.5).
Figure 4FDR (Left) and Power (Right) comparisons among ANCOM II, DESeq2, Prop-T, and Pseudo-C for simulation based on negative binomial distribution.
Figure 5(Left) Venn diagram illustrating overlapping features detected by different procedures. (Right) overlapping features detected by assuming Bifidobacterium as normalizer or the geometric mean of all taxa as the normalizer.