| Literature DB >> 31222023 |
James T Morton1,2, Clarisse Marotz1, Alex Washburne3, Justin Silverman4,5,6, Livia S Zaramela1, Anna Edlund7, Karsten Zengler8,9,10, Rob Knight11,12,13.
Abstract
Differential abundance analysis is controversial throughout microbiome research. Gold standard approaches require laborious measurements of total microbial load, or absolute number of microorganisms, to accurately determine taxonomic shifts. Therefore, most studies rely on relative abundance data. Here, we demonstrate common pitfalls in comparing relative abundance across samples and identify two solutions that reveal microbial changes without the need to estimate total microbial load. We define the notion of "reference frames", which provide deep intuition about the compositional nature of microbiome data. In an oral time series experiment, reference frames alleviate false positives and produce consistent results on both raw and cell-count normalized data. Furthermore, reference frames identify consistent, differentially abundant microbes previously undetected in two independent published datasets from subjects with atopic dermatitis. These methods allow reassessment of published relative abundance data to reveal reproducible microbial changes from standard sequencing output without the need for new assays.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31222023 PMCID: PMC6586903 DOI: 10.1038/s41467-019-10656-5
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Illustration demonstrating statistical limitations inherent in compositional datasets. a Two different biological scenarios can yield the exact same proportions of taxa in samples from a population pre- and post-treatment. b Simulated datasets plotting the true differential obtained using absolute abundance data on the x-axis, versus the inferred differential obtained using relative abundance data on the y-axis. Each dot represents a taxon in the dataset, and the colors represent datasets with various ratios of total microbial load (K) between before and after samples. The red line represents the optimal scenario where the samples have equal microbial load. This illustrates the prevalence of either false positives (FP) or false negatives (FN) when performing differential abundance analysis on samples with unequal total microbial load. The presence of either FPs or FNs is dictated by a nonlinear function of the true differential (see online methods). c An illustration of differential proportions of bacterial species before and after treatment. d Same data as b but plotting the rank of the differentials, demonstrating that ranks are equivalent regardless of differences in microbial load
Fig. 2Analysis of salivary microbiota before and after brushing teeth. a Flow-cytometry-quantified microbial load in unstimulated saliva collected for 5 min normalized to before brushing teeth. Each line corresponds to a different volunteer. Error bars represent the standard deviation from duplicate flow-cytometry measurements. b Microbial ranks estimated from multinomial regression with Actinomyces and Haemophilus highlighted. The y-axis represents the log-fold change that is known up to some bias constant K, and the x-axis numerically orders the ranks of each taxa in the analysis. c A comparison of t-statistics (left) and p-values (right) between before and after samples where each dot is an individual taxon (top graphs) or ratio between each taxon to Actinomyces (bottom graphs) calculated from relative abundance data (x-axis) and absolute abundance data (y-axis). The 1-1 correspondence in the ratio graphs is a result of the microbial loads cancelling out, as described in Eq. (3). d A comparison of relative abundance vs absolute abundance data of Actinomyces, Haemophilus and log(Actinomyces: Haemophilus) before and after brushing teeth. Error bars represent standard error of the mean. e Comparison of the multinomial coefficients used for DR, ALDEx2 and ANCOM outputs. The test statistics generated from ALDEx2 and ANCOM are sorted in the same order as the multinomial coefficients to provide a consistent comparison. All taxa that passed the significance tests are highlighted in red
Fig. 3DR analysis of skin in two atopic dermatitis studies. Panels a–c represent data from Byrd et al.[27], and panels d, e represent data from Leung et al.[28]. Both studies compare lesioned (L) to non-lesioned (NL) skin. a Microbial ranks estimated from multinomial regression applied to shotgun metagenomics from Byrd et al.[27] with key genera highlighted. The y-axis represents the log-fold change that is known up to some bias constant K. b Proportions of S. aureus, S. epidermidis, M. globosa, and P. acnes in lesioned (blue) and non-lesioned (orange) skin (left) and correlation of relative abundance with SCORAD score (right). c Log-ratios of (S. aureus: P. acnes), (S. epidermidis: P. acnes), and (M. globosa: P. acnes) (left) and correlation of ratio with SCORAD score (right). Error bars represent standard deviation across participants (n = 20). d Change in log-ratio of (M. globosa: P. acnes) from Leung et al.[28]. e Change in relative abundance of M. globosa between lesioned and non-lesioned skin from Leung et al.[28]. Presented p-values are from paired t-test statistics
Fig. 4DR analysis of the Central Park dataset. a Microbes ranked with respect to their association with nitrogen. b Microbes ranked with respect to their association with pH. Putative hits against an acidophile, an ammonia oxidizer and a nitrogen reducer are highlighted