| Literature DB >> 35915508 |
Christine Hehnly1,2, Lijun Zhang1, Steven J Schiff3, Joseph N Paulson4,5, M Senthil Kumar6,7, Eric V Slud8,9, James Broach1,2, Rafael A Irizarry10,11.
Abstract
BACKGROUND: Individual and environmental health outcomes are frequently linked to changes in the diversity of associated microbial communities. Thus, deriving health indicators based on microbiome diversity measures is essential. While microbiome data generated using high-throughput 16S rRNA marker gene surveys are appealing for this purpose, 16S surveys also generate a plethora of spurious microbial taxa.Entities:
Keywords: Differential richness; False discoveries; Microbiome; Richness; Species misclassification; Spurious; Surveys
Mesh:
Substances:
Year: 2022 PMID: 35915508 PMCID: PMC9344657 DOI: 10.1186/s13059-022-02722-x
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 17.906
Fig. 1Within-genus false taxa accumulation structure. A Sequences in input samples are subjected to various technical steps during 16S sequencing (gray shade). The output reads from 16S sequencing are clustered for sequence similarity using a methodology of choice. Of the number of taxa (clusters) thus reconstructed, some are true, i.e., equal in sequence to those in the input sample, the rest are spurious i.e., false (red). B For every genus, the accumulation is determined as a function of its recovered abundances. Notation: n0 the respective true number of taxa associated (true richness), ythe genus recovered abundance, f(·) the abundance dependent technical component driving false taxa accumulations within-genus
Fig. 2Concordant taxa accumulations across genera, confounded differential richness inference, and the Prokounter strategy. A Sample-wide taxa accumulations are visualized with respect to sample depth (left). Within-genus taxa accumulations are visualized with respect to the total recovered genus abundances for two gener a, i.e., the sum of the abundances of all taxa within the genus. B Differential richness log-fold changes (LFC, y-axis) track differential relative abundance fold changes (LFC, x-axis) in the waste-water treatment survey. C Prokounter exploits within-genus accumulation data to model false taxa accumulation rates. When exploited in a standard Poisson regression setting, the resulting differential richness fold changes are uncorrelated with genus-wide differential abundance statistics (right). Dashed lines represent confidence intervals. Points colored in red are the genus-specific differential richness inferences for the waste-water treatment survey
Relative to study variables, within-genus taxa accumulation trends capture bulk of the systematic variation in 16S surveys’ genus-specific taxa accumulations. For each 16S survey mentioned in column 1, the year of publication is listed in column 2, the partial 16S segment targeted, machine technology and sequence clustering approach used are specified in column 3. FST.x refers to sequence clustering at an a priori fixed sequence similarity threshold of x%. McFadden's pseudo-R2 for explaining genus-specific taxa accumulations with two negative binomial regressions (NB) are listed in columns 4 and 5. The fourth column is obtained when the NB regression includes within-genus taxa accumulation trend (, Methods) alone as predictor. The fifth column additionally includes the genus identifier, total sample depth, and experimental design matrix for each dataset as predictors (methods). Corresponding Akaike Information Criteria (AIC) are listed in columns 6 and 7
Relative to study variables, within-genus taxa accumulation trends capture bulk of the systematic variation in 16S surveys’ sample-wide taxa accumulations. For each 16S survey mentioned in column 1, the year of publication is listed in column 2, the partial 16S segment targeted, machine technology and sequence clustering approach used are specified in column 3. FST.x refers to sequence clustering at an a priori fixed sequence similarity threshold of x%. McFadden's pseudo-R2 for explaining sample-wide taxa accumulations with two negative binomial regressions (NB) are listed in columns 4 and 5. The fourth column is obtained when the NB regression includes within-genus taxa accumulation trend (, Methods) alone as predictor. The fifth column additionally includes the total sample depth, and experimental design matrix for each dataset as predictors (methods). Corresponding Akaike Information Criteria (AIC) are listed in columns 6 and 7
Fig. 3False microbial discoveries accumulate along the recovered abundance axis in the Pseudomonas dilution study. A For each taxa clustering method, the observed variation in within-genus Pseudomonas taxa accumulations are driven by experimental and technical parameters. Contaminant Pseudomonas are expected to fall with input loads, indicating false discovery accumulations at higher recovered Pseudomonas abundances. B The genus recovered abundance axis offers a succinct representation for taxa accumulations. Average and the 95% point-wise confidence intervals for the logged within-Pseudomonas taxa accumulation trends are shown with colored lines for each method, with colored circles indicating the respective observations. C An overlay of taxa accumulations across multiple detected genera in the study. Colors indicate genera