| Literature DB >> 34046023 |
Alice Risely1, Mark A F Gillingham1, Arnaud Béchet2, Stefan Brändel1,3, Alexander C Heni1,3, Marco Heurich4,5,6, Sebastian Menke1, Marta B Manser7, Marco Tschapka1,3, Simone Sommer1.
Abstract
The filtering of gut microbial datasets to retain high prevalence taxa is often performed to identify a common core gut microbiome that may be important for host biological functions. However, prevalence thresholds used to identify a common core are highly variable, and it remains unclear how they affect diversity estimates and whether insights stemming from core microbiomes are comparable across studies. We hypothesized that if macroecological patterns in gut microbiome prevalence and abundance are similar across host species, then we would expect that increasing prevalence thresholds would yield similar changes to alpha diversity and beta dissimilarity scores across host species datasets. We analyzed eight gut microbiome datasets based on 16S rRNA gene amplicon sequencing and collected from different host species to (1) compare macroecological patterns across datasets, including amplicon sequence variant (ASV) detection rate with sequencing depth and sample size, occupancy-abundance curves, and rank-abundance curves; (2) test whether increasing prevalence thresholds generate universal or host-species specific effects on alpha and beta diversity scores; and (3) test whether diversity scores from prevalence-filtered core communities correlate with unfiltered data. We found that gut microbiomes collected from diverse hosts demonstrated similar ASV detection rates with sequencing depth, yet required different sample sizes to sufficiently capture rare ASVs across the host population. This suggests that sample size rather than sequencing depth tends to limit the ability of studies to detect rare ASVs across the host population. Despite differences in the distribution and detection of rare ASVs, microbiomes exhibited similar occupancy-abundance and rank-abundance curves. Consequently, increasing prevalence thresholds generated remarkably similar trends in standardized alpha diversity and beta dissimilarity across species datasets until high thresholds above 70%. At this point, diversity scores tended to become unpredictable for some diversity measures. Moreover, high prevalence thresholds tended to generate diversity scores that correlated poorly with the original unfiltered data. Overall, we recommend that high prevalence thresholds over 70% are avoided, and promote the use of diversity measures that account for phylogeny and abundance (Balance-weighted phylogenetic diversity and Weighted Unifrac for alpha and beta diversity, respectively), because we show that these measures are insensitive to prevalence filtering and therefore allow for the consistent comparison of core gut microbiomes across studies without the need for prevalence filtering.Entities:
Keywords: bioinformatics; community ecology; core microbiome; gut microbiota; host-microbe communities; methods
Year: 2021 PMID: 34046023 PMCID: PMC8144293 DOI: 10.3389/fmicb.2021.659918
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Definitions and descriptions for alpha and beta diversity measures applied in this study.
| Diversity measure | Index | Weighting | Description | References |
| Observed richness | Not weighted | Number of ASVs detected per sample | NA | |
| Faith’s PD | Phylogeny-weighted | Sum of the branch lengths of the phylogenetic tree connecting all microbial taxa present within a sample | ||
| Shannon | Abundance-weighted | A diversity index based on the number of ASVs present and their abundance distribution (evenness) | ||
| BWPD | Phylogeny- and abundance-weighted | Abundance-weighted extension of phylogenetic diversity | ||
| Jaccard | Not weighted | Variability in microbial composition among sampled communities, with composition measured by which ASVs are present or absent | ||
| Unweighted Unifrac | Phylogeny-weighted | Variability in microbial composition among sampled communities based on the lineages they contain | ||
| Morisita | Abundance-weighted | Variability in microbial composition among sampled communities based on ASV presence and abundance. Sensitive to the most abundant species | ||
| Weighted Unifrac | Phylogeny- and abundance-weighted | Abundance-weighted extension of Unweighted Unifrac |
Metadata associated with each dataset.
| Species | Latin name | Country | No. of samples | Sample type | Sample buffer | 16S Primers | Mean read count per sample | Associated publication | Data availability |
| Humans | United States | 500 | Feces | None | 515F/806R (V4) | 33,454 | American Gut Project | NCBI BioProject | |
| Meerkat | South Africa | 137 | Feces | None/RNAlater | 515F/806R (V4) | 129,011 | NA | NCBI BioProject | |
| Red deer | Germany | 136 | Feces | RNAlater | 515F/806R (V4) | 48,667 | |||
| Seba’s short-tailed bat | Panama | 169 | Feces | RNAlater | 515F/806R (V4) | 36,549 | NA | NCBI BioProject | |
| Tome’s spiny rat | Panama | 196 | Feces | RNAlater | 515F/806R (V4) | 25,045 | NCBI BioProject | ||
| Gray-brown mouse lemur | Madagascar | 182 | Feces | RNAlater | 515F/806R (V4) | 49,910 | NCBI BioProject | ||
| Greater flamingo | France | 552 | Cloacal swab | RNAlater | 515RF/806R (V4) | 27,970 | NCBI BioProject | ||
| Red-necked stint | Australia | 98 | Cloacal swab | None | 27F/519R (V1-3) | 42,573 | NCBI BioProject |
FIGURE 1Comparison of ASV detection rates and macroecological patterns across species datasets. (A) Rarefaction curves per species dataset, showing ASV detection with increasing sequencing depth per sample. To facilitate comparison, the 200 ASV mark is represented by a dashed line, and 10,000 reads is indicated with a solid line. X-axis ticks mark every 10,000. (B) ASV accumulation curves with sample size, showing the extent to which each additional sample increases total number of ASVs detected per species dataset. Dashed lines represent extrapolations to the total number of ASVs predicted to be within the overall ASV pool, represented by end points. (C) Percent of total (predicted) ASVs detected with increasing sample size per species dataset. The dashed horizonal line marks 50% of ASVs detected, whilst the vertical dashed lines represent the sample size required to detect 50% of predicted ASVs. (D) The relationship between sample size and predictions of the overall ASV pool. Dashed lines represent the final ASV pool prediction per dataset, which match those shown in Figure 1A. (E) ASV prevalence distribution per dataset, showing the proportion of ASVs found in just one sample (dark blue) to the proportion found in over eight samples (yellow). (F) ASV prevalence distribution per sample, showing mean proportion of ASVs per sample found in just that sample (dark blue) to proportion found in at least eight other samples (yellow). (G) Abundance-occupancy curves per dataset. (H) Rank-abundance curves per species dataset.
FIGURE 2Effects of increasing prevalence threshold on standardized alpha diversity and beta dissimilarity measures, colored by species dataset: (A) observed ASV richness; (B) faiths phylogenetic diversity; (C) shannon index; (D) balance-weighted phylogenetic diversity (BWPD); (E) jaccard index; (F) unweighted Unifrac; (G) morisita; (H) weighted Unifrac.
FIGURE 3Mean standardized alpha diversity and beta dissimilarity measures with increasing prevalence thresholds, colored by species dataset: (A) observed ASV richness; (B) faiths phylogenetic diversity; (C) shannon index; (D) balance-weighted phylogenetic diversity (BWPD); (E) jaccard index; (F) unweighted Unifrac; (G) morisita; (H) weighted Unifrac.
FIGURE 4Spearman’s correlation (rho) between diversity scores from core microbiomes and scores from original unfiltered data, colored by species dataset: (A) observed ASV richness; (B) faiths phylogenetic diversity; (C) shannon index; (D) balance-weighted phylogenetic diversity (BWPD); (E) jaccard index; (F) unweighted Unifrac; (G) morisita; (H) weighted Unifrac. Negative values represent negative correlations, and for ease of interpretation a dashed line represents a correlation of 0.6. Circles represent significant correlations (p < 0.05), whilst squares represent non-significant correlations.