| Literature DB >> 25714718 |
Ryan J Newton1, Sandra L McLellan1, Deborah K Dila1, Joseph H Vineis2, Hilary G Morrison2, A Murat Eren2, Mitchell L Sogin3.
Abstract
UNLABELLED: Molecular characterizations of the gut microbiome from individual human stool samples have identified community patterns that correlate with age, disease, diet, and other human characteristics, but resources for marker gene studies that consider microbiome trends among human populations scale with the number of individuals sampled from each population. As an alternative strategy for sampling populations, we examined whether sewage accurately reflects the microbial community of a mixture of stool samples. We used oligotyping of high-throughput 16S rRNA gene sequence data to compare the bacterial distribution in a stool data set to a sewage influent data set from 71 U.S. cities. On average, only 15% of sewage sample sequence reads were attributed to human fecal origin, but sewage recaptured most (97%) human fecal oligotypes. The most common oligotypes in stool matched the most common and abundant in sewage. After informatically separating sequences of human fecal origin, sewage samples exhibited ~3× greater diversity than stool samples. Comparisons among municipal sewage communities revealed the ubiquitous and abundant occurrence of 27 human fecal oligotypes, representing an apparent core set of organisms in U.S. populations. The fecal community variability among U.S. populations was significantly lower than among individuals. It clustered into three primary community structures distinguished by oligotypes from either: Bacteroidaceae, Prevotellaceae, or Lachnospiraceae/Ruminococcaceae. These distribution patterns reflected human population variation and predicted whether samples represented lean or obese populations with 81 to 89% accuracy. Our findings demonstrate that sewage represents the fecal microbial community of human populations and captures population-level traits of the human microbiome. IMPORTANCE: The gut microbiota serves important functions in healthy humans. Numerous projects aim to define a healthy gut microbiome and its association with health states. However, financial considerations and privacy concerns limit the number of individuals who can be screened. By analyzing sewage from 71 cities, we demonstrate that geographically distributed U.S. populations share a small set of bacteria whose members represent various common community states within U.S. adults. Cities were differentiated by their sewage bacterial communities, and the community structures were good predictors of a city's estimated level of obesity. Our approach demonstrates the use of sewage as a means to sample the fecal microbiota from millions of people and its potential to elucidate microbiome patterns associated with human demographics.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25714718 PMCID: PMC4358014 DOI: 10.1128/mBio.02574-14
Source DB: PubMed Journal: MBio Impact factor: 7.867
Sequence data set statistics
| Dataset | % of total sequences in sample type: | |
|---|---|---|
| Human stool | Sewage | |
| 15 most abundant bacterial families | 98.0 | 26.1 |
| 6 most abundant bacterial families | 90.8 | 21.6 |
| 6 most abundant bacterial families, following | 78.3 | 18.7 |
| 6 most abundant families, oligotype data set, | 78.3 | 11.7 |
| Total subsampled sequence count | 821,476 ( | 11,302,794 ( |
The most abundant bacterial families are ordered according to the mean number of sequences among human stool samples (n = 137).
FIG 1 (A) Bacterial family taxon assignments for the 15 most abundant bacterial families in the pooled human stool data set and the pooled sewage data set. The normalized sequence counts in the sewage data set represent the total proportion of sequences (98.0%) from the 15 families in the pooled stool data set. (B) Box plot depicting the relative abundance of oligotypes classified into the six most abundant bacterial families in the human stool data set for both the pooled human stool and pooled sewage data sets after removing non-fecal-matter-associated oligotypes. The normalized sequence count in the sewage data set represents the total proportion of sequences (78.3%) assigned to those oligotypes. Circles represent sample mean values, and the line vertices represent first and third quartiles. Note that for human stool Prevotellaceae, the first and third quartiles do not intersect the mean.
FIG 2 (A) Comparison of oligotype prevalence among the human stool samples (x axis) with the percentage of the oligotypes (y axis) that were also present at a specific prevalence level among the sewage samples (data series). Data are plotted as the percentage of oligotypes within a human stool prevalence category (e.g., 0 to 10%, 11 to 20%, etc.) that meet a specific prevalence requirement in sewage (1%, 10%, etc.). For example, 50% of human fecal oligotypes that were present in 71 to 80% of the stool samples were present in 100% of the sewage samples (see purple data series). (B) Comparison of Hill diversity values for multiple alpha parameters based on the human fecal oligotype community in the sewage data set versus the human stool data set. Higher alpha values place more weight on the most abundant organisms in the diversity calculation. Shannon and Simpson diversity indices are indicated on the plot. Sample mean diversity values are plotted on the x axis, and pooled diversity from all samples is plotted on the y axis. A one-to-one line is indicated, and lines connect equivalent alpha value results between the two data sets. For visualization, both axes are ordered from high to low diversity values.
FIG 3 Heat map comparing the human fecal oligotype compositions (Bray-Curtis similarity) among samples for the sewage and human stool data sets. Comparisons for the pooled human stool data set versus all individual samples (labeled as “pooled” on the plot) are shown in the space between the sewage and human stool samples.
FIG 4 The primary axis scores of each sample from a constrained ordination of principal coordinates (CAP) for the temperature profile of each city are indicated via color coding on a U.S. map. Scores to the left on the ordination are plotted in shades of blue, and scores to the right on the ordination are plotted in shades of red. Samples that are ≥2 standard deviations from the primary axis origin are colored the maximum blue and red colors. Both the human fecal oligotype (bottom) and non-human-fecal oligotype (top) data sets were included in the CAP, and therefore, colors represent equivalent community variation related to the temperature profile of the cities for each data set. For comparison, sample periods are depicted in separate maps.
Random forest classification statistics for the sewage bacterial community composition as a predictor of obesity levels in city populations
| Data set | Classification | ||
|---|---|---|---|
| No. correct/total no. | Accuracy (% correct) | ||
| Lean | Obese | ||
| All samples, | 44/54 | 45/54 | 82 |
| All samples, SD | 26/38 | 44/46 | 83 |
| Cities, | 16/21 | 18/21 | 81 |
| Cities, SD | 14/17 | 17/18 | 89 |
City populations were classified as “lean” or “obese” based on the estimated percentage of obese people in each city.
All samples for a city were included in the model and classified separately.
Samples in the first (lean) and fourth (obese) quartiles for the distribution of city obesity percentages in the random forest classification model. A “lean city” versus “obese city” designation corresponds to populations with ≤22.8% obesity versus ≥30.4% obesity, respectively.
Samples >1 standard deviation from the mean city obesity percentage in the random forest classification model. A city was considered to be lean at populations with ≤21.5% obesity and obese at populations with ≥31.3% obesity.
Average bacterial community composition in all samples for a city.