| Literature DB >> 36149894 |
Lin Zhang1, Likai Chen2, Xiaoqian Annie Yu3, Claire Duvallet4,5, Siavash Isazadeh4,5, Chengzhen Dai6, Shinkyu Park6, Katya Frois-Moniz4,5, Fabio Duarte6, Carlo Ratti6, Eric J Alm4,5,7, Fangqiong Ling1,8,9,10.
Abstract
The metagenome embedded in urban sewage is an attractive new data source to understand urban ecology and assess human health status at scales beyond a single host. Analyzing the viral fraction of wastewater in the ongoing COVID-19 pandemic has shown the potential of wastewater as aggregated samples for early detection, prevalence monitoring, and variant identification of human diseases in large populations. However, using census-based population size instead of real-time population estimates can mislead the interpretation of data acquired from sewage, hindering assessment of representativeness, inference of prevalence, or comparisons of taxa across sites. Here, we show that taxon abundance and sub-species diversisty in gut-associated microbiomes are new feature space to utilize for human population estimation. Using a population-scale human gut microbiome sample of over 1,100 people, we found that taxon-abundance distributions of gut-associated multi-person microbiomes exhibited generalizable relationships with respect to human population size. Here and throughout this paper, the human population size is essentially the sample size from the wastewater sample. We present a new algorithm, MicrobiomeCensus, for estimating human population size from sewage samples. MicrobiomeCensus harnesses the inter-individual variability in human gut microbiomes and performs maximum likelihood estimation based on simultaneous deviation of multiple taxa's relative abundances from their population means. MicrobiomeCensus outperformed generic algorithms in data-driven simulation benchmarks and detected population size differences in field data. New theorems are provided to justify our approach. This research provides a mathematical framework for inferring population sizes in real time from sewage samples, paving the way for more accurate ecological and public health studies utilizing the sewage metagenome.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36149894 PMCID: PMC9534451 DOI: 10.1371/journal.pcbi.1010472
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.779
Fig 1An ideal sewage mixture simulation shows the potential of microbiome taxon abundance profiles as population census information sources.
(A) We generated an “ideal sewage mixture” consisting of gut microbiomes from different numbers of people. (B) Ranked abundance curves for gut microbiomes of one person and mixtures of multiple people exhibit different levels of dominance and diversity. Blue lines show the rank abundance curves in stool samples (one person), red lines show 10-person mixtures, and saffron lines show 100-person mixtures. In each scenario, ten examples are shown. All samples were rarefied to the same sequencing depths (4,000 seqs/sample). (C) The probability density function of the relative abundance of one taxon for different population sizes. OTU-2379, a Bifidobacterium taxon, was used as an example. Maroon dashed lines indicate the sample means. (D) Multiple taxa’s abundance variances in one-person samples and 100-person samples. The dominant taxa are shown (top100) and are sorted by their ranks in variance. (E) The ratios of the variances of one-person samples and 100-person samples across dominant gut microbial taxa.
Fig 2Classifier performance of models utilizing gut microbiome taxon abundances.
Fig 3MicrobiomeCensus statistic definition, model training, validation, and application.
(A) Example of computing the T statistic. (B) Simulation results for T with different population sizes. Grey points are simulation results. Red bars are means of 10,000 repeats performed for each population size. (C) Model training and tuning. We built the MicrobiomeCensus model using our T statistic and a maximum likelihood procedure. The training set consisted of 10,000 samples for population sizes ranging from 1–300, and 50% of the data were used to train and validate the model. Training and validation errors from different feature subsets are shown. Training errors are shown as blue lines, and validation errors are shown as red lines. (D) Model performance on simulation benchmark. After training and validation, the model utilized the top 120 abundant features. Model performance was tested on synthetic data generated from 550 different subjects not previously seen by the model. The training set consisted of 10,000 samples with population sizes from 1–300, and the testing set consisted of 10,000 repeats at the evaluated population sizes. The training error, testing error, and the error of the final model are shown. (E) Model performance evaluated using a testing set. Black solid dots indicate the means of the predicted values, and error bars indicate the standard deviations of the predicted values. (F) Application of the microbiome population model in sewage. Seventy-six composite samples (blue) were taken from three manholes on the MIT campus, and each sample was taken over 3 hours during the morning peak water usage hours. Twenty-five snapshot samples (grey) were taken using a peristaltic pump for 5 minutes at 1-hour intervals throughout a day.
Fig 4Sub-species diversity in gut-associated bacterial species as a potential marker for human population size.
(A-F) Comparison of sub-species diversity of gut-associated bacteria in human gut microbiome samples (LifelinesDeep) and MIT sewage samples. Nucleotide diversity and numbers of polymorphic sites were computed from ten phylogenetic marker genes. (G) and (H) Simulation results showing intra-species diversity in response to increasing population size, as represented by the number of polymorphic sites (G) and nucleotide diversity (H).