| Literature DB >> 30425147 |
Xiaoquan Su1,2,3, Gongchao Jing4,2,3, Daniel McDonald5, Honglei Wang4,2,3, Zengbin Wang4,2,3, Antonio Gonzalez5, Zheng Sun4,2,3, Shi Huang4,2,3, Jose Navas6, Rob Knight7,6,8,9, Jian Xu1,2,3.
Abstract
With the expansion of microbiome sequencing globally, a key challenge is to relate new microbiome samples to the existing space of microbiome samples. Here, we present Microbiome Search Engine (MSE), which enables the rapid search of query microbiome samples against a large, well-curated reference microbiome database organized by taxonomic similarity at the whole-microbiome level. Tracking the microbiome novelty score (MNS) over 8 years of microbiome depositions based on searching in more than 100,000 global 16S rRNA gene amplicon samples, we detected that the structural novelty of human microbiomes is approaching saturation and likely bounded, whereas that in environmental habitats remains 5 times higher. Via the microbiome focus index (MFI), which is derived from the MNS and microbiome attention score (MAS), we objectively track and compare the structural-novelty and attracted-attention scores of individual microbiome samples and projects, and we predict future trends in the field. For example, marine and indoor environments and mother-baby interactions are likely to receive disproportionate additional attention based on recent trends. Therefore, MNS, MAS, and MFI are proposed "alt-metrics" for evaluating a microbiome project or prospective developments in the microbiome field, both of which are done in the context of existing microbiome big data.IMPORTANCE We introduce two concepts to quantify the novelty of a microbiome. The first, the microbiome novelty score (MNS), allows identification of microbiomes that are especially different from what is already sequenced. The second, the microbiome attention score (MAS), allows identification of microbiomes that have many close neighbors, implying that considerable scientific attention is devoted to their study. By computing a microbiome focus index based on the MNS and MAS, we objectively track and compare the novelty and attention scores of individual microbiome samples and projects over time and predict future trends in the field; i.e., we work toward yielding fundamentally new microbiomes rather than filling in the details. Therefore, MNS, MAS, and MFI can serve as "alt-metrics" for evaluating a microbiome project or prospective developments in the microbiome field, both of which are done in the context of existing microbiome big data.Entities:
Keywords: bioinformatics; community similarity; data mining; database search; microbial ecology; microbiome; microbiome novelty; novelty; search
Mesh:
Substances:
Year: 2018 PMID: 30425147 PMCID: PMC6234870 DOI: 10.1128/mBio.02099-18
Source DB: PubMed Journal: MBio Impact factor: 7.867
FIG 1Historical trend of microbiome novelty scores. (A) The MNSs of samples from 2010 to 2017 followed a normal distribution. In each subpanel, the bar chart represents the frequencies of samples and the curve is the simulated standard normal distribution. (B) Yearly accumulative curves of the total numbers of samples and novel samples. From 2010 to 2017, 15,501 samples were identified as novel microbiomes with an MNS of ≥0.15. (C) Yearly accumulative curves of sample numbers for human samples and nonhuman sample. (D) Yearly development of novel sample ratios (defined as the number of novel samples over the number of total samples) in each category. Thick dotted lines represent the ratios of novel samples in high-level categories (human, animal, and natural environments), while thin dotted lines are those in subcategories. (E) Linearly fitting slopes of novel sample ratio increases in each category. The color schemes are the same for panels D and E.
FIG 2Lack of correlation between the MNSs and Shannon indexes of alpha diversities at both the phylum level (A) and the genus level (B).
FIG 3Microbiome attention scores of known microbiome samples. (A) The MAS threshold of 14 is determined based on the top 20% of MAS samples. (B) Distribution of samples by MNS (x axis) and MAS (y axis). With the cutoff of MNSs was ≥0.15 (novel samples) and that of MASs was ≥14 (high-attention samples), a total number of 2,238 microbiomes were identified as focus samples (dots under the shadows).
FIG 4Prediction of sleeping beauty (potential focus) microbiomes. (A) Numbers of focus microbiomes (beauties) that were awaken at the nth year after their birth; (B) principal-component analysis of 4-year MASs between beauty samples and still-asleep samples with a random-forest accuracy of 98.78%; (C) habitats of awakened beauties during 2010 to 2017 and of predicted sleeping beauties born since 2015.
Habitats of focus samples, or beauties, during 2010 to 2017 and of predicted potential focus samples, or sleeping beauties, that were born since 2015
| Environment | No. of focus samples | No. of predicted focus samples |
|---|---|---|
| Lake | 499 | 0 |
| Animal | 498 | 137 |
| Marine | 343 | 358 |
| Soil | 325 | 8 |
| Human | 212 | 23 |
| Building | 122 | 141 |
| River | 88 | 0 |
| Freshwater | 39 | 34 |
| Plant | 13 | 1 |
| Other | 99 | 0 |