| Literature DB >> 31420049 |
Feargal Joseph Ryan1,2,3.
Abstract
BACKGROUND: Research has found that human associated microbial communities play a role in homeostasis and the disruption of these communities may be important in an array of medical conditions. However outside of the human body many of these communities remain poorly studied. The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium is characterizing the microbiomes of urban environments with the aim to improve design of mass transit systems. As part of the CAMDA 2018 MetaSUB Forensics Challenge 311 city microbiome samples were provided to create urban microbial fingerprints, as well as a further 3 mystery datasets for validation.Entities:
Keywords: Bioinformatics; Machine learning; Microbiome; Microbiota; Public health; Urban
Mesh:
Year: 2019 PMID: 31420049 PMCID: PMC6697990 DOI: 10.1186/s13062-019-0245-x
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Description of MetaSUB challenge dataset
| City | Country | Number of samples |
|---|---|---|
| Auckland | New Zealand | 15 |
| Hamilton | New Zealand | 16 |
| New York | U.S.A. | 126 |
| Ofa | Nigeria | 20 |
| Porto | Portugal | 60 |
| Sacramento | U.S.A. | 34 |
| Santiago | Chile | 20 |
| Tokyo | Japan | 20 |
Fig. 1Barplots of relative abundance for domains of life per city in the MetaSUB challenge dataset
Fig. 2Boxplots of relative abundance of most abundant taxa in the primary CAMDA dataset of 311 samples. Relative abundance of a Acinetobacter, b Pseudomonas, c Stenotrophomonas and d Actinobacteria. Kruskal Wallis P values are represented on each plot
Fig. 3t-SNE output to represent microbial profiles on two dimensions. Spearman dissimilarities were calculated from a set of 2239 taxonomic features which represent those present in at least 5% of samples with a minimum relative abundance of 0.1% in a single sample. Confidence regions are 70% confidence regions showing surface type
Confusion matrix showing number of correct and incorrect classifications per city from random forest analysis
| Auckland | Hamilton | NY | Ofa | Porto | Sacramento | Santiago | Tokyo | class.error | |
|---|---|---|---|---|---|---|---|---|---|
| Auckland | 9 | 5 | 0 | 0 | 1 | 0 | 0 | 0 | 0.4 |
| Hamilton | 4 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0.25 |
| NY | 0 | 0 | 124 | 1 | 0 | 1 | 0 | 0 | 0.01587302 |
| Ofa | 0 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 |
| Porto | 0 | 0 | 0 | 0 | 60 | 0 | 0 | 0 | 0 |
| Sacramento | 0 | 0 | 0 | 0 | 1 | 33 | 0 | 0 | 0.02941176 |
| Santiago | 1 | 0 | 1 | 0 | 0 | 0 | 18 | 0 | 0.1 |
| Tokyo | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 0 |
Fig. 4t-SNE output to represent microbial profiles on two dimensions. Spearman dissimilarities were calculated from a set of 2463 taxonomic features which represent those present in at least 5% of samples with a minimum relative abundance of 0.1% in a single sample. This includes “mystery” samples which were initially unlabeled in the MetaSUB challenge. Confidence regions are 70% confidence regions showing surface type. Samples labelled as NY are those which were marked as New York but information was not provided on which of the sample sets (csd2016 or pilot)