| Literature DB >> 29789016 |
Alejandro R Walker1, Tyler L Grimes1, Somnath Datta1, Susmita Datta2.
Abstract
BACKGROUND: Microbial communities can be location specific, and the abundance of species within locations can influence our ability to determine whether a sample belongs to one city or another. As part of the 2017 CAMDA MetaSUB Inter-City Challenge, next generation sequencing (NGS) data was generated from swipe samples collected from subway stations in Boston, New York City hereafter New York, and Sacramento. DNA was extracted and Illumina sequenced. Sequencing data was provided for all cities as part of 2017 CAMDA contest challenge dataset.Entities:
Keywords: Bacterial 16S gene; Classifier; Machine learning; Microbiome; Network analysis; PCA
Mesh:
Substances:
Year: 2018 PMID: 29789016 PMCID: PMC5964687 DOI: 10.1186/s13062-018-0215-8
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Sample count for city and effective samples analyzed and resulting number of common entries for each of the selected taxonomic ranks included in this work
| City | Effective samples | Total samples | Order | Family | Genus |
|---|---|---|---|---|---|
| New York | 777 | 1572 | 19 | 23 | 10 |
| Boston | 134 | 141 | |||
| Sacramento | 18 | 18 |
Fig. 1Area-proportional Venn Diagrams of discovered entries across all three taxonomic ranks. a), b), and c) represent the counts for taxonomic ranks “order”, “family”, and “genus”, respectively. Three cities intersection represents the count of common variables used for most of the analyses in this work. Total count for each city represents the effective number of species (S)
Total amount of variance explained by principal components 1-3 for all three taxonomic tanks (“order”, “family”, and “genus”)
| Rank | PC1 | PC2 | PC3 | Total |
|---|---|---|---|---|
| order | 69.8% | 7.3% | 6.7% | 83.8% |
| family | 78.7% | 5.2% | 4.7% | 88.6% |
| genus | 86.8% | 4.7% | 3.1% | 94.6% |
Fig. 2PCA bi-plots of principal components 1, and 2 are presented in a1, b1, and c1 for taxonomic ranks “order”, “family”, and “genus”, respectively. Three-dimensional plots of first three components are presented in a2, b2, and c2 for taxonomic ranks “order”, “family”, and “genus”, respectively. Colors are: orange for Boston, green for New York, and blue for Sacramento
Random forest classification error of city across all taxonomic ranks “order”, “family”, and “genus”
| City | Order | Family | Genus |
|---|---|---|---|
| Number of predictors | |||
| 19 | 23 | 10 | |
| Classification error | |||
| Boston | 11% | 18% | 34% |
| New York | 1% | 1% | 2% |
| Sacramento | 6% | 0% | 11% |
Fig. 3Variable importance for the Random forest classifier, as determined by the mean decrease in accuracy. a), b), and c) are importance plots for taxonomic ranks “order”, “family”, and “genus” respectively
Fig. 4Ensemble results, in terms of Accuracy, Sensitivity, Specificity, and AUC for each taxonomic rank. a), b), and c) correspond to taxonomic rank “order”, “family”, and “genus” respectively. Each individual plot shows pairwise classification results for comparisons of Boston – New York, Boston – Sacramento, and New York – Sacramento
ANOVA results for taxonomic rank “order”. Tukey’s multiple comparison test results after 5000 replications significant p-values (α = 0.01) were averaged and counted for Tukey’s groups (Boston-New York-Sacramento). In general terms, when comparing two cities if letters (‘a’, ‘b’ and ‘c’) are all the same, we conclude that the means are not significantly different. If the letters are different we conclude city means are significantly different in terms of bacterial abundances. As for example, “order” enterobacteriales, shows minimum, average and maximum p-value out of 5000 replications, and 4967 times out of 5000 replications the three city means were found to be significantly different ‘a’-‘b’-‘c‘; 30 times Boston and New York mean bacterial abundances remain the same but Sacramento is different (‘a’-‘a’-‘b’) and only in 3 cases Boston, and Sacramento are the same but New-York (‘a’-‘b’-‘a’) is different deemed by Tukey’s multiple comparison test. Taxonomic rank names (“order”) are presented in the same order for all groups (‘a’-‘b’-‘c’, ‘a’-‘a’-‘b’, ‘a’-‘b’-‘b’, ‘a’-‘b’-‘a’)
| Order | Count | Tukey’s group | |
|---|---|---|---|
| Enterobacteriales | [8.45E-17, 3.08E-8, 1.46E-5] | 4967 | a-b-c |
| Clostridiales | [1.75E-30, 2.42E-19, 3.13E-16] | 4802 | |
| Rhizobiales | [5.18E-16, 3.37E-6, 3.25E-4] | 4693 | |
| Pseudomonadales | [5.37E-15, 6.63E-6, 1.27E-3] | 4432 | |
| Xanthomonadales | [1.14E-16, 2.01E-11, 3.38E-9] | 3134 | |
| Flavobacteriales | [1.24E-20, 1.46E-14, 7.18E-12] | 1939 | |
| Vibrionales | [4.51E-19, 2.3E-13, 1.89E-11] | 1939 | |
| Sphingobacteriales | [8.56E-16, 1.41E-11, 1.7E-9] | 1889 | |
| Bacillales | [5.03E-19, 9.78E-14, 1.99E-11] | 1562 | |
| Alteromonadales | [8.15E-23, 1.22E-16, 4.13E-14] | 1478 | |
| Bacteroidales | [1.09E-26, 5.25E-18, 4.08E-15] | 1474 | |
| Rhodobacterales | [9.85E-8, 2.91E-4, 4.53E-3] | 1332 | |
| Pasteurellales | [2.74E-18, 1.04E-12, 2.15E-10] | 1187 | |
| Rickettsiales | [4.03E-9, 7.4E-4, 5.75E-3] | 683 | |
| Lactobacillales | [4.54E-27, 2.6E-17, 5.45E-15] | 667 | |
| Actinomycetales | [1.9E-7, 8.94E-4, 4.56E-3] | 157 | |
| Burkholderiales | [6.92E-8, 1.26E-3, 5.4E-3] | 95 | |
| Sphingomonadales | [1.76E-8, 9.08E-4, 3.62E-3] | 43 | |
| Rhodospirillales | [2.34E-6, 6.66E-4, 2.78E-3] | 24 | |
| Enterobacteriales | [6.64E-11, 6.3E-7, 8.58E-6] | 30 | a-a-b |
| Clostridiales | [2.6E-24, 6.04E-19, 3.07E-17] | 198 | |
| Rhizobiales | [3.29E-8, 8.12E-5, 1.8E-3] | 288 | |
| Pseudomonadales | [2.58E-8, 9.45E-5, 2.22E-3] | 187 | |
| Xanthomonadales | [6.43E-17, 4.36E-11, 4.84E-9] | 1866 | |
| Flavobacteriales | [4.06E-21, 4.38E-14, 7.69E-12] | 3061 | |
| Vibrionales | [4.48E-20, 5.02E-13, 7.33E-11] | 3061 | |
| Sphingobacteriales | [1.01E-16, 4.25E-11, 3.17E-8] | 3111 | |
| Bacillales | [6.69E-20, 1.19E-13, 9.96E-11] | 3438 | |
| Alteromonadales | [1.87E-24, 3.04E-16, 7.47E-14] | 3522 | |
| Bacteroidales | [7.59E-28, 1.91E-17, 2.23E-14] | 3526 | |
| Rhodobacterales | [6.23E-8, 9.45E-4, 9.91E-3] | 3249 | |
| Pasteurellales | [3.69E-19, 2.08E-12, 3.97E-10] | 3813 | |
| Rickettsiales | [3.36E-7, 4.13E-3, 9.96E-3] | 1234 | |
| Lactobacillales | [1.14E-25, 1.13E-16, 6.48E-14] | 4333 | |
| Actinomycetales | [1.01E-5, 4.01E-3, 9.95E-3] | 1028 | |
| Burkholderiales | [2.04E-5, 4.94E-3, 9.74E-3] | 110 | |
| Sphingomonadales | |||
| Rhodospirillales | [4.49E-4, 5.1E-3, 9.91E-3] | 57 | |
| Enterobacteriales | a-b-b | ||
| Clostridiales | |||
| Rhizobiales | [4.35E-9, 4.86E-5, 4.36E-4] | 19 | |
| Pseudomonadales | |||
| Xanthomonadales | |||
| Flavobacteriales | |||
| Vibrionales | |||
| Sphingobacteriales | |||
| Bacillales | |||
| Alteromonadales | |||
| Bacteroidales | |||
| Rhodobacterales | |||
| Pasteurellales | |||
| Rickettsiales | [3.84E-7, 2.83E-3, 9.99E-3] | 568 | |
| Lactobacillales | |||
| Actinomycetales | [1.18E-5, 2.72E-3, 9.32E-3] | 61 | |
| Burkholderiales | [6.45E-9, 2.92E-3, 10E-3] | 1584 | |
| Sphingomonadales | [1.63E-5, 3.33E-3, 9.99E-3] | 209 | |
| Rhodospirillales | [5.42E-3, 7.22E-3, 10E-3] | 5 | |
| Enterobacteriales | [6.46E-11, 6.52E-9, 1.94E-8] | 3 | a-b-a |
| Clostridiales | |||
| Rhizobiales | |||
| Pseudomonadales | [1.38E-12, 5.01E-6, 4.48E-4] | 381 | |
| Xanthomonadales | |||
| Flavobacteriales | |||
| Vibrionales | |||
| Sphingobacteriales | |||
| Bacillales | |||
| Alteromonadales | |||
| Bacteroidales | |||
| Rhodobacterales | [9.6E-8, 1.02E-3, 9.81E-3] | 320 | |
| Pasteurellales | |||
| Rickettsiales | [3.52E-4, 4.95E-3, 8.88E-3] | 27 | |
| Lactobacillales | |||
| Actinomycetales | [1.01E-5, 3.83E-3, 9.88E-3] | 261 | |
| Burkholderiales | [6.81E-3, 6.81E-3, 6.81E-3] | 1 | |
| Sphingomonadales | [7.24E-5, 4.92E-3, 9.75E-3] | 62 | |
| Rhodospirillales | [1.43E-6, 4.15E-3, 9.97E-3] | 559 |
Bootstrap results (replications = 2000) for mean species diversity across all taxonomic ranks. Table shows p-values for two values of weight modifier (0.5, and 2)
| q | Order | Family | Genus |
|---|---|---|---|
| 0.5 | 0.004 | 0.006 | 0.084 |
| 2 | 0.930 | 1.000 | 0.999 |
Fig. 5Abundance association networks for the three cities based on bacterial fingerprints using common OTUs. Left column corresponds to networks from Sacramento, CA; middle column are networks from New York, NY; and right column from Boston, MA. Top row has networks for the taxonomic rank “order”, middle row is for the taxonomic rank “family”, and bottom row is for “genus”
Fig. 6Workflow of the ensemble classifier (reproduced from Datta et al. [11])