| Literature DB >> 34098872 |
J L Weissman1, Sonia Dogra1, Keyan Javadi1, Samantha Bolten1, Rachel Flint1, Cyrus Davati1, Jess Beattie1, Keshav Dixit1, Tejasvi Peesay1, Shehar Awan1, Peter Thielen2, Florian Breitwieser2, Philip L F Johnson1, David Karig3, William F Fagan1, Sharon Bewick4.
Abstract
BACKGROUND: Even when microbial communities vary wildly in their taxonomic composition, their functional composition is often surprisingly stable. This suggests that a functional perspective could provide much deeper insight into the principles governing microbiome assembly. Much work to date analyzing the functional composition of microbial communities, however, relies heavily on inference from genomic features. Unfortunately, output from these methods can be hard to interpret and often suffers from relatively high error rates.Entities:
Keywords: Functional community; Phylogenetic correction; Random forest; Trait database
Mesh:
Year: 2021 PMID: 34098872 PMCID: PMC8186035 DOI: 10.1186/s12859-021-04216-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Pairwise differences in trait values between body sites (difference in means weighted by taxon abundance). Interactions that were not significant in at least two phyla are left blank. Traits separated into categories for readability: a qualitative with categorical values (split into dummy variables for multi-level traits) and b quantitative with continous values. For carbon substrate use traits see Fig. 2
Fig. 2Pairwise differences in carbon substrate use frequency between body sites (difference in means weighted by taxon abundance). Interactions that were not significant in at least two phyla are left blank. Shown here are binary traits indicating the ability to grow on specific carbon sources. For other traits see Fig. 1
Cohen’s for predicting sample source site
| Test | Mean | ||||
|---|---|---|---|---|---|
| Actinobacteria | Bacteroidetes | Firmicutes | Proteobacteria | ||
| Stool | 0.350 | 0.468 | 0.948 | − 0.021 | 0.436 |
| Posterior Fornix | 0.340 | 0.351 | 0.560 | 0.413 | 0.416 |
| Anterior Nares | 0.430 | 0.579 | 0.268 | 0.240 | 0.379 |
| Retroauricular Crease | 0.165 | 0 | 0.033 | 0.038 | 0.059 |
| Tongue Dorsum | 0 | 0 | 0.019 | 0 | 0.005 |
| Supragingival Plaque | 0 | 0 | 0.157 | 0 | 0.039 |
| Buccal Mucosa | 0 | 0 | 0 | 0 | 0 |
| Mouth (All) | 0 | 0 | 0.677 | 0.004 | 0.170 |
Briefly, the trait values associated with a set of three phyla in a sample were used to train a model to predict whether a sample was from a given site on the basis of a fourth “test” phylum. Values above zero indicate predictive ability in excess of a null model accounting for the number of samples from each site
Fig. 3Top predictors for stool with rank shown in upper left corner. Top predictors across phyla of sample source site, for which importance scores are above the average variable importance across all predictors for all four training sets (7). Shown are mean trait values across all samples in the dataset, split up by body site. See 11, 12, and 13 for top predictors of posterior fornix, anterior nares, and mouth respectively
Inferred trait clusters with positive associations between body sites
| Cluster | Traits | Positive associations |
|---|---|---|
| 2 | Use of: alaninamide, histidine, leucine, pyruvic acid methyl ester | |
| 3 | OptimalpH, Facultative, Cocci, Use of:fructose, galactose, glucose, | |
| lactose, mannose, methyl beta D glucoside, N acetylglucosamine, sucrose | Anterior nares, Supragingival plaque | |
| 4 | LengthMajorAxis, Enzyme Assays: esculin aesculin hydrolysis, | Posterior fornix |
| Use of: cellobiose, glycogen, maltose, raffinose, salicin, starch, yeast extract | ||
| 5 | Max. Temp., Optimal NaCl, Min. NaCl, Max. NaCl, Genome Length, Anaerobe, Single, Clump, Rod, | |
| Enzyme Assays: urease, acid phosphatase, alkaline phosphatase, alpha galactosidase, | ||
| beta galactosidase, acetoin, phosphatase, DNA degradation, Gas Production: indole, hydrogen sulfide, | Retroauricular crease, | |
| Use of: arabinose, propionate, rhamnose, succinate, Tween 80, xylose | Anterior nares, Tongue | |
| 7 | Use of: phenylacetate, putrescine, quinic acid | |
| 10 | Use of: arginine, glycine, phenylalanine, serine, threonine | Tongue |
| 11 | Min. pH, Max. pH | Posterior fornix |
| 12 | Enzyme Assays: tellurite reductase, Use of: citrate | |
| 14 | Optimal Temp., Enzyme Assays: gelatinase, trypsin, | |
| Use of: acetate, galacturonate, glycerol, lactate, | ||
| mannitol, melibiose, ornithine, ribose, sorbitol, trehalose | Supragingival plaque, Tongue | |
| 15 | Use of: butanol, caprate | |
| 16 | GC Content, Min. Temp., Motile, Aerobe, Chain, Gas Production: ammonia, isovaleric acid, | Retroauricular crease, |
| Enzyme Assays: catalase, oxidase, arylsulfatase, phosphohydrolase, | Supragingival plaque, | |
| Use of: aspartate, dextrin, formate, glutamate, malate, proline, pyruvate, suberate, urea, sugars | Tongue | |
| 17 | Use of: adonitol, alanine | |
| 18 | Use of: valerate, 2 aminethanol, 2 ketogluconate, 2 3 butanediol, | Tongue |
| 3 hydroxybenzoate, 3 hydroxybutyrate, 4 hydroxybenzoate, 5 ketogluconate |
Bold and starred () site names signify that a given cluster-site interaction is the strongest positive interaction observed for that site
Fig. 4Bipartite site-cluster network, where clusters are groups of traits that frequently co-occur. Clusters are shown at the top as blue, numbered nodes. Each cluster corresponds to a group of co-occuring traits as listed in Table 2. Body sites are shown at the bottom as labeled, yellow nodes. Positive interactions (cluster common in body site) are represented by solid green lines and negative interactions (cluster uncommon in body site) are represented by dotted red lines. The strength of an interaction is represented by the with of an edge. See 16 for the same figure with positive and negative interactions separated out for ease of viewing
Fig. 5Performance of random forest models predicting generalism (binary classification, present at more than one area or not). “All” means a blocked cross-validation with each phylum as a fold (Actinobacteria: red squares, Bacteroidetes: green triangles, Firmicutes: blue diamonds, Proteobacteria: purple circles). Within each phylum we performed blocked cross-validation using classes as folds, except in the case of Bacteroidetes where all species in the dataset were in the same class and order, so that families were used as the folds. Shown are two measures of performance ( and area under the precision-recall curve), as well as the prevalence of specialist species in a fold ( for “probability is a specialist”)
Fig. 6Diversity of carbon sources in a body site is largely mediated by taxonomic diversity. a Diversity of carbon sources used within samples. b Taxonomic diversity within a sample. c Values from (a, b) for each sample plotted against each other. Entropy (Shannon’s entropy) is a common diversity metric that integrates both the eveness and richness of items considered (carbon sources and species in panels (a, b) respectively)
Performance of random forest models of the number of carbon substrates a species can use
| Test | RMSE | ||
|---|---|---|---|
| Actinobacteria | 6.20 | 0.327 | 0.089 |
| Bacteroidetes | 6.07 | 0.354 | 0.108 |
| Firmicutes | 5.08 | 0.467 | 0.213 |
| GeneralismSingletons Proteobacteria | 8.67 | 0.582 | 0.332 |
| Mean | 6.51 | 0.433 | 0.186 |