| Literature DB >> 33289510 |
Beatriz García-Jiménez1, Jorge Muñoz2, Sara Cabello1, Joaquín Medina1, Mark D Wilkinson1,3.
Abstract
MOTIVATION: Microbial communities influence their environment by modifying the availability of compounds, such as nutrients or chemical elicitors. Knowing the microbial composition of a site is therefore relevant to improve productivity or health. However, sequencing facilities are not always available, or may be prohibitively expensive in some cases. Thus, it would be desirable to computationally predict the microbial composition from more accessible, easily-measured features.Entities:
Year: 2021 PMID: 33289510 PMCID: PMC8208755 DOI: 10.1093/bioinformatics/btaa971
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schema of AE and final model architectures. (A) AE architecture with an OTU latent space. (B) AE architecture with a combined latent space (brown), which minimizes the distance between OTU (blue) and environmental (green) latent spaces during model training. (C) Final prediction model with environmental features as input, where the latent space and the decoder could come from AE in panel A (OTU latent space) or AE in panel B (combined one)
Performance of evaluation metrics. In the test set
| Default | Linear regression | MLP | OTU latent space | Combined latent space | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Input | Pearson | Bray–Curtis | Pearson | Bray–Curtis | Pearson | Bray–Curtis | Pearson | Bray–Curtis | Pearson | Bray–Curtis |
| Age, T, rain, line, variety | 0.5852 | 0.5140 | 0.6629 | 0.4659 | 0.7219 | 0.4169 |
| 0.4222 | 0.7229 |
|
| Age, T, rain | 0.5852 | 0.5140 | 0.6641 | 0.4638 | 0.6927 | 0.4527 |
| 0.4181 | 0.7220 |
|
| T and rain | 0.5852 | 0.5140 | 0.5881 | 0.5089 | 0.6323 | 0.5047 | 0.6087 | 0.4686 |
|
|
| Age and T | 0.5852 | 0.5140 | 0.6622 | 0.4664 | 0.6814 | 0.4591 |
| 0.4204 | 0.7189 |
|
| Age and rain | 0.5852 | 0.5140 | 0.6628 | 0.4728 | 0.7155 | 0.4211 |
| 0.4200 | 0.7048 |
|
Note: In Pearson, higher scores are better, because it is a correlation metric. In Bray–Curtis, lower scores are better, as it is a dissimilarity metric. Bold means the best model per metric and row. Underline means the best model per metric in the table.
Fig. 2.Example of reconstruction and prediction of microbial composition. In the center row is the original microbial composition, allowing it to be compared to both the reconstructed (top) and that predicted from environmental features (bottom). One sample per column. Each Phylum taxonomic category is assigned a different color. Green/red boxes highlight examples of good/bad sample reconstructions or predictions, and their corresponding original microbial composition is denoted with black boxes
Performance at different taxonomic levels
| OTU latent space | Combined latent space | ||||
|---|---|---|---|---|---|
| Taxonomic | No. | Pearson | Bray–Curtis | Pearson | Bray–Curtis |
| Phylum | 16 | 0.9576 | 0.1591 | 0.9451 | 0.1833 |
| Class | 45 | 0.8777 | 0.2514 | 0.8646 | 0.2610 |
| Order | 83 | 0.8264 | 0.3007 | 0.7983 | 0.3057 |
| Family | 144 | 0.8229 | 0.3239 | 0.7965 | 0.3229 |
| Genus | 222 | 0.8133 | 0.3414 | 0.7901 | 0.3408 |
| Species | 717 | 0.7348 | 0.4181 | 0.7220 | 0.4072 |
Note: Based on reference model configuration, with the three selected input variables.
Fig. 3.Prediction of microbial composition in different predicted climate change conditions, at distinct plant ages. Outcomes are reported at the Class taxonomic level. Each colored point and dashed line indicates a sample in a different temperature/precipitation condition. ‘actual’: 59°F and 1.5 inches of rain; ‘hot and dry’: 86°F and 0 inches of rain; ‘cold and rain’: 50°F and 5 inches of rain. Note the difference in the maximum of relative abundance between A/B (0.2) and C (0.6)
Prediction performance with transfer learning from the primary model to smaller datasets
| Walters | Maarastawi | |||
|---|---|---|---|---|
| Pearson | Bray–Curtis | Pearson | Bray–Curtis | |
| Linear regression | 0.5436 | 0.5596 | 0.1588 | 0.6437 |
| MLP | 0.7114 | 0.4410 | 0.7230 | 0.3347 |
| OTU latent sapce |
|
|
|
|
| Combined latent space | 0.7266 | 0.4346 | 0.7060 | 0.3728 |
Note: Walters ’s subset: 100 samples with the same input features as the primary model. Maarastawi : 123 samples with environmental features distinct from those in the primary model. Bold means the best model per column.