| Literature DB >> 35760985 |
Wasin Poncheewin1, Anne D van Diepeningen2, Theo A J van der Lee2, Maria Suarez-Diez1, Peter J Schaap3,4.
Abstract
The rhizosphere, the region of soil surrounding roots of plants, is colonized by a unique population of Plant Growth Promoting Rhizobacteria (PGPR). Many important PGPR as well as plant pathogens belong to the genus Pseudomonas. There is, however, uncertainty on the divide between beneficial and pathogenic strains as previously thought to be signifying genomic features have limited power to separate these strains. Here we used the Genome properties (GP) common biological pathways annotation system and Machine Learning (ML) to establish the relationship between the genome wide GP composition and the plant-associated lifestyle of 91 Pseudomonas strains isolated from the rhizosphere and the phyllosphere representing both plant-associated phenotypes. GP enrichment analysis, Random Forest model fitting and feature selection revealed 28 discriminating features. A test set of 75 new strains confirmed the importance of the selected features for classification. The results suggest that GP annotations provide a promising computational tool to better classify the plant-associated lifestyle.Entities:
Mesh:
Year: 2022 PMID: 35760985 PMCID: PMC9237127 DOI: 10.1038/s41598-022-14913-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Workflow for GPs based functional genomics and classification. Genome sequences are analyzed using sequence similarity and protein domain content. (Colocalized) protein domain content is used to infer Genome Properties. Enrichment analysis and Random Forest feature selection was used obtain genomic features. Classification performance was evaluated using a test set of 75 newly available genomes.
Figure 2Pairwise Average Nucleotide Identity (ANI) scores between coding regions. Scores were calculated from alignments that have 70% or more identity and at least 70% coverage of the shorter gene.
Average number of strain specific Genome Property classes per approach.
| Approach | Complete | Partial | Not detected | Not presenta |
|---|---|---|---|---|
| GP-PA | 440 ± 22 | 256 ± 14 | 590 ± 14 | 438 |
| GP-SND | 161 ± 11 | 362 ± 6 | 763 ± 12 | 596 |
| GP-SD | 158 ± 10 | 365 ± 7 | 763 ± 13 | 602 |
aNumber of genome properties not presented in any of strains.
Figure 3Principal component analysis based on GP-SND content as variables. The fraction of the variance is given in parentheses. P. cichorii JBC1 and two strains of P. cerasi are outside 95% confidence ellipse of the EPP group.
Genome properties related to the plant-associated lifestyle: enrichment analysis.
| Genome property | Description | Adjusted |
|---|---|---|
| GenProp0238a | 2-Aminoethylphosphonate catabolism to acetaldehyde | < 10–6 |
| GenProp0721a | 2-Aminoethylphosphonate (AEP) ABC transporter, type II | < 10–6 |
| GenProp0613a | Cytochrome c reductase | < 10–6 |
| GenProp0907 | Poly-beta-1,6 N-acetyl-D-glucosamine system, PgaABCD type | < 10–6 |
| GenProp0271 | Trehalose utilization | < 10–6 |
| GenProp1745 | GA12 biosynthesis | < 10–6 |
| GenProp1189 | MqsRA toxin-antitoxin complex | < 10–6 |
| GenProp1645 | Zeaxanthin biosynthesis | < 10–6 |
| GenProp0659 | Tryptophan degradation to anthranilate | 7.96 × 10–5 |
| GenProp0895 | Alcohol ABC transporter, PedABC-type | 7.01 × 10–4 |
| GenProp0902 | Quinohemoprotein amine dehydrogenase | 1.40 × 10–3 |
| GenProp1516 | Phosphatidylcholine biosynthesis V | 5.37 × 10–3 |
| GenProp0908a | 2,3-Diaminopropionic acid biosynthesis | < 10–6 |
| GenProp0813a | Pyrimidine utilization | < 10–6 |
| GenProp1165a | PhnGHIJKL complex | < 10–6 |
| GenProp1381 | Methylphosphonate degradation I | < 10–6 |
| GenProp0236 | Phosphonates ABC transport | 2.62 × 10–3 |
| GenProp0710 | Generic phosphonates utilization | 2.62 × 10–3 |
| GenProp1193 | RelBE toxin-antitoxin complex | 3.19 × 10–2 |
| GenProp1566 | 3.64 × 10–2 | |
aThese Genome Properties are also important random forest features (Table 3).
Random Forest features importance of Genome properties related to the plant-associated lifestyle.
| Genome property | Description | Predictive powerb |
|---|---|---|
| GenProp0813a | Pyrimidine utilization | 500 |
| GenProp0908a | 2,3-Diaminopropionic acid biosynthesis | 500 |
| GenProp0721a | 2-Aminoethylphosphonate (AEP) ABC transporter, type II | 329 |
| GenProp0238a | 2-Aminoethylphosphonate catabolism to acetaldehyde | 328 |
| GenProp0615 | Cytochrome c based oxygen reduction and quinone re-oxidation | 251 |
| GenProp0613a | Cytochrome c reductase | 243 |
| GenProp1629 | Propanoyl-CoA degradation I | 215 |
| GenProp1572 | 145 | |
| GenProp1562 | Fatty acid salvage | 53 |
| GenProp1717 | Fatty acid beta-oxidationI(GenProp1308, GenProp1510 and GenProp1544) | 53 |
| GenProp1165a | PhnGHIJKL complex | 2 |
| GenProp1251 | 2 | |
| GenProp1281 | Hydrogen sulfide biosynthesis I | 1 |
| GenProp1681 | 1 |
aGP also found in the enrichment analysis.
bNumbers were obtained using recursive feature elimination (500 iterations).
Figure 4Representative list of discriminating Genome Properties obtained with the GP-SND approach. Left panel: enrichment analysis, right panel: Random Forest feature selection. Red lines indicate the PGPR strains (vertical) and enriched traits (horizontal). Blue lines indicate the EPP strains (vertical) and enriched traits (horizontal). Newly sequenced strains are highlighted in yellow. Enriched GPs that were also highlighted in the RF feature importance analysis are indicated in green.
Figure 5Analysis of the validation set. (a) Principal component analysis of the test set 1 composed of PGPR strains (red squares), saprotroph strains (green squares), and EPP (orange square). (b) Principal component analysis of the test set 2 composed of bioremediation strains (orange squares) and unclassified strains (purple squares). Variance is indicated in brackets. Previously analyzed Pseudomonas strains and previous obtained 95% confidence ellipses are in gray. (c) Average Nucleotide Identity (ANI) score among P. syringae strains. P. syringae isolate inb918 is at the top left.