| Literature DB >> 26074881 |
Margarita Gomila1, Arantxa Peña1, Magdalena Mulet1, Jorge Lalucat2, Elena García-Valdés2.
Abstract
The genus Pseudomonas currently contains 144 species, making it the genus of Gram-negative bacteria that contains the largest number of species. Currently, multilocus sequence analysis (MLSA) is the preferred method for establishing the phylogeny between species and genera. Four partial gene sequences of housekeeping genes (16S rRNA, gyrB, rpoB, and rpoD) were obtained from 112 complete or draft genomes of strains related to the genus Pseudomonas that were available in databases. These genes were analyzed together with the corresponding sequences of 133 Pseudomonas type strains of validly published species to assess their correct phylogenetic assignations. We confirmed that 30% of the sequenced genomes of non-type strains were not correctly assigned at the species level in the accepted taxonomy of the genus and that 20% of the strains were not identified at the species level. Most of these strains had been isolated and classified several years ago, and their taxonomic status has not been updated by modern techniques. MLSA was also compared with indices based on the analysis of whole-genome sequences that have been proposed for species delineation, such as tetranucleotide usage patterns (TETRA), average nucleotide identity (ANIm, based on MUMmer and ANIb, based on BLAST) and genome-to-genome distance (GGDC). TETRA was useful for discriminating Pseudomonas from other genera, whereas ANIb and GGDC clearly separated strains of different species. ANIb showed the strongest correlation with MLSA. The correct species classification is a prerequisite for most diversity and evolutionary studies. This work highlights the necessity for complete genomic sequences of type strains to build a phylogenomic taxonomy and that all new genome sequences submitted to databases should be correctly assigned to species to avoid taxonomic inconsistencies.Entities:
Keywords: Pseudomonas; genomics; multilocus sequence analysis; systematics; taxonomy
Year: 2015 PMID: 26074881 PMCID: PMC4447124 DOI: 10.3389/fmicb.2015.00214
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Phylogenetic tree of the 112 complete or draft genomes of strains related to the genus . The strains analyzed in this study whose genomes have been sequenced are labeled in red. Distance matrices were calculated by the Jukes-Cantor method. Dendrograms were generated by neighbor-joining. Cellvibrio japonicum Ueda107 was used as outgroup. The bar indicates sequence divergence. Percentage bootstrap values only of groups and subgroups higher than 50% of 1000 replicates are indicated at branching nodes.
Phylogenetic affiliation based on concatenated MLSA analysis for the 63 whole genome sequenced strains not assigned, or incorrectly assigned at the species level, including strains of .
| 96.17 | 96.17 | 95.03 | |||||
| 96.11 | 98.55 | 95.11 | |||||
| 96.00 | 98.55 | 95.03 | |||||
| 99.54 | 99.54 | 95.03 | |||||
| 98.47 | 98.47 | 96.26 | |||||
| 96.72 | 96.72 | 96.07 | |||||
| 96.24 | 99.65 | 96.12 | |||||
| 96.29 | 99.65 | 96.12 | |||||
| 96.41 | 98.36 | 95.98 | |||||
| 95.17 | 95.17 | 95.10 | |||||
| 97.89 | 98.72 | 97.89 | |||||
| 97.94 | 98.36 | 97.94 | |||||
| 97.66 | 97.70 | 97.66 | |||||
| 97.94 | 98.00 | 97.94 | |||||
| 97.97 | 98.72 | 97.97 | |||||
| 98.58 | 98.58 | 96.91 | |||||
| 96.46 | 96.55 | 96.46 | |||||
| 97.78 | 98.33 | 97.78 | |||||
| 97.53 | 98.33 | 97.53 | |||||
| 96.80 | 97.11 | 96.80 | |||||
| 96.66 | 97.11 | 95.97 | |||||
| 98.58 | 99.51 | 96.83 | |||||
| 97.67 | 97.84 | 96.49 | |||||
| 97.14 | 99.35 | 96.06 | |||||
| 97.36 | 97.36 | 96.94 | |||||
| 96.73 | 96.96 | 96.38 | |||||
| 98.47 | 99.54 | 96.71 | |||||
| 98.53 | 99.54 | 96.80 | |||||
| 97.42 | 99.35 | 96.38 | |||||
| 99.02 | 99.05 | 99.02 | |||||
| 98.83 | 99.05 | 98.83 | |||||
| 99.65 | 99.73 | 98.16 | |||||
| 99.86 | 99.86 | 98.00 | |||||
| 99.81 | 99.92 | 95.23 | |||||
| 96.88 | 97.05 | 95.03 | |||||
| 91.01 | 91.04 | – | – | n.a. SG | |||
| 99.02 | 99.02 | 96.18 | |||||
| 98.78 | 98.78 | 94.66 | |||||
| 96.80 | 99.35 | 95.44 | |||||
| 94.78 | 95.33 | 93.64 | |||||
| 96.80 | 97.09 | 95.93 | |||||
| 96.04 | 98.47 | 95.10 | |||||
| 94.97 | 94.97 | 90.32 | |||||
| 96.44 | 98.47 | 95.41 | |||||
| 96.61 | 99.35 | 95.13 | |||||
| 94.81 | 94.92 | 94.11 | |||||
| 96.47 | 96.47 | 93.58 | |||||
| 96.80 | 99.65 | 95.41 | |||||
| 96.80 | 99.87 | 95.50 | |||||
| 96.72 | 99.87 | 95.41 | |||||
| 96.89 | 99.68 | 95.38 | |||||
| 96.47 | 96.47 | 96.47 | |||||
| 91.63 | 91.63 | 88.67 | |||||
| 99.81 | 99.81 | 99.81 | |||||
| 99.54 | 99.54 | 99.40 | |||||
| 92.57 | 93.28 | 92.57 | |||||
| 94.81 | 94.81 | 88.72 | |||||
| 90.45 | 91.27 | 90.45 | |||||
| 90.55 | 92.27 | 90.55 | |||||
| 92.54 | 92.87 | 92.54 | |||||
| 92.06 | 95.18 | 92.06 | |||||
| 91.34 | 93.28 | 91.34 | |||||
| 91.94 | 95.18 | 91.94 |
n.a., not assigned.
Figure 2Graphs representing the relationship between TETRA (A), ANIb (B), ANIm (C), and GGDC (D) indices vs. MLSA sequence similarity for the genomes studied; (E) shows the relationship between ANIb and GGDC indices. Each dot represents a pairwise comparison; the genomic indices are plotted against the corresponding MLSA sequence similarity. TETRA signatures values in black circles indicate TETRA < 0.99 and MLSA < 97% and TETRA > 0.99 and MLSA > 97%; green triangles indicate TETRA > 0.99 and MLSA < 97%; and red triangles TETRA < 0.99 and MLSA > 97%. ANIb and ANIm black circles indicate genomic values <90% and MLSA < 97% and genomic values >95% and MLSA > 97%; green triangles genomic values between 90 and 95% and MLSA < 97%; red triangles genomic values between 90 and 95% and MLSA > 97%; and blue circles genomic values between 85 and 90% and MLSA > 97%. The values of GGDC < 70% and MLSA < 97% and GGDC > 70% and MLSA > 97% are indicated in black circles in GGDC plots; in green triangles are indicated values of GGDC < 70% and MLSA > 97%.
Figure 3Association table between ANIb values and MLSA sequence similarities. The number of strain pairs is displayed in each category square. Square A indicates ANIb values ≥95% and MLSA ≥97%; square B indicates ANIb values <90% and MLSA values <97%.
Figure 4Graphs representing relationships between ANIb values and MLSA sequence similarities of pairwise comparisons of strains assigned to the . Each dot represents pairwise values between the ANIb indices plotted against the corresponding MLSA sequence similarity. In the ANIb range of 90–95%, green triangles indicate pairwise comparisons with MLSA values lower than 97% and red triangles values between 97 and 98%; other values are indicated with black circles.