| Literature DB >> 33841347 |
Camila Gazolla Volpiano1, Fernando Hayashi Sant'Anna1, Adriana Ambrosini1, Jackson Freitas Brilhante de São José2, Anelise Beneduzi2, William B Whitman3, Emanuel Maltempi de Souza4, Bruno Brito Lisboa2, Luciano Kayser Vargas2, Luciane Maria Pereira Passaglia1.
Abstract
Taxonomic decisions within the order Rhizobiales have relied heavily on the interpretations of highly conserved 16S rRNA sequences and DNA-DNA hybridizations (DDH). Currently, bacterial species are defined as including strains that present 95-96% of average nucleotide identity (ANI) and 70% of digital DDH (dDDH). Thus, ANI values from 520 genome sequences of type strains from species of Rhizobiales order were computed. From the resulting 270,400 comparisons, a ≥95% cut-off was used to extract high identity genome clusters through enumerating maximal cliques. Coupling this graph-based approach with dDDH from clusters of interest, it was found that: (i) there are synonymy between Aminobacter lissarensis and Aminobacter carboxidus, Aurantimonas manganoxydans and Aurantimonas coralicida, "Bartonella mastomydis," and Bartonella elizabethae, Chelativorans oligotrophicus, and Chelativorans multitrophicus, Rhizobium azibense, and Rhizobium gallicum, Rhizobium fabae, and Rhizobium pisi, and Rhodoplanes piscinae and Rhodoplanes serenus; (ii) Chelatobacter heintzii is not a synonym of Aminobacter aminovorans; (iii) "Bartonella vinsonii" subsp. arupensis and "B. vinsonii" subsp. berkhoffii represent members of different species; (iv) the genome accessions GCF_003024615.1 ("Mesorhizobium loti LMG 6,125T"), GCF_003024595.1 ("Mesorhizobium plurifarium LMG 11,892T"), GCF_003096615.1 ("Methylobacterium organophilum DSM 760T"), and GCF_000373025.1 ("R. gallicum R-602 spT") are not from the genuine type strains used for the respective species descriptions; and v) "Xanthobacter autotrophicus" Py2 and "Aminobacter aminovorans" KCTC 2,477T represent cases of misuse of the term "type strain". Aminobacter heintzii comb. nov. and the reclassification of Aminobacter ciceronei as A. heintzii is also proposed. To facilitate the downstream analysis of large ANI matrices, we introduce here ProKlust ("Prokaryotic Clusters"), an R package that uses a graph-based approach to obtain, filter, and visualize clusters on identity/similarity matrices, with settable cut-off points and the possibility of multiple matrices entries.Entities:
Keywords: ANI; Rhizobium; dDDH; genome clustering; species-cluster
Year: 2021 PMID: 33841347 PMCID: PMC8026895 DOI: 10.3389/fmicb.2021.614957
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1A diagram showing the workflow of the study.
Genomes detected with high contamination and/or low completeness.
| Assembly accession | Refseq category | Organism name | Length (MB) | Contigs | Present markers | Completeness | Redundancy |
|
| NA | 7.18 | 612 | 70 | 0.7077 | 1.6195 | |
|
| Representative | 7.38 | 1891 | 103 | 0.9813 | 1.1221 | |
|
| Representative | 10.11 | 133 | 104 | 0.9814 | 1.1206 | |
|
| Representative | 3.00 | 470 | 95 | 0.8782 | 1.0293 | |
|
| Representative | 3.10 | 38 | 98 | 0.8961 | 1.0245 | |
|
| NA | 6.40 | 1670 | 96 | 0.893 | 1.0415 | |
|
| Representative | 5.02 | 59 | 95 | 0.9684 | 1.1036 | |
|
| Representative | 6.21 | 598 | 104 | 0.9814 | 1.1927 | |
|
| Representative | 4.42 | 35 | 105 | 1 | 1.1446 | |
|
| NA | 4.88 | 52 | 103 | 0.9979 | 1.1806 | |
|
| NA | 5.52 | 10 | 104 | 0.9814 | 1.1275 | |
|
| NA | 2.26 | 131 | 90 | 0.8198 | 1.1681 |
16S rRNA copies extracted from genomes and incorrectly taxonomically assigned.
| Name | Access | 16S rRNA gene locus tag | SILVA SSU r138 | ||
|
| |||||
| Order | Family | Genus | |||
| GCF_003993795.1 | EJJ38_RS13400 |
|
|
| |
| GCF_004346185.1 | EDC64_RS23195 |
|
| NA | |
| “ | GCF_004801285.1 | E6C48_RS19465 |
|
|
|
| GCF_003024615.1 | C7U62_RS19230 |
|
|
| |
| GCF_003024595.1 | C7U60_RS18680 |
|
|
| |
| C7U60_RS19135; C7U60_RS21155 |
|
|
| ||
| C7U60_RS18920 | NA | NA | NA | ||
| GCF_003574465.1 | DT057_RS35310 |
|
|
| |
| DT057_RS35040 |
|
| NA | ||
| DT057_RS35290 |
| NA | NA | ||
| DT057_RS05830; DT057_RS35200 |
| NA | NA | ||
| DT057_RS35260 | NA | NA | NA | ||
| GCF_002759055.1 | CS379_RS09215 |
|
|
| |
| GCF_001927285.1 | BUQ68_RS19420 |
|
|
| |
| GCF_004362745.1 | EDD54_RS17735; EDD54_RS20245 |
|
|
| |
| GCF_011317485.1 | GRZ53_RS14600; GRZ53_RS22395 |
|
|
| |
| GCF_000705355.1 | EO99_RS0125160 |
|
|
| |
| GCF_000732195.1 | GQ59_RS30420 |
|
|
| |
| GQ59_RS30195 |
|
|
| ||
| GQ59_RS30200 |
|
| NA | ||
FIGURE 2Graph-based clustering using ProKlust compared to hierarchical clustering. (A) The average of each pair from the pairwise input. matrix/matrices is/are obtained. A Boolean matrix/matrices is/are obtained according to the cut-off values chosen by the user. If more than one matrix is used as input, the final generated matrix is obtained by multiplying the elements of the matrices. A graph is formed by connecting the nodes which present the positive values. In this example, nodes correspond to genomes and edges correspond to ANI ≥95% with coverage alignment ≥50%. The data could be filtered to retain components containing more than one species name or unconnected nodes containing the same species names. In addition, filters to remove isolated nodes (“filterRemoveIsolated”) or the largest component (“filterOnlyLargerComponent”) are also available. The tool generates four types of outputs: (i) the maximal cliques on “maxCliques,” which is the largest subset of nodes in which each node is directly connected to every other node in the subset i.e., all the possible species groups that could be delimited in the graph, which could result in groups having genomes in common; (ii) “components” that contains the isolated nodes or groups formed of complete graphs; (iii) “graph,” an igraph object graph, that can be further handled by the user; and (iv) the “plot,” where the final graph could be visualized. (B) Overview of the hierarchical-based clustering approach. These approaches return tree-shaped diagrams with non-overlapping clusters.
FIGURE 3Genomic clusters detected using pairwise ANI values from 520 Rhizobiales genomes. Here, only a subset is shown following filtering to (i) retain genomes clusters containing more than one bacterial species name and (ii) the unconnected genomes containing the same species names (synonym strains). In the graph, the clusters have nodes corresponding to genomes and edges corresponding to ANI values above the cut-off for species delineation. In the case of ANIb from pyANI, where an alignment coverage matrix was also generated, the edges additionally correspond to a reliable alignment of between the set of genomes. The same graph structure was obtained using FastANI values. Different colors were used whenever possible to indicate different species names.
Heterotypic synonyms found here that have already been detected by other authors.
| Species 1 | Species 2 | Basis | Proposition | References |
|
|
| Numerical taxonomic analysis showing that 24 strains of |
| |
|
|
| dDDH (79.8%) and OrthoANIu (97.8%) |
| |
|
|
| ANIb (97.55%), ANIm (98.25%), gANI (97.99%), OrthoANI (97.94%) and dDDH (83.9%) |
| |
|
|
| dDDH (97.8%) |
| |
|
|
| dDDH (81.2%) | ||
|
|
| dDDH (99.1%) | ||
|
|
| dDDH (80.5%) | ||
|
|
| dDDH (90.3%) | ||
|
|
| dDDH (92.2%) | ||
|
|
| dDDH (76.3%) | New subspecies of | |
|
|
| dDDH (70.0%) | New subspecies of |
FIGURE 4Genomic clusters detected using pairwise ANI values from Aminobacter genomes. In the graph, the clusters have nodes corresponding to genomes and edges corresponding to ANI values above the cut-off for species delineation with a reliable alignment of between the genomes. Isolated nodes were removed. Different colors were used whenever possible to indicate different species names.