| Literature DB >> 29792589 |
Stacy Ciufo1, Sivakumar Kannan1, Shobha Sharma1, Azat Badretdin1, Karen Clark1, Seán Turner1, Slava Brover1, Conrad L Schoch1, Avi Kimchi1, Michael DiCuccio1.
Abstract
Average nucleotide identity analysis is a useful tool to verify taxonomic identities in prokaryotic genomes, for both complete and draft assemblies. Using optimum threshold ranges appropriate for different prokaryotic taxa, we have reviewed all prokaryotic genome assemblies in GenBank with regard to their taxonomic identity. We present the methods used to make such comparisons, the current status of GenBank verifications, and recent developments in confirming species assignments in new genome submissions.Entities:
Keywords: GenBank; RefSeq; taxonomy; type strains
Mesh:
Substances:
Year: 2018 PMID: 29792589 PMCID: PMC6978984 DOI: 10.1099/ijsem.0.002809
Source DB: PubMed Journal: Int J Syst Evol Microbiol ISSN: 1466-5026 Impact factor: 2.747
No. of type strains (including co-identical strains and other kinds of type materials) from the NCBI Taxonomy Database and the DSMZ
| NCBI taxonomy | DSMZ | |
|---|---|---|
| No. of organisms | 31 078 | 14 238 |
| No. of type strains (and other type materials) | 103 533 | 45 362 |
Fig. 1.ANI process workflow for processing of pre-submission genomes.
Determination of the default cutoff of 96 %, based on current taxa for which we can determine both concordant and discordant ANI values
| ANI threshold | Count concordant below | Count discordant above |
|---|---|---|
| 98 | 175 | 7 |
| 97 | 112 | 9 |
| 96 | 77 | 9 |
| 95 | 55 | 12 |
| 94 | 40 | 16 |
| 93 | 23 | 22 |
| 92 | 18 | 31 |
Exceptions to ANI cutoff values: for most taxonomic groups, the assembly is considered a match when the ANI value shows a 96 % identity
For some taxa, the cutoff range may vary to reflect a clearer or less defined relationship of species within a genus.
| TaxID | Scientific name | ANI cutoff |
|---|---|---|
| 34073 |
| 88.00 % |
| 40324 |
| 88.50 % |
| 1596 |
| 93.50 % |
| … | ||
| 67270 |
| 99.99 % |
| 68178 |
| 99.99 % |
| 68208 |
| 99.99 % |
Fig. 2.Example of taxonomy correction markup on a GenBank record. This information was added after working closely with the submitter to correct the identification of the genome entry.
ANI equivalency groups
Pairs of species which cannot be separated by ANI analysis due to high similarity of their genome sequences. Sometime, an equivalency group will include more than two species. In these cases, they are duplicated in the lookup table.
| Species_1 | Species_2 |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fig. 3.K-mer tree showing the distribution of Shigella genomes among those of E. coli. The bar indicates the percentage nucleotide rate over the length of the genome.
Fig. 4.K-mer tree showing genome variability amongst Lactobacillus gasseri assemblies. The ANI cutoff of 93.5 % includes both groups of assemblies (a and b), whilst a 96 % cutoff will separate them. Type assemblies are highlighted. The bar indicates the percentage nucleotide substitution rate over the length of the genome.