| Literature DB >> 25406383 |
Sophie M Colston1, Matthew S Fullmer1, Lidia Beka1, Brigitte Lamy, J Peter Gogarten2, Joerg Graf2.
Abstract
UNLABELLED: Prokaryotic taxonomy is the underpinning of microbiology, as it provides a framework for the proper identification and naming of organisms. The "gold standard" of bacterial species delineation is the overall genome similarity determined by DNA-DNA hybridization (DDH), a technically rigorous yet sometimes variable method that may produce inconsistent results. Improvements in next-generation sequencing have resulted in an upsurge of bacterial genome sequences and bioinformatic tools that compare genomic data, such as average nucleotide identity (ANI), correlation of tetranucleotide frequencies, and the genome-to-genome distance calculator, or in silico DDH (isDDH). Here, we evaluate ANI and isDDH in combination with phylogenetic studies using Aeromonas, a taxonomically challenging genus with many described species and several strains that were reassigned to different species as a test case. We generated improved, high-quality draft genome sequences for 33 Aeromonas strains and combined them with 23 publicly available genomes. ANI and isDDH distances were determined and compared to phylogenies from multilocus sequence analysis of housekeeping genes, ribosomal proteins, and expanded core genes. The expanded core phylogenetic analysis suggested relationships between distant Aeromonas clades that were inconsistent with studies using fewer genes. ANI values of ≥ 96% and isDDH values of ≥ 70% consistently grouped genomes originating from strains of the same species together. Our study confirmed known misidentifications, validated the recent revisions in the nomenclature, and revealed that a number of genomes deposited in GenBank are misnamed. In addition, two strains were identified that may represent novel Aeromonas species. IMPORTANCE: Improvements in DNA sequencing technologies have resulted in the ability to generate large numbers of high-quality draft genomes and led to a dramatic increase in the number of publically available genomes. This has allowed researchers to characterize microorganisms using genome data. Advantages of genome sequence-based classification include data and computing programs that can be readily shared, facilitating the standardization of taxonomic methodology and resolving conflicting identifications by providing greater uniformity in an overall analysis. Using Aeromonas as a test case, we compared and validated different approaches. Based on our analyses, we recommend cutoff values for distance measures for identifying species. Accurate species classification is critical not only to obviate the perpetuation of errors in public databases but also to ensure the validity of inferences made on the relationships among species within a genus and proper identification in clinical and veterinary diagnostic laboratories.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25406383 PMCID: PMC4251997 DOI: 10.1128/mBio.02136-14
Source DB: PubMed Journal: mBio Impact factor: 7.867
General features of the Aeromonas genomes
| Species | Strain | Genome size (Mbp) | No. of scaffolds | Avg genome coverage[ | G+C content (%) | No. of predicted CDSs[ | Accession no. | Reference | |
|---|---|---|---|---|---|---|---|---|---|
| CECT 4199T | 4.66 | 120 | 87 | 114,541 | 58.4 | 4,173 | PRJEB7019[ | This study | |
| CECT 7289 T | 4.69 | 78 | 117 | 163,504 | 61.7 | 4,266 | PRJEB7020[ | This study | |
| CECT 8023T | 4.11 | 113 | 128 | 95,095 | 58.1 | 3,733 | PRJEB7021[ | This study | |
| CECT 4227T | 4.68 | 41 | 53 | 237,067 | 60.5 | 4,223 | PRJEB7022[ | This study | |
| CECT 7113T | 4.28 | 69 | 30 | 149,050 | 62.3 | 3,909 | PRJEB7023[ | This study | |
| CECT 838 T | 4.47 | 111 | 95 | 101,663 | 61.6 | 4,081 | PRJEB7024[ | This study | |
| CIP 107763T | 4.43 | 64 | 87 | 188,049 | 58.9 | 4,012 | PRJEB7047[ | This study | |
| CECT 4254T | 4.06 | 37 | 116 | 203,531 | 61.5 | 3,711 | PRJEB7026[ | This study | |
| CECT 4342T | 4.47 | 35 | 112 | 380,984 | 61.9 | 4,076 | PRJEB7027[ | This study | |
| CECT 4487T | 4.47 | 46 | 56 | 208,775 | 59.5 | 4,054 | PRJEB7028[ | This study | |
| CECT 4224T | 4.54 | 22 | 50 | 441,212 | 61.1 | 4,113 | PRJEB7029[ | This study | |
| LMG 24681T | 3.90 | 76 | 48 | 108,949 | 58.2 | 3,609 | PRJEB7030[ | This study | |
| CECT 4486T | 4.41 | 66 | 70 | 147,024 | 58.4 | 3,997 | PRJEB7050[ | This study | |
| CECT 4228T | 4.50 | 58 | 55 | 161,393 | 58.7 | 4,065 | PRJEB7031[ | This study | |
| CECT 839T | 4.74 | 1 | UNK[ | 4,744,448 | 61.5 | 4,119 | CP000462[ | ||
| CECT 4232T | 4.48 | 233 | 60 | 37,608 | 60.9 | 4,075 | PRJEB7032[ | This study | |
| CIP 108876T | 4.23 | 309 | 9 | 21,565 | 59.2 | 3,946 | AQGQ01[ | ||
| LMG 24783T | 5.18 | 91 | 99 | 150,424 | 59.0 | 4,713 | PRJEB7033[ | This study | |
| CIP 105493T | 4.76 | 105 | 67 | 113,495 | 58.4 | 4,331 | PRJEB7034[ | This study | |
| DSM 22539T | 4.53 | 102 | 99 | 155,151 | 60.0 | 4,149 | PRJEB7035[ | This study | |
| CIP 103209T | 4.74 | 128 | 117 | 89, 543 | 58.5 | 4,442 | PRJEB7036[ | This study | |
| LMG 24682T | 4.19 | 98 | 121 | 82,664 | 63.1 | 3,828 | PRJEB7037[ | This study | |
| CECT 4240T | 4.13 | 111 | 260 | 108,810 | 61.7 | 3,808 | PRJEB7038[ | This study | |
| CIP 107798T | 3.99 | 100 | 86 | 73,112 | 61.1 | 3,654 | PRJEB7039[ | This study | |
| CECT 4245T | 4.68 | 52 | 34 | 188,072 | 58.6 | 4,160 | PRJEB7040[ | This study | |
| LMG 24683T | 4.24 | 106 | 66 | 85,294 | 62.8 | 3,884 | PRJEB7041[ | This study | |
| CECT 7082T | 4.76 | 51 | 89 | 238,229 | 60.1 | 4,278 | PRJEB7042[ | This study | |
| CECT 4255T | 4.34 | 27 | 66 | 640,249 | 60.0 | 3,917 | PRJEB7043[ | This study | |
| CECT 4257T | 4.52 | 52 | 59 | 181,171 | 58.8 | 4,070 | PRJEB7044[ | This study | |
| BVH88 | 4.71 | 131 | 204 | 74,486 | 58.6 | 4,295 | PRJEB7045[ | This study | |
| Ae398 | 4.44 | 149 | UNK | 76,364 | 61.4 | 3,866 | CACP01[ | ||
| CECT 4221 | 4.58 | 332 | 66 | 31,465 | 61.0 | 4,207 | PRJEB7046[ | This study | |
| AAK1 | 4.77 | 37 | 20 | 404,457 | 61.7 | 4,237 | PRJDB70[ | ||
| CIP 107500 | 4.71 | 73 | 84 | 165,885 | 61.8 | 4,284 | PRJEB7048[ | This study | |
| 173 | 4.79 | 74 | 46 | 119,625 | 61.6 | 4,134 | AOBN01[ | ||
| 277 | 4.79 | 41 | 76 | 282,384 | 61.6 | 4,213 | AOBQ01[ | ||
| 14 | 4.67 | 75 | 45 | 130,840 | 62 | UNK | AOBM01[ | ||
| 116 | 4.61 | 45 | 66 | 208,249 | 62 | 4,090 | ANPN01[ | ||
| 259 | 4.70 | 80 | 39 | 117,245 | 61.7 | 4,098 | AOBP01[ | ||
| 187 | 4.78 | 59 | 111 | 197,352 | 61.6 | 4,205 | AOBO01[ | ||
| SSU | 4.94 | 2 | 285 | 4,791,870 | 61.5 | 4,449 | AGWR01[ | The Broad Institute | |
| ML09_119 | 5.02 | UNK | UNK | UNK | 60.8 | 4,434 | CP005966.1[ | ||
| SNUFPC_A8 | 4.97 | 41 | 37 | 234,812 | 60.8 | 4,352 | AMQA01[ | ||
| CIP 107985 | 4.68 | 107 | 140 | 90,304 | 61.6 | 4,268 | PRJEB7049[ | This study | |
| WS | 4.78 | 1 | UNK | 4,788,430 | 60.7 | 4,385 | CP007567.1[ | ||
| AS03 | 4.96 | 69 | 21 | 124,543 | 58.3 | UNK | AMQG02[ | ||
| A449 | 5.04 | 1 | UNK | 5,040,536 | 58.2 | 4,436 | CP000644.1[ | ||
| 01-B526 | 4.92 | 604 | 40 | 83,743 | 58.4 | 4,529 | AGVO01[ | ||
| AH4 | 4.87 | 41 | 90 | 258,555 | 59.6 | 4,453 | PRJEB6940[ | This study | |
| AMC 34 | 4.58 | 1 | 288 | 4,578,728 | 58.5 | 4,117 | AGWU01[ | The Broad Institute | |
| B565 | 4.55 | 1 | UNK | 4,551,783 | 58.7 | 4,073 | CP002607[ | ||
| AER 39 | 4.42 | 4 | 283 | 1,516,045 | 58.9 | 3,948 | AGWT01[ | The Broad Institute | |
| Hm21 | 4.68 | 50 | 200 | 179,631 | 58.7 | 4,245 | ATFB01[ | ||
| LMG 13067 | 4.74 | 72 | 46 | 147,470 | 58.3 | 4,171 | PRJEB7051[ | This study | |
| AER 397 | 4.50 | 5 | 378 | 3,260,625 | 58.9 | 3,986 | AGWV01[ | The Broad Institute | |
| AMC 35 | 4.57 | 2 | 285 | 4,172,420 | 58.6 | 4,036 | AGWW01[ | The Broad Institute |
Obtained from the EMBL Nucleotide Sequence Database.
Previously published names are indicated inside braces.
UNK, unknown.
Obtained from GenBank, National Center for Biotechnology Information.
The average genome coverage is expressed in bp sequenced divided by genome size.
The N50 (reported in nucleotides) represents the smallest of the largest contigs covering 50% of the total size of all contigs.
CDS, coding sequence.
FIG 1 (A) Maximum likelihood reconstruction of 16 single-copy housekeeping genes. Support values are represented by dots: red (90%+ bootstraps), orange (80%+), yellow (70%+). (B) Approximate maximum likelihood reconstruction of 2,710 orthologous groups found in 90% or more of the taxa. aLRT SH-like support values equal to or greater than 0.97 are represented by red dots. The species A. veronii, A. hydrophila, A. dhakensis, A. salmonicida, and A. caviae are color-coded in both trees. Additionally, two previously misidentified taxa, A. veronii AMC 34 and A. hydrophila AH4, are shown in red and teal, respectively. Eight well-supported clades were shared between the two reconstructions. They are shown by the colored bars and are numbered 1 through 8.
FIG 3 Comparison of isDDH and ANI results. The pairwise percent similarities of 56 genomes were determined using either isDDH or ANI. The two approaches revealed a significant correlation, with an r2 of 0.957. When testing samples with an isDDH values of ≥50%, the r2 was 0.9996.
FIG 2 ANI and isDDH values. The lower triangle displays ANI values, and the upper triangle shows the isDDH values. ANI values are colored according to three historical species cutoff values: 94% (yellow), 95% (orange), and 96%+ (red). The isDDH values displayed are the upper limits of the 95% confidence intervals and are colored red if the met the laboratory DDH species cutoff of 70% hybridization. ANI of 96% correlates well with 70% isDDH values, with only the A. allosaccharophila isolates failing to match (68.7%).