| Literature DB >> 31556867 |
Mohammad Hamidian1, Ryan R Wick2, Rebecca M Hartstein3, Louise M Judd2, Kathryn E Holt4,2, Ruth M Hall3.
Abstract
The Acinetobacter baumannii global clone 1 isolate AB307-0294, recovered in the USA in 1994, and the global clone 2 (GC2) isolate ACICU, isolated in 2005 in Italy, were among the first A. baumannii isolates to be completely sequenced. AB307-0294 is susceptible to most antibiotics and has been used in many genetic studies, and ACICU belongs to a rare GC2 lineage. The complete genome sequences, originally determined using 454 pyrosequencing technology, which is known to generate sequencing errors, were re-determined using Illumina MiSeq and MinION (Oxford Nanopore Technologies) technologies and a hybrid assembly generated using Unicycler. Comparison of the resulting new high-quality genomes to the earlier 454-sequenced versions identified a large number of nucleotide differences affecting protein coding sequence (CDS) features, and allowed the sequences of the long and highly repetitive bap and blp1 genes to be properly resolved for the first time in ACICU. Comparisons of the annotations of the original and revised genomes revealed a large number of differences in the protein CDS features, underlining the impact of sequence errors on protein sequence predictions and core gene determination. On average, 400 predicted CDSs were longer or shorter in the revised genomes and about 200 CDS features were no longer present.Entities:
Keywords: AB307-0294; ACICU; Acinetobacter baumannii; complete genome sequence; global clone 1; global clone 2
Year: 2019 PMID: 31556867 PMCID: PMC6861863 DOI: 10.1099/mgen.0.000298
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Properties of early completed genomes
|
|
Country |
Isolation date |
GC |
Original sequence |
Revised |
Revised sequence | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Length (bp) |
GenBank no. |
Sequencing technology |
Reference |
Length (bp) |
GenBank no. |
Sequencing technology |
Reference | |||||
|
|
nk |
1951 |
– | |||||||||
|
Chromosome |
|
|
3 976 747 |
CP000521.1 |
454 |
[ |
Yes |
3 857 743 |
CP012004.1 |
PacBio |
[ | |
|
pAB1 |
|
|
13 408 |
CP000522.1 | 454 |
No |
Not present |
na |
nk | |||
|
pAB2 |
|
|
11 302 |
CP000523.1 | 454 |
No |
Not present |
|
nk | |||
|
pAB3 |
|
|
Not present |
– | 454 |
Yes |
148 955 |
CP012005.1 |
PacBio | |||
|
|
France |
2001 |
1 | |||||||||
|
Chromosome |
|
|
3 936 291 |
CU459141.1 |
Illumina |
[ |
No |
– |
– |
– |
– | |
|
p1ABAYE |
|
|
5644 |
CU459137.1 |
Illumina |
No |
– |
– |
– |
– | ||
|
p2ABAYE |
|
|
9661 |
CU459138.1 |
Illumina |
No |
– |
– |
– |
– | ||
|
p3ABAYE |
|
|
94 413 |
CU459140.1 |
Illumina |
No |
– |
– |
– |
– | ||
|
p4ABAYE |
|
|
2726 |
CU459139.1 |
Illumina |
No |
– |
– |
– |
– | ||
|
|
USA |
1994 |
1 | |||||||||
|
Chromosome |
|
|
3 760 981 |
CP001172.1 |
454 |
[ |
Yes |
3 759 495 |
CP001172.2 |
MinION and Illumina |
This study | |
|
|
USA |
2004 |
1 | |||||||||
|
Chromosome |
|
|
4 050 513 |
CP001182.1 |
454 |
[ |
Yes |
4 055 148 |
CP001182.2 |
Illumina |
[ | |
|
pAB0057 |
|
|
8729 |
CP001183.1 |
454 |
Yes |
8731 |
CP001183.2 |
Illumina | |||
|
|
South Korea |
2011* |
2 | |||||||||
|
Chromosome |
|
|
3 940 614 |
CP001921.1 |
454 |
[ |
No |
– |
– |
– |
– | |
|
ABKp1 |
|
|
74 451 |
CP001922.1 |
454 |
No |
– |
– |
– |
– | ||
|
ABKp2 |
|
|
8041 |
CP001923.1 |
454 |
No |
– |
– |
– |
– | ||
|
|
Italy |
2005 |
2 | |||||||||
|
Chromosome |
|
|
3 904 116 |
CP000863.1 |
454 |
[ |
Yes |
3 919 274 |
CP031380.1 |
MinION and Illumina |
This study | |
|
pACICU1 |
|
|
28 279 |
CP000864.1 |
454 |
Yes |
24 268 |
CP031381.1 |
MinION and Illumina |
This study | ||
|
pACICU2 |
|
|
64 366 |
CP000865.1 |
454 |
Yes |
70 101 |
CP031382.1 |
MinION and Illumina |
This study | ||
|
|
Taiwan |
2007 |
2 | |||||||||
|
Chromosome |
|
|
4 130 792 |
CP002522.1 |
Illumina |
[ |
Yes |
4 138 388 |
CP002522.2 |
Illumina |
– | |
|
p1ABTCDC0715 |
|
|
8731 |
CP002523.1 |
Illumina |
No |
– |
– |
– |
– | ||
|
P2ABTCDC0715 |
|
|
70 894 |
CP002524.1 |
Illumina |
No |
– |
– |
– |
– | ||
|
|
China |
2006 |
2 | |||||||||
|
Chromosome |
|
|
3 991 133 |
CP001937.1 |
454 |
[ |
Yes |
4 022 275 |
CP001937.2 |
PacBio |
[ | |
|
pMDR-ZJ06† |
|
|
20 301 |
CP001938.1 |
454 |
Yes |
Not present† |
|
|
| ||
|
|
Taiwan |
2008 |
2 |
[ | ||||||||
|
Chromosome |
|
|
3 957 368 |
CP003856 |
Illumina |
No |
– |
– |
– |
– | ||
|
|
China |
2012‡ |
2 | |||||||||
|
Chromosome |
|
|
3 964 912 |
CP003500.1 |
454 |
[ |
No |
– |
– |
– |
– | |
|
pABTJ1 |
|
|
77 528 |
CP003501.1 |
454 |
No |
– |
– |
– |
– | ||
|
pABTJ2 |
|
|
110 967 |
CP004359.1 |
454 |
No |
– |
– |
– |
– | ||
*Genome submission date; isolation date is not known.
†pMDR-ZJ06 is not present in the revised genome
‡Recovered between 2007 and 2012.
na, not applicable; nk, not known.
Comparison of bap and blp1 genes in early complete genomes and their revisions
|
Genome |
Revision technology |
|
| ||
|---|---|---|---|---|---|
|
Size (bp) |
Locus ID |
Size (bp) |
Locus ID | ||
|
| |||||
| Original |
6306 |
A1S_2696 |
–* |
– | |
|
Revised |
PacBio |
6225 |
ACX60_04030 |
– |
– |
|
| |||||
|
Original |
22 920 |
ABBFA_000776 |
10 071 |
ABBFA_000810 | |
|
Revised |
Nanopore |
25863 |
ABBFA_00771 |
10 089 |
ABBFA_00802 |
|
| |||||
|
Original |
6420 |
ACICU_02938-46 |
9510 |
ACICU_02910 | |
|
Revised |
Nanopore |
22 212 |
DMO12_08904 |
9813 |
DM012_08811 |
|
| |||||
|
Original |
2115 |
ABZJ_03124 |
9135 |
ABZJ_03096 | |
|
Revised |
PacBio |
7947 |
ABZJ_03955 |
9813 |
ABZJ_03096 |
*ATCC 17978 does not contain the blp1 gene.
Fig. 1.Histograms of CDS lengths relative to the length of the top hit in UniProt, in the original versus revised genomes. (a) ACICU GenBank accession no. CP000863.1 (original) and CP031380 (revised), (b) AB307-0294 GenBank accession no. CP001172.1 (original) and CP001172.2 (revised), and (c) ATCC 17978 GenBank accession no. CP000521.1 (original) and CP012004.1 (revised). The x-axis shows the ratio of CDS length to the length of the closest hit in the UniProt TrEMBL database. The y-axis shows gene frequency and is truncated at 100 (the centre bar extends to ~3000 genes). A tight distribution around 1.0 indicates that the assembly’s CDSs match known proteins, supporting few indel errors in the assembly. A left-skewed distribution is characteristic of an assembly with indel errors that lead to premature stop codons.