| Literature DB >> 16738553 |
Koji Hayashi1, Naoki Morooka, Yoshihiro Yamamoto, Katsutoshi Fujita, Katsumi Isono, Sunju Choi, Eiichi Ohtsubo, Tomoya Baba, Barry L Wanner, Hirotada Mori, Takashi Horiuchi.
Abstract
With the goal of solving the whole-cell problem with Escherichia coli K-12 as a model cell, highly accurate genomes were determined for two closely related K-12 strains, MG1655 and W3110. Completion of the W3110 genome and comparison with the MG1655 genome revealed differences at 267 sites, including 251 sites with short, mostly single-nucleotide, insertions or deletions (indels) or base substitutions (totaling 358 nucleotides), in addition to 13 sites with an insertion sequence element or defective prophage in only one strain and two sites for the W3110 inversion. Direct DNA sequencing of PCR products for the 251 regions with short indel and base disparities revealed that only eight sites are true differences. The other 243 discrepancies were due to errors in the original MG1655 sequence, including 79 frameshifts, one amino-acid residue deletion, five amino-acid residue insertions, 73 missense, and 17 silent changes within coding regions. Errors in the original MG1655 sequence (<1 per 13,000 bases) were mostly within portions sequenced with out-dated technology based on radioactive chemistry.Entities:
Mesh:
Year: 2006 PMID: 16738553 PMCID: PMC1681481 DOI: 10.1038/msb4100049
Source DB: PubMed Journal: Mol Syst Biol ISSN: 1744-4292 Impact factor: 11.429
Figure 1E. coli K-12 pedigree. The relationships of E. coli K-12 MG1655 and W3110 with wild-type E. coli K-12 (EMG2 or WG1) have been described (Bachmann, 1972, 1996). Wild-type K-12 was cured of phage λ to make W1485 prior to 1954 (Step 1), which in turn was cured of the F+ factor to make W2637 (Step 2), from which W3110 was selected for a strongly galactose-fermenting strain in 1956 (Step 3). More recently, W1485 was cured of the F+ factor to make MG1655 (Guyer ). E. coli K-12 EMG2, W1485, W2637, and W3110 have the same rpoS396(Am) allele (codon 33, TAG (Am); Rod ; Atlung ; KA Datsenko and BL Wanner, unpublished data), while MG1655 has the pseudorevertant Q33 allele (Atlung ).
Figure 2Resolution of E. coli K-12 W3110 and MG1655 sequence differences. See text.
Summary of E. coli K-12 MG1655 genome corrections
| Change | Location | No. sites |
|---|---|---|
| 1-nt substitution | Intergenic | 12 |
| Coding | 56 | |
| 2-nt substitution | Intergenic | 5 |
| Coding | 26 | |
| Multiple nt substitution | Intergenic | 2 |
| Coding | 2 | |
| RNA | 1 | |
| 1-nt indel | Intergenic | 48 (27) |
| Coding | 75 (50) | |
| RNA | 1 (0) | |
| 2-nt indel | Intergenic | 6 (5) |
| Coding | 3 (2) | |
| 3-nt indel | Coding | 4 (3) |
| 4-nt indel | Coding | 1 (1) |
| 6-nt indel | Coding | 1 (1) |
| Total | 243 (193) |
aThe actual sequence corrections are in Supplementary Table 2.
bMost genes affected have only single corrections. Exceptions had five (yfjP, alx, ppiC), six (yhdZ), seven (yieP, yjgN), 11 (yigL), and 14 (yibJ) corrections.
cTotals are given with the number of insertions in parentheses. Indels changed not only the length of particular gene products but also the number of gene products, for example, corrections resulting in gene fusion event(s), or conversion from one to two genes.
dOne multiple nt substitution changes coding of ebgA (CAAG to AGCA at nt 3 222 944); the other results in a frameshift in gntT due to a 2-nt addition (C to GCG at nt 3 544 358).
eA 1-codon deletion lies in yghG and 1-codon insertions lie in rffE, yiaY, and yieP.
fA 2-codon insertion lies in arcB.
Figure 3IS element and defective phage differences. Locus names and genome locations on the left side are based on the MG1655 genome. IS1A, IS1B, IS1C, etc. are named alphabetically to distinguish individual insertions (Supplementary Figure 1). IS elements, black arrows; sites, red arrowheads; six ISs disrupt orfs, red bars (alsK, dcuA, gatA, rcsC, tdcD, and tnaB); and phage genes, green arrows.
Confirmed sequence differences between E. coli K-12 W3110 and MG1655
| Gene | b num | JW id | Function | Changes | Ancestral type | |
|---|---|---|---|---|---|---|
| W3110 | MG1655 | |||||
|
| b1025 | JW5143 | Conserved membrane protein | V130 (GTA) | A130(GCA) | MG1655 |
|
| b1276 | JW1268 | Aconitate hydratase 1 | G522 (GGC) | A522 (AGC) | W3110 |
|
| b1579 | JW1571 | Qin prophage; predicted defective integrase | L274 (CTC) | F274 (TTC) | W3110 |
|
| b1942 | JW1926 | Conserved protein | V219 (GTT) | V219 (GTC) | MG1655 |
|
| b2741 | JW5437 | RNA polymerase, sigma S (sigma38) factor | Stop33 (TAG) | Q33 (CAG) | W3110 |
|
| b3357 | JW5702 | DNA-binding transcriptional dual regulator; cyclic AMP receptor protein | K29 (AAG) | T29(ACG) | MG1655 |
|
| b4009 | JWR109 | 23S ribosomal RNA ( | A2256 | G2256 | ND |
|
| b4138 | JW5735 | C4-dicarboxylate antiporter; anaerobic | Frameshift | MG1655 | |
| ND=not determined. | ||||||
aThe b number (b num) and JW identifier (JW id) are the locus tags in the MG1655 (GenBank™ U00096.2) and W3110 (DDBJ AP009048) genomes.
bAncestral K-12 is EMG2 (Figure 1). EMG2 and W1485 were shown to be alike and to have the same allele as MG1655 or W3110, as indicated. For reasons given elsewhere (Atlung ), progenitor E. coli likely had the rpoS Q33 (CAG) allele, while a number of E. coli K-12 derivatives have a true reversion to Q33 or a mutant (E33, Y33, S33, or L33) allele. ND, not determined.
c dcuA has a 2-nt insertion (TT) after nt 182 of its coding region.