| Literature DB >> 26010885 |
Brian P Anton1, Emmanuel F Mongodin2, Sonia Agrawal2, Alexey Fomenkov1, Devon R Byrd1, Richard J Roberts1, Elisabeth A Raleigh1.
Abstract
We report the complete sequence of ER2796, a laboratory strain of Escherichia coli K-12 that is completely defective in DNA methylation. Because of its lack of any native methylation, it is extremely useful as a host into which heterologous DNA methyltransferase genes can be cloned and the recognition sequences of their products deduced by Pacific Biosciences Single-Molecule Real Time (SMRT) sequencing. The genome was itself sequenced from a long-insert library using the SMRT platform, resulting in a single closed contig devoid of methylated bases. Comparison with K-12 MG1655, the first E. coli K-12 strain to be sequenced, shows an essentially co-linear relationship with no major rearrangements despite many generations of laboratory manipulation. The comparison revealed a total of 41 insertions and deletions, and 228 single base pair substitutions. In addition, the long-read approach facilitated the surprising discovery of four gene conversion events, three involving rRNA operons and one between two cryptic prophages. Such events thus contribute both to genomic homogenization and to bacteriophage diversification. As one of relatively few laboratory strains of E. coli to be sequenced, the genome also reveals the sequence changes underlying a number of classical mutant alleles including those affecting the various native DNA methylation systems.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26010885 PMCID: PMC4444293 DOI: 10.1371/journal.pone.0127446
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Laboratory strains of E. coli with finished (ungapped) genome sequences in GenBank.
| Strain | Ancestor | RefSeq or | Reference |
|---|---|---|---|
| MG1655 | K-12 | NC_000913 | [ |
| W3110 | K-12 | NC_007779 | [ |
| DH1 | K-12 | NC_017625, NC_017638 | [ |
| DH10B | K-12 | NC_010473 | [ |
| BW2952 [MC4100(MuLac)] | K-12 | NC_012759 | [ |
| MDS42 | K-12 | NC_020518 | unpublished |
| MC4100 | K-12 |
| [ |
| BW25113 | K-12 |
| unpublished |
| KLY | K-12 |
| [ |
| ER2796 | K-12 |
| this work |
| ER3413 | K-12 |
| this work |
| REL606 | B | NC_012967 | [ |
| BL21(DE3) | B | NC_012971, NC_012892 | [ |
| BL21-Gold | B | NC_012947 | unpublished |
| W (ATCC 9637) | W | NC_017635, NC_017664 | [ |
| KO11 | W | NC_017660, NC_016902 | [ |
| LY180 | W | NC_022364 | unpublished |
| ATCC 8739 | Crooks | NC_010468 | unpublished |
Fig 1Relationship of ER2796 and ER3413 to the nine other completely sequenced E. coli K-12 strains.
Completely sequenced strains are shown in bold type, and selected ancestral strains in Roman type. Most of the tree has been abstracted from Bachmann [28], except for the ancestries of MC4100 and DH10B back to Hfr Hayes, which are based on Laehnemann [63] and Durfee [4], respectively. Selected additional contributions of genetic material via crosses are shown by dotted lines. It appears based on genotype that Hfr 3000 U482 is the “U series” ancestor of DH10B, while Hfr 3000 U169 contributed genetic material in the ancestry of MC4100.
DNA restriction-modification genes in E. coli K-12 MG1655.
| Gene | Product | Activity | |
|---|---|---|---|
| DNA MTases | |||
|
| orphan MTase M.EcoKDam | Gm6A | |
|
| orphan MTase M.EcoKDcm | Cm5CW | |
|
| orphan MTase M.EcoKII | A | |
|
| Type I MTase M.EcoKI | Am6ACNNNNNNG | |
| Other Genes | |||
|
| Type I restriction ENase R.EcoKI | ||
|
| Type I specificity subunit S.EcoKI | ||
|
| Type IV restriction ENase | ||
|
| Type IV restriction ENase | ||
|
| Type IV restriction ENase |
a All of these genes have been deleted or otherwise inactivated in ER2796 except for yhdJ, which is additionally inactivated in ER3413.
Fig 2Lineage showing the construction of ER2796 from JC1552.
Selected intermediate genotypes are shown. Markers that were selected are shown, followed by those that were screened for in parentheses. The lineage from K-12 to JC1552 has been described previously [28]. Genotypes are shown here using the historic allele names, but we suggest an updated nomenclature for some of these in Table 3 based on the genome sequence.
Genotype markers in ER2796 and underlying sequence features.
| Allele | Old Allele Name (if changed) | Alteration | Genes Affected | ER2796 Sequence | MG1655 Sequence | Amino Acid Changes |
|---|---|---|---|---|---|---|
|
|
|
|
| 167920–169255 (169251–169255 is target site duplication) | between 167919–167920 | ER2796_149 (aa 1–145 + 13 aa), and ER2796_151 (aa 158–747) |
|
|
| deletion |
| between 365092–365093 | 362419–364862 | Δ223–1024; adds 40 aa extension overlapping |
|
|
| tRNA transition |
| 693302 (T) | 695693 (C) | (DNA nt) G34A |
| e |
| excision |
| between 1193386–1193387 | 1195598–1210801 | null; associated changes in |
|
|
| missense |
| 1302017 (T) | 1319610 (C) | G454D (ER2796_1284) |
|
| silent |
| 2012149 (T) | 2029184 (C) | E386 | |
| nonsense (TGA) |
| 2013172 (T) | 2030207 (C) | W45stop (ER2796_2013; also ER2796_2012 from internal start at aa 111) | ||
|
|
|
|
| 2021730–2030885 (2030877–2030885 is target site duplication) | between 2038764–2038765 | Δ87–211; ER2796_2025 (aa 1–86 + 15 aa), and ER2796_2035 (from internal start at aa 102) |
|
|
| deletion, in-frame |
| between 2080789–2080790 | 2088669–2088704 | Δ152–163 (ER2796_2082) |
|
| –1 frameshift |
| between 2798558–2798559 | 2812480 (A) | Δ92–171; ER2796_2763 (aa 1–90 + 20 aa), and ER2796_2762 (from internal start at aa 108) | |
| silent |
| 2798561 (C) | 2812483 (T) | L91 (ER2796_2762) | ||
|
| nonsense (TAG) |
| 2851555 (A) | 2865477 (G) | E33stop ER2796_2821 (from internal start at aa 40) | |
|
|
| –1 frameshift |
| between 3304561–3304562 | 3317286 (C) | Δ210–447 ER2796_3265 (aa 1–209 + 13 aa), and ER2796_3266 (from internal start at aa 221) |
|
| missense |
| 3443238 (G) | 3472313 (T) | K88Q | |
| missense |
| 3443372 (G) | 3472447 (T) | K43T (ER2796_3428) | ||
|
| deletion + 1266 bp KanR insertion |
| 3484166–3485431 | 3513241–3513773 | Δ55–242 ER2796_3474 (from internal start at aa 242) | |
|
| missense |
| 3699368 (A) | 3726511 (G) | A295V (ER2796_3669) | |
| missense |
| 3700835 (A) | 3727978 (G) | H271Y (ER2796_3670) | ||
| silent |
| 3700836 (G) | 3727979 (A) | N270 | ||
| insertion (IS1) |
| 3702309–3703085 | between 3729451–3729452 | Δ100–330; adds 1 aa extension (ER2796_3671) | ||
|
|
| –2 frameshift |
| between 3744696–3744697 | 3771063–3771064 (GG) | Δ254–637 ER2796_3708 (aa 1–253 + 60 aa), and ER2796_3709 (from internal start at aa 306) |
|
| +1 frameshift |
| 3787535 (C) | between 3813902–3813903 | ER2796_3754 | |
|
|
| –2 frameshift |
| between 4100468–4100469 | 4126836–4126837 (CG) | Δ48–386; ER2796_4061 (8 aa + aa 48–386) |
|
|
| deletion of 60,679 bp + insertion of 2 |
| 4511776–4514442 (4511776–4513104 and 4513114–4514442 are | 4537567–4595455 | null, except |
a The reassignment of glnV44 (supE44) was noted previously [43].
b The double mutation (one silent) is in agreement with a previous study [35].
c The sequence of luxS reported here is identical to a previous study [70], although our alignment differs slightly, moving the frameshift 3 nt and inferring a transition instead of a transversion. The steps that resulted in the shared luxS11 allele clearly include a base deletion and a base change, but exactly which deletion and which base change depend on the local alignment. Spontaneous unselected transitions are somewhat more frequent than transversions [71], so our alignment may be preferable. The mutation is present in DH1 [72] (see Table 1), an ancestor of the strain used in [70] and may have been present in sibling strains JC1552 (ancestral to ER2796; RecA+) and JC1553 (source of the recA1 allele of the DH1 and its descendants [73, 74]). The luxS and recA genes are very close, about 8 kb apart, and introduction of recA1 was the last step in construction of DH1.
d This nonsense mutation, which is common in laboratory E. coli strains [75], was most likely ancestral, not introduced by transduction. It may be partially suppressed in this strain. The rpoS mutation and the accompanying supE44 mutation (identified here as glnX44) can be traced to strain Y10, very early in the K-12 pedigree [28].
e This frameshift mutation presumably restores the wild type state, reverting the frameshift present in early K-12 derivative strains MG1655 and W3110 [76].
f The position of the parental zjj202::Tn10 is inferred to be 4597466–4597474 of MG1655 (NC_000913.2), the nine base target sequence that is duplicated upon insertion.
Fig 3Alignment of the MG1655 genome with ER2796 and DH10B, conducted with Progressive-Mauve.
Boundaries of the major contiguous blocks of sequence, labeled with capital letters, are formed by two major events specific to the DH10B lineage: block B results from deletion of a 34.6 kb region of MG1655 followed by partial restoration as part of a φ80Δ(lacZ)M15 mosaic prophage insertion in DH10B; and block E results from the IS10-mediated inversion of an 11 kb segment of MG1655, again in DH10B [4]. The following larger indels visible in the figure are labeled: prophage e14 lost in both ER2796 and DH10B; prophage CPZ-55 lost in ER2796; the 16 kb mtgA-yhcE region lost in ER2796 through IS5-mediated deletion; the ICR region deleted in both ER2796 and DH10B; Tn10 insertion at yedZ in ER2796; tandem duplication of a 113 kb region in DH10B, presumably IS5-mediated; the φ80Δ(lacZ)M15 mosaic prophage insertion in DH10B, including the lacZ region (part of block B).
Fig 4Comparison of the lacZY regions of MG1655 and ER2796.
A. Schematic drawing showing the region of MG1655 lacZ and lacZY intergenic region that is deleted in ER2796. It is oriented forward with respect to the chromosomal sequence, with the operon reversed from the conventional representation. In ER2796, the lacZ ORF enodes amino acids 1–222 of MG1655 lacZ (white box) fused to 40 amino acids derived from the lacZY intergenic region, and overlapping with lacY (cross-hatched box). The putative lacY ribosome binding site (RBS) is preserved in ER2796. B. DNA and translated protein sequence of the lacZY junction, numbered from ER2796. Nucleotides and translated amino acids missing in ER2796 are shown in gray, and those present are shown in black. In ER2796, aa 1–222 of the translated ORF are shown in black, and the 40 aa derived from the intergenic region are shown in red. Start codons of lacZ and lacY are highlighted, and the putative RBS of lacY is underlined. 2160 bp (720 aa) of MG1655 lacZ have been removed at the indicated position for brevity.
Fig 5Use of long reads to identify gene conversion events.
The schematic alignment shows the paralogous ribosomal gene clusters rrnB and rrnE from ER2796 (white genes) along with nonhomologous flanking genes (gray). The genes are marked with names and coordinates in ER2796. In ER2796, rrnB has been the apparent recipient of a gene conversion event in which rrnE served as donor (vertical arrows), and thus both regions are identical. As a result of this event, rrnB in ER2796 exhibits minor variations when compared with rrnB from its ancestor, MG1655: six SNPs (marked with *) and one indel (marked with †). Red tinted boxes indicate the regions of alteration (left and middle) and delineate the boundaries of the clusters (left and right). Sequencing reads internal to the clusters (i.e., between the outer two red boxes) cannot be mapped uniquely to one locus or the other unless they extend into the nonhomologous flanking regions, and the minor variants within (e.g., the middle red box) cannot be assigned to one cluster or the other without sequencing reads directly connecting them with a flanking region on one side or the other. The long-read library used in this analysis includes numerous reads that connect the unique flanking regions with the internal variants. The mapped coordinates of six example reads from the actual analysis are shown at the top, including some that span both sides of the 5 kb gene cluster. Arrows indicate where a read continues beyond the region shown here.