Literature DB >> 29449382

Complete Genome Sequence of Escherichia coli ML35.

Angeline Casale1, Stephanie Clark1, Melissa Grasso1, Marta Kryschuk1, Lukas Ritzer1, Madyson Trudeau1, Laura E Williams2.   

Abstract

We report here the complete genome sequence of Escherichia coli strain ML35. We assembled PacBio reads into a single closed contig with 169× mean coverage and then polished this contig using Illumina MiSeq reads, yielding a 4,918,774-bp sequence with 50.8% GC content.
Copyright © 2018 Casale et al.

Entities:  

Year:  2018        PMID: 29449382      PMCID: PMC5814495          DOI: 10.1128/genomeA.00034-18

Source DB:  PubMed          Journal:  Genome Announc


GENOME ANNOUNCEMENT

Escherichia coli strain ML35 was isolated during studies of lac operon gene expression in the 1950s (1). ML35 does not synthesize lactose permease, but it constitutively expresses β-galactosidase (2). Since its isolation, ML35 has been used in a variety of experiments, including the investigation of interactions between E. coli and predatory bacteria (3). Williams and coworkers (4) are using ML35 and other E. coli strains to test the prey range of predatory bacteria. Comparative genomics will help us understand how genome variation within a prey species impacts variation in predation phenotypes. We extracted genomic DNA from 3 ml of overnight culture grown in Trypticase soy broth at 37°C using the Wizard genomic DNA purification kit (Promega). Aliquots were used by the University of Maryland Institute for Genome Sciences to construct a PacBio library and by the University of Rhode Island Genomics and Sequencing Center to construct an Illumina library. Sequencing on a PacBio RS II instrument using P6-C4 chemistry yielded 93,133 subreads, with an N50 value of 12,583 bp, from two single-molecule real-time (SMRT) cells. For de novo assembly, we launched an Amazon EC2 instance of SMRT Portal version 2.3.0 and used the Hierarchical Genome Assembly Process version 3 (HGAP3) (5) with an estimated genome size 4.5 Mb and a target coverage of 30×. This generated contigs of 4,964,530 bp and 18,915 bp, with 169× and 18× mean coverages, respectively. The small contig is highly similar to regions of the large contig. Combined with its low coverage, this suggests that the small contig is an assembly artifact; therefore, we discarded it. To circularize the large contig, we used Gepard (6) to visualize overlap between the ends of the contig and BLAST (7) and EMBOSS extractseq (8) to specify coordinates and trim overlap, thereby generating a closed 4,918,091-bp contig. To polish the closed contig, we processed 2 × 250-bp Illumina MiSeq reads using SolexaQA++ version 3.1.4 (9). We removed bases that had a quality score of <13 with DynamicTrim and then discarded reads that had <100 bp with LengthSort. This yielded 5,366,007 read pairs. Using the Burrows-Wheeler aligner “mem” (BWA-mem) algorithm version 0.7.13 (10), we mapped 94.8% of these reads to the closed contig. We sorted and indexed the alignment file with SAMtools (11) and then used Pilon version 1.22 (12) to identify and correct 717 small indels, yielding a corrected 4,918,780-bp contig. To confirm this sequence, we used the same Illumina MiSeq reads and DynamicTrim quality score cutoff but adjusted the LengthSort cutoff to 75 bp. After aligning these reads to the corrected contig, Pilon identified eight discrepancies, which we manually examined and corrected to generate the final genome sequence of 4,918,774 bp with 50.8% GC content. Annotation with the Prokaryotic Genome Annotation Pipeline (PGAP) predicted 4,782 protein-coding sequences, 757 of which are annotated as hypothetical proteins, along with 95 tRNAs and 7 rRNA operons. By comparing the ML35 genome to that of E. coli MG1655 (GenBank accession no. NC_000913), we identified an 11-bp insertion in ML35’s lacY gene that causes a frameshift and a nonsynonymous substitution in ML35’s lacI gene that causes a V24E replacement, which is reported to impact the repressor protein function (13). These mutations may explain the Lac phenotype observed for ML35.

Accession number(s).

This complete genome sequence has been deposited in GenBank under the accession no. CP025747. The version described in this paper is the first version, CP025747.1.
  12 in total

1.  EMBOSS: the European Molecular Biology Open Software Suite.

Authors:  P Rice; I Longden; A Bleasby
Journal:  Trends Genet       Date:  2000-06       Impact factor: 11.639

2.  [Galactoside-permease of Escherichia coli].

Authors:  G BUTTIN; G N COHEN; J MONOD; H V RICKENBERG
Journal:  Ann Inst Pasteur (Paris)       Date:  1956-12

3.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

4.  Gepard: a rapid and sensitive tool for creating dotplots on genome scale.

Authors:  Jan Krumsiek; Roland Arnold; Thomas Rattei
Journal:  Bioinformatics       Date:  2007-02-19       Impact factor: 6.937

5.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

Authors:  Chen-Shan Chin; David H Alexander; Patrick Marks; Aaron A Klammer; James Drake; Cheryl Heiner; Alicia Clum; Alex Copeland; John Huddleston; Evan E Eichler; Stephen W Turner; Jonas Korlach
Journal:  Nat Methods       Date:  2013-05-05       Impact factor: 28.547

6.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

7.  Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequence.

Authors:  P Markiewicz; L G Kleina; C Cruz; S Ehret; J H Miller
Journal:  J Mol Biol       Date:  1994-07-29       Impact factor: 5.469

8.  Early host damage in the infection cycle of Bdellovibrio bacteriovorus.

Authors:  S C Rittenberg; M Shilo
Journal:  J Bacteriol       Date:  1970-04       Impact factor: 3.490

9.  Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement.

Authors:  Bruce J Walker; Thomas Abeel; Terrance Shea; Margaret Priest; Amr Abouelliel; Sharadha Sakthikumar; Christina A Cuomo; Qiandong Zeng; Jennifer Wortman; Sarah K Young; Ashlee M Earl
Journal:  PLoS One       Date:  2014-11-19       Impact factor: 3.240

10.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.