Literature DB >> 32943554

Complete Genome Sequences of 47 Environmental Isolates of Escherichia coli.

Georgia Breckell1, Olin K Silander2.   

Abstract

Escherichia coli is commonly considered a host-associated bacterium. However, there is evidence that some strains occupy environmental (non-host-associated) niches. Here, we report the complete genomes of 47 Escherichia coli environmental isolates. These will be useful for understanding the dynamics of plasmids, phages, and other repetitive genetic elements.
Copyright © 2020 Breckell and Silander.

Entities:  

Year:  2020        PMID: 32943554      PMCID: PMC7498420          DOI: 10.1128/MRA.00222-20

Source DB:  PubMed          Journal:  Microbiol Resour Announc        ISSN: 2576-098X


ANNOUNCEMENT

Escherichia coli has historically been considered a host-associated bacterium, although recent evidence suggests that many strains may persist and grow in the environment, and in some cases this may be the primary niche (1–4). E. coli is also well known for its prolific horizontal gene transfer (3, 5). To understand the rates of gene transfer, especially of mobile genetic elements (which are often repetitive in nature), complete genomes are required. To achieve this, we carried out whole-genome sequencing and assembly for 47 environmental strains of E. coli isolated from the shore of the St. Louis River in Minnesota, near Lake Superior (6). We grew all strains in LB medium and isolated genomic DNA using either the Promega Wizard kit or phenol-chloroform extraction (7). All strains were sequenced using both the Oxford Nanopore Technologies (ONT) and Illumina sequencing platforms. Illumina data were obtained from MicrobesNG with in-house quality control (adapter trimming with Trimmomatic v0.30, with a sliding window quality score cutoff value of Q15) using DNA extracted with the Promega Wizard kit. We prepared ONT sequencing libraries using the rapid barcoding kit (SQK-RBK004) and ran all libraries on R9.4 flow cells, multiplexing between 6 and 12 strains on each flow cell. We performed base calling using Guppy v2.3.7. We obtained at least 250 Mbp of sequence data for all strains except one (Table 1), with a median of 1,002 Mbp per strain (interquartile range [IQR], 670 Mbp to 1,296 Mbp). For all strains with more than 500 Mbp of sequence data, we used Filtlong v0.2.0 (https://github.com/rrwick/Filtlong) to retain only 500 Mbp in total, prioritizing read quality over length with the following parameters: min length set to 1,000, mean q weight set to 10, and split set to 500. The filtered read sets had a median read N50 value of 17.4 kbp (IQR, 13.4 kbp to 20.5 kbp). We also obtained at least 30-fold coverage of 2 × 250-bp paired-end Illumina reads for each genome.
TABLE 1

Genome statistics for all 47 assemblies

Strain IDa Sample date (yr-mo-day)LocationNo. of Illumina readsONT extraction methodb Total ONT sequence size (bp)No. of ONT readsONT read N50 (bp)Filtered ONT read N50(bp)c Chromosome length (bp)d Circular chromosomee Genome length (bp)f Total no. of contigsrRNA orientationg Read accession no.Assembly accession no.
SC4682005-8-15Upshore1,288,488Phenol-CHCl31,402,515,33534,3377,16414,8154,426,017Yes4,426,0171StandardSAMEA6595239GCA_902825195
SC4572005-8-15Upshore603,943Phenol-CHCl3465,520,11751,6939,20910,2804,555,909Yes4,555,9091StandardSAMEA6595235GCA_902810185
SC4552005-8-15Upshore819,314Phenol-CHCl31,033,029,65430,63410,56017,5364,655,420Yes4,655,4201StandardSAMEA6595233GCA_902810345
SC4342005-8-15Waterline618,526Phenol-CHCl3849,800,86533,88010,11316,0564,658,197Yes4,658,1971StandardSAMEA6595225GCA_902810315
SC4772005-9-19Surface water1,045,031Phenol-CHCl3843,013,94644,6137,85612,5494,658,510Yes4,658,5101StandardSAMEA6595243GCA_902810395
SC3162005-6-15Surface water3,409,700Promega43,287,85939,44511,80713,3654,663,327Yes4,681,2043StandardSAMEA6595204GCA_902809975
SC4672005-8-15Upshore922,562Phenol-CHCl31,003,216,47838,3368,52113,9634,715,938Yes4,715,9381StandardSAMEA6595238GCA_902825205
SC4232005-8-15Sediment849,233Phenol-CHCl3274,260,72718,52216,18818,0664,716,885Yes4,716,8851StandardSAMEA6595220GCA_902810325
SC4652005-8-15Upshore1,577,380Promega and phenol-CHCl31,386,503,69135,0479,48915,1954,722,586Yes4,785,0192StandardSAMEA6595237GCA_902810145
SC4522005-8-15Upshore417,390Phenol-CHCl32,207,184,72116,95911,90930,8894,723,951Yes4,755,1662StandardSAMEA6595230GCA_902810195
SC4312005-8-15Waterline722,122Phenol-CHCl31,774,332,18625,1443,32621,1384,727,732Yes4,866,2542StandardSAMEA6595223GCA_902810235
SC4752005-9-19Surface water413,920Phenol-CHCl3907,422,84236,0858,26314,9964,729,401Yes4,729,4011StandardSAMEA6595241GCA_902810385
SC4922005-9-19Surface water544,889Phenol-CHCl3718,894,15030,09513,76718,5014,736,913Yes4,736,9131StandardSAMEA6595248GCA_902810375
SC4802005-9-19Surface water614,748Phenol-CHCl31,009,767,14625,68612,89220,4914,741,504Yes4,741,5041StandardSAMEA6595245GCA_902810405
SC4762005-9-19Surface water661,043Promega and phenol-CHCl31,467,536,90533,4356,50915,7584,747,946Yes4,747,9461StandardSAMEA6595242GCA_902810415
SC4792005-9-19Surface water483,518Phenol-CHCl3468,589,66741,63811,80013,3674,762,128Yes4,913,0123StandardSAMEA6595244GCA_902810125
SC3922005-8-24Upshore870,803Promega366,289,28344,9908,6469,7144,770,015Yes4,783,2812StandardSAMEA6595208GCA_902810015
SC3122005-6-15Surface water3,582,617Promega678,059,11732,70813,22217,7654,775,485Yes4,878,7783StandardSAMEA6595203GCA_902810065
SC3862005-8-18Upshore1,917,879Promega989,915,3894,2597,17012,5074,778,381Yes5,088,0945StandardSAMEA6595207GCA_902810055
SC4562005-8-15Upshore350,577Phenol-CHCl3120,521,60914,7938,7169,9464,790,285Yes4,885,3422StandardSAMEA6595234GCA_902810135
SC4872005-9-19Surface water1,289,213Phenol-CHCl3651,178,80534,63413,09917,5464,794,586Yes4,794,5861StandardSAMEA6595246GCA_902810365
SC4332005-8-15Waterline566,913Phenol-CHCl3660,660,04825,78117,31223,3624,797,429Yes4,961,2142StandardSAMEA6595224GCA_902810215
SC4292005-8-15Waterline1,110,243Promega and phenol-CHCl31,081,882,33128,6689,27118,7804,797,468Yes4,961,2442StandardSAMEA6595221GCA_902810175
SC4302005-8-15Waterline2,171,974Promega and phenol-CHCl31,749,627,49927,3437,47819,1594,797,499Yes4,961,2832StandardSAMEA6595222GCA_902810205
SC3972005-8-15Surface water1,485,195Promega1,001,724,41428,04013,64319,6774,858,696No5,067,2473StandardSAMEA6595209GCA_902809965
SC4112005-8-15Surface water565,996Phenol-CHCl31,360,250,70625,13411,22420,5824,859,344Yes5,068,1093StandardSAMEA6595216GCA_902810035
SC4192005-8-15Sediment793,357Phenol-CHCl31,421,369,57835,9426,73214,6844,859,796Yes4,916,1162StandardSAMEA6595218GCA_902810255
SC3642005-7-27Surface water9,348,468Promega1,077,384,66626,42613,12620,9284,860,085Yes5,063,8122StandardSAMEA6595205GCA_902810045
SC4532005-8-15Upshore522,987Promega and phenol-CHCl31,231,172,69550,0597,01011,0724,863,138No5,308,2394StandardSAMEA6595231GCA_902810175
SC3072005-6-15Surface water2,670,106Promega1,216,450,68026,66110,59519,8654,892,106Yes5,221,1065StandardSAMEA6596823GCA_902809955
SC4002005-8-15Surface water9,809,258Promega1,138,079,05521,52116,54325,6904,924,724Yes5,065,6882StandardSAMEA6595210GCA_902809905
SC4892005-9-19Surface water779,700Phenol-CHCl3530,260,83973,4245,8067,7794,929,025No5,008,1684StandardSAMEA6595247GCA_902810115
SC4692005-9-19Surface water399,522Phenol-CHCl3723,447,74340,5129,77213,7664,940,057Yes5,129,8183StandardSAMEA6595240GCA_902810165
SC4022005-8-15Surface water3,305,579Promega1,103,528,65125,92714,10420,8754,944,324Yes5,085,2872StandardSAMEA6595211GCA_902810085
SC4062005-8-15Surface water700,028Promega and phenol-CHCl32,471,779,99025,8208,15219,6134,958,102Yes4,958,1021StandardSAMEA6595213GCA_902810285
SC4542005-8-15Upshore1,653,791Phenol-CHCl31,010,953,79443,1077,74113,0394,982,834Yes4,988,3862AlternativeSAMEA6595232GCA_902810105
SC4412005-8-15Waterline738,159Phenol-CHCl3988,454,25946,0486,22511,4354,986,040Yes5,022,4792AlternativeSAMEA6595226GCA_902810245
SC4462005-8-15Waterline1,037,860Phenol-CHCl3451,143,11935,49913,07114,5804,986,746Yes4,997,0682AlternativeSAMEA6595229GCA_902810265
SC4452005-8-15Waterline490,093Phenol-CHCl3260,701,60914,32219,18321,4374,987,469Yes5,033,2612AlternativeSAMEA6595228GCA_902810225
SC4432005-8-15Waterline841,953Phenol-CHCl31,027,640,11431,6879,89717,2654,999,711No5,021,0472AlternativeSAMEA6595227GCA_902810185
SC4102005-8-15Surface water436,740Phenol-CHCl3981,338,09325,37510,87322,4045,001,654Yes5,008,3242StandardSAMEA6595215GCA_902809925
SC4222005-8-15Sediment514,475Phenol-CHCl3611,712,02798,7634,9005,6235,003,951Yes5,003,9511StandardSAMEA6595219GCA_902810275
SC4642005-8-15Upshore874,611Promega and phenol-CHCl33,086,301,29918,3038,99327,8975,023,622Yes5,137,9832StandardSAMEA6595236GCA_902810155
SC4072005-8-15Surface water1,260,500Promega and phenol-CHCl32,703,637,26924,4197,66020,8535,088,866Yes5,088,8661StandardSAMEA6595214GCA_902810295
SC4032005-8-15Surface water1,257,142Promega765,980,79833,27612,83017,2055,089,116Yes5,145,4362StandardSAMEA6595212GCA_902809945
SC3682005-7-31Surface water4,766,180Promega910,447,79848,1746,45411,0265,101,998Yes5,101,9981StandardSAMEA6595206GCA_902810305
SC4182005-8-15Sediment665,222Promega and phenol-CHCl32,677,454,94524,2827,75221,1405,222,289Yes5,222,2891StandardSAMEA6595217GCA_902810335

Strain identification (ID) and sampling information were taken from reference 6. Strains are sorted by chromosome length.

Phenol-CHCl3 indicates that phenol-chloroform extraction was used; Promega indicates that the Promega Wizard DNA extraction kit was used.

Filtered read N50 indicates the N50 value after Filtlong (https://github.com/rrwick/filtlong) was used to retain only 500 Mbp from each strain.

Chromosome length indicates the length of the longest contig, assumed to be the chromosome.

Circular chromosome indicates whether the chromosome is a single circular contig.

Genome length indicates the sum of the lengths of all contigs.

rRNA orientation indicates the orientation of the seven ribosomal operons in E. coli, as assessed by socru (11).

Genome statistics for all 47 assemblies Strain identification (ID) and sampling information were taken from reference 6. Strains are sorted by chromosome length. Phenol-CHCl3 indicates that phenol-chloroform extraction was used; Promega indicates that the Promega Wizard DNA extraction kit was used. Filtered read N50 indicates the N50 value after Filtlong (https://github.com/rrwick/filtlong) was used to retain only 500 Mbp from each strain. Chromosome length indicates the length of the longest contig, assumed to be the chromosome. Circular chromosome indicates whether the chromosome is a single circular contig. Genome length indicates the sum of the lengths of all contigs. rRNA orientation indicates the orientation of the seven ribosomal operons in E. coli, as assessed by socru (11). We used the long-read assembler Flye v2.4.2 (8) for genome assembly. We polished the assemblies using four rounds of long-read polishing with Pilon v1.23 (9), followed by two rounds of short-read polishing with Racon v1.3.2 with the following parameter changes: gap penalty increased to −8, match score increased to 8, and mismatch score increased to −6 (10). The contigs were left as linear if not circularized by Flye, with no reorientation. We confirmed the structural accuracy of each genome using socru v2.1.7 (11) to assess the order and orientation of the seven rRNA operons (Table 1). In all cases, the rRNA operons were found in standard or known orientations, supporting the structural accuracy of these genomes. All software was run using default parameters, unless otherwise specified. The genomes range in length from 4.4 Mbp to 5.2 Mbp. Under the assumption that any nonchromosomal contigs are plasmids, 17 isolates contained no plasmids (i.e., only one chromosomal contig), 21 isolates contained a single plasmid, and 10 isolates contained multiple plasmids (Table 1). The 47 complete genomes produced in this study provide a resource for insight into environmental adaptation and the genome dynamics of repetitive mobile genetic elements in E. coli.

Data availability.

The complete sequences and reads for these isolates have been deposited in the ENA database, and the accession numbers are listed in Table 1.
  9 in total

1.  Presence and growth of naturalized Escherichia coli in temperate soils from Lake Superior watersheds.

Authors:  Satoshi Ishii; Winfried B Ksoll; Randall E Hicks; Michael J Sadowsky
Journal:  Appl Environ Microbiol       Date:  2006-01       Impact factor: 4.792

2.  Assembly of long, error-prone reads using repeat graphs.

Authors:  Mikhail Kolmogorov; Jeffrey Yuan; Yu Lin; Pavel A Pevzner
Journal:  Nat Biotechnol       Date:  2019-04-01       Impact factor: 54.908

3.  Genome sequencing of environmental Escherichia coli expands understanding of the ecology and speciation of the model bacterial species.

Authors:  Chengwei Luo; Seth T Walk; David M Gordon; Michael Feldgarden; James M Tiedje; Konstantinos T Konstantinidis
Journal:  Proc Natl Acad Sci U S A       Date:  2011-04-11       Impact factor: 11.205

4.  Phylogenetic distribution of traits associated with plant colonization in Escherichia coli.

Authors:  Guillaume Méric; E Katherine Kemsley; Daniel Falush; Elizabeth J Saggers; Sacha Lucchini
Journal:  Environ Microbiol       Date:  2012-08-30       Impact factor: 5.491

5.  Clonal divergence in Escherichia coli as a result of recombination, not mutation.

Authors:  D S Guttman; D E Dykhuizen
Journal:  Science       Date:  1994-11-25       Impact factor: 47.728

6.  Phylogenetic background and habitat drive the genetic diversification of Escherichia coli.

Authors:  Marie Touchon; Amandine Perrin; Jorge André Moura de Sousa; Belinda Vangchhia; Samantha Burn; Claire L O'Brien; Erick Denamur; David Gordon; Eduardo Pc Rocha
Journal:  PLoS Genet       Date:  2020-06-12       Impact factor: 5.917

7.  Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement.

Authors:  Bruce J Walker; Thomas Abeel; Terrance Shea; Margaret Priest; Amr Abouelliel; Sharadha Sakthikumar; Christina A Cuomo; Qiandong Zeng; Jennifer Wortman; Sarah K Young; Ashlee M Earl
Journal:  PLoS One       Date:  2014-11-19       Impact factor: 3.240

8.  Fast and accurate de novo genome assembly from long uncorrected reads.

Authors:  Robert Vaser; Ivan Sović; Niranjan Nagarajan; Mile Šikić
Journal:  Genome Res       Date:  2017-01-18       Impact factor: 9.043

9.  socru: typing of genome-level order and orientation around ribosomal operons in bacteria.

Authors:  Andrew J Page; Emma V Ainsworth; Gemma C Langridge
Journal:  Microb Genom       Date:  2020-06-25
  9 in total
  2 in total

1.  Efficiency of the synthetic self-splicing RiboJ ribozyme is robust to cis- and trans-changes in genetic background.

Authors:  Markéta Vlková; Bhargava Reddy Morampalli; Olin K Silander
Journal:  Microbiologyopen       Date:  2021-08       Impact factor: 3.139

2.  Gene regulation in Escherichia coli is commonly selected for both high plasticity and low noise.

Authors:  Markéta Vlková; Olin K Silander
Journal:  Nat Ecol Evol       Date:  2022-06-20       Impact factor: 19.100

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.