Literature DB >> 32972949

Draft Genome Sequences of Isolates of Diverse Host Origin from the E. coli Reference Center at Penn State University.

David W Lacher¹, Mark K Mammel², Jayanthi Gangiredla², Solomon T Gebru², Tammy J Barnaba², Sydney A Majowicz³, Edward G Dudley^3,4.

Abstract

Escherichia coli strains present a vast genomic diversity. We report the draft genome sequences of 1,000 isolates from the E. coli Reference Center at Penn State University. These strains were originally isolated from multiple animal and environmental sources over the past 50 years.

Entities: Gene Species

Year: 2020 PMID： 32972949 PMCID： PMC7516160 DOI： 10.1128/MRA.01005-20

Source DB: PubMed Journal: Microbiol Resour Announc ISSN： 2576-098X

ANNOUNCEMENT

Members of the genus Escherichia, specifically Escherichia coli, include pathogenic and nonpathogenic strains. The ability to differentiate these two groups of E. coli has an impact on food safety. As part of the U.S. Food and Drug Administration’s efforts to expand state-of-the-art technology to identify pathogenic E. coli strains, we are developing an in-depth phylogenetic landscape of E. coli that parses these bacteria into different clades. In order to expand this landscape as well as provide further depth, whole-genome sequences are essential. Here, we report the draft genome sequences of 1,000 isolates from the culture collection housed at Penn State University’s E. coli Reference Center. The diverse collection examined in this study contains isolates from animal, environmental, and food sources. E. coli is commonly found as a member of the gut microbiota of warm-blooded organisms and has been isolated from a wide range of animal hosts (1, 2). Phylogenetic analyses have shown that E. coli can be divided into several phylogroups (3, 4), with pathogenic and nonpathogenic strains seemingly randomly distributed among them. This project focuses on the whole-genome sequencing of E. coli isolates from nonhuman animal sources, as well as the environment, that may reveal lineages from nonpathogenic to pathogenic strains. Understanding this evolutionary path may provide molecular insight into the acquisition of virulence attributes from an environmental source. Pure cultures for each strain were grown aerobically overnight in Luria-Bertani broth at 37°C. Total genomic DNA was extracted from 1 ml of overnight culture using the DNeasy blood and tissue kit (Qiagen, Hilden, Germany). DNA extractions were performed with the Qiagen QIAcube instrument using the manufacturer’s Gram-negative bacterium protocol. Sequencing libraries were prepared with 1 ng DNA using the Nextera XT DNA sample prep kit (Illumina, San Diego, CA, USA) and sequenced on either the Illumina MiSeq or NextSeq platform. The resulting paired-end reads (2 × 250 bp for MiSeq, 2 × 150 bp for NextSeq) were quality assessed by FastQC v0.11.8 (5). Low-quality reads were trimmed to a quality threshold of Q > 30, and adapter sequences were removed using the NexteraPE adapter file in Trimmomatic v0.38 (6). The genomes were de novo assembled with SPAdes v3.13.0 (7) using a k-mer size of 55, and assembly quality assessment was performed with QUAST v5.0 (8). The genomes were automatically annotated using the NCBI Prokaryotic Genome Annotation Pipeline (9). Default parameters were used for all software unless otherwise specified. The depth of coverage for the draft genomes ranged from 17× to 161×, with the genomes ranging in size from 4,291,381 to 5,764,740 bp. The number of contigs ranged from 50 to 741, while the N50 values ranged from 16,761 to 315,275 bp. The genomes were placed into one of six categories according to their source, avian, environmental, food, mammal, reptile, or unknown (Table 1). Most (n = 629) of the strains are of mammalian origin, with bovine, porcine, and canine sources being the most common (n = 203, 168, and 92, respectively). Among the 270 isolates of avian origin, chicken and turkey were the most common sources (n = 68 and 60, respectively). Phylogroups were assigned based on the single nucleotide polymorphisms (SNPs) present within 45 genes found in E. coli K-12 MG1655 (GenBank accession number U00096.3). Briefly, the 45 genes were extracted from each assembly and aligned to the sequence from K-12 MG1655 using BLAST. A SNP profile of 45 concatenated sites was then used to assign the phylogroup. Each of the established E. coli phylogroups is represented among the 1,000 genomes, namely, phylogroups A (n = 180), B1 (n = 438), B2 (n = 220), D (n = 69), E (n = 38), and F (n = 23). Twenty isolates belong to one of the following four known “cryptic” lineages of Escherichia (10, 11): lineage 1 (n = 3), lineage 3 (n = 4), lineage 4 (n = 2), and lineage 5 (n = 11). The remaining 12 isolates were classified as undetermined, because their phylogroup could not be assigned using the panel of 45 SNP loci.

TABLE 1

Summary of 1,000 genomes from the E. coli Reference Center

Category	No. of genomes	No. of source species or types	Phylogroup(s) observed	Cryptic lineage(s) observed
Avian	270	18	A, B1, B2, D, E, F	1, 3, 4, 5
Environmental	62	3	A, B1, B2, D, E, F	3, 4, 5
Food	37	6	A, B1, D	None
Mammal	629	41	A, B1, B2, D, E, F	1
Reptile	1	1	A	None
Unknown	1	1	B1	None

Summary of 1,000 genomes from the E. coli Reference Center

Data availability.

The draft genome assemblies were deposited at DDBJ/ENA/GenBank through the FDA’s GenomeTrakr pipeline under BioProject accession number PRJNA357722. The versions described in this announcement are the second versions. A full listing of the source and phylogroup information for the 1,000 genomes can be found at https://doi.org/10.6084/m9.figshare.12885527.v2 (12). A list of the 45 genes and diagnostic SNPs used for phylogroup assignment can be found at https://doi.org/10.6084/m9.figshare.12899765.v1 (13).

10 in total

1. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors: Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal: J Comput Biol Date: 2012-04-16 Impact factor: 1.479

2. QUAST: quality assessment tool for genome assemblies.

Authors: Alexey Gurevich; Vladislav Saveliev; Nikolay Vyahhi; Glenn Tesler
Journal: Bioinformatics Date: 2013-02-19 Impact factor: 6.937

Review 3. The "Cryptic" Escherichia.

Authors: Seth T Walk
Journal: EcoSal Plus Date: 2015

Review 4. Escherichia coli from animal reservoirs as a potential source of human extraintestinal pathogenic E. coli.

Authors: Louise Bélanger; Amélie Garenaux; Josée Harel; Martine Boulianne; Eric Nadeau; Charles M Dozois
Journal: FEMS Immunol Med Microbiol Date: 2011-03-24

5. Phylogenetic distribution of branched RNA-linked multicopy single-stranded DNA among natural isolates of Escherichia coli.

Authors: P J Herzer; S Inouye; M Inouye; T S Whittam
Journal: J Bacteriol Date: 1990-11 Impact factor: 3.490

6. Cryptic lineages of the genus Escherichia.

Authors: Seth T Walk; Elizabeth W Alm; David M Gordon; Jeffrey L Ram; Gary A Toranzos; James M Tiedje; Thomas S Whittam
Journal: Appl Environ Microbiol Date: 2009-08-21 Impact factor: 4.792

7. The Clermont Escherichia coli phylo-typing method revisited: improvement of specificity and detection of new phylo-groups.

Authors: Olivier Clermont; Julia K Christenson; Erick Denamur; David M Gordon
Journal: Environ Microbiol Rep Date: 2012-12-24 Impact factor: 3.541