| Literature DB >> 28348859 |
Danielle J Ingle1,2,3, Mary Valcanis4, Alex Kuzevski4, Marija Tauschek2, Michael Inouye1,5, Tim Stinear2, Myron M Levine6, Roy M Robins-Browne2,7, Kathryn E Holt1,3.
Abstract
The lipopolysaccharide (O) and flagellar (H) surface antigens of Escherichia coli are targets for serotyping that have traditionally been used to identify pathogenic lineages. These surface antigens are important for the survival of E. coli within mammalian hosts. However, traditional serotyping has several limitations, and public health reference laboratories are increasingly moving towards whole genome sequencing (WGS) to characterize bacterial isolates. Here we present a method to rapidly and accurately serotype E. coli isolates from raw, short read WGS data. Our approach bypasses the need for de novo genome assembly by directly screening WGS reads against a curated database of alleles linked to known and novel E. coli O-groups and H-types (the EcOH database) using the software package srst2. We validated the approach by comparing in silico results for 197 enteropathogenic E. coli isolates with those obtained by serological phenotyping in an independent laboratory. We then demonstrated the utility of our method to characterize isolates in public health and clinical settings, and to explore the genetic diversity of >1500 E. coli genomes from multiple sources. Importantly, we showed that transfer of O- and H-antigen loci between E. coli chromosomal backbones is common, with little evidence of constraints by host or pathotype, suggesting that E. coli 'strain space' may be virtually unlimited, even within specific pathotypes. Our findings show that serotyping is most useful when used in combination with strain genotyping to characterize microevolution events within an inferred population structure.Entities:
Keywords: E. coli; diversity; genotype; phenotype; serotype
Mesh:
Substances:
Year: 2016 PMID: 28348859 PMCID: PMC5343136 DOI: 10.1099/mgen.0.000064
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Datasets used to assess accuracy and utility of the EcOH database
| Isolates | Citation | ENA or NCBI | No. of genomes | Population | Sequencing centre |
|---|---|---|---|---|---|
| EPEC (atypical EPEC) | ERP001141 (ENA) | 197 | Diverse | Sanger, UK | |
| UPEC | ERP001354 (ENA) | 169 | Clonal, | UQ, Australia | |
| GenomeTrakr | PRJNA183844 (NCBI) | 300 | Diverse | Food and Drug Administration, USA | |
| GenomeTrakr | PRJNA183844 (NCBI) | 1000 | Diverse | Food and Drug Administration, USA | |
| ETEC | Various, see | 362 | Diverse | Sanger, UK |
Comparison of serotype calls in 197 EPEC isolates from serological phenotyping and in silico serotyping using the EcOH database with srst2 and blast+
Phenotype, srst2 and blast+ refer to the method used to determine O-group or H-type.The SRST2/BLAST+ column refers to isolates that received a confident genotype call using both the SRST2 and BLAST+ methods. The breakdown of in silico analysis into Phenotype and No phenotype refers to number of confident calls made for isolates with a serological phenotype and those without, respectively.
| Antigen | Phenotype* | Genotype (EcOH)† | |||||
|---|---|---|---|---|---|---|---|
| O-group | Phenotype | ||||||
| 144 (73 %) | 182 (92 %) | 180 (91 %) | 179 (91 %) | ||||
| Phenotype | No phenotype | Phenotype | No phenotype | Phenotype | No phenotype | ||
| 137 | 45 | 135 | 45 | 134 | 45 | ||
| H-type | 128 (65 %) | 194 (95 %) | 179 (91 %) | 179 (91 %) | |||
| Phenotype | No phenotype | Phenotype | No phenotype | Phenotype | No phenotype | ||
| 127 | 67 | 120 | 59 | 120 | 59 | ||
*O-group or H-type determined by serological phenotyping in a reference laboratory.
†O-group or H-type determined by in silico analysis using the EcOH database.
Fig. 1.In silico prediction of serotype and antimicrobial resistance for UPEC ST131 isolates. Core genome SNP tree for 170 UPEC ST131 isolates, outgroup-rooted using ST95 isolates. Serotype and acquired antimicrobial resistance gene profiles detected using srst2 are indicated by rings surrounding the phylogeny. Low-confidence serotype calls, which are based on limited read depth, are shown in paler colours to indicate uncertainty.
Fig. 2.In silico prediction of serotype and ST from GenomeTrakr data. Shown is the kmer-based tree obtained from the GenomeTrakr project; common STs or CCs, identified via MLST analysis using srst2, are highlighted and labelled. Rings show predicted O-group (inner ring wzm/wzt genes and middle ring wzx/wzy genes) and H-type (outer ring).
Fig. 3.Population structure and collector’s curves showing genetic diversity among 1547 E. coli isolates. (a) eBURST analysis of MLST data for 1547 E. coli isolates identified 55 eBURST groups consisting of STs linked by SLVs (shown) and 178 unrelated STs (not shown). Circles indicate unique STs; SLVs (i.e. STs that share 6/7 alleles) are linked together to form eBURST groups. Circle sizes are proportionate to the number of isolates observed for each ST. The sources of isolates of each ST are shown in pie charts, using the colour code shown in the inset legend. (b) Collector’s curves illustrating the rise in the total number of distinct types observed as more E. coli isolates are included in the analysis. (c) Separate collector’s curves for two diarrhoeagenic E. coli pathotypes, aEPEC and ETEC.