| Literature DB >> 35083165 |
Xiaomei Zhang1, Michael Payne1, Sandeep Kaur1, Ruiting Lan1.
Abstract
Shiga toxin-producing Escherichia coli (STEC) have more than 470 serotypes. The well-known STEC O157:H7 serotype is a leading cause of STEC infections in humans. However, the incidence of non-O157:H7 STEC serotypes associated with foodborne outbreaks and human infections has increased in recent years. Current detection and serotyping assays are focusing on O157 and top six ("Big six") non-O157 STEC serogroups. In this study, we performed phylogenetic analysis of nearly 41,000 publicly available STEC genomes representing 460 different STEC serotypes and identified 19 major and 229 minor STEC clusters. STEC cluster-specific gene markers were then identified through comparative genomic analysis. We further identified serotype-specific gene markers for the top 10 most frequent non-O157:H7 STEC serotypes. The cluster or serotype specific gene markers had 99.54% accuracy and more than 97.25% specificity when tested using 38,534 STEC and 14,216 non-STEC E. coli genomes, respectively. In addition, we developed a freely available in silico serotyping pipeline named STECFinder that combined these robust gene markers with established E. coli serotype specific O and H antigen genes and stx genes for accurate identification, cluster determination and serotyping of STEC. STECFinder can assign 99.85% and 99.83% of 38,534 STEC isolates to STEC clusters using assembled genomes and Illumina reads respectively and can simultaneously predict stx subtypes and STEC serotypes. Using shotgun metagenomic sequencing reads of STEC spiked food samples from a published study, we demonstrated that STECFinder can detect the spiked STEC serotypes, accurately. The cluster/serotype-specific gene markers could also be adapted for culture independent typing, facilitating rapid STEC typing. STECFinder is available as an installable package (https://github.com/LanLab/STECFinder) and will be useful for in silico STEC cluster identification and serotyping using genome data.Entities:
Keywords: STEC O157:H7; STEC phylogenetic clusters; STEC serotyping; cluster/serotype-specific gene markers; in silico STEC tying pipeline STECFinder; metagenomics; non-O157:H7 STEC serotypes
Mesh:
Substances:
Year: 2022 PMID: 35083165 PMCID: PMC8785982 DOI: 10.3389/fcimb.2021.772574
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 5.293
Figure 1in silico serotyping pipeline workflow. Schematic of in silico serotyping STEC by cluster/serotype-specific genes combined with the ipaH gene, stx genes including all available subtypes and E. coli O antigen and H antigen genes, implemented in STECFinder. Both assembled genomes and raw reads are accepted as data input.
Figure 2The frequency of 460 STEC serotypes. The graph shows the frequency of 460 STEC serotypes. STEC O157:H7 and top 28 non-O157:H7 serotypes are listed separately. The number on top of each stacked column refers to the number of isolates for each serotype.
Figure 3STEC cluster identification phylogenetic tree. Representative isolates from the identification dataset were used to construct the phylogenetic tree by Quicktree v1.3 to identify STEC (Shiga toxin-producing E. coli) clusters and visualised using Grapetree. The dendrogram shows the phylogenetic relationships of 2,567 STEC isolates represented in the identification dataset. Branch lengths are log scale for clarity. The scale bar represents 0.1 substitutions per site. STEC clusters are coloured. Numbers in square brackets after cluster name are the number of isolates for each identified cluster. ECOLI is E. coli. EIEC is Enteroinvasive E. coli. MC indicates a minor STEC cluster.
Major STEC clusters identified in identification dataset.
| Cluster | No. of isolates | No. of serotypes | No. of STs | Top 28 non-O157:H7 serotypes* |
|---|---|---|---|---|
| O157:H7 | 356 | 1 | 83 | O157:H7 |
| C1 | 414 | 30 | 97 | 1-O26:H11, 3-O111:H8, 8-O118/O151:H16, |
| 12-O71:H11, 15-O103:H11, 18-O69:H11 | ||||
| C2 | 181 | 16 | 42 | 2-O103:H2, 6-O45:H2, 9-O123/O186:H2 |
| 11-O118/O151:H2 | ||||
| C3 | 45 | 18 | 12 | 19-O103:H25, 25-O156:H25, 6-O45:H2 |
| C4 | 89 | 14 | 21 | 13-O5:H9, 20-O165:H25, 24-O177:H25 |
| C5 | 29 | 1 | 5 | 4-O121:H19 |
| C6 | 41 | 1 | 6 | 5-O145:H28 |
| C7 | 40 | 2 | 13 | 7-O91:H14 |
| C8 | 40 | 1 | 14 | 10-O146:H21 |
| C9 | 4 | 1 | 1 | 10-O146:H21 |
| C10 | 50 | 2 | 15 | 14-O128ab:H2 |
| C11 | 27 | 1 | 6 | 16-O117:H7 |
| C12 | 21 | 1 | 6 | 17-O76:H19 |
| C13 | 10 | 1 | 7 | 21-O113:H21 |
| C14 | 16 | 2 | 2 | 22-O113:H4 |
| C15 | 5 | 1 | 1 | 23-O104:H4 |
| C16 | 14 | 1 | 4 | 26-O8:H19 |
| C17 | 24 | 11 | 7 | 27-O130:H11 |
| C18 | 6 | 1 | 1 | 28-O55:H7 |
*The serotypes in each non-O157:H7 cluster are listed with their rank by isolate frequency for the top 28 non-O157:H7 serotypes followed by the serotype.
Summary of identified STEC minor clusters in identification dataset.
| Phylogroup | No. of MC* | Name of MC | No. of isolates | No. of serotypes | No. of STs |
|---|---|---|---|---|---|
| A | 37 | AM1-AM37 | 139 | 64 | 42 |
| B1 | 126 | B1M1-B1M126 | 519 | 157 | 186 |
| B2 | 14 | B2M1-B2M14 | 35 | 20 | 17 |
| C | 7 | CM1-CM7 | 17 | 10 | 8 |
| D | 22 | DM1-DM22 | 67 | 26 | 29 |
| E | 19 | EM1-EM19 | 73 | 26 | 34 |
| G | 4 | GM1-GM4 | 27 | 12 | 12 |
*MC, minor clusters.
Figure 4The frequency of the top 28 non-O157:H7 STEC serotypes in STEC major clusters. The graph shows the frequency of top 28 non-O157:H7 serotypes in the 18 STEC major clusters. Clusters are shown per colour legend and also at the top of the bar. X-axis shows the serotype while y-axis shows the number of isolates.
The sensitivity and specificity of STEC cluster/serotype-specific gene markers.
| Clusters | Cluster-specific genes marker sets | Identification dataset (3,258 isolates) | ||
|---|---|---|---|---|
| No. of isolates | Sensitivity | Specificity* | ||
| O157:H7 | Set of 6 genes | 356 | 100 | 99.72 |
| C1 | Set of 4 genes | 414 | 100 | 99.82 |
| C2 | Set of 4 genes | 181 | 100 | 99.97 |
| C3 | Set of 3 genes | 45 | 100 | 100 |
| C4 | Set of 3 genes | 89 | 100 | 99.97 |
| C5 | Set of 4 genes | 29 | 100 | 100 |
| C6 | Set of 3 genes | 41 | 100 | 99.88 |
| C7 | Set of 4 genes | 40 | 100 | 99.97 |
| C8 | Set of 5 genes | 40 | 100 | 99.97 |
| C9 | Set of 2 genes | 4 | 100 | 100 |
| C10 | Set of 2 genes | 50 | 100 | 100 |
| C11 | Single gene | 27 | 100 | 100 |
| C12 | Set of 2 genes | 21 | 100 | 100 |
| C13 | Set of 4 genes | 10 | 100 | 100 |
| C14 | Set of 4 genes | 16 | 100 | 99.97 |
| C15 | Set of 2 genes | 5 | 100 | 100 |
| C16 | Set of 4 genes | 14 | 100 | 99.97 |
| C17 | Set of 3 genes | 24 | 100 | 99.97 |
| C18 | Set of 3 genes | 6 | 100 | 99.97 |
| O26:H11 | Set of 6 genes | 204 | 100 | 99.41 |
| O103:H2 | Set of 4 genes | 121 | 100 | 99.87 |
| O111:H8 | Set of 3 genes | 96 | 100 | 100 |
| O45:H2 (C2) | Set of 5 genes | 22 | 100 | 99.97 |
| O45:H2 (C3) | Set of 3 genes | 1 | 100 | 100 |
| O118/O156:H16 | Set of 4 genes | 17 | 100 | 99.94 |
| O123/O186:H2 | Set of 3 genes | 21 | 100 | 100 |
*The specificity of cluster-specific gene set less than 100% was due to at least one false positive found in that set.