| Literature DB >> 25860693 |
Dominic Lambert1, Catherine D Carrillo1, Adam G Koziol1, Paul Manninger1, Burton W Blais1.
Abstract
The timely identification and characterization of foodborne bacteria for risk assessment purposes is a key operation in outbreak investigations. Current methods require several days and/or provide low-resolution characterization. Here we describe a whole-genome-sequencing (WGS) approach (GeneSippr) enabling same-day identification of colony isolates recovered from investigative food samples. The identification of colonies of priority Shiga-toxigenic Escherichia coli (STEC) (i.e., serogroups O26, O45, O103, O111, O121, O145 and O157) served as a proof of concept. Genomic DNA was isolated from single colonies and sequencing was conducted on the Illumina MiSeq instrument with raw data sampling from the instrument following 4.5 hrs of sequencing. Modeling experiments indicated that datasets comprised of 21-nt reads representing approximately 4-fold coverage of the genome were sufficient to avoid significant gaps in sequence data. A novel bioinformatic pipeline was used to identify the presence of specific marker genes based on mapping of the short reads to reference sequence libraries, along with the detection of dispersed conserved genomic markers as a quality control metric to assure the validity of the analysis. STEC virulence markers were correctly identified in all isolates tested, and single colonies were identified within 9 hrs. This method has the potential to produce high-resolution characterization of STEC isolates, and whole-genome sequence data generated following the GeneSippr analysis could be used for isolate identification in place of lengthy biochemical characterization and typing methodologies. Significant advantages of this procedure include ease of adaptation to the detection of any gene marker of interest, as well as to the identification of other foodborne pathogens for which genomic markers have been defined.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25860693 PMCID: PMC4393293 DOI: 10.1371/journal.pone.0122928
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Timelines for detection of STEC in food testing laboratories.
In the standard approach (Days 1 to 3), samples taken from foods (e.g., ground beef) are added to enrichment broths developed to favor growth of STEC. Following the enrichment procedure, broth cultures are screened for STEC by PCR, and positive samples are plated on agar media [4]. On the third day, putative STEC colonies are identified by PCR screening. The EHEC-7 CHAS (top line) is used to confirm presence of genomic targets identifying colonies as STEC (e.g., O-type, Shiga-toxin, eae) and confirmed priority STEC are shipped to specialized facilities for typing by MLVA and/or PFGE. In the GeneSippr approach (lower line), presumptively positive STEC colonies are identified by whole genome sequencing within similar time frames as the standard method. Following the completion of the sequencing run, whole genome sequence (WGS) data could be assembled and/or shared with public health agencies for use in high-resolution typing methods such as whole-genome MLST (wgMLST).
Sequences of e-probes used in this study.
| Reference | Gene target | e-probe Sequence |
|---|---|---|
| EHEC-7 CHAS [ |
|
|
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
| GDCS |
|
|
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
| GDCS |
|
|
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
|
|
| Synthetic construct, MyIC [ | My-IC 1 |
|
| My-IC 2 |
|
a Genomically Dispersed Conserved Sequences (GDCS)
Strains and results of GeneSippr analysis.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EDL 933 |
| - | - | - | - | - | - | + | + | + | + | 15 | 3.0 | 1.6 | 6.9 |
| EC20040078/ Sakai |
| - | - | - | - | - | - | + | + | + | + | 15 | 19.2 | 1.9 | 7.9 |
| OLC-464 |
| + | - | - | - | - | - | - | + | - | + | 15 | 42.8 | 2.8 | 11.9 |
| OLC-683 |
| + | - | - | - | - | - | - | - | - | + | 15 | 212 | 0.1 | 0.6 |
| OLC-731 |
| + | - | - | - | - | - | - | + | + | + | 15 | 201 | 3.5 | 14.7 |
| OLC-716 |
| - | + | - | - | - | - | - | + | - | + | 15 | 41.6 | 2.5 | 10.5 |
| OLC-975 |
| - | + | - | - | - | - | - | - | - | - | 15 | 17.8 | 2.2 | 9.4 |
| OLC-679 |
| - | - | + | - | - | - | - | + | - | + | 15 | 44.6 | 2.2 | 9.4 |
| OLC-728 |
| - | - | + | - | - | - | - | + | - | + | 15 | 167 | 1.0 | 4.4 |
| OLC-455 |
| - | - | - | + | - | - | - | + | - | + | 15 | 41.3 | 1.8 | 7.4 |
| OLC-715 |
| - | - | - | + | - | - | - | + | + | + | 15 | 286 | 0.3 | 1.1 |
| OLC-682 |
| - | - | - | + | - | - | - | - | - | + | 15 | 132 | 4.9 | 20.6 |
| OLC-710 |
| - | - | - | - | + | - | - | - | + | + | 15 | 39.6 | 3.0 | 12.8 |
| OLC-791 |
| - | - | - | - | + | - | - | - | + | + | 15 | 208 | 3.3 | 13.9 |
| OLC-675 |
| - | - | - | - | - | + | - | + | - | + | 15 | 39.0 | 3.1 | 12.9 |
| OLC-684 |
| - | - | - | - | - | + | - | - | - | + | 15 | 245 | 0.3 | 1.1 |
| OLC-469 |
| - | - | - | - | - | - | + | + | + | + | 15 | 70.2 | 5.2 | 21.6 |
| OLC-797 |
| - | - | - | - | - | - | + | + | + | + | 15 | 40.7 | 2.5 | 10.4 |
| OLC-1470 |
| - | - | - | - | - | - | + | + | + | + | 15 | 8.3 | 1.8 | 7.4 |
| OLC-733 |
| - | - | - | - | - | - | - | - | + | - | 15 | 46.8 | 0.8 | 3.4 |
| OLC-816 |
| - | - | - | - | - | - | - | - | + | - | 15 | 121 | 1.6 | 6.5 |
| OLC-721 |
| - | - | - | - | - | - | - | - | + | - | 15 | 171 | 4.1 | 17.0 |
| OLC-1051 |
| - | - | - | - | - | - | - | + | - | - | 15 | 32.4 | 3.7 | 15.7 |
| OLC-732 |
| - | - | - | - | - | - | - | - | + | + | 15 | 99.6 | 6.0 | 25.4 |
| OLC-1547 | generic | - | - | - | - | - | - | - | - | - | - | 15 | 41.3 | 2.1 | 9.0 |
| OLC-1555 | generic | - | - | - | - | - | - | - | - | - | - | 15 | 13.0 | 2.8 | 11.7 |
| OLC-1682 |
| - | - | - | - | - | - | - | - | - | - | 5 | 15.1 | 1.4 | 6.1 |
| OLC-1683 |
| - | - | - | - | - | - | - | - | - | - | 5 | 2.9 | 2.1 | 8.6 |
aBased on strain characterization/CHAS results in previous work [4]
bTotal genomic DNA isolated from a single colony
cNumber of reads generate (in millions)
dEstimated fold coverage of genome achieved with 21-nt reads
*low coverage was observed for three strains in one run. Analysis was repeated every hour until QC targets were identified. For strains OLC-683, OLC-715, and OLC-684, QC and virulence targets were identified at cycle 125, cycle 175 and cycle 41, respectively.
Fig 2Minimum read length required for accurate identification of target sequences.
The genome of E. coli Sakai (EC20040078) was used to randomly generate triplicate datasets of 0.5M (top left), 1M (top right), 1.5M (bottom left) and 2M simulated reads (bottom right) for twelve read lengths ranging from 18 to 50 nt (144 datasets). The reads from individual datasets were then mapped to the target sequences, and the mean percentage of sequence identity (MPI) was calculated for each dataset. The average and standard deviation of the three mean percentage of sequence identity obtained for each dataset are shown. An MPI above 90% was used as the threshold for accurate identification.
Fig 3Impact of coverage depth on identification of target sequences using 21-nt simulated reads.
The genome of E. coli Sakai (EC20040078) was used to randomly generate triplicate datasets of simulated 21-nt reads for twelve depths of coverage (36 datasets). The reads from individual datasets were then mapped to the target sequences, and the mean percentage of sequence identity (MPI) was calculated for each dataset. The average and standard deviation of the three MPI values obtained for each dataset at different depths of coverage are shown. An MPI above 90% was used as the threshold for accurate identification.
Fig 4Comprehensive coverage of the E. coli Sakai reference genome with 21-nt reads.
A. Mapping of 21-nt sequencing reads to the E. coli Sakai (EC20040078) reference genome indicates that a depth of coverage of 4 ensures that >95% of the genome is covered. The 21-nt reads sampled in real time (×) during two sequencing runs, reads trimmed to 21 nt (open circle) from two experimental runs (and the result of combining the two), and simulated 21-nt reads were mapped against the closed E. coli Sakai (EC20040078) reference genome. The percentage of the reference genome covered by the reads from each sequencing run and simulations are shown. B. Mapping of 21-nt experimental reads sampled in real time to the E. coli Sakai reference genome demonstrates the absence of significantly large gaps. The 21-nt reads sampled in real time from the sequencing run that provided the best genome coverage were mapped against the E. coli Sakai reference genome (EC20040078) and gaps of all sizes (bp) were counted using a custom script. Frequency of each gap size is indicated. No gaps larger than 61 bases were observed.