| Literature DB >> 23935901 |
Barry G Hall1, Heliodoro Cardenas, Miriam Barlow.
Abstract
In clinical settings it is often important to know not just the identity of a microorganism, but also the danger posed by that particular strain. For instance, Escherichia coli can range from being a harmless commensal to being a very dangerous enterohemorrhagic (EHEC) strain. Determining pathogenic phenotypes can be both time consuming and expensive. Here we propose a simple, rapid, and inexpensive method of predicting pathogenic phenotypes on the basis of the presence or absence of short homologous DNA segments in an isolate. Our method compares completely sequenced genomes without the necessity of genome alignments in order to identify the presence or absence of the segments to produce an automatic alignment of the binary string that describes each genome. Analysis of the segment alignment allows identification of those segments whose presence strongly predicts a phenotype. Clinical application of the method requires nothing more that PCR amplification of each of the set of predictive segments. Here we apply the method to identifying EHEC strains of E. coli and to distinguishing E. coli from Shigella. We show in silico that with as few as 8 predictive sequences, if even three of those predictive sequences are amplified the probability of being EHEC or Shigella is >0.99. The method is thus very robust to the occasional amplification failure for spurious reasons. Experimentally, we apply the method to screening a set of 98 isolates to distinguishing E. coli from Shigella, and EHEC from non-EHEC E. coli strains and show that all isolates are correctly identified.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23935901 PMCID: PMC3720857 DOI: 10.1371/journal.pone.0068901
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Minimum spanning tree based on complete genomes excluding plasmids.
Pathogenicity phenotypes are indicated by colors. The pathogenicity of uncolored strains is not known. Full strain IDs and accession numbers of the genome sequence files are given in Table S1.
Figure 2Minimum spanning tree based on complete genomes including plasmids.
Pathogenicity phenotypes are indicated by colors. The pathogenicity of uncolored strains is not known. Full strain IDs and accession numbers of the genome sequence files are given in Table S1.
Figure 3Determination of segments from bops.
A: Part of a .scores file in which binary strings indicate the presence or absence of a series of bops for strains A–E. Numbered lines above the string show contiguous bops that are identically distributed among strains A–E. B: A corresponding .segScores file in which the binary strings indicate the presence of absence of the segments shown in panel A.
EHEC strains.
| Strain | Probability of being EHEC |
| EcoO157H7_EC4115 | 1.00 |
| EcoO157H7_EDL933 | 0.99 |
| EcoO157H7_Sakai | 0.99 |
| EcoO157H7_TW14359 | 1.00 |
| EcoO26H11_str11368 | 0.84 |
| EcoO103H2_12009 | 0.85 |
| EcoO111H-_11128 | 0.88 |
Segments used to search the nr database.
| Segment | Segment ID | Length (bp) | Number of EHEC hits | Number of non-EHEC hits | p | Comment |
| 1 | 10254 | 734 | 7 | 0 | 1.0 | + |
| 2 | 10258 | 6,924 | 7 | 1 | 0.875 | + |
| 3 | 10261 | 200 | 7 | 2 | 0.778 | |
| 4 | 10263 | 270 | 7 | 1 | 0.875 | + |
| 5 | 10314 | 600 | 7 | 1 | 0.875 | + |
| 6 | 10375 | 108 | 7 | 1 | 0.875 | |
| 7 | 10380 | 196 | 7 | 2 | 0.778 | |
| 8 | 10391 | 217 | 7 | 2 | 0.778 | |
| 9 | 10393 | 400 | 7 | 0 | 1.0 | + |
| 10 | 10396 | 800 | 7 | 0 | 1.0 | + |
| 11 | 10398 | 141 | 7 | 1 | 0.875 | |
| 12 | 10402 | 636 | 7 | 0 | 1.0 | + |
| 13 | 10545 | 12,387 | 7 | 12 | 0.368 | |
| 14 | 10547 | 1,744 | 7 | 3 | 0.700 | |
| 15 | 10549 | 376 | 7 | 1 | 0.875 | + |
| 16 | 10551 | 5,590 | 7 | 14 | 0.333 | |
| 17 | 10605 | 134 | 7 | 0 | 1.0 | |
| 18 | 11916 | 138 | 7 | 1 | 0.875 |
Hits align over at least 50% of the query length.
p is the probability that a hit is EHEC.
Includes Citrobacter rodentium ICC168, a strain that is known to have acquired EHEC and EPEC associated sequences from E. coli [21].
Hits in bacteriophage genomes were not counted.
Plus sign indicates that length is ≥200 bp and p is >0.80. Segments indicated by + would be useful as PCR probes to detect EHEC strains.
Experimental reliability of EHEC-specific and Shigella-specific PCR probes.
| EHEC-specific PCR probes | ||
| Probe | Number of amplicons | Number of amplicons not in EHEC strain |
| EHEC 1 | 48 | 0 |
| EHEC 2 | 27 | 0 |
| EHEC 3 | 0 | 0 |
| EHEC 4 | 44 | 2 |
| EHEC 5 | 49 | 0 |
| EHEC 6 | 47 | 0 |
| EHEC 7 | 0 | 0 |
| EHEC 8 | 0 | 0 |
In silico reliability of EHEC-specific and Shigella-specific PCR probes.
| EHEC-specific PCR probes | ||
| Probe | Number of hits | Number of hits not in EHEC strain |
| EHEC 1 | 10 | 1 |
| EHEC 2 | 10 | 1 |
| EHEC 3 | 9 | 0 |
| EHEC 4 | 9 | 0 |
| EHEC 5 | 9 | 0 |
| EHEC 6 | 9 | 0 |
| EHEC 7 | 10 | 1 |
| EHEC 8 | 10 | 1 |
Hits in a BLAST search of the GenBank non-redundant nucleotide database.
Probe reliabilities.a
| EHEC probe | Probability not EHEC |
| Probability not |
| EHEC 1 | 0.018 | Shi 1 | 0.039 |
| EHEC 2 | Unreliable | Shi 2 | 0.04 |
| EHEC 3 | Unreliable | Shi 3 | 0.042 |
| EHEC 4 | 0.039 | Shi 4 | 0.107 |
| EHEC 5 | 0 | Shi 5 | 0.037 |
| EHEC 6 | 0 | Shi 6 | 0.04 |
| EHEC 7 | Unreliable | Shi 7 | 0.04 |
| EHEC 8 | Unreliable | Shi 8 | 0.039 |
Combined results from tables 3 and 4.
Probability that a strain with a hit or amplicon from this probe is not EHEC.
Probability that a strain with a hit or amplicon from this probe is not Shigella.