| Literature DB >> 25910947 |
Annemarie Pielaat1, Martin P Boer2, Lucas M Wijnands3, Angela H A M van Hoek3, El Bouw3, Gary C Barker4, Peter F M Teunis5, Henk J M Aarts3, Eelco Franz3.
Abstract
The potential for using whole genome sequencing (WGS) data in microbiological risk assessment (MRA) has been discussed on several occasions since the beginning of this century. Still, the proposed heuristic approaches have never been applied in a practical framework. This is due to the non-trivial problem of mapping microbial information consisting of thousands of loci onto a probabilistic scale for risks. The paradigm change for MRA involves translation of multidimensional microbial genotypic information to much reduced (integrated) phenotypic information and onwards to a single measure of human risk (i.e. probability of illness). In this paper a first approach in methodology development is described for the application of WGS data in MRA; this is supported by a practical example. That is, combining genetic data (single nucleotide polymorphisms; SNPs) for Shiga toxin-producing Escherichia coli (STEC) O157 with phenotypic data (in vitro adherence to epithelial cells as a proxy for virulence) leads to hazard identification in a Genome Wide Association Study (GWAS). This application revealed practical implications when using SNP data for MRA. These can be summarized by considering the following main issues: optimum sample size for valid inference on population level, correction for population structure, quantification and calibration of results, reproducibility of the analysis, links with epidemiological data, anchoring and integration of results into a systems biology approach for the translation of molecular studies to human health risk. Future developments in genetic data analysis for MRA should aim at resolving the mapping problem of processing genetic sequences to come to a quantitative description of risk. The development of a clustering scheme focusing on biologically relevant information of the microbe involved would be a useful approach in molecular data reduction for risk assessment.Entities:
Keywords: GWAS; Microbiology; Risk assessment; SNP; STEC
Mesh:
Substances:
Year: 2015 PMID: 25910947 PMCID: PMC4613885 DOI: 10.1016/j.ijfoodmicro.2015.04.009
Source DB: PubMed Journal: Int J Food Microbiol ISSN: 0168-1605 Impact factor: 5.277
E. coli O157 strains (n = 38) used in this study and some of their genetic characteristics.
| Strain | Source | Year | LSPA | Intimin ( | clade 8 | SBI | ||
|---|---|---|---|---|---|---|---|---|
| H06 | Human | 2005 | I | + | T | − | na | |
| H07 | Human | 2005 | I | + | T | − | na | |
| H09 | Human | 2005 | I | + | T | − | na | |
| H13 | Human | 2006 | I | + | T | − | 3 | |
| H15 | Human | 2006 | I | + | T | − | 3 | |
| A42 | Bovine | 2002 | I | + | T | − | 3 | |
| H25 | Human | 2006 | I/II | + | T | − | 1 | |
| H27 | Human | 2006 | I/II | + | T | − | 1 | |
| H42 | Human | 2007 | I/II | + | T | + | 1 | |
| H44 | Human | 2007 | I/II | + | T | − | 21 | |
| H48 | Human | 2008 | I/II | + | T | − | 16 | |
| H49 | Human | 2008 | I/II | + | T | − | 6 | |
| H83 | Human | 2009 | I/II | + | T | − | 16 | |
| A25 | Bovine | 2008 | I/II | + | T | − | 21 | |
| A37 | Bovine | 2007 | I/II | + | T | − | 6 | |
| A40 | Bovine | 2002 | I/II | + | T | + | 1 | |
| A45 | Bovine | 2007 | I/II | + | T | − | 16 | |
| A48 | Bovine | 2008 | I/II | + | T | − | 6 | |
| A51 | Bovine | 2002 | I/II | + | T | − | 6 | |
| A60 | Bovine | 2003 | I/II | + | T | − | 1 | |
| A62 | Bovine | 2008 | I/II | + | T | − | 6 | |
| A63 | Bovine | 2008 | I/II | + | T | − | 6 | |
| A69 | Bovine | 2002 | I/II | + | T | − | 1 | |
| A72 | Bovine | 2002 | I/II | + | T | − | 1 | |
| A76 | Bovine | 2003 | I/II | + | T | − | 5 | |
| H02 | Human | 2003 | II | + | T | − | 11 | |
| H17 | Human | 2006 | II | + | A | − | 1 | |
| H19 | Human | 2006 | II | + | A | − | 1 | |
| H24 | Human | 2006 | II | + | A | − | 5 | |
| H32 | Human | 2006 | II | + | A | − | 5 | |
| H51 | Human | 2008 | II | + | A | − | 1 | |
| A12 | Bovine | 2004 | II | + | T | − | 5 | |
| A13 | Bovine | 2008 | II | + | A | − | 5 | |
| A16 | Bovine | 2006 | II | + | A | − | 5 | |
| A29 | Bovine | 2009 | II | + | A | − | 16 | |
| A30 | Bovine | 2009 | II | + | T | − | 5 | |
| A32 | Bovine | 2009 | II | + | A | − | 6 | |
| A34 | Bovine | 2009 | II | + | A | − | 5 |
Note:
Lineage-specific polymorphism assay (Yang et al., 2004).
tir (A255T) polymorphism assay ( Bono, 2009).
Clade 8 status of isolates assessed by SNP analysis of ECs2357 (Riordan et al., 2008).
Shiga toxin-encoding bacteriophage insertion site assay (Shaikh and Tarr, 2003; Besser et al., 2007).
Not applicable, the insertion site assay did not result in a SBI genotype.
Fig. 1Frequency distributions of the fractional adhesion for the different STEC strains to Caco-2 cells (separated for human and animal strains).
Fig. 2Principal coordinate plot of the similarity matrix K. For each pair of strains the similarity was calculated as the fraction of SNPs that have the same score. Black, blue and red dots represent lineage I, I/II and II STEC O157 strains respectively.
Fig. 3GWAS plot for trait fractional adhesion to Caco-2 cells. In this plot there is no correction for population structure, i.e. highly overestimating the number of true significant SNP effects.
Number of positions (1–17) on the chromosome of STEC O157 strains, locus on the reference genome (position (bp)), minor allele frequency (MAF), point estimate for the regression coefficient (β) and − log10 P-value for the SNPs on the test strains compared to the reference strain having a positive relation with the fraction adhesion to Caco-2 cells. This table shows the results for the regression model without correction for population structure, i.e. overestimating the effects.
| ID | Position (bp) | MAF | − log10 (P) | |
|---|---|---|---|---|
| 1 | 808,227 | 0.05 | 0.31 | 7.55 |
| 2 | 1,204,977 | 0.08 | 0.20 | 4.01 |
| 3 | 1,265,758 | 0.05 | 0.31 | 7.55 |
| 4 | 1,265,760 | 0.05 | 0.31 | 7.55 |
| 5 | 1,955,401 | 0.08 | 0.20 | 4.01 |
| 6 | 1,963,016 | 0.08 | 0.20 | 4.01 |
| 7 | 1,965,259 | 0.08 | 0.20 | 4.01 |
| 8 | 2,168,378 | 0.11 | 0.18 | 4.17 |
| 9 | 2,168,379 | 0.11 | 0.18 | 4.17 |
| 10 | 2,303,672 | 0.08 | 0.22 | 5.29 |
| 11 | 3,115,509 | 0.08 | 0.21 | 4.64 |
| 12 | 3,480,394 | 0.05 | 0.31 | 7.55 |
| 13 | 3,486,443 | 0.08 | 0.21 | 4.73 |
| 14 | 3,486,494 | 0.08 | 0.20 | 4.14 |
| 15 | 4,929,010 | 0.11 | 0.23 | 7.83 |
| 16 | 5,054,140 | 0.05 | 0.32 | 8.47 |
| 17 | 5,409,931 | 0.08 | 0.21 | 4.64 |
Biological information regarding the 17 significant SNPs, obtained using a model without correction for population structure (in Table 2); locus on the reference genome (position (bp)), function of this locus and description of the SNP.
| ID | Position | Locus tag | Function | SNP description |
|---|---|---|---|---|
| 1 | 808,227 | ECs0729 | RhsC protein | Synonymous; C219T |
| 2 | 1,204,977 | ECs1121 | Prophage CP-933R tail fiber protein; putative host specificity protein | Synonymous; C1741T |
| 3 | 1,265,758 | ECs1203 | Antitermination protein Q | Synonymous; C12T |
| 4 | 1,265,760 | Encoded by prophage CP-933R | Non-synonymous; G14A | |
| 5 | 1,955,401 | ECs1977 | Phage capsid and scaffold protein | Synonymous; C156T |
| 6 | 1,963,016 | ECs1987 | Tail assembly protein | Synonymous; G351C/T |
| 7 | 1,965,259 | ECs1990 | Prophage CP-933 V tail fiber protein; putative host specificity protein | Synonymous; C1062T |
| 10 | 2,303,672 | ECs2332 | Non-synonymous; C268A (H90N) | |
| 11 | 3,115,509 | Intergenic | G − > A | |
| 12 | 3,480,394 | ECs3489 | Phage tail fiber protein encoded by prophage CP-933P | Synonymous; G252A |
| 13 | 3,486,443 | ECs3499 | Hypothetical protein | Non-synonymous; T98C (L33S) |
| 14 | 3,486,494 | Non-synonymous; T149C (I50T) | ||
| 16 | 5,054,140 | ECs4969 | Putative portal protein | Non-synonymous; G190A (E64K) |
| 17 | 5,409,931 | ECs5283 | DNA-binding transcriptional repressor UxuR | Non-synonymous; C534A (N178K) |
Note:
SNPs having a MAF ≥ 0.05 and, in bold face, MAF ≥ 0.1, i.e. ID 8, 9 and 15.
SNPs are displayed by type and position in the locus, followed in parentheses by the effect on the amino acid sequence in case of a non-synonymous SNP.
Test strains (second row) causing a ‘significant’ effect (MAF ≥ 0.05 in Table 3) with accompanying fractional adhesion to Caco-2 cells (first row). e.g, for ID 1 the contrast of strains H24 and H19 compared to all 36 other strains gives a significant effect. For each SNP, only the major frequent allele is shown, the empty cells refer to the minor frequent allele.
| ID | Fraction adhesion | 0.66 | 0.12 | 0.07 | 0.18 | 0.87 | 0.10 | 0.11 | 0.35 | 0.17 | 0.11 | 0.09 | 0.62 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Strain | H13 | H15 | H25 | H44 | H24 | H32 | H51 | A13 | A16 | H02 | H17 | H19 | |
| 1 | 1 | 1 | |||||||||||
| 2 | 1 | 1 | 1 | ||||||||||
| 3 | 1 | 1 | |||||||||||
| 4 | 1 | 1 | |||||||||||
| 5 | 1 | 1 | 1 | ||||||||||
| 6 | 1 | 1 | 1 | ||||||||||
| 7 | 1 | 1 | 1 | ||||||||||
| 8 | 1 | 1 | 1 | 1 | |||||||||
| 9 | 1 | 1 | 1 | 1 | |||||||||
| 10 | 1 | 1 | 1 | ||||||||||
| 11 | 1 | 1 | 1 | ||||||||||
| 12 | 1 | 1 | |||||||||||
| 13 | 1 | 1 | 1 | ||||||||||
| 14 | 1 | 1 | 1 | ||||||||||
| 15 | 0 | 0 | 0 | 0 | |||||||||
| 16 | 1 | 1 | |||||||||||
| 17 | 1 | 1 | 1 |
Fig. 4GWAS plot for trait fractional adhesion to Caco-2 cells, with correction for population structure using a mixed model. There is one significant SNP, at position 3,115,507, with effect β = 0.09, and − log10 (P-value) = 2.28.