| Literature DB >> 29104564 |
Camilla Sekse1, Arne Holst-Jensen1, Ulrich Dobrindt2, Gro S Johannessen1, Weihua Li3, Bjørn Spilsberg4, Jianxin Shi3.
Abstract
High-throughput sequencing (HTS) is becoming the state-of-the-art technology for typing of microbial isolates, especially in clinical samples. Yet, its application is still in its infancy for monitoring and outbreak investigations of foods. Here we review the published literature, covering not only bacterial but also viral and Eukaryote food pathogens, to assess the status and potential of HTS implementation to inform stakeholders, improve food safety and reduce outbreak impacts. The developments in sequencing technology and bioinformatics have outpaced the capacity to analyze and interpret the sequence data. The influence of sample processing, nucleic acid extraction and purification, harmonized protocols for generation and interpretation of data, and properly annotated and curated reference databases including non-pathogenic "natural" strains are other major obstacles to the realization of the full potential of HTS in analytical food surveillance, epidemiological and outbreak investigations, and in complementing preventive approaches for the control and management of foodborne pathogens. Despite significant obstacles, the achieved progress in capacity and broadening of the application range over the last decade is impressive and unprecedented, as illustrated with the chosen examples from the literature. Large consortia, often with broad international participation, are making coordinated efforts to cope with many of the mentioned obstacles. Further rapid progress can therefore be prospected for the next decade.Entities:
Keywords: bacteria and viruses; fungi and parasites; metagenomics; metataxonomics; microbial profiling; outbreak investigation; surveillance; whole genome sequencing
Year: 2017 PMID: 29104564 PMCID: PMC5655695 DOI: 10.3389/fmicb.2017.02029
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Four sectors are considered here as potential users of high throughput sequencing (HTS) technologies for detection and characterization of foodborne pathogens (FBPs). Research (upper left) is a knowledge driver providing exploitable reference data and detection methods among others to the other three sectors (green arrows), and receives valuable data and material back from the other sectors (yellow arrows). The food industry (upper right) is legally obliged to take preventive measures and to monitor its products and production systems to prevent contamination with FBPs, with economy as a main priority driver. Documentation of the systematic efforts to maintain low risk (goal = pathogen free) products must be available for inspection. The health sector (lower left) treats patients and is usually the first to isolate and characterize outbreak-associated strains, thereby providing key information necessary for the other sectors to investigate and minimize the impact of outbreaks. The competent authorities (lower right) enforce the food law and surveil the food industry and products, but also coordinate the outbreak investigations based on data provided by the other sectors. Epidemiological data, legal acts and quality control documents are the main information sources used and shared by the competent authorities (blue arrows). Outbreak investigations have a strong focus on specific source tracking (red arrow).
Users, criteria and limitations on use of analytical methods for detection of foodborne pathogens.
| Molecular targets | A narrow range of pre-defined risk indicators, e.g., particular species specific markers, serotype specific markers, virulence genes | A broader range of risk indicators, pre-defined or not. May also include focus on different sequence variants | At least all case specific markers identified from characterization of patient isolate(s). Often also including a broad range of other risk indicators, pre-defined or not |
| Purpose of analysis | Ensure that own food product is legally compliant and does not pose an unacceptable risk to consumer (can be perceived as safe) | Ensure that food production/products are legally compliant and safe, and monitoring of prevalence/distribution of contaminants on the market | Identify contaminated product(s) and retract the product(s) from the market as well as clearing non-contaminated products from suspicion |
| How? | Analytical verification: pre-defined risk indicators are not detectable at a pre-defined limit of detection or in a pre-defined sample size. Sampling based on HACCP | Analytical detection and typing of risk indicators, analysis of traceability documentation and epidemiological analysis. A pre-defined sample size or LOD | Interviews of patients/family |
| Required time from test to result | Minutes or usually several hours, but may be justifiable with up to days/weeks in case of re-emerging contamination problem | Exceptionally weeks, usually days, but may be shorter in case of products with short shelf life and high turnover | As soon as possible, preferably minutes to hours, but can be days |
| Resource limitations | Routine on-site detection. Price sensitive market | Routine laboratory analyses complemented with in-depth analyses in well-equipped (reference) laboratories. Testing costs per unit can be low to moderate, depending on specific control program. High to very high costs can be justified in particular cases | Emergency conditions: life and health of humans are at stake. Routine and in-depth analyses in advanced laboratories. Testing costs per unit can be moderate to very high |
| Representative analytical approach | Traditional culturing, PCR | Enrichment culturing, biochemical tests and PCR complemented with sequencing of genes | PCR complemented with gene sequencing and WGS |
| High throughput sequencing (HTS) applicable? When? | Usually not because of costs and time. May be justifiable to apply HTS based genome typing methods on isolates to investigate/unravel re-emerging contamination problems | Usually not because of costs and time. HTS based amplicon sequencing of one to a few genes in some cases justifiable. Selected isolates from public enforcement/control programs occasionally qualify for WGS | Yes, typically by WGS of selected isolates, possibly limiting bioinformatics to focusing on particular gene panels while facilitating successive in-depth analyses of genome evolution and epidemiology |
Research applications are essentially unlimited and therefore not included in this table.
Patient treatment and characterization of patient derived isolates is a separate task not included in this table. However, the source tracking rely in part on the data derived from characterization of patient derived isolates.
HACCP, hazard analysis (and) critical control point.
LOD, limit of detection.
PCR, polymerase chain reaction.
WGS, whole genome sequencing.
Figure 2Metagenomics data analysis. At least three different approaches for analysis of HTS sequence reads can be selected, but combinations are often preferred. (A) Assembly of sequences (e.g., reads) into contigs (consensus sequences) requires mapping. Sets of contigs are often further assembled into scaffolds (not shown), where the relative position of contigs is known but gaps of ± known size between the contigs remain to be closed. The example shows that four of the six reads can be assembled into a consensus contig while the two remaining reads cannot be assembled with any of the others. (B) Mapping of sequences (e.g., reads) to other sequences (e.g., in database) also requires mapping. The example shows one perfect and one partial match between two query sequences (e.g., reads) and a reference sequence. The mismatch in the partial match is shown in red. (C) Any sequence larger than one nucleotide can be divided into subsequences of length k ≥ 1. The size of k will affect the likelihood of any random k-mer being unique to a data set. A small k will reduce the number of unique k-mers. This is demonstrated in the example, as for the given reference sequence two k-mers will not be unique with k = 3, while with k = 5 all k-mers are unique. Rare k-mers or k-mer frequencies can be used to estimate relationships between two sets of sequences (e.g., two shotgun metagenomes or a sequence isolate and a reference genome).
Figure 3Approaches to HTS sequence read analysis and their dependence on alignment, time and coverage. At least three different approaches for analysis of HTS sequence reads can be selected, but combinations are often preferred. Top right: Assembly of sequences into contigs (see also Figure 2), scaffolds and complete genome assemblies is alignment dependent, time consuming and the success probability is usually correlated with the coverage. This approach is typically taken when time is not the limiting factor and a complete assembly is desired for successive analysis and reference applications. Bottom: Mapping (see also Figure 2) of reads to existing assembly/assemblies is also alignment dependent and time consuming but can also be performed successfully at low coverage (a single read can be mapped to a reference assembly). The size of the reference (e.g., database or genome) and the degree to which mismatches are accepted will have a significant impact on the time required for data analysis (olive arrows). This approach is typically taken to determine functional aspects of metagenome and transcriptome sequences and in metataxonomics. Top left: (see also Figure 2) is a fast, alignment independent, statistical (probabilistic) approach to investigate properties of a sequenced genome such as its similarity and relationship to other (reference) genomes. It is typically used to screen sequenced genomes to identify genomes of particular interest for more comprehensive analysis.
Figure 4The difference between shotgun metagenomics and amplicon based metataxonomic sequencing. (A) Six different genomes (1–6) shown in different colors with five of the six (1 and 3–6) containing a shared genomic region. The shared genomic region in all five has a conserved motif on the left side (red circle = forward primer binding site), but one of them has a significant change in the conserved motif on the right side (yellow circle = reverse primer binding site) resulting in primer mismatch (black circle). (B) The sequenced fragments with shotgun metagenomics are random motifs from the six different genomes, and only one of the conserved (primer binding) motifs will exceptionally be included. (C) The sequenced fragments with amplicon sequencing are only those delimited both by a conserved left and right motif (red and yellow circles). The difference in mean coverage per nucleotide is significant. Assuming that each genome has the same length (e.g., 108 bp) and is present in equal concentration in the original sample, a read length of 200 bp, and an invariant length of the shared genomic region delimited by the primer sites of 250 bp, the mean coverage per nucleotide of the targets will be: (B) R × L/N × G = 106 × 200 bp/6 × 108 bp = 1/3 where R = number of reads, L = read length, N = number of genomes and G = genome size. (C) R × L/A × D = 106 × 200 bp/4 × 250 bp = 2 × 105 where R and L are the same as for B, while A = number of genomes flanked both by conserved forward and reverse primer sites, and D = the length of the shared genomic region delimited by the primer sites. In this example C is 6 × 105 times more sensitive than B. Shotgun reads may be analyzed applying all bioinformatics approaches (assembly, mapping and k-mer analysis, alignment dependent and alignment free; cf. Figures 2, 3). Amplicon reads are usually analyzed by mapping, clustering and phylogenetic approaches, while assembly is only exceptionally applied.
Examples of published high throughput sequencing based investigations of foodborne pathogen (FBP) outbreaks.
| Isolates | 70 isolates from two outbreaks complemented with isolates from sporadic cases | HiSeq2500 (Illumina), 200PE | Whole genome sequencing (WGS) of single isolates followed by mapping to reference genome and single nucleotide polymorphism (SNP) analysis | UK, 2013 | Jenkins et al., | |
| O157:H7 | Isolates | 29 isolates, 24 isolates from an outbreak, 5 unrelated cases | MiSeq (Illumina), 150PE | WGS of single isolates followed by | US, 2001–2012 | Turabelidze et al., |
| O157:H7 | Isolates | 16 isolates, 8 from humans, 8 from animals | GS FLX (Roche) and GAIIx (Illumina) | WGS of single isolates followed by | UK, 2009 | Underwood et al., |
| O157:H7 | Isolates | 105 isolates, 10 human isolates from an outbreak, 95 human isolates from sporadic cases | Ion Torrent PGM (Life technologies) | WGS of single isolates followed by mapping to reference genome and single nucleotide polymorphism (SNP) analysis | UK, 2007–2012 | Holmes et al., |
| Multiple STEC serotypes | Isolates | 46 isolates, 6 isolates from humans from an outbreak, 40 human isolates from sporadic cases | Ion Torrent PGM | WGS of single isolates followed by multi-locus sequence typing (MLST), | DK, 2012 | Joensen et al., |
| O104:H4 | Communities | 45 clinical human fecal samples | MiSeq 151PE and HiSeq 2500 151PE rapid mode shotgun metagenomics yielding from 8.6 × 106 to 4.4 × 107 reads per sample | Retrospective study applying shotgun metagenomics, followed by | Germany/Europe, 2011 | Loman et al., |
| Isolates | 2 isolates with one band difference on PFGE | GS FLX and GAIIx | WGS of single isolates followed by | Canada, 2008 | Gilmour et al., | |
| Isolates | 5 isolates, three from humans and two environmental isolates associated to one outbreak | Ion Torrent PGM | WGS of single isolates followed by | Australia, 2013 | Wang et al., | |
| Isolates | 18 isolates, 7 human, 10 food isolates, 1 control isolate | MiSeq, 75PE, 150PE and 250PE | WGS of single isolates followed by | Austria 2011–2013 | Schmid et al., | |
| Isolates | 57 isolates from 5 outbreaks | MiSeq, 250PE | WGS of single isolates followed by | Australia, 2006–2012 | Octavia et al., | |
| Serovar Typhimurium | Isolates | 61 isolates, 21 human and 5 food isolates related to an outbreak and additional 35 isolates | HiSeq2500 | WGS of single isolates followed by mapping based SNP analysis | UK, 2012 | Ashton et al., |
| Serovar Enteritidis | Isolates | 55 isolates, 28 isolates from 7 outbreaks, 27 isolates from sporadic cases | MiSeq, 250PE | WGS of single isolates followed by mapping based SNP analysis | US, 2001–2012 | Taylor et al., |
| Serovar Enteritidis | Isolates | 6 isolates, 3 human and 3 food isolates | HiSeq2000 (Illumina), 100PE | WGS of single isolates followed by | Belgium, 2014 | Wuyts et al., |
| Serovars Typhimurium, Enteritidis and Derby | Isolates | 47 isolates, 26 isolates from 9 outbreaks, 21 isolates from sporadic cases | GAIIx, 101PE | WGS of single isolates followed by reference free | Denmark, 2000–2010 | Leekitcharoenphon et al., |
| Hepatitis A virus (HAV) | Isolated HAV communities | HAV communities from clinical serum and fecal samples from 120 patients associated with a single outbreak | No data on sequencing approach except fragment size (315 bp) | Metataxonomics, i.e., amplicon sequencing of VP1/P2B region of HAV, followed by mapping to HAV genotype reference sequences, complemented with phylogenetic analysis | USA, 2013 | Collier et al., |
| HAV | Isolates and communities | Two food samples from two apparently unrelated HAV outbreaks | RNAseq combined with amplicon sequencing (MiSeq; 250PE) | Four amplicons, partially overlapping, covered the entire HAV genome. Genome assembly followed by genotyping by mapping to HAV reference genomes | Italy, 2013 | Chiapponi et al., |
| Rotavirus | Community | One clinical fecal sample from a small outbreak | RNAseq on MiSeq (120PE), approximately 1 × 106 reads | Genotyping by mapping to rotavirus reference genomes | Japan, 2012 | Mizukoshi et al., |
| Isolates | Seven isolates from outbreak associated food (yogurt) | WGS (HiSeq2000 100PE and 100MP | Metataxonomic MLST approach to species and sub-species ( | USA, 2013 | Lee et al., | |
| Community | Frozen filet of outbreak associated food (fish), and vomit samples from patients associated with the outbreak | RNAseq on GAII (80PE) complemented with Sanger amplicon sequencing of DNA (1.1 kbp fragment of 18S rRNA gene) | Japan, 2008–2010 | Kawai et al., | ||
The examples are further described and discussed in the main text of this review.
STEC, shiga-toxin producing E. coli.
200PE refers to the use of paired-end sequencing with a read length of 200 bp.
100MP refers to the use of mate-pair sequencing with a read length of 100 bp
Figure 5Isogenic or non-isogenic isolates? The distance or number of observed differences between isolates, usually measured as single nucleotide polymorphisms (SNPs) in HTS studies, can provide clues to determine if isolates belong to the same strain, i.e., whether they are isogenic or not. This is important for outbreak investigations, epidemiology and to assess if a persistent strain is present in a food production system. Fewer than ten SNPs is often interpreted as evidence of an isogenic origin of bacterial isolates (see examples and discussion in the main text of this paper). Practice is currently not harmonized and also depends on the taxon in question, how SNPs or other differences are calculated, and which part of the genome the study covers (e.g., core or whole genome). An inferred phylogenetic relationship between nine isolates (A–I; terminal nodes) is shown. For each isolate, a blue letter (a–i) indicates the number of unique SNPs associated with each individual isolate. Internal nodes labeled X–Z connect three clusters of isolates, while internode N connects all isolates. Brown letters (x–z) indicate the number of shared SNPs separating each individual cluster of isolates from the others. The distance (Δ) between any pair of isolates is the sum of SNPs (i.e., blue and brown letters) separating them, e.g., if a = 3, d = 2, x = 2 and y = 4 then ΔAD = 3 + 2 + 2 + 4 = 11. The following two examples serve to illustrate the difference between putatively isogenic and non-isogenic clusters of isolates (with a threshold of 9 for isogenics): If a = b = c = d = e = f = 2, g = h = i = 3, x = 2, and y = z = 3 then all the isolates A–F might be considered isogenic (internal distance between any pair of isolates Δmax ≤ 9), as might G-I (Δmax = 6), whereas A–F might not be considered isogenic with G–I (internal distance between any members from two different clusters Δmin ≥10). Similarly, if a = b = c = d = e = f = 6, g = h = i = 1, x = 1, and y = z = 3 then only isolates G–I (Δmax = 2) might be considered isogenic (any other pair of isolates would yield Δmin ≥11).
Figure 6Reference guided metagenome sequencing based approach for identification and characterization of pathogenic and outbreak associated strain(s). In case of an outbreak, fecal samples from patients are subjected to culturing, in order to isolate the outbreak strain. Patients are also interviewed in order to try to identify food products that may be the source(s) of infection. The metagenomes of stool samples, food products and cultured strains can then be amplicon sequenced (metataxonomics) or shotgun sequenced (metagenomics) and the data mapped to reference databases for identification of virulence markers. Shotgun reads can also be assembled into larger contigs or genomes for identification of pathogenic strains. The latter is facilitated if the sequence data are derived from single isolates. Black arrows indicate forward flow direction of the analysis, while gray arrows indicate feedback changing the premises for earlier steps. Feedback from the sequencing analysis can be used to refine and narrow the search for a specific FBP. If successful, the outbreak will be terminated. This review includes multiple examples of the application of the described approach to outbreak investigations.
Figure 7Reference independent shotgun metagenome sequencing based approach for identification and characterization of outbreak strain(s). In case of smaller outbreaks the possibility to compare metagenomes from affected people (patients) and healthy controls is limited. In these cases the availability of clinical isolates may be required to avoid exhaustive open ended bioinformatics (in silico) analyses, as exemplified by Brzuszkiewicz et al. (2011) and Rasko et al. (2011). Environmental gene tags (EGTs) from metagenomes of people affected by the outbreak and controls (people not affected) can be compared in case of a larger outbreak, as exemplified by Loman et al. (2013). In that study, EGTs present only in affected patients were characteristic of the outbreak strain and provided sufficient information to near complete characterization of its genome. Scaffolds and in particular assembled genomes may and should be uploaded to reference database(s), for successive use in analytical approaches like those described in Figure 6.
Examples of published high throughput sequencing based approaches to detection of foodborne pathogens for industrial and control purposes.
| Naturally contaminated tomatoes (4 samples, 4 conditions) | Pathogen detection ( | Shotgun metagenomics, MiSeq (Illumina) 151PE | Raw data not described. After assembly: 1.5 × 107 sequences of variable length (mean = 210 bp), comprising a total of 2.6 Gbp | Taxonomic classification by reference-based analysis in MG-RAST. Pathogen detection by reference-based mapping to a database of | Ottesen et al., | |
| Naturally contaminated cilantro (coriander leaves). 91 samples included in metataxonomic study, 12 samples in shotgun metagenome study | Pathogen detection and quantification ( | Metataxonomics by 16S rDNA amplicon sequencing on GS FLX (Roche) and shotgun metagenome sequencing on MiSeq (75PE) with 1 to 6 samples per run | GS FLX raw data not described. Total number of shotgun reads ranged from 6.8 × 105 to 5.5 × 106 per sample depending on number of samples sequenced in one run | Taxonomic classification by reference-based analysis using the RDP classifier. Pathogen detection by reference-based mapping to genome sequences of cultured isolates of | Jarvis et al., | |
| Naturally contaminated ice cream. 3 enrichments with four replicates and 13 time points included in metataxonomic study, 6 samples included in shotgun metagenome study | Pathogen detection ( | Metataxonomics based on 16S rDNA amplicon sequencing on MiSeq (300PE), and shotgun metagenome sequencing on MiSeq (PE151) | Metataxonomic data not described. Total number of shotgun reads ranged from 1.5 × 107 to 8.2 × 107 per sample | Taxonomic classification by reference-based analysis using Resphera Insight and subtyping of | Ottesen et al., | |
| STEC | Bagged spinach: spiked and unspiked samples, altogether 16 samples | Pathogen detection and quantification of STEC | Shotgun metagenome sequencing on MiSeq (250PE) | Total number of shotgun reads ranged from 8.0 × 106 to 1.5 × 107 per sample | Pathogen detection by reference-based mapping to virulence and serotype genes and STEC genomes. Microbial community analysis by | Leonard et al., |
| STEC and microbial community | Bagged spinach: 36 spiked samples (low level) | Pathogen detection and quantification (STEC), virulence gene detection and characterize the microbial community before, during and after enrichment | Shotgun metagenome sequencing on MiSeq (250PE) | Total number of shotgun reads ranged from 1.2 × 106 to 5.2 × 106 per sample | Pathogen detection by reference-based analyses targeting STEC, molecular serotypes, virulence genes and SNPs complemented with phylogenetic analyses. Microbial community analysis using a | Leonard et al., |
| Viral community | Lettuce: 42 field grown samples and 54 retail samples | Pathogen detection and quantification, and characterization of viral community | RNAseq and shotgun metagenome sequencing, complemented with 16S rDNA metataxonomics to verify absence of bacterial DNA on HiSeq2500 (Illumina; 100PE) | Total number of reads approximately 1.5 × 109 (approximately 2.3 to 3.4 × 107 reads per sample) | Assembly of reads followed by mapping to viral reference genomes for taxonomic classification, including detection of putative human pathogens | Aw et al., |
| Viral community | Fresh parsley plants irrigated with fecally tainted water + water samples and negative controls | Pathogen detection and quantification, and characterization of viral community | RNAseq on MiSeq (300PE) | From 1.2 × 106 (irrigated parsley), to 8.1 × 106 (negative control plants) reads per sample | Assembly of reads followed by mapping to viral reference genomes for taxonomic classification, complemented with phylogenetic analysis | Fernandez-Cassi et al., |
| Norovirus | Commercial shellfish | Pathogen detection and typing to assess diversity | RNAseq on HiSeq (100PE) and metataxonomics (capsid VP1 gene) on MiSeq (300PE) | Approximately 2 × 107 reads per sample (RNAseq) and from 2 × 105 to 4 × 105 reads per sample (metataxonomics) | Mapping of reads to viral reference genomes for taxonomic classification and genotyping | Imamura et al., |
| Norovirus | Farmed oysters | Pathogen detection and typing to assess the performance of a commercially applied decontamination system | Metataxonomics (capsid VP1 gene) on MiSeq (300PE) | From 2 × 105 to 4 × 105 reads per sample | Assembly and mapping to norovirus reference genomes for genotyping | Imamura et al., |
The examples are further described and discussed in the main text of this review.
151PE refers to the use of paired-end sequencing with a read length of 151 bp.
STEC, shiga-toxin producing E. coli.