Literature DB >> 33826490

Application of a strain-level shotgun metagenomics approach on food samples: resolution of the source of a Salmonella food-borne outbreak.

Florence E Buytaers^1,2, Assia Saltykova^1,2, Wesley Mattheus³, Bavo Verhaegen⁴, Nancy H C Roosens², Kevin Vanneste², Valeska Laisnez⁵, Naïma Hammami⁵, Brigitte Pochet⁶, Vera Cantaert⁶, Kathleen Marchal^7,1,8, Sarah Denayer⁴, Sigrid C J De Keersmaecker².

Abstract

Food-borne outbreak investigation currently relies on the time-consuming and challenging bacterial isolation from food, to be able to link food-derived strains to more easily obtained isolates from infected people. When no food isolate can be obtained, the source of the outbreak cannot be unambiguously determined. Shotgun metagenomics approaches applied to the food samples could circumvent this need for isolation from the suspected source, but require downstream strain-level data analysis to be able to accurately link to the human isolate. Until now, this approach has not yet been applied outside research settings to analyse real food-borne outbreak samples. In September 2019, a Salmonella outbreak occurred in a hotel school in Bruges, Belgium, affecting over 200 students and teachers. Following standard procedures, the Belgian National Reference Center for human salmonellosis and the National Reference Laboratory for Salmonella in food and feed used conventional analysis based on isolation, serotyping and MLVA (multilocus variable number tandem repeat analysis) comparison, followed by whole-genome sequencing, to confirm the source of the contamination over 2 weeks after receipt of the sample, which was freshly prepared tartar sauce in a meal cooked at the school. Our team used this outbreak as a case study to deliver a proof of concept for a short-read strain-level shotgun metagenomics approach for source tracking. We received two suspect food samples: the full meal and some freshly made tartar sauce served with this meal, requiring the use of raw eggs. After analysis, we could prove, without isolation, that Salmonella was present in both samples, and we obtained an inferred genome of a Salmonella enterica subsp. enterica serovar Enteritidis that could be linked back to the human isolates of the outbreak in a phylogenetic tree. These metagenomics-derived outbreak strains were separated from sporadic cases as well as from another outbreak circulating in Europe at the same time period. This is, to our knowledge, the first Salmonella food-borne outbreak investigation uniquely linking the food source using a metagenomics approach and this in a fast time frame.

Entities: CellLine Chemical Disease Species

Keywords: SNP analysis; Salmonella; food surveillance; metagenomics; outbreak; strain-level

Year: 2021 PMID： 33826490 PMCID： PMC8208685 DOI： 10.1099/mgen.0.000547

Source DB: PubMed Journal: Microb Genom ISSN： 2057-5858

Data Summary

The subsp. serovar Enteritidis isolate sequencing data is described in the supplementary data. Shotgun metagenomics sequencing is still a relatively new approach, which until now had not yet been applied on food samples in real time to resolve a food-borne outbreak to its food source. This work presents as a case study a outbreak in a hotel school in Belgium that was analysed with a metagenomics workflow on food in parallel to the conventional outbreak investigation. This allowed us to relate the strains present in the food, analysed through shotgun metagenomics, with isolates from human cases, in a time frame theoretically shorter by at least 1 week than the results from conventional methods. As this is, to the best of our knowledge, the first study presenting the successful use of shotgun metagenomics for the study of contaminated food in a food-borne outbreak investigation, we believe that it will have an important impact on the public health and research community and on the trust in this technology as a reliable, faster and cost-effective alternative. This study also provides a valuable dataset to further explore metagenomic data analysis tools.

Introduction

The detection and characterization of pathogens in food aims at avoiding contamination of consumers if carried out as a continuous screening, but also at putting an end to epidemics when consumers have already been infected. According to European Union legislation, typically the analysis of a suspect food sample involved in a food-borne outbreak includes an attempt at obtaining an isolate of the micro-organism, most often by the official control laboratories, such as the National Reference Laboratory (NRL), to further characterize it, e.g. by real-time PCR (qPCR) or whole-genome sequencing (WGS) [1-3]. To unambiguously identify the source of the outbreak, the food contaminant also has to be uniquely linked to the pathogens usually obtained from human cases by the National Reference Center (NRC). This strengthens the assumption on the food source based on epidemiological studies only. However, isolation from food samples is not straightforward nor always successful, as opposed to the human samples, which typically contain higher loads of the pathogen. In these cases, the relatedness to the human isolates cannot be obtained and the outbreak is never resolved to its food source. Indeed, the European Food Safety Authority (EFSA) reported that the causative agent was unknown in 23.8 % of outbreaks that occurred in 2018 [4, 5]. In some cases, the wrong foodstuff can even be blamed, leading to huge economic losses in the sector [6]. A novel approach, i.e. shotgun metagenomics, has been investigated in recent years in an attempt to characterize the pathogen but without the need to isolate it from the food matrix [7-10]; therefore, in a possibly shorter time frame and, most importantly, increasing the chance of finding the source of the outbreak. EFSA recently published an opinion on the use of WGS and metagenomics for outbreak investigation, confirming the possibility for typing and source attribution from shotgun metagenomics data, in particular if a draft reconstructed genome of the pathogen at the strain-level can be obtained [11]. Until now, only a few studies have investigated the possibility of achieving strain-level characterization for pathogens in food samples; however, these did not link strains obtained from the food samples to isolates from the human cases, a prerequisite for the trace back of the outbreak [12-15]. We have previously developed such a metagenomics approach to be implemented for food-borne outbreak investigations [16, 17] using artificially contaminated samples, targeting the Shiga toxin-producing (STEC), and we were able to link it back to isolates from humans. This method has, however, not yet been implemented for another pathogen or during a real outbreak. Among food-borne outbreaks occurring in Europe, food contaminations due to are the second most commonly reported cause of gastrointestinal infections [4]. Salmonelloses are caused by thousands of different serovars, of which subsp. serovar Enteritidis accounts for over 40 % of all infections for which the serovar has been identified. They are most often related to eggs and have been associated with a high proportion of food-borne outbreaks, due to the use of the raw product in several food preparations [4]. The standard protocol for analysing food products potentially contaminated with according to European Union legislation is to isolate the pathogen through several enrichment and plating steps (ISO 6579 : 2017 [18]). The isolated strain is then characterized through biochemical and/or serological testing, as well as multilocus variable number tandem repeat analysis (MLVA) to infer phylogeny against a well-characterized background. However, EFSA has now recommended WGS of isolates, particularly when linked to outbreaks [19]. WGS offers the possibility to study the full genome of the isolate, including potential virulence and antimicrobial-resistance (AMR) genes [20, 21]. It also allows the highest level of precision in relatedness studies based on SNP differences between strains, and allows sporadic bacteria to be distinguished from persistent bacteria in a food-production environment [22, 23]. Using metagenomics, has thus far only been characterized in faeces [24, 25] or in food after selective concentration of genomic DNA by immunomagnetic separation [26]. However, food samples contaminated with this species have not yet been tested with an open metagenomics approach in the scope of a real outbreak. From September 5th 2019 until September 14th 2019, over 200 students and teachers at a hotel and tourism school in Belgium suffered from food poisoning, with symptoms such as abdominal pain, headache, diarrhoea and fever [27, 28]. The outbreak was thoroughly investigated by the local authorities [regional health agency Zorg en Gezondheid, the Federal Agency for the Security of the Food Chain (FASFC) and the NRL (food and feed) and NRC (human)]. Laboratory analyses were conducted on 65 samples obtained from food leftovers and kitchen surfaces, as well as isolates from infected patients. This resulted in the identification of the contamination as being subsp. serovar Enteritidis, found in a meal prepared on September 5th 2019 by students and served in the school restaurant. The meal consisted of fish sticks with mashed potatoes and freshly made tartar sauce. After WGS of isolates from food and human origins, the source of the contamination was established as being the sauce, prepared with raw eggs [27-29]. A rare MLVA profile, i.e. 3-12-5-5-1, was determined for the human and food isolates by the NRC [28]. After disinfection of the kitchen and kitchen equipment, was not detected anymore in environmental samples and no new cases were recorded. The outbreak was reported through the European Epidemic Intelligence Information System (EPIS) ('Urgent Inquiry' UI-608) and the Rapid Alert System for Food and Feed (RASFF, 2020.3675) and allowed the tracing of this outbreak back to an egg-producing farm in Spain, considered as the source of the contamination [27, 28]. At the same time period (ongoing since 2016), another outbreak was circulating in Europe and was linked to eggs of Polish origin. However, this strain of subsp. serovar Enteritidis was distinct from the isolates from the hotel-school outbreak and was characterized with MLVA profiles 2-9-7-3-2, 2-9-6-3-2, 2-9-10-3-2, 2-10-6-3-2, 2-10-8-3-2 or 2-11-8-3-2 [30, 31]. As this was an ideal case study to apply our previously developed strain-level metagenomics approach on contaminated food samples to be used during a food-borne outbreak, we received from the Belgian NRL, in parallel to the conventional investigation, two samples that were positive for . Enteritidis and linked to the hotel-school outbreak. Both samples were processed with a metagenomics workflow described previously [17]. After short-read sequencing, we conducted data analysis in order to infer the pathogenic strain’s genome, characterize it and link it back to the human isolates to resolve the outbreak. The food strain obtained from metagenomics reads was included in a SNP-level phylogenetic tree containing human and food isolates from the hotel-school outbreak, as well as strains related to another outbreak circulating in Europe during the same time period [30, 31] and other sporadic strains that occurred in Belgium in 2019. The time of analysis of such a shotgun metagenomics approach was then compared to the time necessary to elucidate this outbreak with food isolates’ data.

Methods

Sample preparation

Two aliquots of cultured food samples (i.e. a mixture of the meal components and the sauce as a separate component) linked to the outbreak were received from the NRL after a first non-selective enrichment according to ISO 6579 [18] (i.e. 25 g foodstuff was mixed with 225 ml buffered peptone water and incubated for 18±2 h at 37±1 °C). The sample dish was an aluminium tray with three compartments, one for each component (mashed potatoes, fish stick, tartar sauce). The tartar sauce was tested separately as well, after confirmation that it was the probable source of the contamination. The food enrichments had been tested for the presence of prior to their selection for this study, using the iQ-Check II PCR detection kit (Bio-Rad) according to the manufacturer’s instructions, and showed positive results (Cq of 18 and 17, respectively) as opposed to the blanks and to other samples collected in the school during the investigation. Aliquots of 4–15 ml of the two cultured food samples were stored in the fridge until metagenomics DNA extraction was carried out.

DNA extraction and qPCR

The sample preparation was carried out according to Buytaers et al. [17]. Briefly, 1 ml of the aliquots was centrifuged at 6000 for 10 min and the cell pellets were used for DNA extraction using a Nucleospin food kit (Macherey-Nagel). In order to confirm the presence of the contaminant (Salmonella) in the DNA extracts, a qPCR was performed for the genes invA and rpoD, according to Barbau-Piednoir et al. [32].

Shotgun metagenomics sequencing

The quality and quantity of all DNA extracts were evaluated [17] using the NanoDrop 2000 (Thermo Fisher Scientific), Qubit 3.0 fluorometer (Thermo Fisher Scientific) and 4200 TapeStation (Agilent). All DNA extracts were further processed using the Nextera XT library preparation kit (Illumina) before sequencing on the Illumina MiSeq, generating paired-end 250 bp reads with the reagent kit v3. The samples were sequenced in one run of eight libraries. The number of (paired-end) reads sequenced per metagenomics sample is presented in Table 1. Sequencing metrics were obtained using FastQC version 0.11.7 [33].

Table 1.

Quality metrics of the metagenomics sequencing and metagenomics assemblies

Metric	Sauce	Meal
Sequencing metrics
Total reads	2 653 700	4 857 796
Sequences flagged as poor quality	0	0
Sequence length	35–251	35–251
G+C (mol%)	49	47
Mean quality score	35.83	36.1
Median quality score	30	31
Strain assembly metrics*
No. of contigs	78	75
Largest contig	325 096	325 086
Total length	4 703 829	4 704 090
G+C (mol%)	52.13	52.13
N50	106 626	128 74
Mean coverage	93.9	88.35
Median coverage	73.5	65.5

*Statistics based on contigs of size ≥500 bp.

Quality metrics of the metagenomics sequencing and metagenomics assemblies Metric Sauce Meal Sequencing metrics Total reads 2 653 700 4 857 796 Sequences flagged as poor quality 0 0 Sequence length 35–251 35–251 G+C (mol%) 49 47 Mean quality score 35.83 36.1 Median quality score 30 31 Strain assembly metrics* No. of contigs 78 75 Largest contig 325 096 325 086 Total length 4 703 829 4 704 090 G+C (mol%) 52.13 52.13 N50 106 626 128 74 Mean coverage 93.9 88.35 Median coverage 73.5 65.5 *Statistics based on contigs of size ≥500 bp.

Isolate data

Sequencing data from subsp. serovar Enteritidis isolates (see Table S1, available with the online version of this article) included data from five isolates of the hotel-school outbreak from food origin (the leftover meal and the three components of this meal that were all probably contaminated through spreading of the sauce between the compartments, and a chicken-based meal consumed on September 24th 2019 at the hotel school that was probably contaminated in the rubbish bin) and from five isolates from human origin linked to the hotel-school outbreak, obtained following conventional methods [34]. These 10 isolates showed the same MLVA profile. As background for the phylogenetic analysis, data were also included from isolates linked to the still ongoing Polish outbreak [30, 31], presenting distinct MLVA profiles, i.e. seven Belgian isolates from food origin, five Belgian isolates from human origin and four isolates from public databases representing the different outbreak clusters defined by the Public Health England SNP pipeline described in an outbreak assessment from the European Centre for Disease Prevention and Control (ECDC) and EFSA [31], supplemented with ten isolates of human origin from Belgian sporadic cases from 2019, also presenting a different MLVA profile to the one of the hotel-school outbreak.

Data analysis

The metagenomics sequencing data were analysed through the workflow presented by Buytaers et al. [17]: after trimming, a taxonomic classification of all reads to the genus level was performed using Kraken2 [35] (same databases as previously described [17]) in order to obtain an overview of the taxa present in the sample. The taxonomic classification results from Kraken2 [35] were verified using the online tools PathogenFinder (designed for isolate WGS) [36] using the model created for all bacteria, as well as CCMetagen [37] used with the National Center for Biotechnology Information (NCBI) nucleotide database. Then, a strain-level read classification was performed using Sigma [38] on a database of 787 complete genome assemblies of (all serovars) from NCBI (list available upon request), using the default parameters as described by Saltykova et al. [16] to obtain the reads of the pathogenic strain, as was the only pathogen detected after analysis of the taxonomic classification results. These reads as well as the sequencing reads from all isolates were assembled using SPAdes 3.13.0 [39]. Quality metrics from the assemblies (Table 1) were obtained using quast version 5.0.2 [40]. All assemblies from isolates and metagenomics samples were then typed (serovar prediction) using the online Salmonella In Silico Typing Resource (SISTR) [41] and the presence of AMR genes was detected using blast 2.6.0 on the ResFinder database [42], with a minimum identity threshold of 90 % and a minimum length of 60 % for metagenomics assemblies, and 90 % minimum identity and minimum length for isolate assemblies [43]. The parameters were lowered for the metagenomics assemblies compared to the parameters (90 % gene coverage and 90 % nucleotide identity) chosen for the study of isolates, considering the lower depth obtained with metagenomics sequencing. For phylogenetic analysis, SNP calling was carried out on the classified (unassembled) reads as previously described [16], with subsp. serovar Enteritidis strain EC20120200 (Enterobacteria) as a reference (GenBank accession no. CP007434.2). Maximum-likelihood substitution model selection and phylogenetic tree inference were done with mega [44], using the NNI (nearest-neighbour-interchange) heuristic method, keeping all informative sites and using a bootstrap method with 100 replicates. The model selected to build the phylogenetic tree was that of Tamura and Nei [45]. iTOL [46] was used for the representation of the tree, with the percentage of the reference genome covered annotated on each branch.

Results

Taxonomic classification of the metagenomics samples

Two food samples (meal and sauce component) that could be related to the outbreak after a first screening (culture and qPCR) were tested using a shotgun metagenomics approach in parallel to the conventional outbreak investigation carried out at the NRL. After culture-based enrichment of the food matrices, the DNA was extracted and sequenced. The reads obtained were then taxonomically classified to determine the genera that were present in the food matrices. Only bacteria could be detected in both samples (89 and 96 % of the sequenced reads for the meal and the sauce, respectively), although the meal consisted of fish, mashed potatoes and sauce, and the sauce was made with fresh eggs. This was expected as the latter species (fish, potato and chicken) are not represented in the taxonomic databases used and, therefore, should be part of the unclassified section of the reads (Fig. 1). The same bacterial genera were detected in both matrices albeit at different relative abundances, except for , which was only present in the meal sample. The consensus in detected bacterial genera was to be anticipated since the sauce was sampled from the meal. , the genus implicated in the outbreak, was detected at a high percentage in both matrices (70 % in the sauce, 40 % in the meal). This is consistent with the qPCR detection of the -specific invA and rpoD genes in the DNA extracts of both samples (Table S2). However, other detected genera like , , or may also represent pathogenic species. Therefore, in an attempt to use the taxonomic classification as an agnostic tool to identify the causative food-borne pathogen, two other data analysis tools were used to determine the presence of a pathogen in the sample (CCMetagen and PathogenFinder). CCMetagen and PathogenFinder identified as the main or only pathogen in the two samples (the results are shown in Table S3) after analysis based on KMA sequence alignments on the NCBI nucleotide database (CCMetagen) or prediction of pathogenicity based on the detection of groups of genes associated with human pathogenic bacteria (PathogenFinder). The output of the three different tools used, based on different bioinformatics approaches, confirmed that was considered as the only pathogen meriting further investigation in this study.

Fig. 1.

Percentages of reads classified to the genus level using a taxonomic classification tool (Kraken2) from metagenomics samples (full meal and sauce) with in-house databases of mammals, archaea, bacteria, fungi, human, protozoa and viruses. Red represents the proportion of ‘’ in the samples. The reads that could not be classified to the genus level for mammals, archaea, bacteria, fungi, human, protozoa or viruses are represented in grey.

strain inference from metagenomics samples and in silico typing

Obtaining strains from the metagenomics reads is necessary to mimic the recovery and characterization of an isolate with conventional methods. This was done for each metagenomic sample following a previously reported metagenomics strain-level analysis pipeline [16, 17]. After classification of the reads to a database of genomes, 1 843 873 and 1 618 032 reads were classified as ASM303203v1 [ subsp. serovar Enteritidis (enterobacteria), RefSeq accession no. GCF_003032035.1], respectively, for the meal and the sauce (Table S4). This represents 38 % of the total sequenced reads for the meal and 61 % of the reads for the sauce. Less than 7000 reads (<0.5 % of the total reads) were classified to other genomes for both samples, indicating that most probably only one strain of this species was present in the sample and that the reads assigned to ASM303203v1 correspond to that strain. Consecutively, a sequence-based characterization can be performed on the reads of each inferred strain, corresponding to the characterization of the isolate with conventional methods. The reads were assembled (Table 1) and then typed in silico. The results (Table S5) confirmed that the strains obtained are indeed subsp. serovar Enteritidis, based on O- and H-type prediction (serogroup D1, H1 g, m, H2-), multilocus sequence typing (MLST) clustering (ST11) and matches of their closest public genome. When comparing to the in silico typing of sequenced isolates from food and human origin from the outbreak (Table S5), the results were identical except for the detection of all 330 whole-genome MLST alleles in the isolates and 329 identical alleles in the metagenomics-based strains (one allele present partially). Other isolates obtained from the NRC, the NRL and from another outbreak circulating in Europe (not related to the hotel-school outbreak) were typed with the same tool. These were also defined as subsp. serovar Enteritidis, but were related to other genomes from public databases (Table S5). The presence of AMR genes was also investigated in the assembled contigs of the metagenomics-based strains (Table S6), to follow the analysis that is usually performed on isolates (using the technique of microdilutions in broth), but then at the genotype level. The locus aac(6′)-Iaa_1, linked to resistance to aminoglycoside due to a chromosomally encoded aminoglycoside acetyltransferase, was detected in all strains from the hotel-school outbreak, including strains derived from metagenomics sequencing, as well as all non-outbreak-related strains included in this study with 96.35 % identity and 100 % coverage (Table S6). The prevalence of this gene in WGS from NCBI is 29 % [47]. No other AMR genes were detected in any strain.

Metagenomics-based trace back investigation of the outbreak to its food source

Finally, in order to relate cases from food and human origins, the MLVA profiles can be compared with traditional methods, but EFSA now recommends WGS of isolates and uses core-genome MLST in data sharing platforms such as EPIS. In our analysis, all isolates and metagenomics-derived strains were compared using SNP calling and reconstruction of a phylogenetic tree (Fig. 2). SNP calling offers the possibility of comparing the full genome and is considered more suited to use for metagenomics-derived strains [16]. The cluster corresponding to the hotel-school outbreak (represented in blue in Fig. 2) includes the isolates from patients and suspicious food vehicles obtained by the NRC and NRL, as well as the two inferred strains obtained from direct sequencing of two food samples (suspect meal and sauce) using a shotgun metagenomics approach. The breadth of coverage of the reference genome for the two reconstructed strains from metagenomics samples is 97 and 85 % for the sauce and the meal, respectively. These values are in the same range as the values obtained for the isolates of the same outbreak. All strains of the hotel-school outbreak cluster, including the strains from the metagenomics samples, have 0 SNP differences per million genomic positions (Table S7). Other subsp. serovar Enteritidis circulating in Europe at the same time period, including isolates linked to an outbreak of Polish origin that started in 2016 but was still ongoing (shown in purple in Fig. 2), were included in the analysis, and could be separated both from the isolates and the metagenomics strains from the hotel-school outbreak.

Fig. 2.

SNP-based phylogenetic tree representing the isolates and metagenomics-derived strains from food samples linked to the hotel-school outbreak (UI-608, in blue) in the global context of subsp. serovar Enteritidis circulating in Belgium and in Europe during the same time period. Isolates linked to the Polish outbreak (UI-367) are indicated in purple, and isolates from sporadic cases in Belgium in 2019 in black. Percentage of the reference genome covered is presented on the side of each branch. Bar, nucleotide substitutions per 100 nucleotide sites. Node values represent bootstrap support values.

Timing for a conventional and a metagenomics-based approach to resolve outbreak investigation to the food source

A schematic representation of the theoretical timeline of the conventional analysis conducted at the NRL on food samples, in parallel to the investigation on human samples conducted at the NRC, is presented in Fig. 3 (upper line). After receipt of the samples, the confirmation of the presence of in the food is first conducted with qPCR on the food matrices, then normally isolates are obtained after approximatively 1 week (if isolates can be produced from the food samples), and characterized for serotype and MLVA profile. Once the MLVA profile is confirmed to be identical to the one detected in the patients’ isolates, the DNA of the food isolates is extracted for WGS analysis. At the Belgian NRL, the serotyping and MLVA profile of the food isolates, if obtained, are currently prerequisites before sequencing, to prove that the strains have a high chance of being linked to the outbreak, as only outbreak cases are eligible for obtaining budget and priority for WGS. Notably, the isolates from human origin are most often already characterized at that stage as they are detected and isolated most often more easily and earlier in the investigation process. Together with library preparation, the sequencing takes approximately 4 days. The sequencing typically occurs 2 to 3 weeks after receipt of the samples depending on the isolation time, the time necessary to gather sufficient isolates to be cost-efficient for multiplexing in a single sequencing run, and to perform the sequencing run. Data analysis is then conducted, followed by sharing of the information, with national and international instances (in this case: RASFF 2019.3675 on October 16th 2019 and EPIS UI-608 updated on October 24th 2019 with the NGS data). In this outbreak, it allowed determination of the source of the contamination as an egg-producing farm in Spain and detection of 13 related human cases from France and 2 human cases in both the Netherlands and the UK [27]. In the same time period, an outbreak was reported in the Netherlands involving eggs originating from Spain (RASFF 2019.3069, UI-601). However, the strains of . Enteritidis had distinct MLVA profiles, 2-11-7-3-2, 3-10-5-4-1, 2-10-7-3-2, 3-11-5-4-1, and 170 core-genome MLST allelic differences from our outbreak strain. The UK also reported an outbreak linked to eggs (RASFF 2019.1412, UI-602), but again no link with the Belgian outbreak strain was established. The WGS data of these strains were not publicly available and, therefore, could not be added to the phylogenetic analysis in this study.

Fig. 3.

Comparison of theoretical processing time for the conventional approach (upper level) and the shotgun metagenomics approach (lower level) for -contaminated food samples from receipt of the samples to strain typing and trace back between human and food strains. A range of days (D x–y) accounts for a range of duration of some laboratory analyses, which can vary due to the presence of technicians during weekends, success in the isolation process or cost-effectiveness (start of the sequencing run with sufficient samples). This timeline was compared to that of a metagenomics-based analysis of the food samples. DNA from the meal and the sauce was extracted from a small fraction of the cultured food matrices for subsequent metagenomics analysis after suspicion of the contamination with qPCR (not necessary for a metagenomics-only workflow). From the time of the DNA extraction, depending on the availability of a sequencing instrument and the preparation of the libraries, the sequenced reads could be obtained in a minimum of 4 days (Fig. 3, lower line). Thereafter, a taxonomic classification was obtained in a few minutes and, after 1 day, a pathogenic strain was obtained and fully typed. In less than a week after receipt of the samples in the laboratory, the pathogen was fully described and related to other cases from the outbreak (from food and human origin) in a phylogenetic tree. This corresponds already to the mean time necessary to only obtain an isolate from food in routine analysis, if obtained, with no information about relatedness of the cases at that stage of the conventional analysis. Indeed, in the conventional analysis, obtaining a food isolate is a prerequisite for performing the molecular analysis, including WGS, to be able to determine relatedness.

Discussion

We deliver in the present study a proof of concept for the shotgun metagenomics approach on food samples previously developed on food samples artificially spiked with STEC (Shiga toxin-producing ) [17] to resolve a outbreak in Belgium up to the food source. We described the analysis of an outbreak that affected over 200 students and teachers at a hotel school in Belgium, using a strain-level shotgun metagenomics-based approach in parallel to the investigation based on WGS of isolates performed by the NRL and NRC. Two suspect samples of leftovers of the meal and the tartar sauce included in this dish were analysed with a shotgun metagenomics workflow, in a relatively very short time frame, and the pathogenic strain was inferred from the sequenced metagenomics reads and characterized as a subsp. serovar Enteritidis that was related with 0 SNP differences to the isolates of human origin from the same outbreak. Therefore, the outbreak could be resolved, i.e. source attribution, using metagenomics data for the food samples. As this was a proof of concept, isolates were also obtained and characterized from the food samples through conventional analysis, and were also related to the metagenomics strains with 0 SNP differences, as a validation of the obtained results. Moreover, the outbreak cluster was placed in a global perspective of the situation of salmonelloses in Belgium and Europe using a phylogenetic tree including other strains circulating at the same time period. The timing of an outbreak investigation is a critical factor to limit the propagation of the contamination. Shotgun metagenomics is an alternative to the conventional approaches circumventing the need for isolation, which is time-consuming and most importantly not always achievable in routine analysis. This study showed the potential of metagenomics to be used during outbreak investigations on food samples for obtaining the same level of information as from food isolates, in a time frame reduced by over 1 week. Moreover, this constitutes a pathogen-agnostic approach dependent on a non-selective enrichment, which allows the detection of the pathogenic strains (here ) and the characterization of this contaminant without prior knowledge on the species or the number of different species and/or strains present in the sample [17], in contrast to conventional methods where the assumption of the species to test for is based on the symptoms of the patients. Therefore, this metagenomics approach is also advantageous in case of a limited quantity of food leftovers, because no choice for best fit symptoms-pathogen should be made as for conventional methods. Hence, this approach can potentially increase the range of pathogens detected in a mixed sample, and help reduce even more the economic burden of such food-borne pathogens, as was already stated for WGS of isolates [48]. Our approach still relies on the isolation of the pathogen from the human samples and is not a stand-alone metagenomics approach. As the bacterial load is generally higher in human samples, isolation is not reported as a challenge in these matrices. Moreover, the isolation in the human samples is often not a limiting factor for the timing of food-borne outbreak investigation, as these samples are often obtained before the food samples in the case of outbreaks. Nevertheless, metagenomics studies of stool samples, included during outbreaks, have been published previously [25, 49, 50], and such an approach could be performed in parallel to the one we present, in the corresponding institution (NRC). However, this would represent a higher cost and the sequencing of human DNA might lead to ethical and privacy issues, in particular in Europe. At a national scale, the typing data of food and human isolates are shared between the NRL and NRC, and matches are reported at the European level, i.e. EFSA and ECDC [27]. No shared database is publicly available at the moment and access to this data or the samples must go through contact between both national entities. Communication concerning human health at the international level for outbreaks in Europe is done through the use of a communication platform and data sharing between public-health experts, by 'Urgent Inquiries' at the EPIS platform. For food safety, communications are done by the competent authorities through the RASFF system. These tools were used in the hotel-school outbreak investigation and helped to trace back and link the outbreak to eggs originating from Spain and other human cases in France, the UK and the Netherlands [27]. However, for confidentiality reasons, these data were not made publicly available and, therefore, could not be included in our presented phylogenetic tree. Our study highlights that access to scientific data, including both raw WGS data and processed results, from public-health and food-safety authorities at both the national and international level will help to strengthen analyses on international outbreaks such as the one presented in this study, and consequently should be considered in the line of data sharing systems that have proven their efficiency. The shotgun metagenomics approach has proven its potential for outbreak investigation through studies like this one, yet additional research could help with the actual further implementation of this method in routine settings. First, the culture of the food matrix as currently specified in the ISO (International Organization for Standardization) method could be adapted to suit a larger number of species concurrently for pathogen-agnostic metagenomics studies. Second, the optimal quality-control metrics for metagenomic sequencing have not yet been established, in contrast to ongoing efforts for WGS of isolates (e.g. ISO/DIS 23418 [51]). In the current analysis, eight metagenomic food samples (six were not related to this study) were multiplexed in a single MiSeq run, with a relatively high cost per sample as a result. This allowed achievement of a sequencing depth of >85× for the single detected strain for both metagenomic samples, which is comparable to values typically achieved for isolates and is more than sufficient for the reconstruction of the pathogen’s genome. This indicates that, in the future, sequencing of a higher number of samples simultaneously can be attempted, lowering the cost. The observed coverage is, however, much higher than in our previous work, where multiplexing of 12 minced meat samples resulted in sequencing depths between 0.9× and 10× for detected strain(s) [17]. Leonard et al. [12, 13] reported that multiplexing of 12 enriched spinach samples yielded coverages between 5× and 145× for an reference genome, with 4 samples having coverages less than 30×. Therefore, the minimal required sequencing depth will likely differ for each sample type, and will depend on biological factors such as the initial load of contamination or the efficiency of the enrichment procedure, and the expected number of bacterial strains. Generally, we have observed that coverages of over 5–10× can be sufficient for detection of virulence genes and phylogenetic placement of bacterial strains in case reference-based assembly is used [16]. However, there is a need to precisely establish the reliability of the strain characterization and subtyping results obtained using data of different sequencing depth. Third, user-friendly pipelines need to be developed to be used directly in the laboratory by non-expert bioinformaticians. Moreover, bioinformatics taxonomic identification tools should be further tested and improved, so that different tools, each with their advantages and limitations, provide the same results, and to avoid misclassifications [52]. However, the focus of this study was not to present a benchmarking of bioinformatics tools for strain-level shotgun metagenomics, but rather a proof of concept based on previously developed bioinformatics methodologies [16, 17]. Other approaches and tools might still improve the results (accuracy, speed of analysis) and could be evaluated in further studies [53, 54]. This confirms the need for studies such as this one to produce data to make benchmarking analyses possible or help in the design of new tools. Another perspective for the implementation of this method in routine analysis is the reduction of the analysis cost. As elaborated above, shotgun metagenomics analyses imply runs with a very limited number of samples on Illumina sequencers in order to maximize the sequencing depth. Other sequencing devices as manufactured, for instance, by Oxford Nanopore Technologies offer real-time long-read sequencing of one sample at a time, at a low price if using the Flongle flow cell. Such fast sequencing could also further reduce the turnaround time of a metagenomics-based outbreak investigation [55]. However, its applicability for strain-level characterization in complex samples remains to be demonstrated. In 2019, the EFSA published an opinion on the use of metagenomics for outbreak investigation [11], describing the possibilities offered by an isolation-free method. However, at that time, metagenomics had not yet been used to resolve a food-borne outbreak investigation to its food source and was considered as experimental. Moreover, it was considered technically challenging to obtain a draft genome of the pathogenic strain in order to assign particular genetic determinants to the causative agent. This study has shown that a outbreak caused by a complex food matrix could be resolved to strain resolution using shotgun metagenomics, in a shorter time frame than needed for isolation of the strain, paving the way for future studies to use this method outside the experimental scope and to support the EFSA opinion. Click here for additional data file.

39 in total

1. Application of metagenomic sequencing to food safety: detection of Shiga Toxin-producing Escherichia coli on fresh bagged spinach.

Authors: Susan R Leonard; Mark K Mammel; David W Lacher; Christopher A Elkins
Journal: Appl Environ Microbiol Date: 2015-09-18 Impact factor: 4.792

2. Rapid detection of food-borne pathogens by using molecular techniques.

Authors: Rambabu Naravaneni; Kaiser Jamil
Journal: J Med Microbiol Date: 2005-01 Impact factor: 2.472

3. An economic analysis of salmonella detection in fresh produce, poultry, and eggs using whole genome sequencing technology in Canada.

Authors: Sonali Jain; Kakali Mukhopadhyay; Paul J Thomassin
Journal: Food Res Int Date: 2018-09-24 Impact factor: 6.475

4. Whole genome sequencing and metagenomics for outbreak investigation, source attribution and risk assessment of food-borne microorganisms.

Authors: Kostas Koutsoumanis; Ana Allende; Avelino Alvarez-Ordóñez; Declan Bolton; Sara Bover-Cid; Marianne Chemaly; Robert Davies; Alessandra De Cesare; Friederike Hilbert; Roland Lindqvist; Maarten Nauta; Luisa Peixe; Giuseppe Ru; Marion Simmons; Panagiotis Skandamis; Elisabetta Suffredini; Claire Jenkins; Burkhard Malorny; Ana Sofia Ribeiro Duarte; Mia Torpdahl; Maria Teresa da Silva Felício; Beatriz Guerra; Mirko Rossi; Lieve Herman
Journal: EFSA J Date: 2019-12-03

5. A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4.

Authors: Nicholas J Loman; Chrystala Constantinidou; Martin Christner; Holger Rohde; Jacqueline Z-M Chan; Joshua Quick; Jacqueline C Weir; Christopher Quince; Geoffrey P Smith; Jason R Betley; Martin Aepfelbacher; Mark J Pallen
Journal: JAMA Date: 2013-04-10 Impact factor: 56.272

6. A complete bacterial genome assembled de novo using only nanopore sequencing data.

Authors: Nicholas J Loman; Joshua Quick; Jared T Simpson
Journal: Nat Methods Date: 2015-06-15 Impact factor: 28.547

7. Identification of acquired antimicrobial resistance genes.

Authors: Ea Zankari; Henrik Hasman; Salvatore Cosentino; Martin Vestergaard; Simon Rasmussen; Ole Lund; Frank M Aarestrup; Mette Voldby Larsen
Journal: J Antimicrob Chemother Date: 2012-07-10 Impact factor: 5.790

8. The use of taxon-specific reference databases compromises metagenomic classification.

Authors: Vanessa R Marcelino; Edward C Holmes; Tania C Sorrell
Journal: BMC Genomics Date: 2020-02-27 Impact factor: 3.969

9. Metagenomics-Based Proficiency Test of Smoked Salmon Spiked with a Mock Community.

Authors: Claudia Sala; Hanne Mordhorst; Josephine Grützke; Annika Brinkmann; Thomas N Petersen; Casper Poulsen; Paul D Cotter; Fiona Crispie; Richard J Ellis; Gastone Castellani; Clara Amid; Mikhayil Hakhverdyan; Soizick Le Guyader; Gerardo Manfreda; Jöel Mossong; Andreas Nitsche; Catherine Ragimbeau; Julien Schaeffer; Joergen Schlundt; Moon Y F Tay; Frank M Aarestrup; Rene S Hendriksen; Sünje Johanna Pamp; Alessandra De Cesare
Journal: Microorganisms Date: 2020-11-25