Literature DB >> 26950338

Whole Genome Sequence Analysis Using JSpecies Tool Establishes Clonal Relationships between Listeria monocytogenes Strains from Epidemiologically Unrelated Listeriosis Outbreaks.

Laurel S Burall¹, Christopher J Grim¹, Mark K Mammel¹, Atin R Datta¹.

Abstract

In an effort to build a comprehensive genomic approach to food safety challenges, the FDA has implemented a whole genome sequencing effort, GenomeTrakr, which involves the sequencing and analysis of genomes of foodborne pathogens. As a part of this effort, we routinely sequence whole genomes of Listeria monocytogenes (Lm) isolates associated with human listeriosis outbreaks, as well as those isolated through other sources. To rapidly establish genetic relatedness of these genomes, we evaluated tetranucleotide frequency analysis via the JSpecies program to provide a cursory analysis of strain relatedness. The JSpecies tetranucleotide (tetra) analysis plots standardized (z-score) tetramer word frequencies of two strains against each other and uses linear regression analysis to determine similarity (r2). This tool was able to validate the close relationships between outbreak related strains from four different outbreaks. Included in this study was the analysis of Lm strains isolated during the recent caramel apple outbreak and stone fruit incident in 2014. We identified that many of the isolates from these two outbreaks shared a common 4b variant (4bV) serotype, also designated as IVb-v1, using a qPCR protocol developed in our laboratory. The 4bV serotype is characterized by the presence of a 6.3 Kb DNA segment normally found in serotype 1/2a, 3a, 1/2c and 3c strains but not in serotype 4b or 1/2b strains. We decided to compare these strains at a genomic level using the JSpecies Tetra tool. Specifically, we compared several 4bV and 4b isolates and identified a high level of similarity between the stone fruit and apple 4bV strains, but not the 4b strains co-identified in the caramel apple outbreak or other 4b or 4bV strains in our collection. This finding was further substantiated by a SNP-based analysis. Additionally, we were able to identify close relatedness between isolates from clinical cases from 1993-1994 and a single case from 2011 as well as links between two isolates from over 30 years ago. The identification of these potential links shows that JSpecies Tetra analysis can be a useful tool in rapidly assessing genetic relatedness of Lm isolates during outbreak investigations and for comparing historical isolates. Our analyses led to the identification of a highly related clonal group involved in two separate outbreaks, stone fruit and caramel apple, and suggests the possibility of a new genotype that may be better adapted for certain foods and/or environment.

Entities: CellLine Chemical Disease Species

Mesh：

Year: 2016 PMID： 26950338 PMCID： PMC4780826 DOI： 10.1371/journal.pone.0150797

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Listeria monocytogenes is the causative agent of listeriosis, an invasive disease associated with high hospitalization and mortality rates. Human listeriosis is typically caused by the ingestion of ready-to eat (RTE) foods contaminated with L. monocytogenes. Historically, foods typically associated with invasive listeriosis have included deli meat and soft cheeses; however, there has been a recent expansion to include ice cream and fresh produce, like cantaloupes, celery, and apples [1-3]. RTE foods pose higher risk for listeriosis as they are ingested without any further processing, such as cooking, that would kill L. monocytogenes. Many of these foods use refrigeration, among other methods, to restrict bacterial growth during their shelf-life. While these standard practices work well for most bacteria, they are not adequate for Listeria control as the organism is capable of growth at refrigeration temperature and is often tolerant to freezing temperature, high salt and low pH. While the proactive approach to illness prevention is restricting human exposure to contaminated foods, during ongoing outbreak epidemiological investigations, timely removal of the suspected contaminated foods is essential to limit the size of the outbreak. Historically, identification of the suspected foods has been done via patient interviews (exposure history) and genome based comparisons, such as Pulsed Field Gel Electrophoresis (PFGE), to match clinical isolates with food and/or environmental isolates. These processes are lengthy and resource intensive and certain PFGE profiles are more common than others due to PFGE’s comparatively low discriminatory power, when compared to WGS, whole genome MLST (wgMLST), and core genome MLST (cgMLST) among other techniques [4], which hinders the linkage of cases that cannot be connected via traditional epidemiological approaches. In the case of human listeriosis outbreaks, epidemiological links can be obscured by low overall incidence and a wide range of incubation times (3–90 days) before the onset of illness. Coupled with these drawbacks, cases with an unknown link can go unidentified if they are infrequent enough to appear as sporadic cases. This possibility is even more likely after taking into account limitations in patient memory after the protracted incubation time or the inability to conduct patient interviews in some cases. In order to improve our resolution of genetic relatedness, the CDC and U.S. FDA have started utilizing whole genome sequencing (WGS)to compare patient, food and environmental isolates[5, 6]. This type of approach has already assisted in linking cases associated with a specific vehicle and source [7-10]. Additionally, other groups have been able to use these approaches to identify linkages between historical isolates, aiding in source attribution [8, 11, 12]. Analysis of WGS data, however, can be difficult due to the intensive computational needs and bioinformatics skills for genomic comparisons and determining a threshold to establish relatedness. Different computational tools can also lead to some variation in the interpretation of the same data. While the overall results will likely allow similar conclusions, certain differences can be critical. Considerations such as whether to use a reference-based approach and which reference sequence to use [13] as well as differences in software algorithms can lead to different error rates, and variability leading to different interpretations [14-16]. Software algorithms may be optimized based to different assumptions either in the code or in settings chosen by the user. While in a broad analysis these differences may be unimportant, evaluation of more closely related strains could be shifted by slight differences in called SNPs. Also, many other methods can have a false high discovery rate of SNPs [16]. Understanding of these issues can make it difficult for a user to choose the right parameters when doing their analyses. We evaluated the tetranucleotide usage pattern analysis tool, JSpecies Tetra [17], initially as a quick quality assurance/control tool for WGS and then as an initial assessment of strain relatedness, identifying isolates and potential clusters for further higher resolution comparisons. This tool is capable of rapidly analyzing several genomes on a standard personal computer (PC) via a user-friendly graphical user interface (GUI) and provides data output that is easily interpreted and communicated with a broad cross-section of the scientific and regulatory community. Further comparisons, such as average nucleotide identity (ANI), core-genome or whole genome MLST, and high quality single nucleotide polymorphisms (SNP) comparisons, can then be done to verify any relationships of interest. Using this tool coupled with sero-grouping via qPCR [18] to further streamline the process, we were able to verify previously established relationships among Listeria isolates, and discovered new ones including stone fruit and caramel apple outbreaks [3, 19] that may prove of use in future efforts. While addition of a qPCR sero-grouping method may seem out of date, we have found that the use of this information helps us better select comparison genomes given the distinct genomic separation of the various L. monocytogenes serotypes [20]. We can use this sero-group information to focus our comparisons on a somewhat broad group, blind to PFGE pattern similarities that can be obscured by phage variations, while still narrowing it to a dataset that a normal workstation can process. Additionally, this qPCR can be completed before a sequence run is completed or even started, providing immediate guidance to a front line analyst as to which group to compare the new data with when performing the initial JSpecies comparison. The use of the JSpecies tool will enhance our ability to quickly identify isolates with a high degree of relatedness during outbreak investigations, improving our response during attribution studies and in detecting incidents with any potential links to historic or otherwise epidemiologically unlinked Listeria isolates.

Materials and Methods

Strains, media and reagents

Strains used in this study are listed in S1 Table and Table 1. Strains examined in this study were from our lab collection, as well as a few well sequenced reference genomes from NCBI. Strains from our collection were included in this study if they were serotype 4b and had already been sequenced as part of active outbreak investigations or the GenomeTrakr project with SRA files publicly available for analysis [5] or during outbreak follow-up investigations, as with the stone fruit isolates. Strains sequenced were randomly chosen from isolates obtained during various isolation efforts, whether by our lab or the originating lab. Cultures were stored at -800 C in presence of 20% glycerol and routinely grown at 370 C using brain heart infusion (BHI) (Sigma-Aldrich, USA) agar or broth.

Table 1

Strains analyzed in the various phylogenetic comparisons highlighted in this paper.

Strain	Accession #	Alternate Designation	Source Type	Originating Lab^a	Country^c	Isolation Year	Serotype
CLIP80459	FM242711.1[21]		cheese	n/a^b			4b
F2365	AE017262.2[22]		patient	n/a			4b
H7858	AADR00000000[22]		meat	n/a			4b
HPB2262	AATL00000000[23]		patient	n/a			4b
J1-220	CP006047.2[24]		patient	n/a			4b
J1776	CP006598		patient	n/a			4b
J1816	CP006046.2[24]		environmental	n/a			4b
J1817	CP006599		environmental	n/a			4b
J1926	CP006600		meat	n/a			4b
L312	FR733642.2		cheese	n/a			4b
LL195	HF558398.1		Patient	n/a			4b
ScottA	CM001159.1[25]		patient	n/a	USA	1983	4b
LS41	LNOC00000000	CEB 1883	patient	Pasteur Institute		pre-1986	4bde
LS42	LNNL00000000	CEB 1890	patient	Pasteur Institute		pre-1986	4bde
LS47	LNOD00000000	CEB 2015	environmental	Pasteur Institute		pre-1986	4bde
LS50	LNOE00000000	CEB 2776	animal	Pasteur Institute		pre-1986	4bde
LS114	LNNY00000000	SLCC 4527	patient		Germany	1975	4b
LS267	LNNM00000000	BE4-79	patient	TX DOH	USA	1994	4bde
LS268	LNNN00000000	BE4-80	patient	TX DOH	USA	1994	4bde
LS269	LNNO00000000	BE4-81	patient	TX DOH	USA	1994	4bde
LS270	LNNP00000000	BE4-148	patient	TX DOH	USA	1994	4bde
LS271	LNNQ00000000	BE4-288	patient	TX DOH	USA	1994	4bde
LS272	LNNR00000000	BE4-769	patient	TX DOH	USA	1994	4bde
LS275	LNNS00000000	BR4-1259	patient	TX DOH	USA	1994	4bde
LS276	LNNT00000000	BR4-1292	patient	TX DOH	USA	1994	4bde
LS542	AVQQ01	501928	environmental	WEAC	USA	2010	4bV
LS642	AVQM01	10M – 127	clinical		Australia		4bV
LS643	AVQN01	10M – 130	clinical		Australia		4bV
LS644	AVQO01	10M – 138A	clinical		Australia		4bV
LS645	AVQP01	10M – 198	clinical		Australia		4bV
LS651	LNNU00000000	HUM 2011015611	patient	Denver HMC	USA	2011	4b
LS988	LNNV00000000	869223-5c1	yellow nectarine	OARSA	USA	2014	4bV
LS990	LNNW00000000	869224-1a2	white peach	OARSA	USA	2014	4bV
LS993	LNNX00000000	869224-2b1	white peach	OARSA	USA	2014	4bV
LS996	LNNZ00000000	869224-2c2	white peach	OARSA	USA	2014	4bV
LS997	LNOA00000000	869224-3a1	white peach	OARSA	USA	2014	4bV
LS1000	LNOB00000000	869224-3b2	white peach	OARSA	USA	2014	4bV
LS1012*	LNPM00000000	824917–3	granny smith apple	Den-DO	USA	2015	4b
LS1013*	LNPN00000000	824917–4	granny smith apple	Den-DO	USA	2015	4b
LS1014*	LNPO00000000	824917–5	granny smith apple	Den-DO	USA	2015	4bV
LS1015*	LNPP00000000	824917–6	granny smith apple	Den-DO	USA	2015	4b
LS1016*	LNPQ00000000	824917–7	granny smith apple	Den-DO	USA	2015	4b
LS1017*	LNSR00000000	824917–10	granny smith apple	Den-DO	USA	2015	4bV
LS1023*	LNSL00000000		patient	CDC	USA	2014	4b
LS1024*	LNSM00000000		patient	CDC	USA	2014	4bV
LS1064*	LNSN00000000	Lm 2014 L-6268	patient	CDC	USA	2014	4bV
LS1065*	LNSO00000000	Lm 2014 L-6561	patient	CDC	USA	2014	4bV
LS1066*	LNSP00000000	Lm 2014 L-6390	patient	CDC	USA	2014	4bV
LS1067*	LNSQ00000000	Lm 2014 L-6470	patient	CDC	USA	2014	4bV

aOriginating Lab refers to the group that provided the isolates to our collection.

bn/a, Genomes were accessed through NCBI and the strain was not used from our collection.

cCountry, refers to the original country of origin of the isolate.

*Indicates strains sequenced at their originating labs prior to inclusion on our collection.

aOriginating Lab refers to the group that provided the isolates to our collection. bn/a, Genomes were accessed through NCBI and the strain was not used from our collection. cCountry, refers to the original country of origin of the isolate. *Indicates strains sequenced at their originating labs prior to inclusion on our collection.

Sero-grouping of isolates

A qPCR method for sero-grouping has been developed based on a modification of a previously published conventional PCR method [18, 26]. The qPCR primers and probes are shown in Table 2. The reaction was performed using Quantifast Multiplex PCR kit without ROX (Qiagen, Germantown, MD) in two separate multiplex reactions. The first reaction verifies that the isolate is Listeria monocytogenes via Listeria genus and L. monocytogenes species specific targets. The second reaction determines which serogroup the isolate belongs to, if any (Table 3). Results of the sero-grouping analysis are reported in Table 1.

Table 2

qPCR Primers and Probes.

Name	Sequence	5’ Fluorescence Label	3’ Quencher
IAC-PCF610	TTACAACGGGAGAAGACAATGCCACCA	CAL Fluor Red 610	BHQ-2
Lall-PFAM	ATGTCATGGAATAAT	FAM	MGB-NFQ
Hly3-PTET	TGCACTGGTTTAGCTTGGGAATGGT	TET	BHQ-2
lmo0737-Q670	TTTGCAAGTCAGGGTCTTGTCCGA	Quasar 670	BHQ-2
lmo1118-TET	AGGCGTATACACTCAGGAGAAGATAAAGGT	TET	BHQ-1
ORF2110-Q705	AGTATGACTTCGGGCACAGTTGGC	Quasar 705	BHQ-2
ORF2819-FAM	TGGCAGTTTCCAGGACTTCACTTGT	FAM	BHQ-1
IAC_F	GGCGCGCCTAACACATCT	n/a	n/a
IAC_R	TGGAAGCAATGCCAAATGTGTA	n/a	n/a
qhly3_F	GCTCATTTCACATCGTCCATCTA	n/a	n/a
qhly3_R	CCGGTCATCAATTACCGTTCTC	n/a	n/a
qiap_F	GTTAAAAGCGGYGAYACWATTTGG	n/a	n/a
qiap_R	TTTGACCYACATAAATAGAAGAAGAAGATAA	n/a	n/a
qlmo0737_F	AGATGAACGGCAGAGACTTAAA	n/a	n/a
qlmo0737_R	CCGATCCGAATGCTGCTAATA	n/a	n/a
qlmo1118_F	TGCTTAATAACAGATGAAGAGGATG	n/a	n/a
qlmo1118_R	CTTGTTCCTTAGTATTCCAGGATTT	n/a	n/a
qORF2110_F	CAGAATACGGCATCCCTGATAA	n/a	n/a
qORF2110_R	AGCTCCACGTCCAAAGTAAG	n/a	n/a
qORF2819_F	CATCACTAAAGCCTCCCATTGA	n/a	n/a
qORF2819_R	CCCTCCAACATATACGGAAAGAG	n/a	n/a

Table 3

Serogroup Pattern Identification.

	2a Complex	2b Complex	2c Complex	4b Complex	4bV
Indicated Serotype	1/2a, 3a	1/2b, 3b	1/2c	4b, 4d, 4e	4bV
lmo1118	-	-	+	-	-
lmo0737	+	-	+	-	+
ORF2110	-	-	-	+	+
ORF2819	-	+	-	+	+

Genomic DNA isolation and sequencing

Genomic DNA was isolated on a QiaCube (Qiagen, Germantown, Maryland) using Qiagen’s DNeasy Blood & Tissue kit and the Gram Positive extraction protocol with the pre-lysis step incubated for 1 hour at 37°C. The DNA concentration was determined using a Qubit 2.0 fluorometer (ThermoFisher Scientific) and then diluted to 0.2ng/uL. Genomic libraries were prepared with a Nextera XT DNA sample preparation kit (Illumina, San Diego, CA). A 2x250 paired-end sequencing run was performed on an Illumina MiSeq benchtop sequencer and reads were trimmed and assembled using the CLC Genomics Workbench v7.0 (CLC Bio, Aarhus, Denmark). The assembled contigs were exported as fasta files for JSpecies analyses. Assembly files for strains involved in the detailed analyses have been submitted to NCBI (Table 1).

Genomic analyses by JSpecies

Sequences were imported into the JSpecies workspace and a Tetra analysis was performed [17]. A subset of the isolates was also analyzed in the JSpecies workspace using the ANIb (average nucleotide identity via BLAST) analysis tool to further verify the results. The data file was exported to Microsoft Excel and sorted numerically to rank pairs by their r2 value for Tetra analysis or percent identity for ANI comparisons.

Genome comparison by SNP analysis

Two versions of SNP analyses were performed on the data. The first was performed using a BLAST-based SNP analysis that first identified a core set of genes (n = 2052). This core set of Listeria monocytogenes genes was selected using 92 whole genome sequences (11 closed genomes and 81 shotgun sequences) that were incorporated in a BLAST database. All annotated genes of the reference strain Listeria monocytogenes F2365 (GenBank AE017262) were BLASTed against the database. Each gene that was determined to be present in each of the 92 genomes exactly once was retained as a core gene. A gene was considered present in a genome if it matched the genome at least 90% sequence identity over at least 90% of the gene length. As a result, 1,852 genes were selected. To identify SNPs in genes within the test strains, the core reference genes were BLASTed against the database, and for each genome matched, the matching bases to the reference sequence were aligned in one file for each core gene. Each core gene alignment was then scanned for nucleotide variation at each position. A fasta sequence was generated for each genome sequence by concatenating all of the 23,545 variable bases from the core gene alignments. These fasta sequences were aligned using MEGA6 [27] and this alignment was used to construct a phylogenetic tree using the Neighbor-Joining method with node confidence assessed by 1000 bootstrap replicates [28]. A second SNP analysis was performed using the CFSAN, FDA SNP pipeline, run locally on a Linux Ubuntu workstation. Reference strains were selected based on serotype information so that 4bV strains were mapped against an epidemiologically unrelated, but phylogenetically close, 4bV strain, LS642, and 4b strains were mapped against F2365, an unrelated 4b strain. The source code and further description is available at https://github.com/CFSAN-Biostatistics/snp-pipeline [16]. The SNP FASTA file output was used to reconstruct phylogenetic relationships using MEGA among strains and to specifically determine SNP differences within a given cluster of strains by exporting the non-conserved SNPs to a Microsoft Excel file for analysis.

Results

Evaluation of the JSpecies tool

JSpecies is a species-level genome sequence comparison suite that is comprised of three different peer-reviewed bioinformatics tools [17]. The focus of this work is the Tetra tool, an alignment free tool that performs pairwise analyses of genome sequences by examining the frequency of the 256 possible nucleotide tetramers. This frequency is compared with the expected frequency based on the size of the sequence and deviation from the expected frequency provides a z-score that is used to compare two strains in a linear regression analysis. This calculation allows genomes of different sizes to be compared without the results being skewed by genome completeness, which is a key consideration when comparing draft genome sequences. Each pair within a dataset is compared via a linear regression analysis which provides an r2 value. In linear regression analyses, an r2 value of 1 indicates that the data of one strain is considered to have “predicted” perfectly the data of the other strain. This means that as the r2 value approaches one the strains can be considered more closely related as the results for one predict/match the results for the other. From our experience, values of greater than 0.99998 are indicative of a highly probable clonal relationship. A value of 0.99998 is indicative of slight differences, some of which are real and others that may be due to differences in the sequence reads and their assemblies. It should be noted that correlation coefficients between genomes that are sequenced on the same platform to equivalent coverage levels and assembled with the same software should more accurately reflect true genomic differences. As the r2 value continues to decrease, genomic divergence between the isolates becomes greater. Given the familiarity of this type of statistical method and the interpretation of r2 values, the results can provide an easily communicable approximation of the measure of genetic relatedness. Additionally, the analysis can be performed rapidly on any computer with Java installed with extremely limited training of the analyst. Given the limited barrier to entry in using this tool, analysts generating new sequence data would be able to compare their suspect strains against previously acquired strains, allowing more rapid identification of clusters. The presumptive linkages could then be selected for further, more intensive analysis to verify the linkage. Additionally, the use of this tool prior to submission of sequence data to NCBI would provide a quality assurance step, preventing submissions containing contaminating sequence. Test analyses of randomly selected genomes were performed to determine how quickly JSpecies Tetra could process unbiased datasets of varying sizes. Random selection from a pool of 203 Lm genomes (S1 Table) from various serotypes, isolation dates and sources was performed using a random number generator (http://www.mathgoodies.com/calculators/random_no_custom.html). JSpecies Tetra analyses of randomly selected ten genomes (3Mb average length) could be analyzed reliably in 67 seconds with one of the six trials taking 69 seconds. Analyses of 25 genomes could be performed in 183 to 200 seconds. We were able to analyze data sets of up to 80 genomes in 30 minutes or less. The analysis did stall periodically in larger datasets or those involving more distantly related strains; however, reinitiating the process resulted in progression through the data set rather than starting from the initial point. Various factors resulted in increased analysis time including strain variability as noted as well as the operation of other programs on the computer. These initial analyses identified earlier confirmed relationships between epidemiologically linked strains involved in various previously investigated outbreaks, verifying its ability to detect genetic links. Also, in general, isolates known to be unrelated based on earlier analyses were similarly not linked in these works. However, two epidemiologically unlinked serotype 4b strains, LS47 and LS114, were found to be highly related in these analyses, which will be further investigated in the next section.

JSpecies Tetra analysis identified unknown links

During routine qPCR-based serotype analysis based on an adaptation of a previously published method [26] of new isolates obtained from the 2014 stone fruit outbreak/incident [9, 19] and the 2014 caramel apple outbreak [3], both related to L. monocytogenes contamination, we found a portion of these isolates belonged to the serotype 4bV (Table 1)[29]. The 4bV isolates, also termed as IVb-v1, are serotype 4b by standard antigen-antibody based serology [30] but differed from 4b by PCR-based serotyping [18, 26, 30] due to acquisition of a 6.3kb DNA found in serotype 1/2a, 3a, 1/2c and 3c strains[29, 31]. Based on this information, a small panel of 4b and 4bV isolates was selected for comparison with each other via JSpecies Tetra analysis (data not shown). This analysis indicated a high degree of relatedness between the stone fruit incident/outbreak and caramel apple outbreak isolates. To investigate whether this linkage would persist in a larger, more diverse dataset, the test dataset was expanded to include 31 4b and 19 4bV strains (Table 1). This Tetra analysis confirmed that the 4bV isolates obtained during the stone fruits investigation and the caramel apple outbreak were highly related with r2 values of 0.99999 or 1 (S2 Table, Cluster 3). Additionally, this analysis identified a previously unknown genetic relatedness between isolates from several cases of human listeriosis linked to frozen vegetables in the state of Texas, USA that spanned December 1993 to January 1994 and an isolate from a clinical case in Colorado, USA, from 2011 with no identified source (S2 Table, Cluster 1). This analysis also verified the previously mentioned link between LS47 and LS114 (S2 Table, Cluster 2), two isolates, one clinical and one environmental, from Europe dating back approximately 30 years. We then decided to analyze these strains and a subset of unrelated strains via JSpecies ANIb (average nucleotide identity by BLAST). Analysis of the two strains in cluster 2 (LS47 and LS114) (S2 Table) showed 99.94% identity via ANIb analysis, indicating that the strains show some divergence perhaps due to genetic drift due to temporal factors and the disparate isolation sources (silage and a human patient). Alternatively, this could be due to genetic change during lab passage and storage over the following ~30 years. The strains indicated in cluster 1 showed 99.9–100% identity by ANIb, providing further support that there is a potential link between the 1994 isolates and the 2011 patient isolate. If this information had been available in 2011, it likely would have provided valuable guidance to investigators by suggesting a possible link with the 1994 cases and associated food products. The patient could have then been queried to determine if there was an exposure history for those products. In 2014, FDA reported a voluntary recall by a packing company in California of whole white and yellow peaches, white and yellow nectarines, plums and pluots due to the potential of the products being contaminated with Listeria monocytogenes. Later investigations by CDC revealed that these products could be associated with one or more cases of human listeriosis [9]. Investigations by FDA at the time of the recall resulted in L. monocytogenes isolates from several of these fruits [19]. During the latter part of 2014 and early 2015, CDC reported a multistate listeriosis outbreak involving pre-packed caramel apples made from Granny Smith and Gala apples from a packing company in California [3]. Out of 35 illnesses, 11 were pregnancy associated and three invasive illnesses (meningitis) occurred among otherwise healthy children aged 5–15 years. L. monocytogenes isolates from the environment and apples from the packing facility were closely matched with the outbreak strains. Although these outbreaks/incidents occurred during 2014 and both were associated with fruits processed in California, no link between these outbreaks was ever identified. The results of the JSpecies analysis of isolates from these two outbreaks (S2 Table, Cluster 3) clearly show a very close genetic linkage among the recent 4bV isolates. The JSpecies ANIb tool further verified this relationship showing 99.9–100% identity between a subset of the 4bV isolates obtained from stone fruits and Granny Smith apples in both the initial and reciprocal comparisons. Importantly, comparisons with five unrelated 4bV strains identified no significant link either by JSpecies Tetra or ANIb analysis (S2 Table & Table 4).

Table 4

JSpecies ANIb of a subset of the 4bV Strains again shows a higher level of relatedness between the apple and stone fruit linked isolates and 4bV strains from unrelated sources.

	LS997	LS1017	LS1065	LS1014	LS1064	LS996	LS1024	LS642	LS644	LS542
LS997	---	99.99	99.98	99.99	99.99	99.98	99.98	99.56	99.56	99.52
LS1017	99.99	---	99.98	99.99	99.99	99.99	100	99.57	99.56	99.53
LS1065	99.99	99.99	---	99.99	99.99	99.98	99.99	99.58	99.56	99.53
LS1014	99.99	100	99.99	---	99.99	99.98	99.99	99.56	99.55	99.53
LS1064	99.99	99.99	99.97	99.99	---	99.99	99.99	99.56	99.56	99.52
LS996	99.99	99.99	99.99	99.99	99.99	---	99.99	99.57	99.56	99.53
LS1024	99.98	99.99	99.98	99.98	99.98	99.99	---	99.55	99.56	99.49
LS642	99.56	99.56	99.56	99.57	99.57	99.56	99.57	---	99.54	99.5
LS644	99.55	99.55	99.54	99.55	99.55	99.54	99.55	99.57	---	99.56
LS542	99.47	99.47	99.45	99.47	99.47	99.47	99.46	99.49	99.52	---

To corroborate the results from the JSpecies Tetra and ANIb analyses, a core genome SNP-based analysis was performed on the same panel of strains (Table 1). As part of this analysis, 14 genomes sequenced at other labs were also included in this dataset as they provided more complete genomic sequences and annotation, compared to the draft assemblies in this work, enabling better identification of the core genome. The BLAST-based core genome SNP comparison also showed similar links between the previously discussed strains (Fig 1). As seen with the ANIb analysis, LS47 and LS114 showed some differentiation from each other. The SNP-based tree also shows no noticeable divergence between the 1994 and 2011 isolates within the diverse dataset. A more focused comparison of the fruit 4bV strains indicates a slight divergence between the stone fruit and apple isolates (Fig 2). The outliers noted in the CDC report, which are referred to as LS1064 and LS1067 in this study, again fall outside the other clusters observed in this tree. However, the data still support a high likelihood of a recent common source for all of these isolates that warrants further consideration.

Fig 1

Evolutionary relationships of taxa.

The evolutionary history was inferred using the Neighbor-Joining method based on the data from the BLAST-SNP analysis [22]. The optimal tree with the sum of branch length = 30986.27311707is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches [32]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the number of differences method [33] and are in the units of the number of base differences per site. The analysis involved 49 nucleotide sequences of 4b and 4bV strains indicated in Table 1. Codon positions included were 1st+2nd+3rd+Noncoding. All ambiguous positions were removed for each sequence pair. There were a total of 23545 positions in the final dataset. Evolutionary analyses were conducted in MEGA6 [27]. The strains highlighted in Table 1 are similarly noted here with cluster 1 in purple, cluster 2 in green, and cluster 3 in blue.

Fig 2

An optimal tree focusing on cluster 3 containing the stone fruit and apple 4bV isolates.

Evolutionary relationships of taxa.

An optimal tree focusing on cluster 3 containing the stone fruit and apple 4bV isolates.

The tree was generated as in Fig 1 with a sum of branch length = 129.46875000 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches [32]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the number of differences method [33] and are in the units of the number of base differences per sequence. The analysis involved 13 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All ambiguous positions were removed for each sequence pair. There were a total of 23545 positions in the final dataset. Evolutionary analyses were conducted in MEGA6. Analysis of the strains was also conducted using the CFSAN SNP pipeline [16] to further validate our observations. In this case, careful consideration of a closely related reference strain was required. Given that the 4bV strains have a unique 6.3 kb island, we decided to use LS642 [29, 34], a clinical 4bV isolate from an Australian source as the reference strain for comparing the strains in cluster 3 (Fig 3A) and F2365, a serotype 4b isolate from the 1985 cheese outbreak for analysis of clusters 1 and 2 (Fig 3B) [22, 35]. As with prior analyses, we again see the strains in cluster 3 forming a clade with each other and separated from the other 4bV strains available for comparison (Fig 3A). We defined SNPs as stone fruit (SF) specific if they were present in all ten SF isolates included in this analysis and none of the apple isolates and SF biased if they were present in at least seven of the ten analyzed SF isolates and no more than one of the four apple isolates. This resulted in 22 SF specific SNPs and 25 SF biased SNPs. The converse analysis for apple specific and apple biased SNPs identified no apple specific SNPs and two apple biased SNPs, relative to the LS642 sequence. However, it was interesting to observe that in each instance of a SF specific SNP, the apple strains had the reference strain(LS642) sequence conserved at that site in 75% of the isolates with LS1014 consistently diverging (S3 Table). We also found that in all but ten of the SF biased SNPs the apple strains again had maintained the reference strain sequence. Overall, this data supports the observation that the isolates from the apple and stone fruit incidents are highly related, though not identical.

Fig 3

Neighbor-Joining trees based on the CFSAN SNP pipeline results.

Neighbor-Joining trees based on the CFSAN SNP pipeline results.

A) This tree shows the evolutionary relationships between the 4b and 4bV strains, using LS642 as the reference strain. B) This tree examines 4b and 4bV strains, using F2365 as the reference strain. Clusters 1, 2 and 3 are highlighted in purple, green and blue, respectively, in both. The use of arrows used in 3a show the alteration that can occur using a more distantly related reference strain. We also examined the apple associated serotype 4b strains; however, we used F2365 as the reference genome as this genome represented a closer relative than a 4bV isolate. Using this analysis, we found 11 SNPs relative to F2365, with seven of them confined to one isolate (FDA00008714). We also examined the SNP differences between LS47 and LS114 in Cluster 2 and found 126 SNPs between the two strains. The results underscore the critical need for selection of a proper reference genome as when LS642 was used as the reference strain, we found over 1000 SNPs between these two strains. However, based on the comparison using F2365 as the reference, we can conclude that the two strains are remarkably related, especially when you consider that they were originally isolated in the 1970s and have undergone an unknown number of passages, media types and storage conditions that could result in the accumulation of SNPs. A study comparing isolates from a single processing facility spanning 12 years in their isolation dates identified one SNP, though there was evidence of plasmid and prophage alterations [12]. It is important to consider that while there was less change in these processing facilities isolates, they were known to be from a related source that, while uncontrolled, may have provided more similar selective pressures than the environments for LS47 and LS114 which were isolated from different sources (clinical vs. environmental) and underwent subsequent laboratory passage. We also compared the isolates identified as Cluster 1 (S2 Table) using the CFSAN SNP Pipeline with F2365 as the reference genome. This analysis identified 156 SNPs in this cluster relative to F2365; however, while three were completely unique to LS651 with the F2365 nucleotide conserved in the older isolates, an additional 21 SNPs had a point mutation that differed from those found at that site in the older isolates. For example, LS267 encoding a G while LS651 encoded a T and the others had maintained the F2365 residue at a given location. This data supports a remarkable level of relatedness and given the time span involved, could again represent divergence of the LS651 strain from the original population represented by the 1994 isolates.

Discussion

Serotyping of Lm strains during outbreak investigations is the first step towards subtyping. Although a vast majority of the strains linked to outbreaks are grouped into three serotypes, 1/2a, 1/2b and 4b, occasional deviation from this pattern may provide a very important clue during early phases of outbreak investigations. For example, a listeriosis outbreak in Finland involving butter was caused by serotype 3a strains [36]. We have been using a real-time PCR-based assay developed in our laboratory for serotyping of Lm strains that is based on a previously published PCR protocol for sero-grouping strains [18, 26]. The protocol identifies the major disease causing serotypes and a few other rare serotypes including 1/2c, 3a, 3b, 3c, when coupled with a basic agglutination assay, and is uniquely capable of identifying 4bV isolates. Identification of serotype 4bV among the stone fruit and caramel apple outbreak isolates provided the first clue that these strains could be related. Although 4bV isolates of Lm have been reported from different parts of the world [30], including isolates in Australia [34], these variants are not frequently detected. The unique acquisition of a 6.3 kb DNA fragment in their 4b genome backbone indicate that these strains may have acquired newer traits from other L. monocytogenes serotypes which could affect their pathophysiology and environmental adaptation[29]. The WGS comparison studies of these 4bV strains performed in this study, based on the serotype information, resulted in the identification of an intriguing link that may indicate the need for further evaluation. Genomic comparison using WGS data has been a very useful and powerful tool for establishing potential links between clinical, food and environmental isolates of L. monocytogenes. Finding such links during outbreak investigations could lead to identification of the source of contamination, removal of contaminated foods from circulation thereby saving lives and reducing other burdens associated with such outbreaks. Several methods of varying degree of discriminatory power are currently in use for analysis of WGS data. These bioinformatics tools, however, are resource intensive and can take a long time particularly if one has to compare dozens of isolate genomes. In this communication, we evaluated a relatively easier tool, JSpecies Tetra, to compare genomes of several L. monocytogenes strains obtained from various listeriosis outbreaks. As a first step towards this goal, we compared the genomes of L. monocytogenes isolates, which had been already established to be linked with each other by other genomics tools and epidemiological investigation. The results with the limited numbers of epidemiologically linked outbreak strains clearly showed that JSpecies Tetra analysis could be a useful tool to indicate a genomic relationship. The ready identification of previously known links between Listeria monocytogenes isolates, including three 4bV strains from Australia [29], as well as unknown links between Listeria monocytogenes isolates, shows that the JSpecies tool may provide a useful initial approach for the rapid assessment of genomic relatedness that can be readily performed on most PCs, making it more user accessible. As a superficial comparison of the three approaches, the JSpecies analysis of the subset in S2 Table took less than 10 minutes while the BLAST-based SNP approach took 16 minutes. The CFSAN SNP Pipeline of this dataset took about 4 hours to run locally, though this analysis can be done much more rapidly on a high-performance computing cluster or with cloud computing. While this time difference is minimal at least for the first two, a key factor to consider is that both SNP analyses required detailed knowledge of bioinformatics tools and the use of a Linux machine, as well as the identification of the core genome. Conversely, the JSpecies tool can be done as both a QA/QC measure verifying that the genomic DNA isn’t contaminated and as a rapid check for relatedness with other strains by any researcher with access to a standard PC, though assembly data can also provide clues on the sequence quality. Furthermore, the easy interpretation of the results would improve the ability of investigators to identify links and communicate them across a wide range of disciplines as well as to individuals with the skills and tools to perform the more rigorous methods to determine the actual degree of relatedness. Use of the JSpecies Tetra analysis tool would improve the quality of genomic DNA sequence being submitted into GenBank and enable the scientist to conduct their own independent detailed analysis or to alert investigators of any noteworthy links for further more detailed analysis. While evaluating this tool, we identified two cases of genetic relatedness that would have been relevant if the information had been available to investigators during the active response in either the 2011 case or the caramel apple outbreak. In the case of the links between the 2014 isolates, the question remains as to why these highly related strains were found in two unrelated food vehicles, as well as in clinical isolates from the summer of 2014 that have not been linked to a food source [19]. The production facilities for these foods are located about 70–80 miles from each other. It is possible that the strains implicated in these incidents is present natively within the region and may have adapted to be more competitive in this environment with a coincidental increased fitness in survival within packing facilities. Alternatively, it is possible that a cross-contamination event occurred between the facilities. Determination of which of these explanations is more likely would allow better control of future incidents. The slight divergence between the fruit 4bV isolates suggests the possibility of an environmental niche favorable to the expansion of these strains within the growing region or within the processing facilities. It is interesting that the stone fruit isolates had a higher frequency and specificity of SNPs than the apple isolates suggesting the possibility of a shift selective pressure in the stone fruit environment driving the accumulation of mutations. Given that both foods were fruits, it should be investigated whether this 4bV strain is uniquely adapted to produce or, specifically, fruit contamination and whether there are factors unique to the stone fruits or their processing environment that could explain the greater divergence from the reference strain. In summary, this evaluation of the JSpecies tool has shown that it can rapidly and easily compare WGS of Lm strains, establish genetic relatedness, especially when aided by a simple PCR based serogrouping, and provide useful information guiding source attribution efforts and improving outbreak response.

Strains for JSpecies Base Assessment.

This table provides the information on the strains, as well as their biosample number, used to assess JSpecies performance. (XLSX) Click here for additional data file.

JSpecies Comparison of 4b and 4bV Isolates.

Highly related groups are highlighted with previously known clusters highlighted grey and the new clusters in purple (Cluster 1), green (Cluster 2) and blue (Cluster 3). The blue highlighted text shows the r2 values for previously characterized 4bV isolates. (XLSX) Click here for additional data file.

CFSAN Pipeline identified SNPs for Cluster 3.

This table lists the SNPs found in the cluster 3 strains relative to LS642 that occurred that were found more frequently in one of the two strain subset. A period (.) indicates the nucleotide was unaltered in the test strain relative to LS642 while a dash (-) indicates a gap. Each SNP was assigned as SF specific, SF biased or apple biased as noted by the position of the plus symbol. (XLSX) Click here for additional data file.

28 in total

1. Observation of a new pattern in serogroup-related PCR typing of Listeria monocytogenes 4b isolates.

Authors: Bixing Huang; Ningxia Fang; Karolina Dimovski; Xian Wang; Geoff Hogg; John Bates
Journal: J Clin Microbiol Date: 2010-11-03 Impact factor: 5.948

2. Genome sequence of Listeria monocytogenes Scott A, a clinical isolate from a food-borne listeriosis outbreak.

Authors: Yves Briers; Jochen Klumpp; Markus Schuppler; Martin J Loessner
Journal: J Bacteriol Date: 2011-06-17 Impact factor: 3.490

3. The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors: N Saitou; M Nei
Journal: Mol Biol Evol Date: 1987-07 Impact factor: 16.240

4. Multistate outbreak of listeriosis associated with Jensen Farms cantaloupe--United States, August-September 2011.

Authors:
Journal: MMWR Morb Mortal Wkly Rep Date: 2011-10-07 Impact factor: 17.586

5. Characterization of the novel Listeria monocytogenes PCR serogrouping profile IVb-v1.

Authors: Alexandre Leclercq; Viviane Chenal-Francisque; Hélène Dieye; Thomas Cantinelli; Rezak Drali; Sylvain Brisse; Marc Lecuit
Journal: Int J Food Microbiol Date: 2011-03-21 Impact factor: 5.277

6. Whole genome comparisons of serotype 4b and 1/2a strains of the food-borne pathogen Listeria monocytogenes reveal new insights into the core genome components of this species.

Authors: Karen E Nelson; Derrick E Fouts; Emmanuel F Mongodin; Jacques Ravel; Robert T DeBoy; James F Kolonay; David A Rasko; Samuel V Angiuoli; Steven R Gill; Ian T Paulsen; Jeremy Peterson; Owen White; William C Nelson; William Nierman; Maureen J Beanan; Lauren M Brinkac; Sean C Daugherty; Robert J Dodson; A Scott Durkin; Ramana Madupu; Daniel H Haft; Jeremy Selengut; Susan Van Aken; Hoda Khouri; Nadia Fedorova; Heather Forberger; Bao Tran; Sophia Kathariou; Laura D Wonderling; Gaylen A Uhlich; Darrell O Bayles; John B Luchansky; Claire M Fraser
Journal: Nucleic Acids Res Date: 2004-04-28 Impact factor: 16.971

7. Epidemic listeriosis associated with Mexican-style cheese.

Authors: M J Linnan; L Mascola; X D Lou; V Goulet; S May; C Salminen; D W Hird; M L Yonekura; P Hayes; R Weaver
Journal: N Engl J Med Date: 1988-09-29 Impact factor: 91.245

8. Short-term genome evolution of Listeria monocytogenes in a non-controlled environment.

Authors: Renato H Orsi; Mark L Borowsky; Peter Lauer; Sarah K Young; Chad Nusbaum; James E Galagan; Bruce W Birren; Reid A Ivy; Qi Sun; Lewis M Graves; Bala Swaminathan; Martin Wiedmann
Journal: BMC Genomics Date: 2008-11-13 Impact factor: 3.969

9. Genomic characterization of novel Listeria monocytogenes serotype 4b variant strains.

Authors: Pongpan Laksanalamai; Bixing Huang; Jonathan Sabo; Laurel S Burall; Shaohua Zhao; John Bates; Atin R Datta
Journal: PLoS One Date: 2014-02-19 Impact factor: 3.240

10. Choice of reference-guided sequence assembler and SNP caller for analysis of Listeria monocytogenes short-read sequence data greatly influences rates of error.

Authors: Arthur W Pightling; Nicholas Petronella; Franco Pagotto
Journal: BMC Res Notes Date: 2015-12-08

17 in total

1. Societal Implications of the Internet of Pathogens.

Authors: Alexander L Greninger
Journal: J Clin Microbiol Date: 2019-05-24 Impact factor: 5.948

2. Whole-Genome Sequencing and Concordance Between Antimicrobial Susceptibility Genotypes and Phenotypes of Bacterial Isolates Associated with Bovine Respiratory Disease.

Authors: Joseph R Owen; Noelle Noyes; Amy E Young; Daniel J Prince; Patricia C Blanchard; Terry W Lehenbauer; Sharif S Aly; Jessica H Davis; Sean M O'Rourke; Zaid Abdo; Keith Belk; Michael R Miller; Paul Morley; Alison L Van Eenennaam
Journal: G3 (Bethesda) Date: 2017-09-07 Impact factor: 3.154

3. A clade of Listeria monocytogenes serotype 4b variant strains linked to recent listeriosis outbreaks associated with produce from a defined geographic region in the US.

Authors: Laurel S Burall; Christopher J Grim; Atin R Datta
Journal: PLoS One Date: 2017-05-02 Impact factor: 3.240

4. Genomic characterization of eight Ensifer strains isolated from pristine caves and a whole genome phylogeny of Ensifer (Sinorhizobium).

Authors: Heerman Kumar Sandra Kumar; Han Ming Gan; Mun Hua Tan; Wilhelm Wei Han Eng; Hazel A Barton; André O Hudson; Michael A Savka
Journal: J Genomics Date: 2017-01-18

5. A Comprehensive Evaluation of the Genetic Relatedness of Listeria monocytogenes Serotype 4b Variant Strains.

Authors: Laurel S Burall; Christopher J Grim; Mark K Mammel; Atin R Datta
Journal: Front Public Health Date: 2017-09-13

6. Genomic Diversity of Common Sequence Types of Listeria monocytogenes Isolated from Ready-to-Eat Products of Animal Origin in South Africa.

Authors: Itumeleng Matle; Rian Pierneef; Khanyisile R Mbatha; Kudakwashe Magwedere; Evelyn Madoroba
Journal: Genes (Basel) Date: 2019-12-04 Impact factor: 4.096

7. Molecular Typing of Listeria monocytogenes IVb Serogroup Isolated from Food and Food Production Environments in Poland.

Authors: Beata Lachtara; Jacek Osek; Kinga Wieczorek
Journal: Pathogens Date: 2021-04-15

8. New Aspects on Listeria monocytogenes ST5-ECVI Predominance in a Heavily Contaminated Cheese Processing Environment.

Authors: Meryem Muhterem-Uyar; Luminita Ciolacu; Karl-Heinz Wagner; Martin Wagner; Stephan Schmitz-Esser; Beatrix Stessl
Journal: Front Microbiol Date: 2018-02-01 Impact factor: 5.640

9. Core Genome Multilocus Sequence Typing for Identification of Globally Distributed Clonal Groups and Differentiation of Outbreak Strains of Listeria monocytogenes.

Authors: Yi Chen; Narjol Gonzalez-Escalona; Thomas S Hammack; Marc W Allard; Errol A Strain; Eric W Brown
Journal: Appl Environ Microbiol Date: 2016-09-30 Impact factor: 4.792

10. Microevolution and Gain or Loss of Mobile Genetic Elements of Outbreak-Related Listeria monocytogenes in Food Processing Environments Identified by Whole Genome Sequencing Analysis.

Authors: Helen Yang; Maria Hoffmann; Marc W Allard; Eric W Brown; Yi Chen
Journal: Front Microbiol Date: 2020-05-29 Impact factor: 5.640