Literature DB >> 26135865

Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes.

Werner Ruppitsch¹, Ariane Pietzka², Karola Prior³, Stefan Bletz⁴, Haizpea Lasa Fernandez², Franz Allerberger², Dag Harmsen³, Alexander Mellmann⁴.

Abstract

Whole-genome sequencing (WGS) has emerged today as an ultimate typing tool to characterize Listeria monocytogenes outbreaks. However, data analysis and interlaboratory comparability of WGS data are still challenging for most public health laboratories. Therefore, we have developed and evaluated a new L. monocytogenes typing scheme based on genome-wide gene-by-gene comparisons (core genome multilocus the sequence typing [cgMLST]) to allow for a unique typing nomenclature. Initially, we determined the breadth of the L. monocytogenes population based on MLST data with a Bayesian approach. Based on the genome sequence data of representative isolates for the whole population, cgMLST target genes were defined and reappraised with 67 L. monocytogenes isolates from two outbreaks and serotype reference strains. The Bayesian population analysis generated five L. monocytogenes groups. Using all available NCBI RefSeq genomes (n = 36) and six additionally sequenced strains, all genetic groups were covered. Pairwise comparisons of these 42 genome sequences resulted in 1,701 cgMLST targets present in all 42 genomes with 100% overlap and ≥90% sequence similarity. Overall, ≥99.1% of the cgMLST targets were present in 67 outbreak and serotype reference strains, underlining the representativeness of the cgMLST scheme. Moreover, cgMLST enabled clustering of outbreak isolates with ≤10 alleles difference and unambiguous separation from unrelated outgroup isolates. In conclusion, the novel cgMLST scheme not only improves outbreak investigations but also enables, due to the availability of the automatically curated cgMLST nomenclature, interlaboratory exchange of data that are crucial, especially for rapid responses during transsectorial outbreaks.

Entities: CellLine Chemical Disease Species

Mesh：

Year: 2015 PMID： 26135865 PMCID： PMC4540939 DOI： 10.1128/JCM.01193-15

Source DB: PubMed Journal: J Clin Microbiol ISSN： 0095-1137 Impact factor: 5.948

INTRODUCTION

Listeria monocytogenes is a facultative anaerobe, a Gram-positive, psychrophilic and salt-tolerant, facultative intracellular pathogen of humans and animals, causing clinical manifestations like gastroenteritis, encephalitis, meningitis, and septicemia. A high hospitalization rate of >90% and a case-fatality rate up to 30% make L. monocytogenes an important human pathogen (1). The characteristic traits (growth at low temperatures, survival of freezing and high-salt and nitrite preservation methods, and biofilm formation) of L. monocytogenes represent a major issue for industrialized food production and facilitate food contamination at several stages of food production (2). Nearly all cases of listeriosis are caused by consumption or use of contaminated food or feed. The traditional L. monocytogenes serotyping scheme allows the differentiation of 12 serotypes of which 4b, 1/2a, and 1/2b isolates cause about 96% of all reported human listeriosis cases (3). Low discriminatory power, insufficient reproducibility, and antigen sharing between serotypes impede the value of serotyping in outbreak investigations and necessitate more accurate and more discriminatory typing solutions (4). Pulsed-field gel electrophoresis (PFGE) has been established as the current “gold standard” for L. monocytogenes typing by PulseNet (5, 6) and has been essential for outbreak investigation worldwide (7). However, PFGE is time-consuming, expensive, and difficult to standardize (8, 9). Methods based on DNA sequence analysis appear more promising for fast, accurate, and reproducible strain typing (10). Whereas multilocus sequence typing (MLST) (11, 12) and multi-virulence-locus sequence typing (MVLST) (13) schemes for L. monocytogenes share the characteristics of sequence-based methods, they both lack the discriminative power needed for outbreak investigation of this clonal pathogen (7, 14). Nowadays, the recent and ongoing evolution of sequencing technologies from Sanger sequencing to next-generation sequencing enables sequence analysis on a whole-genome level. Several studies on a variety of bacterial species have already shown that whole-genome sequence (WGS)-based typing, based either on single nucleotide variants (SNVs) (15, 16) or on gene-by-gene allelic profiling of core genome genes, frequently named core genome MLST (cgMLST) or MLST+ (17, 18), currently represents the ultimate diagnostic tool for strain typing. Recently, we successfully applied a cgMLST typing approach to L. monocytogenes (19). Nevertheless, the broad use of WGS-based approaches is still hampered by the lack of standardized nomenclature that would facilitate global exchange of data, as has already been the reality for classical MLST data (20) for more than a decade. To achieve a stable cgMLST scheme for L. monocytogenes that can form the basis of a standardized nomenclature for WGS-based L. monocytogenes typing, first we defined an L. monocytogenes core genome gene set representing the genetic diversity within the L. monocytogenes population based on well-characterized reference strains, and second we challenged this scheme for suitability in outbreak investigations using isolates from two outbreaks and sporadic cases.

MATERIALS AND METHODS

Microorganisms and DNA extraction.

All strains and genome sequences used for the development of the novel cgMLST L. monocytogenes scheme are listed in Table 1. For subsequent evaluation of the scheme, a total of 67 L. monocytogenes isolates from sporadic cases (n = 8 isolates, that served also as outgroups for the outbreaks with matching serotypes and highly similar or even identical PFGE pattern) and two outbreaks (n = 42) (21–23) with reference strains for all serotypes (n = 17) were used (Table 2). All strains were cultured overnight at 37°C on RAPID'L.Mono agar (Bio-Rad, Vienna, Austria) for species confirmation and subcultivated on Columbia blood agar plates (bioMérieux, Marcy I'Etoile, France) prior to DNA extraction using the GenElute Bacterial Genomic DNA kit (Sigma, St. Louis, MO, USA) according to the manufacturer's instructions.

TABLE 1

List of L. monocytogenes strains and genomes used for SeqSphere cgMLST L. monocytogenes target definition

Strain	MLST ST^a	Lineage^b	BAPS partition	Serogroup	Average coverage (no. contigs)	NCBI RefSeq or ENA SRA accession number(s)
EGD-e (reference genome)	35	II	Lm02	1/2a	NA^c	NC_003210
07PF0776	4	I	Lm01	4b	NA	NC_017728
08-5578	292	II	Lm02	1/2a	NA	NC_013766
08-5923	120	II	Lm02	1/2a	NA	NC_013768
10403S	85	II	Lm02	1/2a	NA	NC_017544
ATCC 19117	2	I	Lm01	4d	NA	NC_018584
C1-387	155	II	Lm02	1/2a	NA	NC_021823
Clip81459	4	I	Lm01	4b	NA	NC_012488
F2365	1	I	Lm01	4b	NA	NC_002973
Finland 1998	155	II	Lm02	3a	NA	NC_017547
FSL R2-561	9	II	Lm02	1/2c	NA	NC_017546
HCC23	201	III	Lm04	4a	NA	NC_011660
J0161	11	II	Lm02	1/2a	NA	NC_017545
J1-220	2	I	Lm01	4b	NA	NC_021830
J1776	6	I	Lm01	4b	NA	NC_021839
J1816	6	I	Lm01	4b	NA	NC_021829
J1817	6	I	Lm01	4b	NA	NC_021827
J1926	6	I	Lm01	4b	NA	NC_021840
J2-031	394	II	Lm02	1/2c	NA	NC_021837
J2-064	5	I	Lm01	1/2b	NA	NC_021824
J2-1091	1	I	Lm01	1/2a	NA	NC_021825
L312	4	I	Lm01	4b	NA	NC_018642
L99	201	III	Lm04	4a	NA	NC_017529
LL195	1	I	Lm01	4b	NA	NC_019556
M7	201	III	Lm04	4a	NA	NC_017537
N1-011A	3	I	Lm01	1/2b	NA	NC_021826
R2-502	3	I	Lm01	1/2b	NA	NC_021838
SLCC0717	518	III	Lm03	1/2a	163 (21)	ERR664778
SLCC0759	481	III	Lm03	1/2a	156 (23)	ERR664779
SLCC1042	18	III	Lm03	1/2a	124 (20)	ERR664780
SLCC2372	122	II	Lm02	1/2c	NA	NC_018588
SLCC2376	71	III	Lm04	4c	NA	NC_018590
SLCC2378	73	I	Lm01	4e	NA	NC_018585
SLCC2479	9	II	Lm02	3c	NA	NC_018589
SLCC2482	3	I	Lm01	7	NA	NC_018591
SLCC2540	617	I	Lm01	3b	NA	NC_018586
SLCC2755	66	I	Lm01	1/2b	NA	NC_018587
SLCC3287	427	III	Lm03	1/2a	132 (18)	ERR664782
SLCC4771	467	IV	Lm07	4c	162 (25)	ERR664786, ERR664787
SLCC5850	12	II	Lm02	1/2a	NA	NC_018592
SLCC6263	466	III	Lm03	1/2a	180 (16)	ERR664785
SLCC7179	91	II	Lm02	3a	NA	NC_018593

MLST typing in accordance with http://www.pasteur.fr/recherche/genopole/PF8/mlst/Lmono.html.

Lineage designation in accordance with Haase et al. (7).

NA, not applicable.

TABLE 2

List of L. monocytogenes isolates used for evaluation of the SeqSphere cgMLST L. monocytogenes scheme

Sample identification	Country of isolation	Origin	Collection year	Serotype	MLST ST^b	Lineage^c	BAPS partition	% good cgMLST targets	Coverage (no. of contigs)	ENA accession no.	Reference(s) or study	Comment^d
L3308	Austria	Human	2008	4b	1	Lineage I	Lm01	99.4	180 (25)	ERR664375	21	JPO
L3808	Austria	Human	2008	4b	1	Lineage I	Lm01	99.4	164 (29)	ERR664376	21	JPO
L3908	Austria	Human	2008	4b	1	Lineage I	Lm01	99.3	139 (31)	ERR664377	21	JPO
L4008	Austria	Human	2008	4b	1	Lineage I	Lm01	99.4	133 (29)	ERR664378	21	JPO
L4508	Austria	Human	2008	4b	1	Lineage I	Lm01	99.4	174 (30)	ERR664379	21	JPO
L6708	Austria	Human	2008	4b	1	Lineage I	Lm01	99.4	180 (34)	ERR664394, ERR664395	21	JPO
L6808	Austria	Human	2008	4b	1	Lineage I	Lm01	99.4	160 (29)	ERR664380	21	JPO
W9508	Austria	Food	2008	4b	1	Lineage I	Lm01	99.4	180 (24)	ERR664382	21	JPO
W9708	Austria	Food	2008	4b	1	Lineage I	Lm01	99.4	180 (27)	ERR664384	21	JPO
L2708	Austria	Human	2008	4b	249	Lineage I	Lm01	99.4	151 (33)	ERR664374	21	Outgroup of JPO
L7508	Austria	Human	2008	4b	4	Lineage I	Lm01	99.5	174 (22)	ERR664381	21	Outgroup of JPO
3230TP5	Austria	Food	2010	1/2a	403	Lineage II	Lm02	99.8	106 (26)	ERS482542	22, 23	ACCO I
L20-09	Austria	Human	2009	1/2a	403	Lineage II	Lm02	99.9	120 (23)	ERS482565	22, 23	ACCO I
L21-09	Austria	Human	2009	1/2a	403	Lineage II	Lm02	99.8	120 (45)	ERS482567	22, 23	ACCO I
L23-09	Austria	Human	2009	1/2a	403	Lineage II	Lm02	99.8	120 (25)	ERS482568	22, 23	ACCO I
L27-09	Austria	Human	2009	1/2a	777	Lineage II	Lm02	99.8	117 (24)	ERS482569	22, 23	ACCO I
L29-09	Austria	Human	2009	1/2a	403	Lineage II	Lm02	99.8	120 (46)	ERS482570	22, 23	ACCO I
L31-09	Austria	Human	2009	1/2a	777	Lineage II	Lm02	99.7	120 (56)	ERS482572	22, 23	ACCO I
L32-09	Austria	Human	2009	1/2a	777	Lineage II	Lm02	99.5	64 (76)	ERS482573	22, 23	ACCO I
L33-09	Austria	Human	2009	1/2a	403	Lineage II	Lm02	99.9	120 (21)	ERS482575	22, 23	ACCO I
L34-09	Austria	Human	2009	1/2a	403	Lineage II	Lm02	99.8	113 (36)	ERS482577	22, 23	ACCO I
L35-09	Austria	Human	2009	1/2a	777	Lineage II	Lm02	99.9	98 (23)	ERS482578	22, 23	ACCO I
L68-09	Austria	Human	2009	1/2a	777	Lineage II	Lm02	99.8	113 (35)	ERS482582	22, 23	ACCO I
L71-09	Austria	Human	2009	1/2a	403	Lineage II	Lm02	99.9	120 (22)	ERS482583	22, 23	ACCO I
L9-10	Austria	Human	2010	1/2a	403	Lineage II	Lm02	99.9	120 (19)	ERS482585	22, 23	ACCO I
LD27-12	Germany	Human	2012	1/2a	403	Lineage II	Lm02	99.7	68 (49)	ERS482587	This study	Outgroup of ACCO I
MRL-13-00230	Germany	Food	2013	1/2a	403	Lineage II	Lm02	99.8	120 (34)	ERS482588	This study	Outgroup of ACCO I
Ro-015	Unknown	Unknown	2010	1/2a	403	Lineage II	Lm02	99.8	120 (22)	ERS482589	This study	Outgroup of ACCO I
16132	Austria	Food	2009	1/2a	398	Lineage II	Lm02	99.8	136 (17)	ERS482539	This study	ACCO II
2010-00770	Austria	Food	2010	1/2a	398	Lineage II	Lm02	99.8	120 (20)	ERS482540	22, 23	ACCO II
3230TP3	Austria	Food	2010	1/2a	398	Lineage II	Lm02	99.8	146 (17)	ERS482541	22, 23	ACCO II
4548TP4	Austria	Food	2010	1/2a	398	Lineage II	Lm02	99.8	160 (20)	ERS482543	22, 23	ACCO II
K70-10	Unknown	Food	2010	1/2a	398	Lineage II	Lm02	99.8	120 (21)	ERS482558	22, 23	ACCO II
L10-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.8	120 (16)	ERS482559	22, 23	ACCO II
L14-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.8	113 (19)	ERS482560	22, 23	ACCO II
L16-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.8	120 (19)	ERS482561	22, 23	ACCO II
L17-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.8	120 (21)	ERS482562	22, 23	ACCO II
L18-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.8	120 (20)	ERS482563	22, 23	ACCO II
L19-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.8	120 (20)	ERS482564	22, 23	ACCO II
L20-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.7	120 (19)	ERS482566	22, 23	ACCO II
L30-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.8	120 (18)	ERS482571	22, 23	ACCO II
L32-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.5	120 (18)	ERS482574	22, 23	ACCO II
L33-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.8	120 (18)	ERS482576	22, 23	ACCO II
L4-10	Austria	Human	2009	1/2a	398	Lineage II	Lm02	99.8	120 (17)	ERS482580	22, 23	ACCO II
L42-10	Austria	Human	2010	1/2a	398	Lineage II	Lm02	99.8	120 (20)	ERS482581	22, 23	ACCO II
L75-09	Austria	Human	2009	1/2a	398	Lineage II	Lm02	99.7	120 (26)	ERS482584	22, 23	ACCO II
LD12-10	Germany	Human	2010	1/2a	398	Lineage II	Lm02	99.8	120 (18)	ERS482586	22, 23	ACCO II
12025641	Austria	Food	2012	1/2a	398	Lineage II	Lm02	99.8	152 (18)	ERS482537	This study	Outgroup of ACCO II
12025647	Austria	Food	2012	1/2a	398	Lineage II	Lm02	99.8	142 (17)	ERS482538	This study	Outgroup of ACCO II
L38-11	Austria	Human	2012	1/2a	398	Lineage II	Lm02	99.7	113 (17)	ERS482579	This study	Outgroup of ACCO II
ATCC15313	United Kingdom	Animal	Unknown	1/2a	107	Lineage II	Lm02	99.5	167 (15)	ERS482544	This study	Reference strain
CIP104794	United Kingdom	Animal	1924	1/2a	12	Lineage II	Lm02	99.4	150 (16)	ERS482545	This study	Reference strain
CIP105448	United Kingdom	Human	1935	1/2c	122	Lineage II	Lm02	99.8	112 (22)	ERS482546	This study	Reference strain
CIP105449	Unknown	Animal	1967	1/2b	66	Lineage I	Lm01	99.4	180 (22)	ERS482547	This study	Reference strain
CIP105457	New Zealand	Animal	1931	4a	202	Lineage III	Lm04	99.1	100 (29)	ERS482548	This study	Reference strain
CIP105458	USA	Food	1971	4d	2	Lineage I	Lm01	99.5	119 (27)	ERS482549	This study	Reference strain
CIP105459	USA	Food	1959	4e	73	Lineage I	Lm01	99.2	101 (28)	ERS482550	This study	Reference strain
CIP59-53	Germany	Human	1953	4b	145	Lineage I	Lm01	99.5	90 (29)	ERS482551	This study	Reference strain
CIP78-34	Denmark	Human	1937	3a	98	Lineage II	Lm02	99.4	120 (17)	ERS482552	This study	Reference strain
CIP78-35	USA	Human	1956	3b	617	Lineage I	Lm01	99.5	120 (28)	ERS482553	This study	Reference strain
CIP78-36	Unknown	Unknown	1966	3c	9	Lineage II	Lm01	99.9	112 (29)	ERS482554	This study	Reference strain
CIP78-39	United Kingdom	Food	Unknown	4c	71	Lineage III	Lm04	99.4	120 (12)	ERS482555	This study	Reference strain
CIP78-43	Unknown	Human	1966	7	3	Lineage I	Lm01	99.5	97 (28)	ERS482556	This study	Reference strain
SLCC3280	Unknown	Unknown	Unknown	1/2a	18	Lineage III	Lm03	99.6	117 (23)	ERR664781	This study	Reference strain
SLCC3961	Unknown	Unknown	Unknown	1/2a	18	Lineage III	Lm03	99.7	141 (18)	ERR664783	This study	Reference strain
SLCC4163	Unknown	Unknown	Unknown	1/2a	18	Lineage III	Lm03	99.8	159 (27)	ERR664784	This study	Reference strain
W9608	Austria	Food	2008	1/2b	5	Lineage I	Lm01	99.6	178 (43)	ERR664383	This study	Reference strain

Epidemiological data with results of classical typing approaches and the percentage of good cgMLST targets (of all 1,701 cgMLST targets; naming of the cgMLST targets is in accordance with L. monocytogenes reference strain EDG-e locus tags (GenBank accession number NC_003210) are given.

MLST typing in accordance to http://www.pasteur.fr/recherche/genopole/PF8/mlst/Lmono.html.

Lineage designation in accordance with Haase et al. (7).

JPO, jellied pork outbreak; ACCO I, acid curd cheese outbreak clone I; ACCO II, acid curd cheese outbreak clone II.

List of L. monocytogenes strains and genomes used for SeqSphere cgMLST L. monocytogenes target definition MLST typing in accordance with http://www.pasteur.fr/recherche/genopole/PF8/mlst/Lmono.html. Lineage designation in accordance with Haase et al. (7). NA, not applicable. List of L. monocytogenes isolates used for evaluation of the SeqSphere cgMLST L. monocytogenes scheme Epidemiological data with results of classical typing approaches and the percentage of good cgMLST targets (of all 1,701 cgMLST targets; naming of the cgMLST targets is in accordance with L. monocytogenes reference strain EDG-e locus tags (GenBank accession number NC_003210) are given. MLST typing in accordance to http://www.pasteur.fr/recherche/genopole/PF8/mlst/Lmono.html. Lineage designation in accordance with Haase et al. (7). JPO, jellied pork outbreak; ACCO I, acid curd cheese outbreak clone I; ACCO II, acid curd cheese outbreak clone II.

Whole-genome sequencing and assembly.

Sequencing libraries were prepared using Nextera XT chemistry (Illumina Inc., San Diego, CA, USA) for a 250-bp paired-end sequencing run on an Illumina MiSeq sequencer. Samples were sequenced to aim for minimum coverage of 100-fold using Illumina's recommended standard protocols. The resulting FASTQ files were first quality trimmed and then de novo assembled using the Velvet assembler (24) integrated in Ridom SeqSphere+ software (25) (version 2.3; Ridom GmbH, Münster, Germany). Here, reads were trimmed at their 5′ and 3′ ends until an average base quality of 30 was reached in a window of 20 bases, and the assembly was performed with Velvet version 1.1.04 using optimized k-mer size and coverage cutoff values based on the average length of contigs with >1,000 bp.

BAPS.

To determine the overall L. monocytogenes species variation, we applied a Bayesian analysis of population structure (BAPS) (26, 27). All multilocus sequence typing (MLST) data available as of 24 July 2014 (673 sequence types [STs]) were downloaded from the MLST website (14), and all allelic gene sequences per locus were multiple aligned using MUSCLE (28) and finally concatenated for each ST. The BAPS was carried out using the clustering of linked molecular data functionality. Ten runs were performed, setting an upper limit of 20 partitions. Admixture analysis was performed using the following parameters: minimum population size considered, 5; iterations, 50; number of reference individuals simulated from each population, 50; and number of iterations for each reference individual, 10.

cgMLST target gene definition.

To determine the cgMLST gene set (named MLST+ in the SeqSphere+ software), a genome-wide gene-by-gene comparison was performed using the MLST+ target definer (version 1.1) function of SeqSphere+ with default parameters. These parameters comprise the following filters to exclude certain genes of the EGD-e reference genome (GenBank accession number NC_003210.1, dated 26 March 2015) from the cgMLST scheme: a minimum length filter that discards all genes shorter than 50 bp; a start codon filter that discards all genes that contain no start codon at the beginning of the gene; a stop codon filter that discards all genes that contain no stop codon or more than one stop codon or that do not have the stop codon at the end of the gene; a homologous gene filter that discards all genes with fragments that occur in multiple copies within a genome (with identity of 90% and >100 bp overlap); and a gene overlap filter that discards the shorter gene from the cgMLST scheme if the two genes affected overlap >4 bp. The remaining genes were then used in a pairwise comparison with BLAST version 2.2.12 (parameters used were word size 11, mismatch penalty −1, match reward 1, gap open costs 5, and gap extension costs 2) with the query L. monocytogenes chromosomes. All genes of the reference genome that were common in all query genomes with a sequence identity of ≥90% and 100% overlap and, with the default parameter stop codon percentage filter turned on, formed the final cgMLST scheme; this discards all genes that have internal stop codons in >20% of the query genomes.

Evaluation of the cgMLST target gene set.

To evaluate the applicability and representativeness of the L. monocytogenes cgMLST target gene set, a total of 67 isolates (Table 2) were subsequently analyzed to determine the presence of these target genes. It was assumed that a well-defined cgMLST scheme should cover at least 95% of the cgMLST genes present in all isolates. To extract the target genes, the default parameters were used in the SeqSphere+ software: (i) for processing options, “Ignore contigs shorter than 200 bases”; (ii) for scanning options, “Matching scanning thresholds for creating targets from assembled genomes” with “required identity to reference sequence of 90%” and “required alignment to reference sequence with 100%”; and (iii) for BLAST options, word size 11, mismatch penalty −1, match reward 1, gap open costs 5, and gap extension costs 2. In addition, the target genes were assessed for quality, i.e., the absence of frame shifts and ambiguous nucleotides. A core genome gene was considered a “good target” only if all of the above criteria were met, in which case the complete sequence was analyzed in comparison to the EGD-e reference. Alleles for each gene were assigned automatically by the SeqSphere+ software to ensure unique nomenclature. The combination of all alleles in each strain formed an allelic profile that was used to generate minimum spanning trees (MST) using the parameter “pairwise ignore missing values” during distance calculation. In order to maintain backwards compatibility with classical L. monocytogenes MLST, sequences of the seven genes comprising the allelic profile of the MLST scheme were extracted separately from the genome sequences and queried against the L. monocytogenes MLST database in order to assign classical STs in silico.

Nucleotide sequence accession number.

All raw reads generated were submitted to the European Nucleotide Archive (http://www.ebi.ac.uk/ena/) under the study accession number PRJEB6551.

RESULTS

BAPS partition and admixture analysis based on 673 STs resulted in seven partitions (see Table S1 in the supplemental material). As BAPS partitions Lm05 and Lm06 comprised exclusively Listeria innocua species isolates of 43 STs with significant admixtures, these two partitions were excluded from further analysis. For the remaining five partitions, three (partitions Lm01, Lm02, and Lm04) were among the available NCBI RefSeq genome sequences of L. monocytogenes. To achieve complete coverage of the L. monocytogenes population, we sequenced six additional strains from Seeliger's Listeria culture collection (SLCC0717, SLCC0759, SLCC1042, SLCC3287, SLCC4771, and SLCC6263), representing the missing BAPS partitions Lm03 and Lm07 (Table 1). In total, 42 genome sequences, including L. monocytogenes strain EDG-e as reference for core genome gene definition were fed into the MLST+ target definer and resulted in 1,701 genes out of 2,867 genes of strain EGD-e (53.2% of the EDG-e strain chromosome nucleotides) (see Table S2 in the supplemental material).The cgMLST scheme was then challenged with two sets of strains: the first contained 17 serotype reference strains representing all serotypes, genetic lineages, and BAPS partitions to determine its ability to cover the whole L. monocytogenes diversity and the second consisted of 48 isolates from two published outbreaks, including eight outgroup isolates (Table 2). All 17 serotype reference strains had ≥99.1% good cgMLST targets (mean, 99.5%), and for all serotype representatives the correct MLST was obtained. Similarly, for the two outbreaks, all isolates had ≥99.3% good cgMLST targets (mean, 99.7%). The results are summarized in Table 2. The cgMLST scheme was further evaluated for its usability in outbreak investigation, i.e., whether outbreak isolates could be attributed to the same clone, named cluster type (CT) in the context of cgMLST typing, and clearly separated from the outgroup isolates. Therefore, we determined the maximum number of differing genes within each outbreak that reflect putative microevolutionary events. To facilitate cluster investigations in the future, we finally defined the so-called CT threshold that gives the maximum number of differing alleles that are shared by the same CT. In the two retrospectively analyzed outbreaks, a jellied pork outbreak (JPO) in Austria in the year 2008 and two epidemiologically linked clusters forming the acid curd cheese (Quargel) outbreak (ACCO) in Austria, the Czech Republic, and Germany in the years 2009/2010 (Table 2), detailed analysis resulted in a maximum number of 10 differing alleles (see Table S3 in the supplemental material). cgMLST of seven human and two food isolates from the JPO correctly grouped these isolates together with a maximum of four allelic differences (Fig. 1). Outgroup isolates L2708 (ST249) and L7508 (ST4) exhibited more than 1,000 allelic differences, and reference strains F2365 and LL195 (both ST1) exhibited ≥32 allelic differences (Fig. 1). Extraction of classical MLST targets resulted in STs of all outbreak isolates that were identical to those of ST1 and confirmed the previous Sanger sequencing (Table 2).

FIG 1

Minimum-spanning tree based on cgMLST allelic profiles of 9 L. monocytogenes isolates (all share ST1) from the jellied pork outbreak (21) and two outgroup isolates L2708 (ST249) and L7508 (ST4) in comparison to reference strains F2365 (GenBank accession number NC_002973) and LL195 (NC_019556) (both ST1) exhibiting the same serotype 4b. Each circle represents an allelic profile based on sequence analysis of 1,701 cgMLST target genes. The numbers on the connecting lines illustrate the numbers of target genes with differing alleles. The different groups of strains are distinguished by the colors of the circles. Closely related genotypes (≤10 allele difference) are shaded in gray. NCBI RefSeq strains are marked with an asterisk. cgMLST of 33 isolates from the ACCO correctly identified the two different clones (ACCO I and ACCO II) that caused this outbreak (Fig. 2). Within the ACCO I clone, nine isolates were ST403 and five were ST777, a single locus bglA variant of ST403. cgMLST revealed the same dichotomy as classical MLST; the right branch of the ACCO I tree comprised all ST777 isolates (L27-09, L31-09, L32-09, L35-09, and L68-09). All outgroup isolates (MRL-13-00230, LD27-12, and Ro-015) were ST403 with at least 16 allelic differences compared to the ACCO I isolates. ACCO I isolates displayed a maximum of 10 allelic differences from each other (Fig. 2). ACCO II isolates had a maximum of two allelic differences from each other. All ACCO II isolates were correctly assigned to ST398. The three epidemiologically unrelated outgroup isolates (L38-11, 12025641, and 12025647) with an identical PFGE band pattern (data not shown) also exhibited ST398 and had ≥23 allelic differences compared to the ACCO II food isolates (Fig. 2). ACCO I and ACCO II isolates differed in >1,000 alleles from each other (Fig. 2).

FIG 2

Minimum-spanning tree illustrating the phylogenetic relationship based on the cgMLST allelic profiles of 33 L. monocytogenes isolates from the outbreak associated with acid curd cheese (ACCO) (22, 23) consisting of two clones (ACCO I and ACCO II). Three outgroup isolates per outbreak (with identical PFGE profiles and serotypes) are shown in comparison to the reference strain EGD-e (GenBank accession number NC_003210; ST35). ACCO I isolates L27-09, L31-09, L32-09, L35-09, and L68-09 were ST777; the remaining isolates, including the three ACCO I outgroup isolates were ST403. ACCO II isolates, including the three ACCO II outgroup isolates were all ST398. Each circle represents an allelic profile based on sequence analysis of 1,701 genes. The numbers on the connecting lines illustrate the numbers of target genes with differing alleles. The different groups of strains are distinguished by the colors of the circles. Closely related genotypes (≤10 allele difference) are shaded in gray. The NCBI RefSeq strain is marked with an asterisk.

DISCUSSION

In outbreak situations, a rapid, accurate, and standardized classification of bacterial isolates is essential. Since its introduction in 1998, MLST has become a proof-of-principle method for sequence-based typing methods with a unique centrally curated and thereby standardized nomenclature (20). Building on these experiences nowadays, it is possible to analyze thousands of genes using next-generation sequencing, which dramatically increases discriminatory power and thereby now enables outbreak investigations (18, 19, 29–32). In our study, we were able not only to show that our cgMLST typing scheme is representative for the breadth of the L. monocytogenes population with ≥99.1% successfully extracted cgMLST targets but also to differentiate outbreak from nonoutbreak isolates clearly. The microevolutionary events within each outbreak and the CT threshold of ≤10 differences warrant further comments. Within the first outbreak, the JPO (21), very few allelic changes were detected and the maximum allelic distance within the outbreak was only four alleles. This high similarity reflects the outbreak situation without much time for intraoutbreak microevolution, because all patients belonged to one travel group and became ill after consuming contaminated jellied pork at an Austrian tavern (21). The ACCO cluster of listeriosis occurred from 2009 until 2010 in Austria, Germany, and the Czech Republic and was caused by contaminated acid curd cheese (Quargel) (22, 23). Further epidemiological and molecular outbreak investigations revealed that two different serotype 1/2a clones with distinct PFGE patterns and inlB STs were responsible for this outbreak (33). Interestingly, a recent study focusing on the comparative genomics of the two outbreak clones revealed significant differences in virulence (34). Again, cgMLST analysis corroborated these findings, and the number of differing alleles among the outbreak clones again reflected the outbreak length. Whereas the ACCO I isolates were found over a period of 8 months and up to 10 different alleles were detected; isolates of the ACCO II were found only during a 3-month period, where at maximum two different alleles were recorded. Therefore, we assume that ACCO I isolates are a representative microevolutionary model for the CT threshold determination to facilitate outbreak investigations using cgMLST. Although the software supports outbreak investigation by providing the CT, this does not release the epidemiologist from thorough investigation. MLST and cgMLST both use alleles and not nucleotide polymorphisms as units of comparison. Irrespective of the number of nucleotide polymorphisms involved, each allelic change is numbered as a single event; i.e., an allelic change is related to at least one point mutation but can also contain several nucleotide changes. This principle covers the conflicting signals of horizontal and vertical transfer of genetic material and considers the higher frequency of recombination than point mutations in bacteria (30, 32). One major advantage of such an allele-based approach is easy storage and curating the nomenclature in a central database, which is obligatory to guarantee universal nomenclature. For classical MLST, this scenario was one of the key factors to success. However, manual curation of the current MLST databases frequently hampers the rapid use of novel allelic sequences as human intervention is necessary to assign new alleles and STs. With the software solution used here, it was already possible to automatically assign novel cgMLST alleles, after dedicated quality control of the read and assembly data. This automation is crucial as the vast amount of sequencing data is not humanly readable anymore in a reasonable time frame that is needed for effective implementation of hygiene measures during outbreaks. The immediate and automated assignment of novel alleles also enables any software user to access identical nomenclature for L. monocytogenes cgMLST typing, a prerequisite for successful interlaboratory exchange of data. In the future, it is desirable to have an open Internet-based nomenclature server that is able to be interrogated by any software or user (35). The SpaServer (http://spaserver.ridom.de), which automatically hosts the nomenclature of the Staphylococcus aureus protein A gene typing (spa) and now contains >300,000 typing entries originating from >100 countries, might serve as a blueprint for such service (36). Our approach has one limitation. The analysis is reduced to coding regions only because the second-generation sequencing instruments currently in use produce only relatively short reads that do not assemble the frequently highly repetitive intergenic regions well, leading to faulty assemblies. Therefore, when second-generation sequencing machines are used, focusing on coding regions helps to improve the analytical quality. This might change when third-generation sequencing instruments that produce much longer reads from a single molecule are widely available, preferably as benchtop systems. Nevertheless, the current cgMLST approach will be sustainable as it will maintain backward compatibility with expansion of typing schemes to present typing as we see today with the in silico extraction of classical MLST STs from WGS data. In conclusion, we established a highly representative cgMLST scheme for WGS-based typing of L. monocytogenes and demonstrated both a high discriminatory power and concordance to previous findings in different outbreak scenarios. The remaining challenge is to establish an Internet-based nomenclature server that can be interrogated like the current MLST servers to facilitate universal global nomenclature for any user.

35 in total

1. Precise dissection of an Escherichia coli O157:H7 outbreak by single nucleotide polymorphism analysis.

Authors: George Turabelidze; Steven J Lawrence; Hongyu Gao; Erica Sodergren; George M Weinstock; Sahar Abubucker; Todd Wylie; Makedonka Mitreva; Nurmohammad Shaikh; Romesh Gautom; Phillip I Tarr
Journal: J Clin Microbiol Date: 2013-09-18 Impact factor: 5.948

2. Assessment of resolution and intercenter reproducibility of results of genotyping Staphylococcus aureus by pulsed-field gel electrophoresis of SmaI macrorestriction fragments: a multicenter study.

Authors: A van Belkum; W van Leeuwen; M E Kaufmann; B Cookson; F Forey; J Etienne; R Goering; F Tenover; C Steward; F O'Brien; W Grubb; P Tassios; N Legakis; A Morvan; N El Solh; R de Ryck; M Struelens; S Salmenlinna; J Vuopio-Varkila; M Kooistra; A Talens; W Witte; H Verbrugh
Journal: J Clin Microbiol Date: 1998-06 Impact factor: 5.948

3. Gene Scanning of an Internalin B Gene Fragment Using High-Resolution Melting Curve Analysis as a Tool for Rapid Typing of Listeria monocytogenes.

Authors: Ariane T Pietzka; Anna Stöger; Steliana Huhulescu; Franz Allerberger; Werner Ruppitsch
Journal: J Mol Diagn Date: 2010-12-23 Impact factor: 5.568

4. Multi-virulence-locus sequence typing of Listeria monocytogenes.

Authors: Wei Zhang; Bhushan M Jayarao; Stephen J Knabel
Journal: Appl Environ Microbiol Date: 2004-02 Impact factor: 4.792

5. Comparison of traditional and molecular methods of typing isolates of Staphylococcus aureus.

Authors: F C Tenover; R Arbeit; G Archer; J Biddle; S Byrne; R Goering; G Hancock; G A Hébert; B Hill; R Hollis
Journal: J Clin Microbiol Date: 1994-02 Impact factor: 5.948

Review 6. MLST revisited: the gene-by-gene approach to bacterial genomics.

Authors: Martin C J Maiden; Melissa J Jansen van Rensburg; James E Bray; Sarah G Earle; Suzanne A Ford; Keith A Jolley; Noel D McCarthy
Journal: Nat Rev Microbiol Date: 2013-09-02 Impact factor: 60.633

7. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology.

Authors: Alexander Mellmann; Dag Harmsen; Craig A Cummings; Emily B Zentz; Shana R Leopold; Alain Rico; Karola Prior; Rafael Szczepanowski; Yongmei Ji; Wenlan Zhang; Stephen F McLaughlin; John K Henkhaus; Benjamin Leopold; Martina Bielaszewska; Rita Prager; Pius M Brzoska; Richard L Moore; Simone Guenther; Jonathan M Rothberg; Helge Karch
Journal: PLoS One Date: 2011-07-20 Impact factor: 3.240

8. mlstdbNet - distributed multi-locus sequence typing (MLST) databases.

Authors: Keith A Jolley; Man-Suen Chan; Martin C J Maiden
Journal: BMC Bioinformatics Date: 2004-07-01 Impact factor: 3.169

9. A new perspective on Listeria monocytogenes evolution.

Authors: Marie Ragon; Thierry Wirth; Florian Hollandt; Rachel Lavenir; Marc Lecuit; Alban Le Monnier; Sylvain Brisse
Journal: PLoS Pathog Date: 2008-09-05 Impact factor: 6.823

10. Comparison of the Legionella pneumophila population structure as determined by sequence-based typing and whole genome sequencing.

Authors: Anthony P Underwood; Garan Jones; Massimo Mentasti; Norman K Fry; Timothy G Harrison
Journal: BMC Microbiol Date: 2013-12-24 Impact factor: 3.605

105 in total

Review 1. Transforming bacterial disease surveillance and investigation using whole-genome sequence to probe the trace.

Authors: Biao Kan; Haijian Zhou; Pengcheng Du; Wen Zhang; Xin Lu; Tian Qin; Jianguo Xu
Journal: Front Med Date: 2018-01-09 Impact factor: 4.592

Review 2. A Primer on Infectious Disease Bacterial Genomics.

Authors: Tarah Lynch; Aaron Petkau; Natalie Knox; Morag Graham; Gary Van Domselaar
Journal: Clin Microbiol Rev Date: 2016-09-07 Impact factor: 26.132

Review 3. Navigating Microbiological Food Safety in the Era of Whole-Genome Sequencing.

Authors: J Ronholm; Neda Nasheri; Nicholas Petronella; Franco Pagotto
Journal: Clin Microbiol Rev Date: 2016-10 Impact factor: 26.132

4. Whole-Genome Sequencing Elucidates Epidemiology of Nosocomial Clusters of Acinetobacter baumannii.

Authors: Stefanie Willems; Stefanie Kampmeier; Stefan Bletz; Annelene Kossow; Robin Köck; Frank Kipp; Alexander Mellmann
Journal: J Clin Microbiol Date: 2016-06-29 Impact factor: 5.948

5. Next-Generation Epidemiology: Using Real-Time Core Genome Multilocus Sequence Typing To Support Infection Control Policy.

Authors: John P Dekker; Karen M Frank
Journal: J Clin Microbiol Date: 2016-09-14 Impact factor: 5.948

6. Establishment and Evaluation of a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Pseudomonas aeruginosa.

Authors: Hauke Tönnies; Karola Prior; Dag Harmsen; Alexander Mellmann
Journal: J Clin Microbiol Date: 2021-02-18 Impact factor: 5.948

7. Whole-Genome Multilocus Sequence Typing of Extended-Spectrum-Beta-Lactamase-Producing Enterobacteriaceae.

Authors: Marjolein F Q Kluytmans-van den Bergh; John W A Rossen; Patricia C J Bruijning-Verhagen; Marc J M Bonten; Alexander W Friedrich; Christina M J E Vandenbroucke-Grauls; Rob J L Willems; Jan A J W Kluytmans
Journal: J Clin Microbiol Date: 2016-09-14 Impact factor: 5.948

8. Detection of mecA- and mecC-Positive Methicillin-Resistant Staphylococcus aureus (MRSA) Isolates by the New Xpert MRSA Gen 3 PCR Assay.

Authors: Karsten Becker; Olivier Denis; Sandrine Roisin; Alexander Mellmann; Evgeny A Idelevich; Dennis Knaack; Sarah van Alen; André Kriegeskorte; Robin Köck; Frieder Schaumburg; Georg Peters; Britta Ballhausen
Journal: J Clin Microbiol Date: 2015-10-21 Impact factor: 5.948

9. Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation.

Authors: Brendan R Jackson; Cheryl Tarr; Errol Strain; Kelly A Jackson; Amanda Conrad; Heather Carleton; Lee S Katz; Steven Stroika; L Hannah Gould; Rajal K Mody; Benjamin J Silk; Jennifer Beal; Yi Chen; Ruth Timme; Matthew Doyle; Angela Fields; Matthew Wise; Glenn Tillman; Stephanie Defibaugh-Chavez; Zuzana Kucerova; Ashley Sabol; Katie Roache; Eija Trees; Mustafa Simmons; Jamie Wasilenko; Kristy Kubota; Hannes Pouseele; William Klimke; John Besser; Eric Brown; Marc Allard; Peter Gerner-Smidt
Journal: Clin Infect Dis Date: 2016-04-18 Impact factor: 9.079

10. A Nonautochthonous U.S. Strain of Vibrio parahaemolyticus Isolated from Chesapeake Bay Oysters Caused the Outbreak in Maryland in 2010.

Authors: Julie Haendiges; Jessica Jones; Robert A Myers; Clifford S Mitchell; Erin Butler; Magaly Toro; Narjol Gonzalez-Escalona
Journal: Appl Environ Microbiol Date: 2016-05-16 Impact factor: 4.792