| Literature DB >> 33275089 |
Stefanie Lüth1,2, Carlus Deneke3, Sylvia Kleta2, Sascha Al Dahouk4,2.
Abstract
Where classical epidemiology has proven to be inadequate for surveillance and control of foodborne pathogens, molecular epidemiology, using genomic typing methods, can add value. However, the analysis of whole genome sequencing (WGS) data varies widely and is not yet fully harmonised. We used genomic data on 494 Listeria monocytogenes isolates from ready-to-eat food products and food processing environments deposited in the strain collection of the German National Reference Laboratory to compare various procedures for WGS data analysis and to evaluate compatibility of results. Two different core genome multilocus sequence typing (cgMLST) schemes, different reference genomes in single nucleotide polymorphism (SNP) analysis and commercial as well as open-source software were compared. Correlation of allele distances from the different cgMLST approaches was high, ranging from 0.97 to 1, and unified thresholds yielded higher clustering concordance than scheme-specific thresholds. The number of detected SNP differences could be increased up to a factor of 3.9 using a specific reference genome compared with a general one. Additionally, specific reference genomes improved comparability of SNP analysis results obtained using different software tools. The use of a closed or a draft specific reference genome did not make a difference. The harmonisation of WGS data analysis will finally guarantee seamless data exchange, but, in the meantime, knowledge on threshold values that lead to comparable clustering of isolates by different methods may improve communication between laboratories. We therefore established a translation code between commonly applied cgMLST and SNP methods based on optimised clustering concordances. This code can work as a first filter to identify WGS-based typing matches resulting from different methods, which opens up a new perspective for data exchange and thereby accelerates time-critical analyses, such as in outbreak investigations.Entities:
Keywords: core genome MLST; genomic epidemiology; outbreak; single nucleotide polymorphism; standardisation; whole genome sequencing
Year: 2020 PMID: 33275089 PMCID: PMC8115905 DOI: 10.1099/mgen.0.000491
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.Correlations of generally applicable typing methods, based on linearized distance matrices. Colour scale indicates the strength of correlation.
MLST CCs and references used for MLST CC-specific analyses (sorted by frequency in our dataset)
|
MLST CC |
Closed reference (GenBank Accession) |
Draft reference |
Coverage |
Contigs |
|---|---|---|---|---|
|
CC121 |
HG813249 |
16-LI01132-0 |
91 |
21 |
|
CC9 |
FR733649 |
16-LI00873-0 |
77 |
17 |
|
CC8 |
CP006862 |
16-LI00415-0 |
84 |
19 |
|
CC2 |
CP006046 |
16-LI01038-0 |
119 |
25 |
|
CC3 |
CP006594 |
16-LI00227-0 |
148 |
27 |
|
CC1 |
AE017262 |
16-LI00258-0 |
61 |
19 |
|
CC37 |
CP011397 |
16-LI00295-0 |
113 |
20 |
|
CC6 |
CP006047 |
16-LI00782-0 |
85 |
16 |
|
CC5 |
CP006592 |
16-LI00750-0 |
133 |
21 |
|
CC101 |
CP025221 |
16-LI00284-0 |
117 |
20 |
|
CC18 |
CP020830 |
16-LI00319-0 |
119 |
15 |
|
CC155 |
CP002004 |
16-LI00862-0 |
90 |
25 |
|
CC224 |
CP016629 |
16-LI00391-0 |
91 |
24 |
|
CC7 |
CP002002 |
17-LI00007-0 |
112 |
21 |
|
CC4 |
FM242711 |
16-LI00480-0 |
93 |
27 |
Fig. 2.Boxplot of SNP distances from BioNumerics and Snippy using different reference genomes for SNP analysis (applied to a subset of 394 isolates of 15 different MLST CCs), based on linearized distance matrices.
Fig. 3.Correlations of MLST CC-specific typing methods (applied to a subset of 394 isolates of 15 different MLST CCs), based on linearized distance matrices. Colour scale indicates the strength of correlation.
Fig. 4.Matrix of adjusted Wallace coefficients (direction-dependent values) for cgMLST methods at common thresholds (seven and ten alleles). Colour scale indicates percentage of concordance.
Adjusted thresholds for optimised clustering concordance between cgMLST methods and between cgMLST and SNP methods. Clustering by cgMLST methods at published thresholds [12, 13] (in bold type on the left) was set as reference for the adjustment of clustering thresholds for other methods. The columns show the different comparison methods and the threshold values (alleles or SNPs) at which the greatest possible agreement among the clustering with the respective reference method was achieved based on adjusted Wallace coefficients presented in Fig. 5. As cluster comparison is direction-dependent, the table must be read from the left to the right
|
cgMLST |
SNP | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
General reference |
MLST CC-specific reference | ||||||||||
|
|
Ridom_Ruppitsch |
chewBBACA_Ruppitsch |
BioNumerics_Moura |
Snippy_EGDe |
BioNumerics_EGDe |
Snippy_closed |
Snippy_draft |
BioNumerics_closed |
BioNumerics_draft | ||
|
|
Ridom_ Ruppitsch |
|
6 |
6 |
9–10 |
4 |
11 |
12 |
12 |
11–12 | |
|
chewBBACA_ Ruppitsch |
7 |
7 |
9–10 |
4 |
11 |
12 |
12 |
11–12 | |||
|
BioNumerics_ Moura |
8 |
7 |
13 |
4 |
12 |
12 |
11 |
11–12 | |||
|
Ridom_ Ruppitsch |
|
10 |
10 |
15 |
5 |
19 |
19 |
18 |
18 | ||
|
chewBBACA_ Ruppitsch |
10 |
10 |
15 |
5 |
20 |
18–19 |
18 |
18 | |||
|
BioNumerics_ Moura |
11 |
11 |
18 |
6 |
20 |
22 |
19 |
20 | |||
Fig. 5.Adjusted Wallace coefficients (direction-dependent values) at optimised clustering thresholds. (a) Threshold seven alleles, (b) Threshold ten alleles. Grey text colour indicates that the method was used as the reference for threshold adjustment. Percentage values of concordance are presented. Each method has a specific colour and rows and columns of the same colour represent the two directions of cluster comparison. adj.: adjusted threshold from Table 2.
Fig. 6.Practical test of the translation code taking a cgMLST cluster of 16 isolates belonging to MLST CC121 as an example. The cgMLST dataset retrieved from BioNumerics_Moura at an allele threshold of seven (grey text colour) was used as reference method for clustering. Labelling on the right, ‘method_threshold’. Upper labels: isolate identifiers. An asterisk indicates the isolate that was used for cluster search in the different methods. Members of a cluster are coloured. Corresponding distance matrices can be found in Supplementary file S3.