| Literature DB >> 31143162 |
Balamurugan Jagadeesan1, Leen Baert1, Martin Wiedmann2, Renato H Orsi2.
Abstract
As WGS is increasingly used by food industry to characterize pathogen isolates, users are challenged by the variety of analysis approaches available, ranging from methods that require extensive bioinformatics expertise to commercial software packages. This study aimed to assess the impact of analysis pipelines (i.e., different hqSNP pipelines, a cg/wgMLST pipeline) and the reference genome selection on analysis results (i.e., hqSNP and allelic differences as well as tree topologies) and conclusion drawn. For these comparisons, whole genome sequences were obtained for 40 Listeria monocytogenes isolates collected over 18 years from a cold-smoked salmon facility and 2 other isolates obtained from different facilities as part of academic research activities; WGS data were analyzed with three hqSNP pipelines and two MLST pipelines. After initial clustering using a k-mer based approach, hqSNP pipelines were run using two types of reference genomes: (i) closely related closed genomes ("closed references") and (ii) high-quality de novo assemblies of the dataset isolates ("draft references"). All hqSNP pipelines identified similar hqSNP difference ranges among isolates in a given cluster; use of different reference genomes showed minimal impacts on hqSNP differences identified between isolate pairs. Allelic differences obtained by wgMLST showed similar ranges as hqSNP differences among isolates in a given cluster; cgMLST consistently showed fewer differences than wgMLST. However, phylogenetic trees and dendrograms, obtained based on hqSNP and cg/wgMLST data, did show some incongruences, typically linked to clades supported by low bootstrap values in the trees. When a hqSNP cutoff was used to classify isolates as "related" or "unrelated," use of different pipelines yielded a considerable number of discordances; this finding supports that cut-off values are valuable to provide a starting point for an investigation, but supporting and epidemiological evidence should be used to interpret WGS data. Overall, our data suggest that cgMLST-based data analyses provide for appropriate subtype differentiation and can be used without the need for preliminary data analyses (e.g., k-mer based clustering) or external closed reference genomes, simplifying data analyses needs. hqSNP or wgMLST analyses can be performed on the isolate clusters identified by cgMLST to increase the precision on determining the genomic similarity between isolates.Entities:
Keywords: CFSAN pipeline; Listeria monocytogenes (L. monocytogenes); Lyve-SET; core genome MLST (cgMLST); high quality single nucleotide polymorphism (hqSNP); smoked salmon; whole genome MLST (wgMLST); whole genome sequence (WGS)
Year: 2019 PMID: 31143162 PMCID: PMC6521219 DOI: 10.3389/fmicb.2019.00947
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Listeria monocytogenesa isolated over a period of 18 years from a cold-smoked salmon facility along with two additional isolates from two different facilities.
| Isolate | Source | Specific source | Year | Month | Ribotype | MLST STb |
|---|---|---|---|---|---|---|
| FSL N1-0013c | Food | Whitefish | 1998 | May | DUP-1062 | ST121 |
| FSL N1-0051c | Environment | Equipment | 1998 | August | DUP-1062 | ST321 |
| FSL N1-0053 | Environment | Drain, raw area | 1998 | August | DUP-1062 | ST321 |
| FSL N1-0110 | Food | Salmon brine | 1998 | August | DUP-1062A | ST321 |
| FSL N1-0254 | Environment | Drain | 1998 | October | DUP-1062A | ST321 |
| FSL N1-0255 | Environment | NA | 1998 | October | DUP-1062A | ST321 |
| FSL N1-0256 | Environment | Drain | 1998 | October | DUP-1062A | ST321 |
| FSL N1-0400 | Environment | Drain | 1998 | October | DUP-1062A | ST321 |
| FSL H1-0081 | Environment | Crates | 2000 | February | DUP-1062A | ST321 |
| FSL H1-0159 | Environment | Drain 54 | 2000 | March | DUP-1062C | ST371 |
| FSL H1-0193 | Environment | Drain 18 | 2000 | March | DUP-1062A | ST321 |
| FSL H1-0221 | Environment | Door handle | 2000 | February | DUP-1062A | ST321 |
| FSL H1-0258 | Food | West coast | 2000 | July | DUP-1062A | ST321 |
| FSL H1-0322 | Environment | Drain 54 | 2000 | July | DUP-1062C | ST371 |
| FSL H1-0328 | Environment | Cold smoking-room floor | 2000 | July | DUP-1062A | ST321 |
| FSL H1-0506 | Food | West coast | 2000 | August | DUP-1062D | ST121 |
| FSL T1-0027 | Environment | Floor | 2001 | March | DUP-1062A | ST321 |
| FSL T1-0029 | Environment | Apron | 2001 | March | DUP-1062A | ST321 |
| FSL T1-0077 | Environment | Apron | 2001 | March | DUP-1062A | ST321 |
| FSL T1-0261 | Food | Norwegian Salmon | 2001 | July | DUP-1062A | ST321 |
| FSL T1-0938 | Environment | White tubs | 2001 | November | DUP-1062A | ST321 |
| FSL L4-0166 | Environment | Drain near filet table | 2002 | October | DUP-1062A | ST321 |
| FSL H6-0175 | Environment | Cutting board, trimming area | 2004 | June | DUP-1062A | ST321 |
| FSL R6-0665 | Food | RTE product | 2007 | September | DUP-1062C | ST121 |
| FSL R6-0670 | Environment | Hand truck (dolly) | 2007 | September | DUP-1062C | ST121 |
| FSL R6-0682 | Environment | After skinning machine | 2007 | September | DUP-1062C | ST121 |
| FSL R6-0909 | Environment | Drain 15 | 2007 | December | DUP-1062A | ST321 |
| FSL V1-0034 | Environment | Drain 2 | 2009 | February | DUP-1062A | ST321 |
| FSL V1-0142 | Environment | Drain 4 | 2009 | October | DUP-1062A | ST321 |
| FSL M6-0150 | Environment | Cutting table drain | 2011 | May | DUP-1062A | ST321 |
| FSL M6-0204 | Environment | Food contact surface | 2011 | May | DUP-1062D | ST121 |
| FSL M6-0296 | Environment | Drain in cooler room 9 | 2011 | June | DUP-1062D | ST321 |
| FSL M6-0306 | Environment | Drain in cooler room 8 | 2011 | June | DUP-1062A | ST321 |
| FSL M6-0594 | Environment | Drain in sturgeon room | 2011 | September | DUP-1062A | ST321 |
| FSL M6-0755 | Environment | Drain in sturgeon room | 2011 | November | DUP-1062A | ST321 |
| FSL M6-0810 | Environment | Drain in sturgeon room | 2011 | December | DUP-1062A | ST321 |
| FSL M6-0958 | Environment | Drain in sturgeon room | 2012 | February | DUP-1062A | ST321 |
| FSL M6-1133 | Environment | NA | 2012 | November | DUP-1062A | ST321 |
| FSL M6-1145 | Environment | NA | 2012 | November | DUP-1062A | ST321 |
| FSL R9-4003 | NA | NA | 2015 | ≤Mayd | DUP-1062A | ST199 |
| FSL R9-4438 | Environment | Door handle | 2015 | ≤Aprild | DUP-1062A | ST321 |
| FSL R9-4443 | NA | NA | 2015 | ≤Mayd | DUP-1062A | ST321 |
Selection of reference genomes (closed and draft references) for hqSNP analysis.
| Cluster | Reference | Length | Average coverage | Contigs | N50 |
|---|---|---|---|---|---|
| Cluster 1 closed | NZ_HG813249.1 | 3,072,826 | 150× | 1 | NA |
| Cluster 1 draft | FSL N1-0013 | 3,112,177 | 102× | 27 | 462,476 |
| Cluster 2 closed | NZ_CP019617.1 | 2,989,685 | 182× | 1 | NA |
| Cluster 2 draft | FSL H1-0159 | 3,034,949 | 87× | 113 | 108,201 |
| Cluster 3 closed | NZ_CP019623.1 | 2,940,913 | 181× | 1 | NA |
| Cluster 3a draft | FSL T1-0027 | 3,045,313 | 142× | 54 | 235,624 |
| Cluster 3b draft | FSL T1-0077 | 3,112,454 | 201× | 31 | 586,189 |
FIGURE 1Maximum parsimony tree based on k-mer-based SNP analysis. The tree was built using kSNP3 with the core SNPs identified among the set of 42 isolates in the study dataset plus 140 L. monocytogenes closed genomes downloaded from the NCBI RefSeq database. Lineages (I, II, and III), the three clusters (1, 2, and 3) and sub-clusters (3a and 3b), as well as the unclustered isolate are annotated. Percentages of consensus clustering agreement across up to 100 equally parsimonious trees are shown for the clusters identified in this study and main nodes representing the L. monocytogenes lineages. The tree was midpoint rooted.
Summary of pairwise hqSNP and allelic differences observed with different methods.
| Range of pairwise hqSNP differences (pipeline/reference) | Range of allelic differences (WGS MLST scheme) | |||||||
|---|---|---|---|---|---|---|---|---|
| Cluster or sub-cluster | CFSAN | Lyve-SET | BN | cgMLST | wgMLST | |||
| Closed | Draft | Closed | Draft | Closed | Draft | |||
| 1 | 2–163 | 2–157 | 2–148 | 2–179 | 1–150 | 1–151 | 0–70 | 0–134 |
| 2 | 4 | 4 | 5 | 5 | 5 | 5 | 4 | 4 |
| 3a | ND | 1–37 | ND | 1–30 | ND | 1–30 | 2–20 | 2–46 |
| 3b | ND | 0–28 | ND | 0–29 | ND | 0–27 | 0–15 | 0–32 |
Pairwise unweighted and weighted Robinson-Foulds (RF) distances for different methods.
| Method 1 (pipeline/reference) | Method 2 (pipeline/reference type) | Unweighted RF distance | Weighted RF distance |
|---|---|---|---|
| Lyve-SET/draft | Lyve-SET/closed | 2 | 1.00 |
| Lyve-SET/draft | CFSAN/closed | 0 | 0.00 |
| Lyve-SET/draft | CFSAN/draft | 0 | 0.00 |
| Lyve-SET/draft | BN/closed | 0 | 0.00 |
| Lyve-SET/draft | BN/draft | 0 | 0.00 |
| Lyve-SET/draft | cgMLST/NA | 4 | ND |
| Lye-SET/draft | wgMLST/NA | 4 | ND |
| Lyve-SET/closed | CFSAN/closed | 2 | 1.00 |
| Lyve-SET/closed | CFSAN/draft | 2 | 1.00 |
| Lyve-SET/closed | BN/closed | 2 | 1.00 |
| Lyve-SET/closed | BN/draft | 2 | 1.00 |
| Lyve-SET/closed | cgMLST/NA | 2 | ND |
| Lyve-SET/closed | cgMLST/NA | 2 | ND |
| CFSAN/draft | BN/closed | 0 | 0.00 |
| CFSAN/draft | BN/draft | 0 | 0.00 |
| CFSAN/draft | cgMLST/NA | 4 | ND |
| CFSAN/draft | wgMLST/NA | 4 | ND |
| CFSAN/closed | CFSAN/draft | 0 | 0.00 |
| CFSAN/closed | BN/closed | 0 | 0.00 |
| CFSAN/closed | BN/draft | 0 | 0.00 |
| CFSAN/closed | cgMLST/NA | 4 | ND |
| CFSAN/closed | wgMLST/NA | 4 | ND |
| BN/draft | cgMLST/NA | 4 | ND |
| BN/draft | wgMLST/NA | 4 | ND |
| BN/closed | BN/draft | 0 | 0.00 |
| BN/closed | cgMLST/NA | 4 | ND |
| BN/closed | wgMLST/NA | 4 | ND |
| wgMLST/NA | cgMLST/NA | 0 | ND |
| Lyve-SET/draft | CFSAN/draft | 2 | 0.72 |
| Lyve-SET/draft | BN/draft | 2 | 0.72 |
| Lyve-SET/draft | cgMLST/NA | 2 | ND |
| Lyve-SET/draft | wgMLST/NA | 6 | ND |
| CFSAN/draft | BN/draft | 0 | 0.00 |
| CFSAN/draft | cgMLST/NA | 2 | ND |
| CFSAN/draft | wgMLST/NA | 4 | ND |
| BN/draft | cgMLST/NA | 2 | ND |
| BN/draft | wgMLST/NA | 4 | ND |
| cgMLST/NA | wgMLST/NA | 6 | ND |
| Lyve-SET/draft | CFSAN/draft | 18 | 4.51 |
| Lyve-SET/draft | BN/draft | 16 | 3.80 |
| Lyve-SET/draft | cgMLST/NA | 24 | ND |
| Lyve-SET/draft | wgMLST/NA | 18 | ND |
| CFSAN/draft | BN/draft | 14 | 1.62 |
| CFSAN/draft | cgMLST/NA | 22 | ND |
| CFSAN/draft | wgMLST/NA | 20 | ND |
| BN/draft | cgMLST/NA | 22 | ND |
| BN/draft | wgMLST/NA | 18 | ND |
| cgMLST/NA | wgMLST/NA | 22 | ND |
Impact on results inference by changing reference in hqSNP analysis.
| Concordant results | ||||
|---|---|---|---|---|
| Cluster | Pipeline | Equal SNP values (%) | Different SNP values (%) | Discordant results (%) |
| 1 | CFSAN | 27 | 73 | 0 |
| Lyve-SET | 20 | 80 | 0 | |
| BN | 67 | 33 | 0 | |
| 3a | CFSAN | 2 | 33 | 64 |
| Lyve-SET | 9 | 67 | 24 | |
| BN | 80 | 20 | 0 | |
| 3b | CFSAN | 1 | 80 | 19 |
| Lyve-SET | 23 | 65 | 12 | |
| BN | 91 | 9 | 0 | |
Discordant conclusions (%) obtained due to changing approaches for pairwise comparison between strains.
| Cluster | Reference genome | Pipelines | Pipelines | |||
|---|---|---|---|---|---|---|
| CFSAN | Lyve-SET | BN | wgMLST | |||
| 1 | Draft | CFSAN | – | 0 | 0 | 0 |
| Lyve-SET | 0 | – | 0 | 0 | ||
| BN | 0 | 0 | – | 0 | ||
| wgMLST | 0 | 0 | 0 | – | ||
| 1 | Closed | CFSAN | – | 0 | 0 | 0 |
| Lyve-SET | 0 | – | 0 | 0 | ||
| BN | 0 | 0 | – | 0 | ||
| wgMLST | 0 | 0 | 0 | – | ||
| 3a | Draft | CFSAN | – | 36 | 33 | 13 |
| Lyve-SET | 36 | – | 11 | 18 | ||
| BN | 33 | 11 | – | 18 | ||
| wgMLST | 13 | 18 | 18 | – | ||
| 3b | Draft | CFSAN | – | 7 | 14 | 16 |
| Lyve-SET | 7 | – | 9 | 17 | ||
| BN | 14 | 9 | – | 24 | ||
| wgMLST | 16 | 17 | 24 | – | ||
FIGURE 2Maximum likelihood phylogenetic tree based on hqSNP analysis using the BioNumerics (BN) pipeline. The tree was constructed with RAxML using core hqSNPs identified within cluster 1 (A), sub-cluster 3a (B), and sub-cluster 3b (C). Bootstrap values greater than 70% are shown above the branches. Clades within (sub-) clusters are shown with the hqSNP ranges identified with the three hqSNP methods (CFSAN, Lyve-SET, and BN).