| Literature DB >> 29408896 |
Assia Saltykova1,2, Véronique Wuyts1, Wesley Mattheus3, Sophie Bertrand3, Nancy H C Roosens1, Kathleen Marchal2,4,5, Sigrid C J De Keersmaecker1.
Abstract
Whole genome sequencing represents a promising new technology for subtyping of bacterial pathogens. Besides the technological advances which have pushed the approach forward, the last years have been marked by considerable evolution of the whole genome sequencing data analysis methods. Prior to application of the technology as a routine epidemiological typing tool, however, reliable and efficient data analysis strategies need to be identified among the wide variety of the emerged methodologies. In this work, we have compared three existing SNP-based subtyping workflows using a benchmark dataset of 32 Salmonella enterica subsp. enterica serovar Typhimurium and serovar 1,4,[5],12:i:- isolates including five isolates from a confirmed outbreak and three isolates obtained from the same patient at different time points. The analysis was carried out using the original (high-coverage) and a down-sampled (low-coverage) datasets and two different reference genomes. All three tested workflows, namely CSI Phylogeny-based workflow, CFSAN-based workflow and PHEnix-based workflow, were able to correctly group the confirmed outbreak isolates and isolates from the same patient with all combinations of reference genomes and datasets. However, the workflows differed strongly with respect to the SNP distances between isolates and sensitivity towards sequencing coverage, which could be linked to the specific data analysis strategies used therein. To demonstrate the effect of particular data analysis steps, several modifications of the existing workflows were also tested. This allowed us to propose data analysis schemes most suitable for routine SNP-based subtyping applied to S. Typhimurium and S. 1,4,[5],12:i:-. Results presented in this study illustrate the importance of using correct data analysis strategies and to define benchmark and fine-tune parameters applied within routine data analysis pipelines to obtain optimal results.Entities:
Mesh:
Year: 2018 PMID: 29408896 PMCID: PMC5800660 DOI: 10.1371/journal.pone.0192504
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Overview and classical typing data of S. Typhimurium and S. 1,4,[5],12:i:- isolates in the study.
| ID | Serovar | Isolation date | MOL-PCR profile | Phage type | MLVA profile | ST | Additional information |
|---|---|---|---|---|---|---|---|
| 11–0596 | 1,4,[5],12:i:- | 15/01/2011 | 15-1-1 | DT138 | 3-13-11-NA-211 | ST-34 | Outbreak |
| 11–1163 | 1,4,[5],12:i:- | 28/03/2011 | 15-1-1 | DT138 | 3-13-11-NA-211 | ST-34 | Outbreak |
| 11–1164 | 1,4,[5],12:i:- | 28/03/2011 | 15-1-1 | DT138 | 3-13-11-NA-211 | ST-34 | Outbreak |
| 11–1165 | 1,4,[5],12:i:- | 28/03/2011 | 15-1-1 | DT138 | 3-13-11-NA-211 | ST-34 | Outbreak |
| 11–1166 | 1,4,[5],12:i:- | 26/03/2011 | 15-1-1 | DT138 | 3-13-11-NA-211 | ST-34 | Outbreak |
| 11–0600 | Typhimurium | 04/02/2011 | 15-3-1 | RDNC | 3-14-11-NA-211 | ST-34 | Out-group outbreak |
| 11–1160 | Typhimurium | 10/04/2011 | 1.21×1018−2.58×1011−4199 | DT104 | 3-14-18-14-311 | ST-19 | Out-group outbreak |
| S13BD00332 | Typhimurium | 02/03/2013 | 1.50×108−1.69×1011−1155 | ND | 3-14-14-5-311 | ST-19 | One patient |
| S13BD00591 | Typhimurium | 25/03/2013 | 1.50×108−1.69×1011−1155 | ND | 3-14-14-5-311 | ST-19 | One patient |
| S13BD00844 | Typhimurium | 19/04/2013 | 1.50×108−1.69×1011−1155 | ND | 3-14-14-5-311 | ST-19 | One patient |
| 12–2003 | 1,4,[5],12:i:- | 29/06/2012 | 15-1-1 | DT120 | 3-12-10-NA-211 | ST-34 | Background |
| 12–2203 | 1,4,[5],12:i:- | 11/07/2012 | 15-1-1 | DT120 | 3-12-10-NA-211 | ST-34 | Background |
| 12–2460 | 1,4,[5],12:i:- | 22/06/2012 | 15-1-1 | DT120 | 3-12-10-NA-211 | ST-34 | Background |
| 12–2455 | 1,4,[5],12:i:- | 26/07/2012 | 15-1-1 | DT193 | 3-12-10-NA-211 | ST-34 | Background |
| 12–2599 | 1,4,[5],12:i:- | 07/08/2012 | 15-1-1 | DT193 | 3-12-10-NA-211 | ST-34 | Background |
| 12–2730 | 1,4,[5],12:i:- | 14/08/2012 | 15-1-1 | DT193 | 3-12-10-NA-211 | ST-34 | Background |
| 12–1558 | 1,4,[5],12:i:- | 22/05/2012 | 15-1-1 | DT193 | 3-13-11-NA-211 | ST-34 | Background |
| 12–2314 | 1,4,[5],12:i:- | 17/07/2012 | 15-1-1 | DT193 | 3-13-11-NA-211 | ST-34 | Background |
| 12–2379 | 1,4,[5],12:i:- | 29/07/2012 | 15-1-1 | DT193 | 3-13-11-NA-211 | ST-34 | Background |
| 12–3792 | Typhimurium | 08/10/2012 | 15-3-1 | DT120 | 3-12-10-NA-211 | ST-34 | Background |
| 12–3907 | Typhimurium | 14/10/2012 | 15-3-1 | DT120 | 3-12-10-NA-211 | ST-34 | Background |
| 12–3990 | Typhimurium | 23/10/2012 | 15-3-1 | DT120 | 3-12-10-NA-211 | ST-34 | Background |
| 12–0084 | Typhimurium | 13/01/2012 | 15-3-1 | DT193 | 3-12-10-NA-211 | ST-34 | Background |
| 12–0161 | Typhimurium | 21/01/2012 | 15-3-1 | DT193 | 3-12-10-NA-211 | ST-34 | Background |
| 12–3663 | Typhimurium | 30/09/2012 | 15-3-1 | DT193 | 3-12-10-NA-211 | ST-34 | Background |
| 12–3558 | 1,4,[5],12:i:- | 25/09/2012 | 15-1-1 | DT138 | 3-12-11-NA-211 | ST-34 | Background |
| 12–3582 | 1,4,[5],12:i:- | 11/09/2012 | 15-1-1 | DT138 | 3-12-11-NA-211 | ST-34 | Background |
| 12–3583 | 1,4,[5],12:i:- | 11/09/2012 | 15-1-1 | DT138 | 3-12-11-NA-211 | ST-34 | Background |
| 12–2984 | 1,4,[5],12:i:- | 27/08/2012 | 15-1-1 | RDNC | 3-12-11-NA-211 | ST-34 | Background |
| 12–2998 | 1,4,[5],12:i:- | Not known | 15-1-1 | RDNC | 3-12-11-NA-211 | ST-34 | Background |
| 12–3067 | 1,4,[5],12:i:- | 04/09/2012 | 15-1-1 | RDNC | 3-12-11-NA-211 | ST-34 | Background |
| 12–3444 | Typhimurium | 22/09/2012 | 1.21×1018−2.58×1011−4199 | DT104 | 3-16-16-13-311 | ST-19 | Background |
Background: background isolates; MLVA: multiple-locus variable-number of tandem repeats analysis; MOL-PCR: multiplex oligonucleotide ligation-PCR; NA: absence of a PCR amplicon in MLVA; ND: not determined as phage type was replaced by MLVA at the NRCSS; One patient: isolates obtained from one patient at different time points; Outbreak: isolates related to an epidemiologically confirmed outbreak; Out-group outbreak: out-group isolates that were collected during the same period as the outbreak and that were used in the original outbreak investigation; RDNC: reacts-but-does not confirm, used for isolates that give lysis reactions with the bacteriophages, but these reactions do not match any of the patterns that define a certain phage type [44]; ST: sequence type
(*) background isolates of identical phage type, and with an identical or a frequently occurring MOL-PCR profile as observed for the outbreak and out-group isolates
(**) background isolates with frequently occurring combinations of MOL-PCR profiles, phage types and MLVA profiles.
Fig 1Schematic representation of the tested SNP-based subtyping workflows.
Left: data analysis steps of a SNP-based subtyping workflow; Right: details of variant calling and SNP matrix construction steps of the tested SNP-based subtyping workflows. AF: allele frequency, COV: coverage, GQ: genotype quality, MQ: mapping quality, Rel. COV: relative coverage.
Fig 2Phylogenetic tree generated with the tested SNP-based subtyping workflows.
The workflows were run using the original dataset and LT2 as a reference genome. (A) CSI-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow. Isolates are coloured according to the MLVA-profile. The minimal and maximal SNP distances observed between the five outbreak isolates and the three isolates obtained from the same patient are indicated near the clusters. The two stably recurring groups of isolates mentioned in the text consist of (1) 12–3582, 12–3583, 12–2998, 12–2984, 12–3558, 12–3067 (yellow) and (2) 12–2314, 12–2460, 12–2599, 12–2455, 12–2379, 12–1558 (green and red). The trees are drawn to scale, with branch lengths measured in the number of substitutions per site. The scale axis is provided below each tree. BS: bootstrap values.
Performance metrics of the tested SNP-based subtyping workflows.
| CSI-based workflow | PHEnix-based workflow | CFSAN-based workflow | ||||
|---|---|---|---|---|---|---|
| OD | 30X | OD | 30X | OD | 30X | |
| 100% | 100% | 100% | 100% | 100% | 100% | |
| 999 | 537 | 1649 | 1056 | 2008 | 1732 | |
| 28 | 23 | 29 | 28 | 32 | 32 | |
| 0.992 | 0.972 | 0.994 | 0.990 | 1.00 | 1.00 | |
| 0.982–1.00 | 0.947–0.996 | 0.985–1.00 | 0.976–1.00 | 1.00–1.00 | 1.00–1.00 | |
| 0–3 | 0–2 | 1–3 | 1–3 | 2–14 | 7–17 | |
| 2–4 | 0 | 3–6 | 1–4 | 12–22 | 16–23 | |
Performance metrics of the workflows were assessed using original dataset (OD) and dataset down-sampled to a 30X coverage (30X), with LT2 as a reference genome. DP: discriminative power. SNPs outbreak isolates: minimal and maximal number of SNPs observed between the outbreak isolates. SNPs isolates from one patient: minimal and maximal number of SNPs observed between isolates obtained from the same patient.
Fig 3SNP distance matrices generated with the tested SNP-based subtyping workflows.
The workflows were run using the original dataset and LT2 as a reference genome. (A) CSI Phylogeny-based workflow, (B) PHEnix-based workflow, (C) CFSAN-based workflow. Colours indicate pairwise SNP distances between isolates. Outbreak isolates are shown in bold and isolates obtained from the same patient are underlined.