Literature DB >> 29401524

Impact of the choice of reference genome on the ability of the core genome SNV methodology to distinguish strains of Salmonella enterica serovar Heidelberg.

Valentine Usongo1,2, Chrystal Berry3, Khadidja Yousfi1, Florence Doualla-Bell1, Genevieve Labbé4, Roger Johnson4, Eric Fournier1, Celine Nadon3, Lawrence Goodridge2, Sadjia Bekal1,5.   

Abstract

Salmonella enterica serovar Heidelberg (S. Heidelberg) is one of the top serovars causing human salmonellosis. The core genome single nucleotide variant pipeline (cgSNV) is one of several whole genome based sequence typing methods used for the laboratory investigation of foodborne pathogens. SNV detection using this method requires a reference genome. The purpose of this study was to investigate the impact of the choice of the reference genome on the cgSNV-informed phylogenetic clustering and inferred isolate relationships. We found that using a draft or closed genome of S. Heidelberg as reference did not impact the ability of the cgSNV methodology to differentiate among 145 S. Heidelberg isolates involved in foodborne outbreaks. We also found that using a distantly related genome such as S. Dublin as choice of reference led to a loss in resolution since some sporadic isolates were found to cluster together with outbreak isolates. In addition, the genetic distances between outbreak isolates as well as between outbreak and sporadic isolates were overall reduced when S. Dublin was used as the reference genome as opposed to S. Heidelberg.

Entities:  

Mesh:

Year:  2018        PMID: 29401524      PMCID: PMC5798827          DOI: 10.1371/journal.pone.0192233

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Nontyphoidal Salmonella (NTS) enterica serovars are the most important causes of bacterial gastroenteritis. Among the NTS serovars, Heidelberg is ranked as the second and third most frequent serovar recovered from clinical cases in Québec and Canada respectively [1]. In Québec between 2004 and 2014, 23% of S. Heidelberg clinical isolates were from blood specimens, compared to 7% for S. enterica serovar Enteritidis and 5% for S. enterica serovar Typhimurium, suggesting an increased capacity of this serovar to cause invasive systemic disease [2]. Pulsefield gel electrophoresis (PFGE) has been the gold standard method used by PulseNet Canada (PNC) since the 1990s for the molecular typing of Salmonella during outbreak investigations. However, a major drawback with the use of PFGE in outbreak investigation is the low resolution power of this technique that is further exacerbated when applied to S. Heidelberg typing owing to the extremely low genetic diversity of this serovar. For example, 70% of S. Heidelberg isolated in Québec belonged to pulsovar 2 [2]. Whole genome sequence (WGS) based methods owing to their growing availability and high genomic resolution are rapidly replacing traditional typing methods such as PFGE within major public health laboratories including PNC [3]. Two popular methods that are increasingly applied in the field of bacterial genomic epidemiology are: the gene-by-gene methods which is basically an extension of the 7 gene MLST typing technique to encompass the entire genome (whole genome MLST, wgMLST) or just the core genome (core genome MLST, cgMLST) [4, 5] and the single nucleotide variant (SNV) methods which identifies single nucleotide variants by comparing a population of target genomes against a reference [6, 7]. We recently found that the cgSNV method provided superior discriminatory power than traditional methods during outbreak investigations involving Salmonella Heidelberg [2]. The choice of the reference genome has been previously proposed as a potential consideration affecting core genome SNV (cgSNV)–based analysis and outcomes. For example, choosing a distantly related strain as a source of reference may tend to cluster isolates that are otherwise genetically distant. Another concern with the choice of reference genome is the sequencing status of the reference genome. It is generally perceived that high-quality complete genomes are preferred to ensure accurate and epidemiologically concordant phylogenetic analysis and outbreak investigation. These concerns were not addressed in our previous work on the cgSNV method [2]. Here using draft de novo assembled and completely sequenced or closed genomes of S. Heidelberg as well as a distantly related genome such as S. Dublin as references, we assessed the ability of the cgSNV methodology to differentiate amongst 145 S. Heidelberg strains involved in four distinct outbreaks and sporadic cases of salmonellosis in Québec.

Materials and method

Collection and characterization of bacterial isolates

The 145 S. Heidelberg clinical isolates described in this study were collected as part of the Quebéc surveillance program on human salmonellosis established since 2003 to ensure rapid detection of outbreaks. The food isolates were collected by the Ministère de l'Agriculture, des Pêcheries et de l'Alimentation du Québec (MAPAQ) during routine food-poisoning investigations. Isolates were grown on triple sugar iron agar at 37°C and stored at -80°C in trypticase soy spiked with 10% glycerol. PFGE and serotyping was performed at the Laboratoire de Santé Publique du Québec (LSPQ) following PNC guidelines.

Whole genome sequencing

Frozen bacterial isolates were cultured overnight at 37°C in brain heart infusion broth and genomic DNA was extracted using the Metagenomic DNA isolation Kit for Water (Epicentre, Madison, WI). Samples were prepared using Nextera XT chemistry (Illumina, Inc., San Diego, CA) and were sequenced using Illumina Miseq paired-end read technology using 300 base read lengths. Five strains were selected from the outbreak isolates to serve as references and their reads were de novo assembled using SPAdes v. 3.9 [8]. The complete genome counterparts for these strains have been reported in a previous work [9]. Two draft and three complete, unrelated reference genomes were also downloaded from NCBI and included in this analysis making a total of 15 assessed reference genomes (Table 1).
Table 1

General features of the strains used as reference genomes for the cgSNV analysis of 145 S. Heidelberg isolates.

Strain IDGenome StatusNCBI Accession NoSourceSerovarGenome used as reference in Tree number
ID117795DraftNAHumanHeidelberg1
ID117795Completely sequencedCP016507HumanHeidelberg2
ID128787DraftNAHumanHeidelberg3
ID128787Completely sequencedCP016586HumanHeidelberg4
ID128902DraftNAHumanHeidelberg5
ID128902Completely sequencedCP016579HumanHeidelberg6
ID134609DraftNAHumanHeidelberg7
ID134609Completely sequencedCP016581HumanHeidelberg8
ID135140DraftNAFoodHeidelberg9
ID135140Completely sequencedCP016510FoodHeidelberg10
SL486DraftNZ_ABEL00000000HumanHeidelberg11
CFSAN024776DraftNC_JWQE00000000HumanHeidelberg12
SL476Completely sequencedNC_011083Ground turkeyHeidelberg13
B182Completely sequencedCP003416Bovine fecesHeidelberg14
SL477Completely sequencedCP001144HumanDublinNA

NA, Not available.

NA, Not available.

CgSNV typing

CgSNV analysis was performed using the SNVPhyl pipeline [10] v.1.0 integrated within the NML instance of the Galaxy platform [11]. Briefly, paired-end sequence reads from the 145 isolates were aligned against each of the 15 reference genomes using SMALT v.0.7.5 (http://www.sanger.ac.uk/science/tools/smalt-0). MUMmer v.3.23 [12] and PHAST [13] were used to identify repeat and prophage regions in each reference genome respectively and these regions were excluded from the analysis. Variants were called using two independent variant calling algorithms: FreeBayes v.0.9.20 and SAMtools [14] /BCFtools based on predefined criteria described elsewhere [10]. To infer the relationship between these isolates, minimum spanning trees were constructed from the SNVphyl output data using the geoBURST algorithm built into PHYLOViZ v2.0 [15].

Topological similarity

We assessed the topological similarity of the phylogenetic trees using the Robinson and Foulds (RF) test [16]. This test is a widely used tree metric for tree-to-tree distances and is defined as the minimum number of operations needed to transform one tree into the other. Briefly, newick tree files from the SNVphyl pipeline generated using each of the 14 genomes as references were concatenated and the resulting file was submitted to the online phylogenetic tool T-REX [17] to compute the topological distances between the trees.

Nucleotide sequence accession numbers

The sequence data supporting the results of this article have been deposited in the NCBI Sequence Read Archive under accession number SRP098783.

Results

Epidemiological characteristics and PFGE subtyping results of the 145 S. Heidelberg isolates

Epidemiologic and PFGE fingerprinting results of the 145 S. Heidelberg isolates used in this study are presented in Table 2.
Table 2

Epidemiologic and subtyping results of the 145 S. Heidelberg clinical and food isolates used in this study.

Isolate No.SourceIsolation dateOutbreak codePulsotypePhage typeNCBI accession no.
ID117793Human05–20121219SH12-001
ID117794Human05–20121219SH12-002
ID117795Human05–20121219SH12-003
ID117796Human05–20121219SH12-004
ID117797Human05–20121219SH12-005
ID117798Human05–20121219SH12-006
ID117799Human05–20121219SH12-007
ID117800Human05–20121219SH12-008
ID118040Food05–20121219SH12-009
ID117870Food05–20121219SH12-010
ID128696Human11–20132226SH13-001
ID128783Human11–20132226SH13-002
ID128786Human11–20132226SH13-003
ID128787Human11–20132226SH13-004
ID128808Human11–20132226SH13-005
ID128902Human11–20132226SH13-006
ID128908Human11–20132226SH13-007
ID128910Human11–20132226SH13-008
ID134557Human07–20143219SH14-001
ID134930Human08–20143219SH14-002
ID134612Human08–20143219SH14-003
ID134421Human08–20143219SH14-004
ID134719Human08–20143219SH14-005
ID134608Human08–20143219SH14-006
ID135122Human08–20143219SH14-007
ID134610Human08–20143219SH14-008
ID134609Human08–20143219SH14-009
ID134565Human08–20143217SH14-010
ID134559Human08–20143217SH14-011
ID134929Human08–201432ATHE-35SH14-012
ID134879Food08–20143219SH14-013
ID134880Food08–20143219SH14-014
ID134881Food08–20143219SH14-015
ID134882Food08–20143219SH14-016
ID134883Food08–20143219SH14-017
ID134884Food08–20143219SH14-018
ID134885Food08–20143219SH14-019
ID134886Food08–20143219SH14-020
ID134887Food08–20143219SH14-021
ID134888Food08–20143219SH14-022
ID134889Food08–20143219SH14-023
ID134890Food08–20143219SH14-024
ID135137Food08–20143219SH14-025
ID135138Food08–20143219SH14-026
ID135139Food08–20143219SH14-027
ID135140Food08–20143219SH14-028
ID148030Human03–20164219SRR5228105
ID148149Human03–20164219SRR5228097
ID148230Human03–20164219SRR5228082
ID148231Human03–20164219SRR5228079
ID148280Human03–20164219SRR5228104
ID148286Human03–20164219SRR5228087
ID148337Human03–20164219SRR5228078
ID148338Human03–20164219SRR5228091
ID094525Human12–2007NA219SRR5227118
ID095996Human04–2008NA311SRR5227171
ID097320Human07–2008NA219SRR5227121
ID099254Human10–2008NA229SRR5227124
ID099787Human12–2008NA219SRR5228101
ID100344Human01–2009NA219SRR5227119
ID100753Human02–2009NA219SRR5227155
ID101488Human04–2009NA12216SRR5227148
ID102666Human07–2009NA226SRR5227120
ID102743Human08–2009NA219SRR5227166
ID102860Human08–2009NA1719SRR5227126
ID102963Human08–2009NA226SRR5228093
ID103472Human09–2009NA219SRR5227163
ID103849Human10–2009NA12SRR5227117
ID103978Human10–2009NA13816SRR5227128
ID104279Human11–2009NA1401SRR5227146
ID104398Human12–2009NA12SRR5227169
ID105089Human02–2010NA632SRR5227122
ID105144Human02–2010NA219SRR5227127
ID106827Human06–2010NA8732SH12-013
ID107176Human07–2010NA229SRR5228100
ID107454Human07–2010NA12SRR5227152
ID108191Human08–2010NA219SH10-001
ID108221Human08–2010NA226SH10-014
ID108759Human09–2010NA8626SRR5227162
ID108677Human09–2010NA226SH10-015
ID110275Human01–2011NA16535SRR5227139
ID110331Human01–2011NA10722SRR5227156
ID110403Human01–2011NA219SRR5227141
ID110674Human02–2011NA168atypicalSRR5227174
ID110801Human02–2011NA226SH11-002
ID111466Human04–2011NA219SRR5227167
ID113160Human08–2011NA6629SRR5227130
ID113273Human08–2011NA17547SRR5227140
ID113787Human09–2011NA178atypicalSRR5227173
ID114520Human10–2011NA229SRR5227135
ID114593Human11–2011NA229SRR5227157
ID115377Human12–2011NA219SRR5227165
ID115568Human01–2012NA8629SRR5227133
ID116136Human02–2012NA229SRR5227164
ID116271Human02–2012NA8732SH10-014
ID116824Human03–2012NA107ATHE-10SRR5227129
ID117211Human04–2012NA229SRR5227151
ID117095Human04–2012NA229SRR5227172
ID117506Human04–2012NA219SH10-002
ID117683Human04–2012NA219SRR5227143
ID117578Human05–2012NA219SH12-011
ID118209Human05–2012NA45SRR5227123
ID118236Human05–2012NA8629SRR5227154
ID118280Human05–2012NA107ATHE-10SRR5227150
ID118551Human06–2012NA5210SRR5227136
ID118532Human06–2012NA5210SRR5227175
ID118700Human06–2012NA5210SRR5227158
ID118759Human06–2012NA218SH12-012
ID118979Human07–2012NA219SRR5227159
ID119224Human07–2012NA229SRR5227125
ID119366Human07–2012NA18610SRR5227138
ID119464Human08–2012NA217SRR5227145
ID119539Human08–2012NA219SRR5227147
ID119674Human08–2012NA219SRR5227170
ID119888Human08–2012NA5210SRR5227161
ID119967Human08–2012NA219SRR5227144
ID120403Human09–2012NA219SRR5227160
ID120598Human09–2012NA219SRR5227132
ID120747Human09–2012NA219SRR5227131
ID121956Human11–2012NA18910SRR5227149
ID122356Human12–2012NA219SRR5227176
ID124024Human03–2013NA219SRR5227168
ID124305Human04–2013NA229SRR5227153
ID124498Human04–2013NA435SRR5227142
ID125378Human06–2013NA219SRR5227137
ID126392Human07–2013NA219SRR5227134
ID126712Human08–2013NA219SRR5228099
ID126777Human08–2013NA219SRR5228083
ID126776Human08–2013NA219SRR5228080
ID126696Human08–2013NA219SRR5228081
ID126825Human08–2013NA229SRR5228085
ID147047Human02–2016NA45SRR5228102
ID147120Human02–2016NA23132SRR5228084
ID147091Human02–2016NA5210SRR5228088
ID147129Human02–2016NA219SRR5228092
ID147253Human02–2016NA19429SRR5228086
ID147255Human02–2016NA22910SRR5228090
ID147457Human02–2016NA22519SRR5228089
ID147462Human02–2016NA21429SRR5228094
ID147990Human03–2016NA5210SRR5228106
ID147796Human03–2016NA5210SRR5228095
ID147816Human03–2016NA219SRR5228103
ID147910Human03–2016NA21429SRR5228077
ID147841Human03–2016NA219SRR5228096
ID148066Human03–2016NA219SRR5228098

NA, Not available.

NA, Not available. Isolates from four distinct outbreaks that occurred in Quebéc between 2012 and 2016 were also included in the analysis. These outbreaks were designated as follows: outbreak 1, 2012 (n = 10; 8 human and 2 food isolates) outbreak 2, 2013 (n = 8 human isolates) outbreak 3, 2014 (n = 28; 12 human and 16 food isolates and outbreak 4, 2016 (n = 8 human isolates). All human cases and food items linked to these outbreaks were confirmed by epi-data. Outbreak 1 was linked to a wedding party, outbreak 2 and 3 were traced to separate restaurants and outbreak 4 was associated with a daycare catering service. In addition to the outbreak isolates (n = 54) we also added 91 sporadic clinical isolates collected in Quebéc between 2007 and 2016 into the analysis.

Whole genome sequencing results

An average of 983,919 reads was obtained per isolate (range, 339,270–2,974,917) for the set of 145 S. Heidelberg isolates, corresponding to an estimated average genome coverage of 121x (range, 42x -365x). The number of SPAdes-assembled contigs for the five outbreak isolates that were selected to act as the reference ranged from 24–27 with all the isolates assembled into fewer than 27 contigs (Table 3). The completely sequenced genome equivalent of these isolates have been published in a previous study [9].
Table 3

Assembly statistics for the 5 S. Heidelberg isolates that served as reference genomes.

StrainTotal length (bp)No of contigsN50 (bp)Coverage
ID1177954,751,24124694,16137
ID1287874,747,97127363,27343
ID1289024,746,56527412,15964
ID1346094,853,51927412,096130
ID1351404,753,55027412,162116

Core genome single nucleotide analysis

After removing repeats and prophages as well as SNV-dense regions from all the reference genomes, an average of 4,008,254 genomic positions (range 3,681,444–4,049,343) representing an average of 86% of the reference genomes (range 84.63–86.72%) had sufficient coverage (≥15x) across all 145 isolates for reference mapping. For all the S. Heidelberg reference genomes, an average of 769 high-quality consensus SNVs (range 751–819) were identified by both variant callers as common to all isolates and used for subsequent phylogenetic clustering. For the distantly related S. Dublin genome, 18,155 high-quality core genome SNVs were used to construct the phylogeny. In total, 15 minimum spanning trees were generated with 14 of these trees representing the 14 S. Heidelberg reference genomes. All outbreak isolates formed distinct clusters with all the 14 S. Heidelberg reference genomes and the topologies of these trees were highly similar (Fig 1A and 1B).
Fig 1

Minimum spanning phylogenetic tree of the core genome of 145 S. Heidelberg sequenced isolates generated using A) draft genome or B) closed referenced genome (ID117795) as an example.

Isolates in the same circle have 0 hqSNVs and the size of each circle is proportional to the number of isolates in the circle.

Minimum spanning phylogenetic tree of the core genome of 145 S. Heidelberg sequenced isolates generated using A) draft genome or B) closed referenced genome (ID117795) as an example.

Isolates in the same circle have 0 hqSNVs and the size of each circle is proportional to the number of isolates in the circle. Using S. Dublin as the reference genome led to a loss in resolving power. In fact, sporadic isolates clustered with outbreak 1 isolates (Fig 2).
Fig 2

Minimum spanning phylogenetic tree of the core genome of 145 S. Heidelberg sequenced isolates generated using the distantly related reference S. Derby (SL477).

Isolates in the same circle have 0 hqSNVs and the size of each circle is proportional to the number of isolates in the circle.

Minimum spanning phylogenetic tree of the core genome of 145 S. Heidelberg sequenced isolates generated using the distantly related reference S. Derby (SL477).

Isolates in the same circle have 0 hqSNVs and the size of each circle is proportional to the number of isolates in the circle. The phylogenetic features revealed by the minimum spanning trees were nearly identical across all the 14 S. Heidelberg reference genomes (Table 4).
Table 4

Phylogenetic observations of the minimum spanning tree topology built from cgSNV analysis of 145 S. Heidelberg isolates and comparison of draft and closed genomes as references.

References selected from within the outbreak isolatesReferences downloaded from NCBIDistantly related genome downloaded from NCBI
Minimum spanning tree featuresID117795ID128787ID128902ID134609ID135140SL486JWQE01.1SL476CP003416.1CP001144
DraftClosedDraftClosedDraftClosedDraftClosedDraftClosedDraftDraftClosedClosedClosed
Total # of nodes on the tree929292929292939393939293939386
Total # of nodes representing 1 isolate797979797979808080807879808072
Total # of nodes with more than one isolate131313131313131313131314131315
Total # of OB nodes131313131313131313131313131310
Total # of nodes with one isolate999999999999996
Total # of OB nodes with more than one isolate444444444444444
Total # of sporadic nodes797979797979808080807879808076
Total # of sporadic nodes linked to main OB clusters101010101010101010101010101010
Number of sites used to generate phylogeny76276276176076276076676176476377779975181918155
Number of genomic positions used for reference mapping4,046,5804,037,6524,046,5844,037,3624,047,5864,037,1024,049,3434,037,8494,046,4844,037,7884,008,0044,048,0633,925,6434,036,3353,681,444
% of genomic positions representing the genome86,04%86,68%86,07%86,7%86,08%86,69%85%86,72%86%86,72%85,95%86,12%85,72%86,67%84,63%
The genetic distances observed between outbreak isolates as well as between outbreak and sporadic isolates were nearly identical across the 14 S. Heidelberg reference genomes. Using S. Dublin as the reference genome led to a significant reduction in genetic distances between sporadic and outbreak isolates with some sporadic isolates having 0 and 3 SNV difference with outbreak 1 and 3 respectively whereas the genetic distances between isolates in these outbreaks ranged from 0–3 SNVs (Table 5).
Table 5

Comparison of the number of high quality SNVs between 145 S. Heidelberg sporadic and outbreak isolates using a draft, closed and distantly related reference genomes.

ReferenceaOutbreakOutbreak 1Outbreak 2Outbreak 3Outbreak 4Sporadic
DraftClosedDraftClosedDraftClosedDraftClosedDraftClosed
ID11779510–30–372–7471–7348–5148–5218–2118–211–931–93
272–7471–730066–6765–6780–8179–804–1054–104
348–5148–5266–6765–670–20–356–5856–596–866–87
418–2118–2180–8179–8056–5856–590–10–110–10010–100
ID12878710–30–371–7371–7347–5047–5017–2017–201–921–92
271–7371–730066–6766–6780–8180–814–1054–105
347–5047–5066–6766–670–20–256–5856–586–866–86
417–2017–2080–8180–8156–5856–580–10–110–10010–100
ID12890210–30–372–7471–7348–5247–5017–2017–201–921–92
272–7471–730066–6866–6781–8280–814–1054–105
348–5247–5066–6866–670–30–257–6056–586–866–86
417–2017–2081–8280–8157–6056–580–10–110–10110–100
ID13514010–30–372–7471–7348–5247–5117–2017–201–931–92
272–7471–730066–6866–6881–8280–814–1054–105
348–5247–5166–6866–680–30–358–6058–596–876–87
417–2017–2081–8280–8158–6058–590–10–110–10010–100
ID13560910–30–372–7471–7348–5247–5017–2017–201–931–92
272–7471–730066–6866–6781–8280–814–1054–105
348–5247–5066–6866–670–30–257–6056–586–866–86
417–2017–2081–8280–8157–6056–580–10–110–10110–100
SL48610–3ND71–73ND46–49ND17–20ND1–92ND
271–73065–6680–814–105
346–4965–660–255–576–85
417–2080–8155–570–113–100
JWQE01.110–3ND71–73ND46–49ND17–20ND1–91ND
270–72066–6779–804–105
346–4966–670–255–576–86
417–2079–8055–570–110–99
SL4761ND0–3ND68–70ND46–49ND17–20ND1–92
268–70065–6680–814–105
346–4965–660–255–576–85
417–2080–8155–570–113–100
CP003416.11ND0–3ND72–74ND47–50ND17–20ND1–92
272–74067–6881–824–106
347–5067–680–256–586–86
417–2081–8256–580–110–100
CP0011441ND0–2ND45–47ND28–31ND10–12ND0–60
245–47039–4049–492–67
328–3139–400–232–343–53
410–1249–4932–3407–62

aThe reference genomes were obtained from: S. Heidelberg outbreak isolates [ID117795, ID128787, ID128902, ID135140, and ID135609]; publicly-available S. Heidelberg references from NCBI [SL486, JWQE01.1, SL476, CP003416.1] and distantly-related S. Dublin reference from NCBI [CP001144].

ND, Not Determined.

aThe reference genomes were obtained from: S. Heidelberg outbreak isolates [ID117795, ID128787, ID128902, ID135140, and ID135609]; publicly-available S. Heidelberg references from NCBI [SL486, JWQE01.1, SL476, CP003416.1] and distantly-related S. Dublin reference from NCBI [CP001144]. ND, Not Determined.

Topological similarity of the phylogenetic trees

To confirm the similarities in tree topologies, we performed the RF test. The computed RF topological distances between the 14 trees ranged from 0 to 24 (Table 6).
Table 6

Robinson-Foulds topological distances between trees generated with 145 S. Heidelberg sequenced isolates using draft and closed genomes as references during cgSNV analysis.

Tree 2Tree 3Tree 4Tree 5Tree 6Tree 7Tree 8Tree 9Tree 10Tree 11Tree 12Tree 13Tree 14
Tree 1020202020202020207111511
Tree 220202020202020107111511
Tree 3000240242413111511
Tree 400240242413111511
Tree 50240242413111511
Tree 6240242413111511
Tree 7240013151915
Tree 8242413111511
Tree 9013151915
Tree 1013151915
Tree 11484
Tree 1240
Tree 134

Discussion

In this study we assessed the impact of the choice of the reference genome on the resolution clustering of S. Heidelberg outbreak and sporadic isolates using the cgSNV methodology. Our results revealed that using a draft or completely sequenced S. Heidelberg genomes as references did not affect the ability of the cgSNV method to distinguish between four epidemiologically well characterized S. Heidelberg outbreak isolates and to separate these isolates from sporadic or background strains. In fact, all outbreak and sporadic isolates were clustered on distinct branches (Fig 1A and 1B). On the contrary, using S. Dublin as reference choice resulted in a tree with less resolution (Fig 2). Sporadic isolates clustered together with outbreak 1 isolates and in addition, the genetic distances observed within outbreak as well as between outbreak and sporadic isolates was overall reduced when S. Dublin was used as reference choice as opposed to S. Heidelberg reference genomes (Figs 1 and 2, Table 5). This finding is in agreement with a recent study on Salmonella which found that using a distantly related genome as a choice of reference failed to cluster S. Enteritidis outbreak strains concordantly [18]. These observations emphasize the need to choose an appropriate reference genome during laboratory investigations of foodborne outbreaks involving reference based methods. The loss in resolution observed in this study can be due to the following reasons: Firstly, the SNVPhyl pipeline uses only core genome SNVs to build the phylogeny implying that a reference genome with high similarity with the isolates under investigation would have a larger core genome from which more SNVs can be produced to build a phylogeny as opposed to a dissimilar reference such as S. Dublin. However, the smallest core genome among all the reference genomes used in this study was equivalent to 84.76% of the total genome. Secondly, another issue to consider is that as the reference genome grows more distant from the sequences under analysis, more variation would be observed between the core regions of the reference genome and all other sequences leading to a long branch separating the reference genome from the rest of the other isolates. This was indeed what we observed using S. Dublin as reference (Fig 2) since the majority of the 18155 positions used to construct the phylogeny were indeed variations between the reference (17666 positions) and the other isolates. The RF values for the trees constructed using S. Heidelberg draft genome and its corresponding closed genome equivalent was zero with the exception of the reference genome pair ID134609 (tree 7 vs. tree 8) whereby the RF topological distance was 24. This difference could be attributed to misidentification of SNVs linked to the presence of repetitive regions in the draft genome that were not properly detected. By nature, draft genomes are not entirely genomically accurate and for this reason, it is possible that the SPAdes assembly for this isolate may have collapsed repetitive regions larger than the read/read pair size into a single contig. Aligning reads to repetitive regions is problematic and has been reported to lead to potential misidentification of SNVs [19]. Although we used the software MUMmer to identify and filter the repetitive regions of the reference genome, for a closed genome, this method will be much more successful due to its intact, completely mapped sequence than for a draft genome, which may erroneously possess several copies of unmapped repeat regions. These unmapped repeats can lead to repeat reads mapping to a single contig resulting in the inclusion of additional SNVs and subsequent alignment issues and false SNV calls. Interestingly in this reference pair, 766 sites were used to generate the phylogeny using the draft genome as reference as opposed to 761 sites for the closed genome counterpart. Despite the slight differences in topology between this reference pair as well as in other tree pairs, both the draft and closed genomes were still able to differentiate the outbreak from sporadic isolates in concordance with epidemiological data. In agreement with our observations, a recent study on Listeria monocytogenes also demonstrated that phylogenetic clustering based on SNV analysis using a de novo assembled draft genome selected from within the group was similar to the phylogeny using a closely related closed genome [6]. Despite the increasing accessibility of WGS-based technologies and decreasing costs, the implementation of WGS for routine surveillance and outbreak investigation may still present many challenges as sequencing a genome to completion remains a costly and time-consuming endeavour and for many public-health laboratories, this is not a viable option. [20]. The results of our study indicates that draft genomes can be relied upon as a suitable reference choice during laboratory investigations of foodborne outbreaks using the cgSNVphyl pipeline. In fact, a recent study evaluated the quality score of 32000 genomes located in public repositories and concluded that most of these genomes were of sufficient quality to perform analysis on and only 10% of draft genomes were of poor quality and unsuitable for downstream analysis [21]. In conclusion, our results provide strong evidence that the choice of the reference genome does not impact the ability of the cgSNV methodology to distinguish between S. Heidelberg isolates involved in foodborne outbreaks. Our results also demonstrate that using a distantly related genome as reference could lead to a loss in resolution during cgSNV analysis. Although wgMLST was recently recommended as the primary subtyping tool moving forward by PNC and other foodborne surveillance networks [22], it is important to note that the cgSNV approach still remains an important method in the PNC molecular tool box. In fact, this method was recently used by PNC in collaboration with provincial public health laboratories to identify the source of a multi-provincial outbreak of S. Enteritidis [23]. Despite the advantages provided by cg/wgMLST approaches such as curability and standardization, the development and validation of schemas for each organism still remains a daunting task both financially and technically for public health laboratories operating with limited resources. The implementation of the cgSNV methodology described here could be a viable alternative for monitoring S. Heidelberg. Whether this method applies to other Salmonella serovars remains to be determined.
  20 in total

1.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

2.  T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks.

Authors:  Alix Boc; Alpha Boubacar Diallo; Vladimir Makarenkov
Journal:  Nucleic Acids Res       Date:  2012-06-06       Impact factor: 16.971

3.  Automated reconstruction of whole-genome phylogenies from short-sequence reads.

Authors:  Frederic Bertels; Olin K Silander; Mikhail Pachkov; Paul B Rainey; Erik van Nimwegen
Journal:  Mol Biol Evol       Date:  2014-03-05       Impact factor: 16.240

4.  Quality scores for 32,000 genomes.

Authors:  Miriam L Land; Doug Hyatt; Se-Ran Jun; Guruprasad H Kora; Loren J Hauser; Oksana Lukjancenko; David W Ussery
Journal:  Stand Genomic Sci       Date:  2014-12-08

5.  Complete Genome Sequences of 17 Canadian Isolates of Salmonella enterica subsp. enterica Serovar Heidelberg from Human, Animal, and Food Sources.

Authors:  Geneviève Labbé; Kim Ziebell; Sadjia Bekal; Kimberley A Macdonald; E Jane Parmley; Agnes Agunos; Andrea Desruisseau; Danielle Daignault; Durda Slavic; Linda Hoang; Danielle Ramsay; Frank Pollari; James Robertson; John H E Nash; Roger P Johnson
Journal:  Genome Announc       Date:  2016-09-15

6.  Multi-provincial Salmonellosis Outbreak Related to Newly Hatched Chicks and Poults: A Genomics Perspective.

Authors:  Matthew A Croxen; Kimberley A Macdonald; Matthew Walker; Nancy deWith; Erin Zabek; Christy Peterson; Aleisha Reimer; Linda Chui; Lorelee Tschetter; Linda Hoang; Robin K King
Journal:  PLoS Curr       Date:  2017-08-09

Review 7.  PulseNet International: Vision for the implementation of whole genome sequencing (WGS) for global food-borne disease surveillance.

Authors:  Celine Nadon; Ivo Van Walle; Peter Gerner-Smidt; Josefina Campos; Isabel Chinen; Jeniffer Concepcion-Acevedo; Brent Gilpin; Anthony M Smith; Kai Man Kam; Enrique Perez; Eija Trees; Kristy Kubota; Johanna Takkinen; Eva Møller Nielsen; Heather Carleton
Journal:  Euro Surveill       Date:  2017-06-08

8.  PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods.

Authors:  Alexandre P Francisco; Cátia Vaz; Pedro T Monteiro; José Melo-Cristino; Mário Ramirez; Joäo A Carriço
Journal:  BMC Bioinformatics       Date:  2012-05-08       Impact factor: 3.169

9.  PHASTER: a better, faster version of the PHAST phage search tool.

Authors:  David Arndt; Jason R Grant; Ana Marcu; Tanvir Sajed; Allison Pon; Yongjie Liang; David S Wishart
Journal:  Nucleic Acids Res       Date:  2016-05-03       Impact factor: 16.971

10.  SNVPhyl: a single nucleotide variant phylogenomics pipeline for microbial genomic epidemiology.

Authors:  Aaron Petkau; Philip Mabon; Cameron Sieffert; Natalie C Knox; Jennifer Cabral; Mariam Iskander; Mark Iskander; Kelly Weedmark; Rahat Zaheer; Lee S Katz; Celine Nadon; Aleisha Reimer; Eduardo Taboada; Robert G Beiko; William Hsiao; Fiona Brinkman; Morag Graham; Gary Van Domselaar
Journal:  Microb Genom       Date:  2017-06-08
View more
  4 in total

1.  A Bioinformatic Pipeline for Improved Genome Analysis and Clustering of Isolates during Outbreaks of Legionnaires' Disease.

Authors:  Wolfgang Haas; Pascal Lapierre; Kimberlee A Musser
Journal:  J Clin Microbiol       Date:  2021-01-21       Impact factor: 5.948

2.  One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads.

Authors:  Carlos Valiente-Mullor; Beatriz Beamud; Iván Ansari; Carlos Francés-Cuesta; Neris García-González; Lorena Mejía; Paula Ruiz-Hueso; Fernando González-Candelas
Journal:  PLoS Comput Biol       Date:  2021-01-27       Impact factor: 4.475

Review 3.  Ceftiofur-resistant Salmonella enterica serovar Heidelberg of poultry origin - a risk profile using the Codex framework.

Authors:  Carolee Carson; Xian-Zhi Li; Agnes Agunos; Daleen Loest; Brennan Chapman; Rita Finley; Manisha Mehrotra; Lauren M Sherk; Réjean Gaumond; Rebecca Irwin
Journal:  Epidemiol Infect       Date:  2019-11-04       Impact factor: 2.451

4.  Cereulide Synthetase Acquisition and Loss Events within the Evolutionary History of Group III Bacillus cereus Sensu Lato Facilitate the Transition between Emetic and Diarrheal Foodborne Pathogens.

Authors:  Laura M Carroll; Martin Wiedmann
Journal:  mBio       Date:  2020-08-25       Impact factor: 7.867

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.