Daniel A Yara1, David R Greig2,3, David L Gally2, Timothy J Dallman2,3, Claire Jenkins3. 1. Norwich Medical School, University of East Anglia, Norwich, UK. 2. Division of Infection and Immunity, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush EH25 9RG, UK. 3. National Infection Service, Public Health England, London NW9 5EQ, UK.
Abstract
Over the last 35 years in the UK, the burden of Shiga toxin-producing Escherichia coli (STEC) O157:H7 infection has, during different periods of time, been associated with five different sub-lineages (1983-1995, Ia, I/IIa and I/IIb; 1996-2014, Ic; and 2015-2018, IIb). The acquisition of a stx2a-encoding bacteriophage by these five sub-lineages appears to have coincided with their respective emergences. The Oxford Nanopore Technologies (ONT) system was used to sequence, characterize and compare the stx-encoding prophages harboured by each sub-lineage to investigate the integration of this key virulence factor. The stx2a-encoding prophages from each of the lineages causing clinical disease in the UK were all different, including the two UK sub-lineages (Ia and I/IIa) circulating concurrently and causing severe disease in the early 1980s. Comparisons between the stx2a-encoding prophage in sub-lineages I/IIb and IIb revealed similarity to the prophage commonly found to encode stx2c, and the same site of bacteriophage integration (sbcB) as stx2c-encoding prophage. These data suggest independent acquisition of previously unobserved stx2a-encoding phage is more likely to have contributed to the emergence of STEC O157:H7 sub-lineages in the UK than intra-UK lineage to lineage phage transmission. In contrast, the stx2c-encoding prophage showed a high level of similarity across lineages and time, consistent with the model of stx2c being present in the common ancestor to extant STEC O157:H7 and maintained by vertical inheritance in the majority of the population. Studying the nature of the stx-encoding bacteriophage contributes to our understanding of the emergence of highly pathogenic strains of STEC O157:H7.
Over the last 35 years in the UK, the burden of Shiga toxin-producing Escherichia coli (STEC) O157:H7infection has, during different periods of time, been associated with five different sub-lineages (1983-1995, Ia, I/IIa and I/IIb; 1996-2014, Ic; and 2015-2018, IIb). The acquisition of a stx2a-encoding bacteriophage by these five sub-lineages appears to have coincided with their respective emergences. The Oxford Nanopore Technologies (ONT) system was used to sequence, characterize and compare the stx-encoding prophages harboured by each sub-lineage to investigate the integration of this key virulence factor. The stx2a-encoding prophages from each of the lineages causing clinical disease in the UK were all different, including the two UK sub-lineages (Ia and I/IIa) circulating concurrently and causing severe disease in the early 1980s. Comparisons between the stx2a-encoding prophage in sub-lineages I/IIb and IIb revealed similarity to the prophage commonly found to encode stx2c, and the same site of bacteriophage integration (sbcB) as stx2c-encoding prophage. These data suggest independent acquisition of previously unobserved stx2a-encoding phage is more likely to have contributed to the emergence of STEC O157:H7 sub-lineages in the UK than intra-UK lineage to lineage phage transmission. In contrast, the stx2c-encoding prophage showed a high level of similarity across lineages and time, consistent with the model of stx2c being present in the common ancestor to extant STEC O157:H7 and maintained by vertical inheritance in the majority of the population. Studying the nature of the stx-encoding bacteriophage contributes to our understanding of the emergence of highly pathogenic strains of STEC O157:H7.
All fastq files and assemblies of samples sequenced in this project have been submitted to the National Center for Biotechnology Information (NCBI) under BioProject accession number PRJNA315192 – https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA315192. Strain specific details can be found in Methods under 'Data deposition'. Publicly available data used in this project can be found via Table 1 and in Data Bibliography.
Table 1.
Summary of the PHE archived and publicly available strains used within this study, with their strain ID, lineage, PT, Stx profile, chromosome size (bp), number of prophages in the chromosome, stx-encoding prophages with SBI and size (bp), assembly accession numbers and NCBI BioProject accession numbers
Strain ID
Lineage
Phage type
Stx profile
Chromosome size (bp)
No. of prophages
stx1a prophage SBI and size (bp)
stx2a prophage SBI and size (bp)
stx2c prophage SBI and size (bp)
Reference
BioProject accession no.
Assembly accession no.
PHE archive
E30228
Ia
PT4
Stx1a/2a
5 416 109
15
yehV (47 594)
wrbA (62 890)
–
[41]
PRJNA315192
VXJO00000000
E34500
I/IIa
PT2
Stx2a/2c
5 359 964
14
–
argW (62 149)
sbcB (57 463)
[2]
PRJNA315192
VXJN00000000
E45000
I/IIb
PT49
Stx2a
5 386 698
17
–
sbcB (44 014)
–
This study
PRJNA315192
VXJM00000000
E116508
Ic
PT21/28
Stx2a/2c
5 571 891
17
–
argW (71 870)
sbcB (59 105)
This study
PRJNA315192
VXJP00000000
315176
IIb
PT8
Stx2a
5 579 120
16
–
sbcB (61 851)
–
[10]
PRJNA315192
VXJQ00000000
267849
IIa
PT34
Stx2a/2c
5 510 912
16
–
yecE (47 242)
sbcB (61 840)
[43]
PRJNA315192
VXJR00000000
Publicly available
9000
Ic
PT21/28
Stx2a/2c
5 516 497
17
–
argW (65 158)
sbcB (57 408)
[33]
PRJNA336330
CP018252
397404
Ic
PT21/28
Stx2a/2c
5 618 435
13
–
argW (70 472)
sbcB (59 098)
PRJNA315192
CP043019
155
Ic
PT32
Stx2a
5 513 008
18
–
yecE (50 015)
–
[33]
PRJNA336330
CP018237
350
IIc
PT8
Stx1a/2c
5 411 823
16
yehV (49 867)
–
sbcB (57 747)
[11]
PRJNA336330
CP018243
272
I/IIa
PT2
Stx2a
5 474 193
16
–
argW (65 675)
–
[13]
PRJNA336330
CP018239
644
IIc
PT8
Stx1a/1a/2c
5 831 209
18
yehV (49 544) argW (64 569)
–
sbcB (58 210)
[12]
PRJNA321984
CP015831
180
IIc
PT54
Stx1a/1a/2c
5 509 528
15
yehV (49 544) argW (64 569)
–
sbcB (61 558)
[12]
PRJNA321984
CP015832
Sakai
Ia
–
Stx1a/2a
5 498 450
18
yehV (47 650)
wrbA (62 142)
–
[16]
PRJNA57781
NC_002695
EDL933
Ia
–
Stx1a/2a
5 547 323
14
yehV (47 596)
wrbA (61 066)
–
[14]
PRJNA253471
CP008957
EC4115
I/IIa
–
Stx2a/2c
5 572 075
17
–
argW (71 540)
sbcB (60 476)
[15]
PRJNA224116
NC_011353
TW14359
I/IIa
–
Stx2a/2c
5 528 136
17
–
argW (71 540)
sbcB (60 476)
[15]
PRJNA224116
NC_013008
Summary of the PHE archived and publicly available strains used within this study, with their strain ID, lineage, PT, Stx profile, chromosome size (bp), number of prophages in the chromosome, stx-encoding prophages with SBI and size (bp), assembly accession numbers and NCBI BioProject accession numbersStrain IDLineagePhage typeStx profileChromosome size (bp)No. of prophagesstx1a prophage SBI and size (bp)stx2a prophage SBI and size (bp)stx2c prophage SBI and size (bp)ReferenceBioProject accession no.Assembly accession no.PHE archiveE30228IaPT4Stx1a/2a5 416 10915yehV (47 594)wrbA (62 890)[41]PRJNA315192VXJO00000000E34500I/IIaPT2Stx2a/2c5 359 96414argW (62 149)sbcB (57 463)[2]PRJNA315192VXJN00000000E45000I/IIbPT49Stx2a5 386 69817sbcB (44 014)This studyPRJNA315192VXJM00000000E116508IcPT21/28Stx2a/2c5 571 89117argW (71 870)sbcB (59 105)This studyPRJNA315192VXJP00000000315176IIbPT8Stx2a5 579 12016sbcB (61 851)[10]PRJNA315192VXJQ00000000267849IIaPT34Stx2a/2c5 510 91216yecE (47 242)sbcB (61 840)[43]PRJNA315192VXJR00000000Publicly available9000IcPT21/28Stx2a/2c5 516 49717argW (65 158)sbcB (57 408)[33]PRJNA336330CP018252397404IcPT21/28Stx2a/2c5 618 43513argW (70 472)sbcB (59 098)PRJNA315192CP043019155IcPT32Stx2a5 513 00818yecE (50 015)[33]PRJNA336330CP018237350IIcPT8Stx1a/2c5 411 82316yehV (49 867)sbcB (57 747)[11]PRJNA336330CP018243272I/IIaPT2Stx2a5 474 19316argW (65 675)[13]PRJNA336330CP018239644IIcPT8Stx1a/1a/2c5 831 20918yehV (49 544) argW (64 569)sbcB (58 210)[12]PRJNA321984CP015831180IIcPT54Stx1a/1a/2c5 509 52815yehV (49 544) argW (64 569)sbcB (61 558)[12]PRJNA321984CP015832SakaiIa–Stx1a/2a5 498 45018yehV (47 650)wrbA (62 142)[16]PRJNA57781NC_002695EDL933Ia–Stx1a/2a5 547 32314yehV (47 596)wrbA (61 066)[14]PRJNA253471CP008957EC4115I/IIa–Stx2a/2c5 572 07517argW (71 540)sbcB (60 476)[15]PRJNA224116NC_011353TW14359I/IIa–Stx2a/2c5 528 13617argW (71 540)sbcB (60 476)[15]PRJNA224116NC_013008The application of the Oxford Nanopore Technologies system to sequence UK epidemic strains of Shiga toxin-producing (STEC) O157:H7 revealed stx2a-encoding prophages exhibit a high level of diversity. There was little evidence of geographical or temporal patterns of relatedness, or of intra-UK transmission of stx2a-encoding prophage between indigenous strains. The stx2a-encoding prophages in the UK lineages associated with severe disease appear to be acquired independently and most likely from different geographical and/or environmental sources. These data provide supporting evidence for the existence of a dynamic environmental reservoir of stx2a-encoding prophages that pose a threat to public health due to their potential for integration into competent, indigenous sub-lineages of STEC O157:H7. We also provide further evidence that stx2c-encoding prophages exhibit a high level of similarity across lineages, geographical regions and time, and have likely been maintained and inherited vertically.
Introduction
Shiga toxin-producing (STEC) serotype O157:H7 is a zoonotic pathogen that causes gastrointestinal symptoms in humans. A sub-set of patients (mainly children and the elderly) are at risk of developing haemolytic uraemic syndrome (HUS), a potentially fatal systemic condition primarily associated with acute renal failure, and cardiac and neurological complications [1]. STEC O157:H7 emerged as a public-health concern during the early 1980s and was first isolated in the UK in July 1983 from three cases linked to an outbreak of HUS [2]. Throughout the 1980s, the increasing number of outbreaks of gastrointestinal disease, and HUS associated with this serotype, stimulated the development of sub-typing methods that provided a higher level of strain discrimination than serotyping. In the late 1980s, a phage typing scheme, developed by the Canadian Public Health Laboratory Service, was adopted by Public Health England (PHE; then the Public Health Laboratory Service) [3], and is still used today. In 2015, PHE implemented whole-genome sequencing for routine surveillance of STEC O157:H7 in England [4].The primary STEC virulence factor is the Shiga toxin (Stx), which targets cells expressing the glycolipidglobotriaosylceramide, disrupting host protein synthesis and causing apoptotic cell death. Strains of STEC O157:H7 in the UK produce stx1a, stx2a and stx2c, either individually or in any combination [5]. Strains harbouring stx2a, either alone or in combination with stx1a and/or stx2c, are significantly associated with causing severe disease, including HUS [5, 6], and are associated with more efficient transmission within the ruminant reservoir [7]. The genes encoding the stx subtypes are located on active bacteriophage that can be acquired and integrated into the chromosome of STEC O157:H7 strains. There is evidence that the different prophage backgrounds that harbour stx genes can contribute to differential toxin production and may ultimately affect clinical outcome [8].There are three main lineages of STEC O157:H7 (I, II and I/II) and eight sub-lineages (Ia, Ib, Ic, IIa, IIb, IIc, I/IIa and I/IIb). In the UK, the outbreaks of STEC O157:H7 in the 1980s were caused by strains belonging to sub-lineage Ia [mainly comprising phage type (PT)1 and PT4], sub-lineage I/IIa (comprising PT2) and sub-lineage I/IIb (comprising PT49) [9]. Throughout the 1990s, these three lineages declined and almost disappeared. Concurrently, we observed a dramatic rise of sub-lineage Ic (mainly comprising PT21/28), in addition to a steady increase in the number of cases of sub-lineage IIc (mainly comprising PT8) [5, 9]. Since 2012, the number of cases of PT21/28 has declined and an unusual PT8 variant belonging to sub-lineage IIb has emerged [10].With the exception of sub-lineage IIc (PT8), which is not associated with HUS cases in the UK [5], all the dominant UK sub-lineages over time encode stx2a, and the acquisition of a stx2a-encoding bacteriophage appears to have coincided with their respective emergences [5, 10]. The aim of this investigation was to use the Oxford Nanopore Technologies system to sequence, characterize and compare the stx-encoding prophage harboured by each of the UK sub-lineages to determine the similarity of the stx-encoding prophage acquired by each lineage. Studying the nature of the stx-encoding bacteriophage will contribute to our understanding of the emergence of highly pathogenic strains of STEC O157:H7.
Methods
Bacterial strains
Six strains of STEC O157:H7 were selected for sequencing from the PHE archive on the basis of being the earliest representative of each of the sub-lineages that acquired the stx2a-encoding prophage (Table 1, Fig. 1). Eleven publicly available sequences were also included in the analysis for context. Of these, seven originated from the UK, five were the cause of four published outbreaks [11-13], three were from the USA [14, 15] and one was from Japan [16] (Table 1, Fig. 1).
Fig. 1.
Maximum-likelihood phylogenetic tree of 105 genomes including the 17 (labelled) publicly available genomes and nanopore sequenced genomes produced during this study. Sub-lineages are coloured as follows: I/II, grey; Ia, yellow; Ib, orange; Ic, red; IIa, purple; IIb, blue; IIc, green. Scale bar indicates kbp.
Maximum-likelihood phylogenetic tree of 105 genomes including the 17 (labelled) publicly available genomes and nanopore sequenced genomes produced during this study. Sub-lineages are coloured as follows: I/II, grey; Ia, yellow; Ib, orange; Ic, red; IIa, purple; IIb, blue; IIc, green. Scale bar indicates kbp.
Short-read sequencing on the Illumina HiSeq 2500
Genomic DNA was extracted from cultures of STEC O157:H7 using the QIAsymphony system (Qiagen). The sequencing library was prepared using the Nextera XP kit (Illumina) for sequencing on the HiSeq 2500 instrument (Illumina), run with the fast protocol. fastq reads were processed using Trimmomatic v0.27 [17] to remove bases with a PHRED score of <30 from the leading and trailing ends, with reads <50 bp after quality trimming discarded.
Long-read sequencing and data processing
Genomic DNA was extracted and purified using the Qiagen genomic tip, midi 100/G, with minor alterations including no vigorous mixing steps (mixing performed by inversion instead) and elution into 100 µl double processed nuclease-free water (Sigma-Aldrich). Genomic DNA for each extract was quantified using a Qubit and the HS (high sensitivity) dsDNA assay kit (Thermofisher Scientific), following the manufacturer’s instructions. Library preparation was performed for several instances using both rapid barcoding [SQK-RBK00(1/4)] and native barcoding kits (SQK-LSK108 and EXP-NBD103) (Oxford Nanopore Technologies). The prepared libraries were loaded onto FLO-MIN106 R9.4.1 flow cells (Oxford Nanopore Technologies) and sequenced using the MinION (Oxford Nanopore Technologies) for 48 h.Data produced in a raw fast5 format was basecalled and de-multiplexed using Albacore v2.3.3 (Oxford Nanopore Technologies) into fastq format and grouped in each samples’ respective barcode. De-multiplexing was performed using Deepbinner v0.2.0 [18]. Run metrics were generated using Nanoplot v1.8.1 [19]. The barcode and y-adapter from each sample’s reads were trimmed, and chimeric reads split using Porechop v0.2.4 [20]. Finally, the trimmed reads were filtered using Filtlong v0.1.1 [21] with the following parameters, min length=1000 bp, keep per cent=90 and target bases=550 Mbp, to generate approximately 100× coverage of the STEC genome with the longest and highest-quality reads.
De novo assembly, polishing, reorientation and annotation
Trimmed nanopore fastq files were assembled using Canu v1.7 [22] and the filtered nanopore fastq files were assembled using both Unicycler v0.4.2 [23], with the following parameters min_fasta_length=1000 bp, mode=normal, and Flye v2.4.2 [24], using default parameters. The assembly for each sample that had the highest N50 and lowest number of contigs with the assembly size (between 5.3–6.0 Mbp) were taken forward. Polishing of the assemblies was performed in a three-step process. Firstly, polishing was initiated using Nanopolish v0.11.1 [25] using both the trimmed nanopore fastqs and fast5s for each respective sample accounting for methylation using the --methylation-aware=dcm and --min-candidate-frequency=0.5. Secondly, the polishing was continued with Pilon v1.22 [26] using Illumina fastq reads as the query dataset with the use of bwa v0.7.17 [27] and Samtools v1.7 [28]. Finally, Racon v1.2.1 [29] also using bwa v0.7.17 [27] and Samtools v1.7 [28] was used with the Illumina reads for two cycles to produce a final assembly for each of the samples. As the chromosome from each assembly was circularized and closed, they were re-orientated to start at the dnaA gene (GenBank accession no. NC_000913) from K12, using the --fixstart parameter in Circlator v1.5.5 [30]. Prokka v1.13 [31] with the use of a personalized database (an amino acid fasta that included all genes annotated in the publicly available samples used in this study) was used to annotate the final assemblies.
Prophage detection, excision and processing
Prophages across all samples were detected and extracted using the updated Phage Search Tool (phaster) [32]. Prophage extraction from the genome occurred regardless of prophage size or phaster quality score, and any detected prophages separated by less than 4 kbp were conjoined into a single phage using Propi v0.9.0, as described elsewhere [33]. From here, the prophages were trimmed to remove any non-prophage genes and were again annotated using Prokka v 1.13 [31] with the use of a personalized database (an amino acid fasta that included all genes annotated in the publicly available samples used in this study).
Mash and Stx-encoding prophage phylogeny
Mash v2.2 [34] was used to sketch (sketch length 1000 bp, kmer length 21) the extracted prophages in the samples sequenced in this study and all Stx-encoding prophages found in the publicly available STEC genomes in Table 1. The pairwise Jaccard distance between the prophages was calculated and a neighbour-joining tree computed and visualized using FigTree v1.4.4 [35].
Visualization tools and phylogenetic context
To provide context for the 17 nanopore sequenced (n=6) and publicly available (n=11) samples, a maximum-likelihood tree was recreated combining these 17 genomes with 88 genomes (105 in total) from PHE’s STEC collection (clonal complex 11). Each of the 88 selected genomes is from a unique 250 single-linkage hierarchical cluster that was calculated using SnapperDB v0.2.6 [36]. SnapperDB was also used to generate a whole-genome alignment of all 105 genomes that was processed though Gubbins v2.00 [37] to identify any recombinant sequences. The tree was recreated by RAxML v8.2.8 [38]. Phylogenetic trees were visualized and annotated using FigTree v1.4.4 [35]. All gene diagrams were constructed using Easyfig v2.2.3 [39].
Data deposition
Illumina fastq files are available from National Center for Biotechnology Information (NCBI) BioProject PRJNA315192 under the following SRA (sequence read archive) accession numbers: E30228, SRR10290290; E34500, SRR10290289; E45000, SRR10290288; E116508, SRS941727; 315176, SRR6051955; and 267849, SRR3742262. Nanopore fastq files are available from BioProject PRJNA315192 under the following SRA accession numbers: E30228, SRR10103064; E34500, SRR10103063; E45000, SRR10103062; E116508, SRR10103065; 315176, SRR10103066; and 267849, SRR10103067. Assemblies can be found under BioProject PRJNA315192 under the following accession numbers: E30228, VXJO00000000; E34500, VXJN00000000; E45000, VXJM00000000; E116508, VXJP00000000; 315176, VXJQ00000000; and 267849, VXJR00000000.
Results and Discussion
Genomic features of the samples sequenced in this study
All six isolates, selected for sequencing from the PHE archive on the basis of being the earliest representative of each of the sub-lineages that acquired the stx2a-encoding prophage, assembled into closed chromosomes with one or more plasmids. The isolates belonging to sub-lineage Ia PT4 (E30228) and sub-lineage IIb PT8 (315176) each assembled into a chromosome (5 416 109 and 5 579 120 bp, respectively) and two plasmids (Table 1). The sequence data from the other four isolates each assembled into a chromosome of between 5 359 964 and 5 571 891 bp and a single plasmid (Table 1). The pO157 (IncFIB) plasmid was found in all samples sequenced in this study. The number of prophages in each of the genomes of the six isolates varied from 14 in the isolate belonging to sub-lineage I/IIa PT2 to 17 from the isolates belonging to sub-lineages I/IIb PT49 and Ic PT21/28 (Fig. 2).
Fig. 2.
Easyfig diagram representing the chromosome and prophage content within the samples sequenced in this study (in descending order PT4, PT2, PT49, PT21/28, PT8 and PT34). stx2a-encoding, stx2c-encoding and stx1-encoding prophages are highlighted in red, orange and yellow, respectively. Non-stx-encoding prophages are coloured black. Prophage-like elements are coloured blue and the locus of enterocyte effacement is shown in green.
Easyfig diagram representing the chromosome and prophage content within the samples sequenced in this study (in descending order PT4, PT2, PT49, PT21/28, PT8 and PT34). stx2a-encoding, stx2c-encoding and stx1-encoding prophages are highlighted in red, orange and yellow, respectively. Non-stx-encoding prophages are coloured black. Prophage-like elements are coloured blue and the locus of enterocyte effacement is shown in green.
Comparison of the stx1a-encoding prophage
Six of the isolates analysed in this study contained a prophage encoding stx1a (Table 1, Figs 3 and 4). The stx1a-encoding prophage from the isolate belonging to sub-lineage Ia PT4 (E30228), among the first to be isolated in the UK in 1983, shared similarity with stx1a-encoding prophage found in EDL933 and Sakai, two international outbreak strains that also belong to sub-lineage Ia (Table 1, Figs 3 and 4). EDL933 caused an outbreak in the USA in 1982 linked to contaminated hamburgers [14], and was temporally but not geographically linked to the UK isolate. The outbreak in Sakai City, Japan, associated with contaminated radish sprouts, occurred in 1996 [16], and was both temporally and geographically distinct from EDL933 and E30228 (Figs 3 and 4). Previous analysis of isolates of sub-lineage Ia harbouring stx1a-encoding prophage indicate the stx1a prophage is likely ancestral and inherited vertically [5]. This is consistent with the strains analysed in this study encoding a similar stx1a prophage, despite being isolated at different times and geographical locations.Mid-rooted tree of stx-encoding prophages based on Jaccard distance produced from Mash. Strains are annotated with strain ID, length (bp), stx profile and SBI. Strains sequenced during this study have prophages that are hown preceded by an *. Strains are coloured by sub-lineage: green, Ia; red, Ic; blue, I/IIa; grey, I/IIb; orange, IIa; black, IIb; purple, IIc. Scale bar indicates Jaccard distance.Two Easyfig plots comparing the stx2a-encoding prophages from E45000 with E116508 (above) and 155 and 267849 (below), in descending order. Arrows indicate gene directions. stx genes are shown in red; recombination/replication genes shown in light blue; regulation-associated genes are shown in dark blue; effector genes are shown in pink; structure- and lysis-associated genes are shown in light and dark green, respectively; tRNAs are shown as purple lines; finally, hypothetical genes are shown in grey.Easyfig plot comparing the stx1a-encoding prophages from 644 (×2), 180 (×2), 350, E30288, EDL933 and Sakai. Arrows indicate gene directions. stx genes are shown in red; recombination/replication genes are shown in light blue; regulation-associated genes are shown in dark blue; effector genes are shown in pink; structure- and lysis-associated genes are shown in light and dark green, respectively; tRNAs are shown as purple lines; finally, hypothetical genes are shown in grey.The stx1a-encoding prophages from three isolates belonging to sub-lineage IIc associated with foodborne outbreaks in the UK [11, 12] cluster together based on Mash distance, but were distinct from the stx1a-encoding prophages harboured by the sub-lineage Ia strains described above. As previously described [33, 40], two of these strains (664 PT8 and 180 PT54), linked to a foodborne outbreak in Northern Ireland in 2013 [12], had an additional but different stx1a-encoding prophage within the same chromosome (Fig. 4). Therefore, three different stx1a-encoding prophages, in two different lineages (Ia and IIc), were identified in this study (Figs 3 and 4).
Fig. 4.
Easyfig plot comparing the stx1a-encoding prophages from 644 (×2), 180 (×2), 350, E30288, EDL933 and Sakai. Arrows indicate gene directions. stx genes are shown in red; recombination/replication genes are shown in light blue; regulation-associated genes are shown in dark blue; effector genes are shown in pink; structure- and lysis-associated genes are shown in light and dark green, respectively; tRNAs are shown as purple lines; finally, hypothetical genes are shown in grey.
Comparison of stx2c-encoding prophage
Nine isolates from four different sub-lineages (Ic, I/IIa, IIa and IIc) contained stx2c-encoding prophage. The stx2c-encoding prophage from each of the isolates clustered together based on Mash distance and also aligned across the length of the prophage with few structural variations (Fig. 5). The stx2c prophage from strains within the same sub-lineage were more similar based on Mash distance than stx2c prophage in strains from different lineages (Table 1, Figs 3 and 5). These strains were isolated over a wide time frame from 1983 to 2016, and in different countries including the UK, Ireland and the USA, providing further evidence that stx2c-encoding prophages show a high level of similarity across lineages, time and geographical regions [33] (Table 1). This is consistent with the model of stx2c being present in the common ancestor to extant STEC O157:H7 and maintained by vertical inheritance in the majority of the population.
Fig. 5.
Easyfig plot comparing the stx2c-encoding prophages from all samples in the study, including two stx2a prophages that are in a stx2c-associated prophage structure (315126 and E45000). Arrows indicate gene directions. stx genes are shown in red; recombination/replication genes are shown in light blue; regulation-associated genes are shown in dark blue; effector genes are shown in pink; structure- and lysis-associated genes are shown in light and dark green, respectively; tRNAs are shown as purple lines; finally, hypothetical genes are shown in grey.
Easyfig plot comparing the stx2c-encoding prophages from all samples in the study, including two stx2a prophages that are in a stx2c-associated prophage structure (315126 and E45000). Arrows indicate gene directions. stx genes are shown in red; recombination/replication genes are shown in light blue; regulation-associated genes are shown in dark blue; effector genes are shown in pink; structure- and lysis-associated genes are shown in light and dark green, respectively; tRNAs are shown as purple lines; finally, hypothetical genes are shown in grey.
Comparison of stx2a-encoding prophage
Certain strains that shared lineage, PT and geography harboured similar stx2a-encoding prophages. Examples included (i) the two sub-lineage Ic PT21/28 isolates from the UK, (ii) the two sub-lineage I/IIa PT2 isolates from the UK and (iii) the two isolates from sub-lineage I/IIa from the USA (Table 1, Figs 3 and 6). Isolates designated E30228 and EDL933, both sub-lineage Ia and temporally related but geographically distinct, also encoded similar stx2a-encoding prophage (Table 1, Figs 3 and 6), as did isolates 155 (sub-lineage Ic PT32) and 267849 (sub-lineage IIa PT34), which were unrelated temporally and geographically.Compared to stx2c prophage, however, the stx2a-encoding prophage found in 11 of the isolates in this study exhibited a greater diversity both based on Mash distance and whole-prophage alignment. The stx2a-encoding prophage from each of the lineages causing severe clinical disease in the UK were all distinct, including the two UK sub-lineages (Ia and I/IIa) circulating concurrently and causing outbreaks of HUS in the early 1980s [2, 41] (Fig. 3). Throughout the 1980s, the number of sub-lineage Ia strains (mainly PT1 and PT4) declined and a new sub-lineage, I/IIb PT49, emerged. The stx2a in the emerging sub-lineage I/IIb PT49 strain was encoded on a bacteriophage that was again distinct from either of the two stx2a-encoding prophages found in the representative isolates from the early contemporary sub-lineages Ia and I/IIa. Comparisons between the stx2a-encoding prophages in sub-lineage I/IIb revealed similarity to the prophages commonly found to encode stx2c (Figs 3 and 5). Furthermore, sub-lineage I/IIb stx2a-encoding prophages had the same site of bacteriophage integration (SBI) as sub-lineage I/IIa stx2c-encoding prophages, specifically the sbcB gene.
Fig. 3.
Mid-rooted tree of stx-encoding prophages based on Jaccard distance produced from Mash. Strains are annotated with strain ID, length (bp), stx profile and SBI. Strains sequenced during this study have prophages that are hown preceded by an *. Strains are coloured by sub-lineage: green, Ia; red, Ic; blue, I/IIa; grey, I/IIb; orange, IIa; black, IIb; purple, IIc. Scale bar indicates Jaccard distance.
During the 1990s, all three of the dominant 1980s sub-lineages (Ia, I/IIa and I/IIb) declined as a cause of humangastrointestinal disease, and a new sub-lineage emerged. STEC O157:H7stx2c PT32 belonging to sub-lineage Ic had been circulating in UK and Irish cattle populations for many decades, but had not been linked to cases of human disease [5]. However, following acquisition of a stx2a-encoding prophage (into the SBI argW), which resulted in a change in PT to PT21/28 [5, 33], sub-lineage Ic became the most common STEC O157:H7 sub-lineage causing gastrointestinal disease and HUS in humans in the UK for the next two decades. The stx2c-encoding prophage in lineage Ic had high sequence similarity to stx2c-encoding prophages in the other isolates analysed in this study and shared the same SBI, sbcB (Table 1, Figs 2 and 5). However, the stx2a-encoding prophage acquired by sub-lineage Ic once again differed from those found in the three sub-lineages circulating in the previous decade (Table 1, Figs 2 and 6).Recently, in the UK, there has been a decrease in the number of cases caused by STEC O157:H7 belonging to sub-lineage Ic, and an emergence of sub-lineage IIb PT8 that appears to be associated with the acquisition of a prophage encoding stx2a [9]. Strains belonging to this sub-lineage have caused foodborne outbreaks linked to contaminated mixed-leaf salad, lamb-based meat products including sausages and mince [42], and an environmental exposure linked to participation in a mud-based obstacle event [42]. Like the stx2a-encoding prophage described in sub-lineage I/IIb, the stx2a-encoding prophage in sub-lineage IIb was similar to the stx2c-encoding prophage, and likely the result of horizontal exchange of the stx2a gene into a previously stx2c-encoding prophage. This is also corroborated by the stx2a-encoding prophage in sub-lineage IIb integrating at sbcB associated with stx2c-encoding prophages (Table 1, Figs 3, 5 and 6).Importation of STEC O157:H7 strains from outside the UK via contaminated food products is a constant threat. In 2016, a large national outbreak of STEC O157:H7stx2a/stx2c PT34 belonging to sub-lineage IIa occurred in the UK [43]. Epidemiological investigations concluded that contaminated red Batavia salad leaves from a non-domestic source was the most plausible vehicle of infection. Analysis of the nanopore data from the outbreak strain demonstrated that the stx2a-encoding prophage was different from all the stx2a-encoding prophages identified in the five major UK sub-lineages. However, this prophage shared sequence similarity with the stx2a-encoding prophage in STEC O157:H7 PT32 belonging to sub-lineage Ic, associated with cases of severe gastrointestinal disease in Ireland [5] (Fig. 6). Unlike the previously described stx2a-encoding prophage, the stx2a-encoding prophage in both of these strains share the SBI yecE. This prophage also had similarity to the stx2a-encoding prophage found in a strain of STEC O55:H7 causing recurrent, seasonal outbreaks of HUS in England [40].
Fig. 6.
Two Easyfig plots comparing the stx2a-encoding prophages from E45000 with E116508 (above) and 155 and 267849 (below), in descending order. Arrows indicate gene directions. stx genes are shown in red; recombination/replication genes shown in light blue; regulation-associated genes are shown in dark blue; effector genes are shown in pink; structure- and lysis-associated genes are shown in light and dark green, respectively; tRNAs are shown as purple lines; finally, hypothetical genes are shown in grey.
Summary
Currently, the application of nanopore technology for extensive characterization of STEC O157:H7 genomes at PHE is still under development; therefore, the number of sequences analysed in this study was limited. stx2a-encoding prophages exhibited a higher level of diversity and there was little evidence of geographical or temporal patterns of relatedness, or of intra-UK transmission of stx2a-encoding prophage between indigenous strains. The stx2a-encoding prophages in the UK lineages associated with severe disease, therefore, appear to be acquired independently and most likely from different geographical and/or environmental sources. These data provide supporting evidence for the existence of a dynamic environmental reservoir of stx2a-encoding prophages that pose a threat to public health due to their potential for integration into competent, indigenous sub-lineages of O157:H7. Finally, we provide further evidence that, compared to stx2a-encoding prophages, stx2c-encoding prophages exhibit a high level of similarity across lineages, geographical regions and time, and have likely been maintained and inherited vertically.
Data Bibliography
1. Yara DA, Greig DR, Gharbia SE, Gally DL, Dallman TJ, Jenkins C. BioProject: PRJNA315192, VXJO00000000–VXJR00000000 (2019).2. Greig DR, Jenkins C, Gally DL, Gharbia SE, Dallman TJ. BioProject: PRJNA315192–CP043019 (2019).3. Cowley LA, Dallman TJ, Fitzgerald S, Irvine N, Rooney PJ, McAteer SP, Day M, Perry NT, Bono JL, Jenkins C, Gally DL. GenBank accession numbers CP015831 and CP015832 (2016).4. Dallman TJ, Byrne L, Ashton PM, Cowley LA, Perry NT, Adak G, Petrovska L, Ellis RJ, Elson R, Underwood A, Green J, Hanage WP, Jenkins C, Grant K, Wain J. BioProject PRJNA248042 (2015).5. Eppinger M, Sebastian Y, Ravel J. NCBI accession number NC_011353 (2016).6. Latif H, Aziz RK, Charusanti P, Palsson BO. GenBank accession number CP008957 (2014).7. Makino K, Yokoyama K, Kubota Y, Yutsudo CH, Kimura S, Kurokawa K, Ishii K, Hattori M, Tatsuno I, Abe H, Iida T, Yamamoto K, Onishi M, Hayashi T, Yasunaga T, Honda T, Sasakawa C, Shinagawa H. NCBI accession number NC_002695 (2016).8. Brittnacher M, Jacobs M, Zhou Y, Chang J, Fong C, Gillett W, Haugen E, Hayden H, Kulasekara B, Larson Freeman T, Radey M, Rohmer L, Sims E, Wu Z, Whittam T, Kaul R, Olson MV, Miller SI. NCBI accession number NC_013008 (2016).9. Shaaban S, Cowley L, McAteer SP, Jenkins C, Dallman TJ, Bono JL, Gally DL. GenBank accession numbers CP018252 and CP018237 (2016).10. Launders N, Locking ME, Hanson M, Willshaw G, Charlett A, Salmon R, Cowden J, Harker KS, Adak GK. GenBank accession numbers CP018243 and CP018239 (2016).
Authors: Timothy J Dallman; Lisa Byrne; Philip M Ashton; Lauren A Cowley; Neil T Perry; Goutam Adak; Liljana Petrovska; Richard J Ellis; Richard Elson; Anthony Underwood; Jonathan Green; William P Hanage; Claire Jenkins; Kathie Grant; John Wain Journal: Clin Infect Dis Date: 2015-04-17 Impact factor: 9.079
Authors: L W Riley; R S Remis; S D Helgerson; H B McGee; J G Wells; B R Davis; R J Hebert; E S Olcott; L M Johnson; N T Hargrett; P A Blake; M L Cohen Journal: N Engl J Med Date: 1983-03-24 Impact factor: 91.245
Authors: Sergey Koren; Brian P Walenz; Konstantin Berlin; Jason R Miller; Nicholas H Bergman; Adam M Phillippy Journal: Genome Res Date: 2017-03-15 Impact factor: 9.043
Authors: David Arndt; Jason R Grant; Ana Marcu; Tanvir Sajed; Allison Pon; Yongjie Liang; David S Wishart Journal: Nucleic Acids Res Date: 2016-05-03 Impact factor: 16.971
Authors: Stephen F Fitzgerald; Amy E Beckett; Javier Palarea-Albaladejo; Sean McAteer; Sharif Shaaban; Jason Morgan; Nur Indah Ahmad; Rachel Young; Neil A Mabbott; Liam Morrison; James L Bono; David L Gally; Tom N McNeilly Journal: PLoS Pathog Date: 2019-10-03 Impact factor: 6.823
Authors: C Sawyer; B Vishram; C Jenkins; F Jorgensen; L Byrne; A F W Mikhail; T J Dallman; K Carroll; L Ahyow; Q Vahora; G Godbole; S Balasegaram Journal: Epidemiol Infect Date: 2021-04-19 Impact factor: 2.451