| Literature DB >> 32496181 |
Lu Ya Ruth Wang1, Cassandra C Jokinen2, Chad R Laing3, Roger P Johnson4, Kim Ziebell4, Victor P J Gannon1.
Abstract
Verotoxigenic Escherichia coli (VTEC) are food- and water-borne pathogens associated with both sporadic illness and outbreaks of enteric disease. While it is known that cattle are reservoirs of VTEC, little is known about the genomic variation of VTEC in cattle, and whether the variation in genomes reported for human outbreak strains is consistent with individual animal or group/herd sources of infection. A previous study of VTEC prevalence identified serotypes carried persistently by three consecutive cohorts of heifers within a closed herd of cattle. This present study aimed to: (i) determine whether the genomic relatedness of bovine isolates is similar to that reported for human strains associated with single source outbreaks, (ii) estimate the rates of genome change among dominant serotypes over time within a cattle herd, and (iii) identify genomic features of serotypes associated with persistence in cattle. Illumina MiSeq genome sequencing and genotyping based on allelic and single nucleotide variations were completed, while genome change over time was measured using Bayesian evolutionary analysis sampling trees. The accessory genome, including the non-protein-encoding intergenic regions (IGRs), virulence factors, antimicrobial-resistance genes and plasmid gene content of representative persistent and sporadic cattle strains were compared using Fisher's exact test corrected for multiple comparisons. Herd strains from serotypes O6:H34 (n=22), O22:H8 (n=30), O108:H8 (n=39), O139:H19 (n=44) and O157:H7 (n=106) were readily distinguishable from epidemiologically unrelated strains of the same serotype using a similarity threshold of 10 or fewer allele differences between adjacent nodes. Temporal-cohort clustering within each serotype was supported by date randomization analysis. Substitutions per site per year were consistent with previously reported values for E. coli; however, there was low branch support for these values. Acquisition of the phage-encoded Shiga toxin 2 gene in serotype O22:H8 was observed. Pan-genome analyses identified accessory regions that were more prevalent in persistent serotypes (P≤0.05) than in sporadic serotypes. These results suggest that VTEC serotypes from a specific cattle population are highly clonal with a similar level of relatedness as human single-source outbreak-associated strains, but changes in the genome occur gradually over time. Additionally, elements in the accessory genomes may provide a selective advantage for persistence of VTEC within cattle herds.Entities:
Keywords: Escherichia coli; evolutionary rate; genomics; persistence; relatedness; toxin
Mesh:
Substances:
Year: 2020 PMID: 32496181 PMCID: PMC7371104 DOI: 10.1099/mgen.0.000376
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Number of herd strains analysed in this study
2012–2013 = cohort-year 1; 2013–2014 = cohort-year 2; 2014–2015 = cohort-year 3 (Wang et al., 2018) [39].
|
Serotype |
No. of herd strains |
Year |
|
|---|---|---|---|
|
O6:H34 |
22 |
2012–2013 |
13 |
|
|
2013–2014 |
3 | |
|
|
2014–2015 |
6 | |
|
O22:H8 |
30 |
2012–2013 |
9 |
|
|
2013–2014 |
6 | |
|
|
2014–2015 |
15 | |
|
O108:H8 |
39 |
2012–2013 |
14 |
|
|
2013–2014 |
9 | |
|
|
2014–2015 |
16 | |
|
O139:H19 |
44 |
2012–2013 |
18 |
|
|
2013–2014 |
11 | |
|
|
2014–2015 |
15 | |
|
O157:H7 |
106 |
1995 |
4 |
|
|
1996 |
45 | |
|
|
1997 |
46 | |
|
|
1998 |
11 |
Quality statistics for hybrid assemblies of VTEC reference genomes
|
Strain |
Serotype |
No. of contigs |
Largest contig (bp) |
Total length (bp) |
mol% G+C |
N50, N75 |
|---|---|---|---|---|---|---|
|
ECI-3359 |
O6:H34 |
2 |
5 003 273 |
5 008 659 |
50.66 |
5 003 273 |
|
ECI-2866 |
O22:H8 |
8 |
5 026 381 |
5 180 090 |
50.79 |
5 026 381 |
|
ECI-3462 |
O108:H8 |
4 |
5 045 369 |
5 190 307 |
50.76 |
5 045 369 |
|
ECI-3929 |
O139:H19 |
3 |
4 964 669 |
5 147 626 |
50.79 |
4 964 669 |
|
ECI-0907 |
O157:H7 |
2 |
5 440 825 |
5 551 185 |
50.42 |
5 440 825 |
Fig. 1.MSTs based on wgMLST profiles of persistent serotypes from cattle: (a) O6:H34, (b) O22:H8, (c) O108:H8, (d) O139:H19, (e) O157:H7. Main panels: distribution of genotypes among cohort-years relative to epidemiologically unrelated strains. Sub-panels: distribution of genotypes among individual cattle for which at least two isolates were obtained. Branch labels denote the number of allele differences. Isolates that share <10 allele differences with adjacent nodes are included in the partition (grey).
Number of wgMLST allele differences and similarity values using UPGMA and MSTs. na, Not applicable.
|
UPGMA |
MST | |||||
|---|---|---|---|---|---|---|
|
Within herd |
Versus closest outgroup strain |
Within herd |
Versus closest outgroup strain | |||
|
Serotype |
No. of allele differences |
Similarity value* |
No. of allele differences |
Similarity value* |
No. of clusters† |
No. of allele differences |
|
O6:H34 |
10.6‡ |
89.4‡ |
96.0 |
4.0 |
1 |
75 |
|
O22:H8 |
28.7 |
71.3 |
164.2 |
−64.2 |
2 |
87 |
|
O108:H8 |
42.8 |
57.2 |
|
|
6 |
|
|
O139:H19 |
106.0 |
−6.0 |
>200 |
−100.0 |
1 |
400 |
|
O157:H7 |
11.7 |
88.3 |
47.9§ |
52.1§ |
1 |
37‖ |
*Similarity value as calculated by BioNumerics v7.6.2 (−100 = 200 loci differences; 100 = identical wgMLST profiles).
†Cluster defined by 10 or fewer allele differences between any two adjacent nodes.
‡For comparison, within-herd similarity of O6:H34 cattle isolates from the USA (n=23): 9.8 allele differences (90.2 similarity).
§Versus EDL933, 93.3 allele differences (6.7 similarity); versus Sakai, 99.1 allele differences (0.9 similarity)
||Versus EDL933, 72 allele differences; versus Sakai, 75 allele differences.
Fig. 2.Bayesian inference of time-scaled phylogenies. (a) O6:H34. (b) O22:H8; pink highlighted strains, stx2c+ clade. (c) O108:H8. (d) O139:H19. Blue, cohort-year 1; green, cohort-year 2; red, cohort-year 3. Node and branch labels: posterior support. Node bars: height_95%_HPD.
Summary of beast analysis of SNVPhyl SNV alignments. na, Not applicable.
|
Serotype |
No. of strains* |
Root-to-tip regression (correlation coefficient)† |
Root-to-tip regression ( |
tMRCA‡ |
Model§ |
Mean clock rate |
95 % HPD interval |
No. of SNV sites |
No. of sites in reference core |
Substitutions per site (core) per year|| |
SNVs per genome per year¶ |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
O6:H34 |
23 |
0.6934 |
0.4808 |
2010.32 |
GTR + γ, strict, skyline |
0.0756 |
0.0385, 0.1146 |
43 |
4 588 778 |
7.08×10−7 |
3.25 |
|
O22:H8 |
30 |
0.9266 |
0.8586 |
2009.87 |
GTR + γ, strict, constant |
0.0712 |
0.0438, 0.1004 |
65 |
4 671 513 |
9.91×10−7 |
4.63 |
|
O108:H8 |
40 |
0.8028 |
0.6445 |
2011.32 |
GTR + γ, strict, skyline |
0.0474 |
0.0312, 0.0633 |
140 |
4 846 133 |
1.37×10−6 |
6.64 |
|
O139:H19 |
45 |
0.7530 |
0.5670 |
2009.42 |
HKY + γ, strict, constant |
0.0307 |
0.0244, 0.0392 |
196 |
4 825 351 |
1.25×10−6 |
6.02 |
|
O157:H7 |
36# |
−0.8477 |
0.7186 |
|
Insufficient temporal signal for | ||||||
*Includes duplicate of reference strain; subtract one for the number of unique strains.
†Using ‘best root’ option in TempEST for most clock-like phylogeny.
‡tMRCA, estimated date of most recent common ancestor.
§Nucleotide substitution model (GTR, HKY); clock type (strict, uncorrelated relaxed lognormal); tree prior (coalescent constant, coalescent Bayesian skyline).
||Mean clock rate × (no. of SNV sites/no. of sites in reference core).
¶Mean clock rate × no. of SNV sites.
#Representative subset (n=36) of strains used to assess the temporal signal.
Fig. 3.Bayesian estimates of mean clock rates from date-randomization analysis. x-axes: 1–10, date randomization replicates; 11, accurate collection dates. y-axes: clock rate (substitutions per site). The strength of the temporal structure was rated based on the proportion of randomization replicates (n=10) for which the 95 % HPDs overlapped with that estimated using accurate collection dates: 0, ‘strong’, 0-0.5, ‘moderate’, >0.5, ‘low’ [23].
Fig. 4.Manhattan plot (−log10P) of a GWAS of persistent (n=13) and sporadic (n=11) VTEC strains. Only genes with a naïve P value <0.05 are plotted. Genome-wide significance thresholds of 0.05 and 0.005 (Benjamini–Hochberg corrected) are plotted as dashed lines. Data points above these thresholds were significantly overrepresented in persistent strains. Coloured data points indicate separate gene fragments within the pangenome identified by Roary.
Fig. 5.Accessory genes that were more prevalent in persistent serotypes mapped to reference strain O91 RM7190. The highlighted region (pink) indicates a high density of persistence-associated genomic features, including genes encoding: porin, sulfatase, vitamin B12/cobalamin outer-membrane transporter, transposase and hypothetical proteins.