Literature DB >> 35316173

Global evolutionary dynamics and resistome analysis of Clostridioides difficile ribotype 017.

Korakrit Imwattana^1,2, Papanin Putsathit³, Deirdre A Collins³, Teera Leepattarakit², Pattarachai Kiratisin², Thomas V Riley^1,3,4,5, Daniel R Knight^1,4.

Abstract

Entities: Chemical

Keywords: AMR; Clostridioides difficile; evolution; outbreaks; population; ribotype 017

Mesh：

Substances：
Anti-Bacterial Agents

Year: 2022 PMID： 35316173 PMCID： PMC9176289 DOI： 10.1099/mgen.0.000792

Source DB: PubMed Journal: Microb Genom ISSN： 2057-5858

× No keyword cloud information.

Impact Statement

This study utilizes genomic sequence data from 282 non-clonal ribotype (RT) 017 isolates collected from around the world to delineate the origin and spread of this epidemic lineage, as well as explore possible factors that have driven its success. It also reports on a focused epidemiological investigation of a cluster of RT 017 in a tertiary hospital in Thailand to identify possible sources of transmission in this specific setting.

Data Summary

All new whole-genome sequence data generated in this study, highlighted in the Supplementary Document, have been submitted to the European Nucleotide Archive under BioProject PRJEB44406 (sample accessions ERS6268756–ERS6268798). The complete genome of MAR286 was submitted to GenBank under BioProject PRJNA679085 (accession CP072118). Details of genomes included in the final analyses (282 genomes in the global analysis and an additional 13 genomes from the smaller analysis), as well as records of the phenotypic analyses are available in the Supplementary Document, available at https://www.doi.org/10.6084/m9.figshare.14544792. An interactive version of the Bayesian phylogenetic tree in Fig. 1 is available at https://microreactorg/project/v89tzQ8rii55PkAGF5Jo2r/64c80194.

Introduction

PCR ribotype (RT) 017, or sequence type (ST) 37, ranks among the most successful strains of . Despite producing only one functional toxin (toxin B), RT 017 has spread widely and caused outbreaks globally [1]. The severity of infection (CDI) caused by RT 017 has been comparable to infection caused by strains producing two or three toxins [2-4]. One factor that may have contributed to the success of RT 017 is antimicrobial resistance (AMR) [5]. The evolutionary origins of RT 017 remain contentious [1]. Possible contributing factors included the early erroneous dismissal of RT 017 as non-pathogenic due to its lack of toxin A [6], and the use of diagnostic tests that only detected toxin A [7]. By the time that the pathogenicity of RT 017 was recognized (1995) [8], the strain had already spread across the globe [1]. Based on multi-locus sequence type, RT 017 is a member of evolutionary clade 4 [1]. This clade comprises several non-toxigenic and toxigenic that only produces toxin B (A-B+CDT-) [9]. Epidemiological evidence suggested that clade 4 originated in Asia and RT 017 later spread globally. First, RT 017 has been the dominant strain in Asia for decades [10-14] and has only appeared sporadically in other regions [8, 15–20]. Second, reports of other clade 4 strains have been exclusively from Asian countries, such as ST 81 in China [21, 22], RT 369 in Japan [23] and, most importantly, the high diversity of clade 4 non-toxigenic in Southeast Asia [24]. However, there have not been many historical RT 017 strains available from the region to verify this hypothesis [10-14]. In 2017, Cairns et al. analysed the whole-genome sequence (WGS) data of 277 RT 017 strains from around the world. Their results suggested an alternative hypothesis, that RT 017 instead originated in North America, spread to Europe in the 1990s and later spread to other regions [25]. A more recent study based mainly on the same dataset agreed with this hypothesis but estimated the time of spread to be before the 1970s [26]. Despite the large dataset, this conclusion might have been influenced by a strain selection bias, as the North American strains included in the study were relatively older than strains from other regions [25, 26]. To improve upon the previous analyses, the present study included a larger number of strains, with a few early European strains and a greater diversity of Asian strains. We aimed to explore the origin and spread of RT 017, as well as the key genetic factors driving its success.

Methods

C. difficile RT 017 genomes

This study started with 929 RT 017 strains from three collections: a set of 45 clinical RT 017 strains from Thailand [32 phenotypically multidrug-resistant (MDR) and 13 non-MDR] some of which have been described previously [27], 97 previously unpublished RT 017 strains from our laboratory’s collection and 787 RT 017 genomes publicly available at the NCBI Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra/) as of January 2020. These collections included genomes of three RT 017 isolated in the early 1980s (courtesy of Dr Jon Vernon and Prof. Mark Wilcox in Leeds, UK, but originally part of Prof. S. P. Borriello’s collection) [28]. Multilocus sequence typing (MLST) was performed directly from sequence read files using SRST2 v0.2.0 and the PubMLST database (https://pubmlst.org/organisms/clostridioides-difficile/) as previously described [29, 30]. After excluding clonal and redundant strains (see below), 282 strains remained in the final dataset.

Assembly of a new complete RT 017 genome from Southeast Asia

To facilitate phylogenomic analysis of strains from Thailand, a Thai strain was selected for hybrid assembly of a closed reference genome. MAR286 was a non-MDR strain as opposed to the existing MDR reference strain M68 [31]. Short-read sequencing was performed on an Illumina HiSeq sequencing platform (Illumina) using 150 bp paired-end chemistry to a depth of 39× coverage as previously described [30]. Long-read sequencing was performed on the MinION Mk1C machine (Nanopore). The sequencing libraries were prepared using the Ligation Sequencing Kit (SQK-LSK109) and run on a FLO-MIN106 (R9.4.1) flow cell, according to the manufacturer’s instructions, for 24 h. Hybrid assembly was performed with Unicycler v0.4.8 using a conservative mode [32]. The final assembly graph was visualized and polished with Bandage v0.8.1 [33]. Genome annotation was performed using the NCBI Prokaryotic Genomes Annotation Pipeline [34].

AMR genotyping

AMR genotyping was performed as previously described [24]. Briefly, all read files were interrogated against the ARGannot database (for known accessory AMR genes) with two additional genes recently described in , erm(52) and mefH [27], and a customized gyrA, gyrB and rpoB alleles database (for known resistance-conferring point mutations) using SRST2 [29, 35]. Strains that were positive for either ermB or tetM were interrogated for known transposons using SRST2 as previously described [29].

Evolutionary analysis of RT 017

To investigate the evolution and spread of RT 017, core genome SNP (cgSNP) and Bayesian evolutionary analyses were performed. All paired-end reads were trimmed using TrimGalore v0.6.4 to remove low-quality and adapter sequences (https://github.com/FelixKrueger/TrimGalore), mapped to the genome of M68 and variants identified using Snippy v4.4.5 (https://github.com/tseemann/snippy). The resulting VCF file was then screened to exclude variants occurring in the repetitive region using SnpSift v4.3t [36] and to exclude indels using VCF-annotate v0.1.15 [37]. Gubbins v2.4.1 was used to identify and remove recombination sites [38]. SNP-dists v0.7.0 was used to generate a pairwise cgSNP table (https://github.com/tseemann/snp-dists). Following the approach of Eyre et al. [39] and Didelot et al. [40], a threshold of 0–2 cgSNPs was used to determine if groups of two or more strains were clonally related. To facilitate the Bayesian analysis, clonal strains were removed from the dataset, leaving only one representative for each clonal cluster (n=282). Bayesian evolutionary analysis was performed using BactDating v1.0.1 [41]. In our previous study, BactDating allowed the comparison of a longer sequence alignment than the conventional beast software, which led to more precise confidence intervals [9]. It is also compatible with Gubbins, which was used in the previous step [41]. BactDating was run using a Gubbins recombination-adjusted phylogenetic tree from the previous analysis (1455 sites) as an input with the following settings: Markov chain Monte Carlo (MCMC) chains of 5×108 iterations sampled every 5×105 iterations with a 50 % burn-in and a strict model with a rate of 1.4 mutations per genome per year as published by Didelot et al. [40]. These parameters were first tested on a smaller dataset (n=45, see below) and produced the best model. The molecular clock used in this study is best suited for the microevolutionary analysis and the investigation of transmission [40], which was the main focus of this study. However, this clock estimate does not take into account the quiescence of spores and the analysis may have underestimated the time of spread (see Discussion). Traces were inspected to ensure convergence and the effective sample size (ESS) values for all estimated continuous variables were >200. The final Bayesian tree was annotated using iTOL v6 [42]. An interactive version of the Bayesian phylogenetic tree in Fig. 1 was uploaded to Microreact [43].

Fig. 1.

Bayesian tree of 282 non-clonal RT 017 genomes from around the world. The RT 017 population could be divided into non-epidemic (NE; sublineages NE1–NE3) and epidemic (E) lineages. *The region of origin for each strain. Important genotypic AMR determinants are displayed on the right (A–E). The red star represents M68, the reference genome in this analysis. Bayesian analysis was also performed on a subset of 45 Thai genomes, for which patient data and phenotypic AMR results were available [27, 44]. With this small dataset, several Bayesian evolutionary analyses were performed with different parameters, including the use of different input phylogenetic trees (Gubbins [41] versus PhyML [45]), strict versus relaxed clock, inclusion versus omission of collection dates, as well as different MCMC parameters. The cgSNP analysis was performed using the following reference genomes: MAR286 (this study; accession CP072118.1), M68 (accession FN668375.1), 630 (accession AM180355.1) and M120 (accession FN665653.1), to evaluate whether the choice of reference genome had any effect on downstream analysis. A pairwise whole-genome average nucleotide identity (ANI) between each strain and the reference strains was generated using FastANI [46], and the results were used to compare strain relatedness with each reference.

Pangenome-wide association study

The cgSNP and Bayesian analyses identified two distinct RT 017 sublineages. To determine significant genetic loci associated with each lineage, all genomes were assembled and a pangenome-wide association study (pan-GWAS) was performed as previously described [9]. Briefly, Panaroo v1.1.0 was run with default settings on the annotated genomes [47], and the results were used as an input for Scoary v1.6.16 to identify the significant genetic loci associated with each lineage [48].

Assessment of motility and cell aggregation

We also evaluated motility and cell aggregation in RT 017 from the two lineages; strain 1470 [ATCC 43598, non-epidemic lineage (NE)], MAR006 [epidemic lineage (E)], MAR024 (lineage E) and MAR 286 (lineage NE). First, a motility assay was performed as described by Tasteyre et al. in four separate batches [49]. In each batch, the growth diameter was recorded three times at different angles. Second, cell aggregation was assessed by measuring the optical density at 600 nm (OD600) of the undisturbed and disturbed 48-h-old growth in brain heart infusion broth [50]. These tests were performed with at least three biological replicates. strain IS58 (RT 033, non-motile) was included as a negative control [51].

Statistical analysis

All statistical analyses were performed using online tools by Social Science Statistics available at https://wwwsocscistatisticscom/. A P-value of ≤0.05 was considered to be statistically significant.

Results

The epidemic RT 017 lineage emerged from Asia in the middle of the 20th century

To study the global population structure of RT 017, cgSNP and Bayesian evolutionary analyses were performed on 282 non-clonal RT 017 genomes collected worldwide between 1981 and 2019 (Fig. S1, available in the online version of this article). The overall median year of isolation for this dataset was 2011 [quartile range (QR): 2008–2014]. The median years of isolation for the three main continents were as follow: Asia, 2014 (2010–2016), Europe, 2010 (2006–2012) and North America, 2009 (2004–2017). The Bayesian tree of RT 017 could be divided into two parts based on the tree branching and topology (Fig. 1). The first part had deep temporal branches with a small number of strains, indicating an ancient lineage with limited spreading. This lineage was thus called the non-epidemic lineage (NE) and could be further divided into three sublineages (NE1, NE2 and NE3). The second part stemmed from sublineage NE3, which had shallow temporal branches with a large number of strains, indicating a rapid expansion of the lineage. This lineage was thus called the epidemic lineage (E). Table 1 summarizes 11 lineage-defining SNPs identified. None of the mutations was on the pathogenicity locus (PaLoc), the genetic region containing the toxin genes tcdA and tcdB. All RT 017 strains carried the same tcdB allele (tcdB_9 according to the PubMLST database) and were expected to produce variant type toxin B [1]. Sublineages NE1, NE2 and NE3 consisted mainly of strains from Europe, North America and Asia, respectively, and the common ancestor of the three sublineages was estimated to have emerged in 1588 [95 % confidence interval (CI): 758–1858]. Sublineages NE1 and NE2 split around 1860 (95 % CI: 1622–1954). Lineage E was estimated to have split from sublineage NE3 around 1958 (95 % CI: 1920–1977) and later spread globally around 1970 (95 % CI: 1953–1983).

Table 1.

List of lineage-defining cgSNPs

Position*	Strand†	Product	N/S‡	Lineages
Position*	Strand†	Product	N/S‡	NE₁	NE₂	NE₃	E
Lineage NE vs. lineage E
867 703	F	Diguanylate kinase signalling protein	N	G§	G§	G§	T
Sublineages NE₁ and NE₂ vs. sublineage NE₃
263 571	F	FlgG	N	T§	T§	C	C
480 088	R	UvrA	S	A§	A§	G	G
1 486 937	F	Gfo/Idh/MocA family oxidoreductase	N	T§	T§	G	G
1 789 300	F	Serine O-acetyltransferase	S	C§	C§	T	T
3 254 867	R	ABC transporter	N	T§	T§	C	C
3 808 791	n/a	Non-coding region	–	G§	G§	A	A
Sublineage NE₁ vs. sublineage NE₂
1 299 679	F	Penicillin-binding protein 2	N	G	T§	G	G
1 486 584	F	Gfo/Idh/MocA family oxidoreductase	N	C	T§	C	C
2 928 003	R	ABC transporter	N	G	T§	G	G
3 066 957	R	Thioether cross-link-forming SCIFF peptide maturase	N	C	T§	C	C

*Position on C. difficile M68 genome.

†Coding strand (F, forward; R, reverse).

‡Non-synonymous substitutions (N) and synonymous substitutions (S).

§Different from the reference genome.

List of lineage-defining cgSNPs Position* Strand† Product N/S‡ Lineages NE1 NE2 NE3 E Lineage NE vs. lineage E 867 703 F Diguanylate kinase signalling protein N G§ G§ G§ T Sublineages NE 263 571 F FlgG N T§ T§ C C 480 088 R UvrA S A§ A§ G G 1 486 937 F Gfo/Idh/MocA family oxidoreductase N T§ T§ G G 1 789 300 F Serine O-acetyltransferase S C§ C§ T T 3 254 867 R ABC transporter N T§ T§ C C 3 808 791 n/a Non-coding region – G§ G§ A A Sublineage NE 1 299 679 F Penicillin-binding protein 2 N G T§ G G 1 486 584 F Gfo/Idh/MocA family oxidoreductase N C T§ C C 2 928 003 R ABC transporter N G T§ G G 3 066 957 R Thioether cross-link-forming SCIFF peptide maturase N C T§ C C *Position on C. difficile M68 genome. †Coding strand (F, forward; R, reverse). ‡Non-synonymous substitutions (N) and synonymous substitutions (S). §Different from the reference genome.

The acquisition of ermB was probably the driving factor of the epidemic RT 017 lineage

After incorporating genotypic AMR data, an association between the acquisition of AMR genotype and the spread of RT 017 was evident. Genotypically MDR RT 017 strains were in the lower part of sublineage NE3 and lineage E, and only emerged around 1935 (95 % CI: 1851–1969). There had been multiple acquisition events for the two most common accessory AMR determinants: tetM and ermB. The earliest acquisition of tetM was probably through gaining Tn916, which occurred around 1914 (95 % CI: 1799–1964), while the earliest acquisition of ermB was probably through gaining Tn6194, which occurred around 1958 (95 % CI: 1920–1977), notably the same timeframe as the predicted time of emergence of lineage E. Non-synonymous substitutions in RpoB (H502N, conferring rifamycin resistance) and in GyrA (T82I, conferring fluoroquinolone resistance) were found scattered throughout the population. In contrast, an R505K substitution in RpoB was found only in strains from sublineage NE3 and lineage E and was more common among Asian strains (37.2 % vs. 8.9 %, P<0.0001). The only European strains with an R505K substitution in RpoB were from an outbreak in Portugal [18]. Three independent GyrB substitution events were identified in this dataset: two D426N substitution events in North America around 2008 (95 % CI: 1998–2011) and 2015 (95 % CI: 2012–2016), and one D426V substitution event in Ireland ( M68, the reference strain) around 2004 (95 % CI: 2001–2005) (star in Fig. 1). In addition to the important AMR determinants described above, the aac6-aph2 gene was also common among RT 017, found in 73 strains in this dataset (25.9%), and more commonly among Asian strains (43.4 % vs. 14.2 %, P<0.0001).

The epidemic RT 017 lineage expresses higher motility

The cgSNP that differentiated between lineages NE and E resulted in a substitution in a diguanylate kinase signalling protein, which may play a role in motility and biofilm formation in [50, 52]. Thus, motility and cell aggregation assays were performed (Fig. 2). Strains from lineage E had an increase in growth diameter compared to lineage NE (average diameter 7.7 vs. 5.9 mm, Wilcoxon rank sum P<0.0001, Fig. 2a) with a marginally significant change in the level of cell aggregation as shown by the lower change in OD600 between undisturbed and disturbed cultures (0.88 vs. 0.99, Wilcoxon rank sum P=0.031; for comparison, the non-motile IS58 had 1.84-fold change in OD600, Fig. 2b).

Fig. 2.

Comparison of motility and cell aggregation between Lineages E (pink) and NE (lilac). (a) Lineage E had a larger growth diameter in semi-solid media. (b) Lineage E displayed a lower cell aggregation as measured by the difference in OD600 between undisturbed and disturbed broths. (c) The semisolid media for all tested strains. IS58 (RT 033, dark grey) was used as a negative control. All error bars display 95 % confidence intervals. In addition to the lineage-specific cgSNPs (Table 1) and the difference in the prevalence of genotypic AMR, pan-GWAS was performed to identify other significant lineage-specific genetic loci. A total of 32863 genes was identified in the dataset, 3560 (10.8 %) of which were found in more than 95 % of strains and classified as core genes. Based on the GWAS, the locus most significantly associated with lineage E was the aminoglycoside resistance locus [containing aac6-aph2 and a gene resembling ant6(Ib) (72 % identity, E-value=5.01e-157); sensitivity 85.3 %, specificity 97.8 %]. Apart from AMR-related loci, lineage E was associated with a truncation of the formate dehydrogenase FdhF protein (sensitivity 75.3 %, specificity 97.8 %). A comparison of the FdhF protein is shown in Fig. S2 [53]. In an analysis of 260 representative genomes across eight evolutionary clades [9], this truncated FdhF protein was not found in other strains.

RT 017 strains in Thailand were probably acquired outside of the hospital

In this study, a smaller subset of RT 017 genomes from a single hospital in Thailand (n=45) was analysed to determine the best parameters to be used in the analyses above (Gubbins tree model, strict model with a rate of 1.4 mutations per genome per year, the inclusion of collection dates and the MCMC parameters described in the Methods). First, a local reference genome (MAR286; GenBank accession CP072118.1) was generated to evaluate the effect of the different reference genomes on the downstream analysis. Comparison of the MAR286 genome with that of M68, a commonly used reference genome of RT 017, is shown in Table 2. Pairwise whole-genome ANI and cgSNP analyses were performed on Thai RT 017 genomes against different reference genomes and the results are summarized in Table 3. Thai strains were closest to M68. Using M68 as a reference resulted in the longest average mapped length, significantly longer than MAR286, the second closest reference genome (P<0.0001). Accordingly, M68 was chosen as a reference for subsequent analysis. The average number of pairwise cgSNP differences based on M68 and MAR286 was 0.49 SNPs (95 % CI: 0.44–0.54). The difference between strains in this study and the other two reference genomes was more pronounced, resulting in a greater number of pairwise cgSNP differences compared to M68: 5.42 SNPs (95 % CI: 5.15–5.69) for 630 and 9.39 SNPs (95 % CI: 9.05–9.72) for M120.

Table 2.

Comparison of two RT 017 reference genomes

Parameter	M68	MAR286
Accession	FN668375.1	CP072118.1
Genome size (bp)	4 308 325	4 242 261
Genes	3983	3892
Coding sequences	3830	3761
rRNAs	40	35
tRNAs	109	92
Non-coding RNAs	4	4
CRISPR arrays	4	6
GC content	28.9 %	28.8 %
AMR loci	ermB (Tn6194) [MLS_B], tetM (Tn6190) [tetracyclines], D426V (GyrB) [fluoroquinolones]	ermB (Tn6194) [MLS_B], tetM (Tn916) [tetracyclines]
Pairwise ANI	99.92 %

Table 3.

Effect of the choice of reference genome on cgSNP analysis

Reference	ST (clade)	Accession	Average mapped length (bp)	No. of SNPs	ANI (%)
MAR286	37 (4)	CP072118.1	4 134 703.82	311	99.88
M68	37 (4)	FN668375.1	4 176 850.73	308	99.93
630	54 (1)	AM180355.1	3 836 370.82	267	97.98
M120	11 (5)	FN665653.1	3 579 796.21	235	96.11

ANI, average nucleotide identity; SNPs, single nucleotide polymorphisms; ST, sequence type.

Comparison of two RT 017 reference genomes Parameter M68 MAR286 Accession FN668375.1 CP072118.1 Genome size (bp) 4 308 325 4 242 261 Genes 3983 3892 Coding sequences 3830 3761 rRNAs 40 35 tRNAs 109 92 Non-coding RNAs 4 4 CRISPR arrays 4 6 GC content 28.9 % 28.8 % AMR loci ermB (Tn6194) [MLSB], tetM (Tn6190) [tetracyclines], D426V (GyrB) [fluoroquinolones] ermB (Tn6194) [MLSB], tetM (Tn916) [tetracyclines] Pairwise ANI 99.92 % Effect of the choice of reference genome on cgSNP analysis Reference ST (clade) Accession Average mapped length (bp) No. of SNPs ANI (%) MAR286 37 (4) CP072118.1 4 134 703.82 311 99.88 M68 37 (4) FN668375.1 4 176 850.73 308 99.93 630 54 (1) AM180355.1 3 836 370.82 267 97.98 M120 11 (5) FN665653.1 3 579 796.21 235 96.11 ANI, average nucleotide identity; SNPs, single nucleotide polymorphisms; ST, sequence type. Using M68 as a reference, 308 high-quality cgSNPs were identified across 45 strains. The final Bayesian phylogenetic tree is shown in Fig. 3. Based on this phylogeny, 44 RT 017 strains, excluding the outlier, could be classified roughly into three groups: the oldest group (G1, n=13), most of which were non-MDR RT 017; a group of early MDR RT 017 (G2, n=15); and the most recent and rapidly expanding clade of MDR RT 017 (G3, n=16). The common ancestor of all Thai RT 017 was estimated to have arisen around 1988 (95 % CI: 1949–2000). The common ancestors of the three groups were estimated to have arisen around 1999 (1993–2004), 2003 (1995–2007) and 2012 (2009–2013), respectively.

Fig. 3.

Bayesian tree of 45 Thai RT 017 strains. ‘THP’ refers to strains isolated in 2015 and ‘MAR’ to strains isolated in 2017–2018. Red boxes indicate that the patients were in the same department when the strains were isolated. Blue boxes indicate that the strains were isolated from the same patient within 2–8 weeks. Seven small clonal groups (CGs) were identified across the tree (CG1–CG7 in Fig. 3), three of which (CG2, CG5 and CG7) were from different patients who were in the hospital during the same period, suggesting possible direct patient–patient transmission (red boxes). Two CGs (CG1 and CG3), and two small CGs in CG5, included strains that were isolated from the same patients within 2 months, suggesting recurrence of CDI (blue boxes). The other two CGs (CG4 and CG6) included strains isolated from different patients without an obvious epidemiological link, one of which included strains from two specimens collected 3 years apart, suggesting contaminations in the hospital environment (red asterisks in Fig. 3). The remaining strains were non-clonal.

Discussion

Despite being one of the most successful strains of , very little is known about the evolution and spread of RT 017. This study addresses this knowledge gap using high-resolution phylogenomic analyses on a comprehensive and diverse dataset of 282 global RT 017 genomes. We found that the population of RT 017 can be divided into two lineages, agreeing with the previous study by Cairns et al. [25]. However, the data disagree on the geographical origin of RT 017. Our study suggests that RT 017 may have originated in Asia, supporting the epidemiological studies [1], then spread to Europe and North America. This probably resulted from the inclusion of a few older European strains (isolated between 1981 and 1985) to reduce the gap in collection years between the two continents (P=0.6745 in this dataset) and a large diversity of Asian strains from 11 countries and administrative regions. Based on the difference in structure, the two lineages of RT 017 were classified as non-epidemic (NE, a small number of strains with little population expansion) and epidemic (E, a larger number of strains with rapid population expansion) lineages. Although not exclusively containing strains from one continent, the NE lineage could be divided into three sublineages predominantly containing strains from Asia, Europe and North America. This suggests that the spread of RT 017 between these continents had occurred since the end of the 16th century. This roughly coincides with the estimated time of PaLoc acquisition ~500 years ago [54]. Sublineages NE1 (Europe) and NE2 (North America) were more closely related to one another than to sublineage NE3 (Asia). In turn, sublineage NE3 was more closely related to sublineage NE1 than sublineage NE2, as demonstrated by fewer cgSNP differences (Table 1). Thus, the spread of RT 017 probably began with population movement between Asia and Europe (1588, 95 % CI: 758–1858) before spreading from Europe to North America (1860, 95 % CI: 1622–1954). The direction of the spread between Asia and Europe cannot be determined from this analysis; however, based on the high prevalence and diversity of clade 4 strains in Asia [10–13, 24], it is likely that RT 017, as well as other strains in clade 4, originated in Asia, travelled to Europe and subsequently crossed the Atlantic to North America. Even though RT 017 could be found in at least three continents by the end of the 19th century, the Bayesian analysis suggests that the epidemic lineage E emerged solely from Asia (sublineage NE3) following the acquisition of ermB-positive Tn6194 in 1958 (95 % CI: 1920–1977), before spreading globally in 1970 (95 % CI: 1953–1983). The time of acquisition of the ermB element coincides with the introduction of clindamycin into clinical practice in the 1960s [55]. This pattern of spread is similar to RT 027, another epidemic strain that spread in and from North America in the early 2000s [56] driven by the acquisition of fluoroquinolone resistance in 1993/94 [56], following the widespread use of levofloxacin for community-acquired pneumonia [57]. This provides supporting evidence that the use of antimicrobials and the acquisition of AMR determinants are significant in the spread of . Although the prevalence of fluoroquinolone and rifamycin resistance was also high in , the widespread resistance across all lineages suggests the independent acquisition of resistance after the spread of the strain. The analyses were first performed on a small dataset of Thai clinical RT 017 isolates (n=45) with complete metadata to evaluate the performance of the pipeline. These analyses accurately identified four pairs of strains isolated from the same patients, provided good correlations between AMR phenotypes and genotypes [27], as well as AMR genotypes and cgSNP population structure. When performed on the global dataset (n=282), the analyses accurately predicted the emergence of M68 (2001–2005), a strain from a 2003 outbreak in Ireland [31]. Also, appropriate timelines for the emergence of Argentinian (1996–2000) and Portuguese (2003–2011) clusters [18, 20] were estimated, supporting the accuracy of the analyses. Besides the aforementioned AMR genes, the epidemic lineage E was also associated with the presence of an aminoglycoside resistance locus and a truncated FdhF protein. Being a strictly anaerobic bacterium, is intrinsically resistant to aminoglycosides, and the presence of an additional aminoglycoside-resistance locus is unlikely to have provided any advantage to the bacterium [58]. However, it may suggest that the epidemic strains were from an area with a high prevalence of aminoglycoside-resistant enteric bacteria, especially enterococci [59]. Formate dehydrogenase is an enzyme involved in the reoxidation of NAD [60]. Based on the prediction by the UniProt database [61], the truncated region is the coiled-coil domain that probably serves as a binding site for NAD. Thus the truncated protein is probably non-functional, although has several pathways for oxidizing NAD and the truncated FdhF may not ultimately have any effect on growth or virulence [60]. Another significant genetic variant associated with lineage E was a point substitution (W366L) on the diguanylate kinase signalling protein (Table 1). This protein is involved in the regulation of cyclic dimeric guanosine monophosphate (c-di-GMP), which plays a role in motility and biofilm formation [50, 52]. In our preliminary assessment, strains from lineage E had increased motility in vitro. This provides a foundation for further in vivo studies to determine the effect of these phenotypes on the virulence and transmissibility of the epidemic strains. Analyses of the Thai clinical strains provided information on disease transmission in the country that differs from a previous report from the UK [16]. The UK study reported a cluster of closely related RT 017 strains in a single hospital in London that was different to strains from other parts of the city, suggesting an intra-hospital outbreak [16]. In the current study, all Thai strains were isolated in a single tertiary hospital over 4 years (2015–2018), but most of them were not closely related. Overall, these strains were more closely related to M68, a strain isolated in Ireland in a different decade [31], than to a non-epidemic strain from the same hospital. This suggests that the high prevalence of RT 017 in the hospital was not due to an ongoing outbreak. Indeed, evidence of direct patient–patient transmission could be identified in only a few cases. The remaining cases acquired RT 017 elsewhere, probably from the community [62, 63]. Although evidence of RT 017, or other clade four strains, in the environment in Asia has yet to be provided [24, 64], it may be inferred by the persistence of RT 017 in the human population [10-14]. This mimics the situation in North America, where successful public health interventions have led to a significant decrease in the burden of the epidemic RT 027, although complete eradication has not been achieved [65]. By contrast, RT 027 has almost completely disappeared in Europe [66]. The persistence of RT 027 in North America was linked strongly to continuous spillover from several environmental sources, including household environments and companion animals [67, 68]. Similar studies in Asia are needed to verify the presence of RT 017 in the environment, which would further suggest that RT 017 has long been integrated into the Asian community. This study also demonstrates the effect of reference genome selection on downstream analysis (Table 3). The results were comparable when a reference from the same ST was used (an average difference of 0.49 SNPs, clonality cut-off point of 2 SNPs) [40]. Differences became more pronounced as the reference strain became less related, suggesting that a reference genome from the same ST should be used to ensure accurate cgSNP results. With the introduction of ONT, it is now possible to assemble a complete genome of a local reference strain to maximize the accuracy of cgSNP analysis using a combination of short- and long-read sequences. A limitation of this study remained the relatively low number of early RT 017 strains in general and the lack of older strains from Asia. This probably led to some uncertainty in the estimations, as reflected by wide 95 % CIs, especially around the root of the Bayesian tree. Also, the biological clock of 1.4 mutations per genome per year used in this study did not account for the presence of spores, the genomes of which may remain unchanged for decades or centuries [40]. This may affect the estimated time at the root of the tree, which could be earlier than the current estimate. The inclusion of more early strains will help adjust the model leading to a more accurate estimate. Although it may be difficult to acquire old clinical strains, it may be possible to get historical strains from other sources. Soil is one promising source for ancient , as it is a reservoir for spores and several methods have been developed to measure the age of the soil [69], which can be used as a substitution for the collection date in a Bayesian evolutionary analysis. In conclusion, RT 017 had been circulating between Asia and Europe for centuries before spreading to North America. The epidemic lineage of RT 017 emerged from Asia in the middle of the 20th century following the acquisition of ermB. A focused investigation of contemporary RT 017 in Thailand revealed that the population was highly diverse and community reservoirs/sources may have played an important role in the transmission of disease in this country.

Additional information

The Supplementary Data Sheet 1 is available at DOI: 10.6084 /m9 .figshare.14544792. Click here for additional data file.

67 in total

1. The characteristics of Clostridium difficile ST81, a new PCR ribotype of toxin A- B+ strain with high-level fluoroquinolones resistance and higher sporulation ability than ST37/PCR ribotype 017.

Authors: Baoya Wang; Wenwen Peng; Pingping Zhang; Jianrong Su
Journal: FEMS Microbiol Lett Date: 2018-09-01 Impact factor: 2.742

2. Type-specific risk factors and outcome in an outbreak with 2 different Clostridium difficile types simultaneously in 1 hospital.

Authors: A Goorhuis; S B Debast; J C Dutilh; C M van Kinschot; C Harmanus; S C Cannegieter; E C Hagen; E J Kuijper
Journal: Clin Infect Dis Date: 2011-09-13 Impact factor: 9.079

Review 3. Antimicrobial resistance in Clostridium difficile ribotype 017.

Authors: Korakrit Imwattana; Daniel R Knight; Brian Kullin; Deirdre A Collins; Papanin Putsathit; Pattarachai Kiratisin; Thomas V Riley
Journal: Expert Rev Anti Infect Ther Date: 2019-12-06 Impact factor: 5.091

4. Enzyme-linked immunosorbent assay for Clostridium difficile toxin A.

Authors: D M Lyerly; N M Sullivan; T D Wilkins
Journal: J Clin Microbiol Date: 1983-01 Impact factor: 5.948

5. Effects of Clostridium difficile toxins given intragastrically to animals.

Authors: D M Lyerly; K E Saum; D K MacDonald; T D Wilkins
Journal: Infect Immun Date: 1985-02 Impact factor: 3.441

6. Cyclic diguanylate inversely regulates motility and aggregation in Clostridium difficile.

Authors: Erin B Purcell; Robert W McKee; Shonna M McBride; Christopher M Waters; Rita Tamayo
Journal: J Bacteriol Date: 2012-04-20 Impact factor: 3.490

7. c-di-GMP turn-over in Clostridium difficile is controlled by a plethora of diguanylate cyclases and phosphodiesterases.

Authors: Eric Bordeleau; Louis-Charles Fortier; François Malouin; Vincent Burrus
Journal: PLoS Genet Date: 2011-03-31 Impact factor: 5.917

8. Bandage: interactive visualization of de novo genome assemblies.

Authors: Ryan R Wick; Mark B Schultz; Justin Zobel; Kathryn E Holt
Journal: Bioinformatics Date: 2015-06-22 Impact factor: 6.937

Review 9. Metabolism the Difficile Way: The Key to the Success of the Pathogen Clostridioides difficile.

Authors: Meina Neumann-Schaal; Dieter Jahn; Kerstin Schmidt-Hohagen
Journal: Front Microbiol Date: 2019-02-15 Impact factor: 5.640

10. Producing polished prokaryotic pangenomes with the Panaroo pipeline.

Authors: Gerry Tonkin-Hill; Neil MacAlasdair; Christopher Ruis; Aaron Weimann; Gal Horesh; John A Lees; Rebecca A Gladstone; Stephanie Lo; Christopher Beaudoin; R Andres Floto; Simon D W Frost; Jukka Corander; Stephen D Bentley; Julian Parkhill
Journal: Genome Biol Date: 2020-07-22 Impact factor: 13.583

1 in total

1. Epidemiology of Clostridium (Clostridioides) difficile Infection in Southeast Asia.

Authors: Peng An Khun; Thomas V Riley
Journal: Am J Trop Med Hyg Date: 2022-08-08 Impact factor: 3.707

1 in total