Literature DB >> 29617810

Two Groups of Cocirculating, Epidemic Clostridiodes difficile Strains Microdiversify through Different Mechanisms.

Tatiana Murillo1, Gabriel Ramírez-Vargas1, Thomas Riedel2,3, Jörg Overmann2,3, Joakim M Andersen4, Caterina Guzmán-Verri5, Esteban Chaves-Olarte1, César Rodríguez1.   

Abstract

Clostridiodes difficile strains from the NAPCR1/ST54 and NAP1/ST01 types have caused outbreaks despite of their notable differences in genome diversity. By comparing whole genome sequences of 32 NAPCR1/ST54 isolates and 17 NAP1/ST01 recovered from patients infected with C. difficile we assessed whether mutation, homologous recombination (r) or nonhomologous recombination (NHR) through lateral gene transfer (LGT) have differentially shaped the microdiversification of these strains. The average number of single nucleotide polymorphisms (SNPs) in coding sequences (NAPCR1/ST54 = 24; NAP1/ST01 = 19) and SNP densities (NAPCR1/ST54 = 0.54/kb; NAP1/ST01 = 0.46/kb) in the NAPCR1/ST54 and NAP1/ST01 isolates was comparable. However, the NAP1/ST01 isolates showed 3× higher average dN/dS rates (8.35) that the NAPCR1/ST54 isolates (2.62). Regarding r, whereas 31 of the NAPCR1/ST54 isolates showed 1 recombination block (3,301-8,226 bp), the NAP1/ST01 isolates showed no bases in recombination. As to NHR, the pangenome of the NAPCR1/ST54 isolates was larger (4,802 gene clusters, 26% noncore genes) and more heterogeneous (644 ± 33 gene content changes) than that of the NAP1/ST01 isolates (3,829 gene clusters, ca. 6% noncore genes, 129 ± 37 gene content changes). Nearly 55% of the gene content changes seen among the NAPCR1/ST54 isolates (355 ± 31) were traced back to MGEs with putative genes for antimicrobial resistance and virulence factors that were only detected in single isolates or isolate clusters. Congruently, the LGT/SNP rate calculated for the NAPCR1/ST54 isolates (26.8 ± 2.8) was 4× higher than the one obtained for the NAP1/ST1 isolates (6.8 ± 2.0). We conclude that NHR-LGT has had a greater role in the microdiversification of the NAPCR1/ST54 strains, opposite to the NAP1/ST01 strains, where mutation is known to play a more prominent role.

Entities:  

Mesh:

Year:  2018        PMID: 29617810      PMCID: PMC5888409          DOI: 10.1093/gbe/evy059

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Clostridioides difficile infections (CDI) are the main cause of hospital-acquired diarrhea after antibiotic treatment and the most common type of nosocomial infections in high-income countries (Slimings and Riley 2014; Knight et al. 2015). They vary from mild to moderate diarrhea to severe pseudomembranous colitis, toxic megacolon, and death (Hunt and Ballard 2013; Knight et al. 2015) and have a strong impact on healthcare systems, affecting millions of patients worldwide (McGlone et al. 2012; Lessa et al. 2015). These infections are mostly acquired through the exposure of patients to spores in hospital environments, although the number of CDI community-acquired cases is also on the rise (Gupta and Khanna 2014; Knight et al. 2015). The toxins TcdA and TcdB have been traditionally regarded as the main virulence factors of C. difficile (Hunt and Ballard 2013). They inactivate small GTPases through their glucosyltransferase activity and thereby damage the actin cytoskeleton of intestinal epithelial cells, among other deleterious host cell effects (Just et al. 1995; Chaves-Olarte et al. 1997). In most C. difficile strains, the genes encoding TcdA and TcdB are found in a so-called pathogenicity locus (PaLoc) (Braun et al. 1996). Other virulence factors described for this species include the binary toxin CDT, which affects the dynamics of epithelial microtubules as consequence of its ADP ribosyltransferase activity (Perelle et al. 1997; Schwan et al. 2009), as well as adhesins, fimbriae, and flagellin for host colonization (Goulding et al. 2009; Reynolds et al. 2011), and the surface layer protein (SlpA), which has been linked to inflammation and adherence to host cells (Calabi et al. 2001; Merrigan et al. 2013). As the virulence and epidemic potential of strains differ significantly, several methods, including Pulsed Field Gel Electrophoresis (PFGE), ribotyping, toxinotyping, and Multilocus Sequence Typing (MLST) have been applied to type C. difficile isolates (Knight et al. 2015). Among the different types, NAP1/ST01 strains are particularly notorious and caused nosocomial outbreaks linked to high morbidity and mortality rates in the United States, Canada, the United Kingdom, Australia, and Latin America (Quesada-Gómez et al. 2010; Hunt and Ballard 2013). Other strains of C. difficile, such as the NAPCR1/ST54, have also caused outbreaks (Quesada-Gómez et al. 2015). The NAPCR1/ST54 strains show high virulence in animal models despite their close phylogenetic relationship to nonepidemic ST54 isolates such as the C. difficile reference strain 630 (CD630) (Quesada-Gómez et al. 2015; López-Ureña et al. 2016). Moreover, they are multidrug-resistant and their genomes are unusually diverse, as indicated by their classification in at least 10 different SmaI macrorestriction patterns (López-Ureña et al. 2016; Ramírez-Vargas et al. 2017). The rates at which different types of genomic change occur are of fundamental importance to understanding prokaryote genome evolution (Vos et al. 2015) and the emergence of new or more virulent strains (Knight et al. 2015). Bacterial genomes may evolve through accumulation of mutations (m), homologous recombination (r), or nonhomologous recombination (NHR) (Mugal et al. 2014; Vos et al. 2015). Mutations give rise to single nucleotide polymorphisms (SNPs), which are termed nonsynonymous (dN) if they affect the coded protein or synonymous (dS) if they do not (Kryazhimskiy and Plotkin 2008; Mugal et al. 2014). Homologous recombination, in turn, is the exchange of genetic information between identical or highly similar DNA molecules, even between the same bacterial chromosome (Vos and Didelot 2009; Hanage 2016). One recombination event, unlike a mutation, simultaneously substitutes several nucleotides (Guttman and Dykhuizen 1994; Hanage 2016). The r/m rate compares the effect of r and m in bacterial diversification by calculating the rate of nucleotides per generation substituted by each process (Guttman and Dykhuizen 1994; Croucher et al. 2015). NHR is the acquisition of dissimilar genetic content by mechanisms of lateral gene transfer (LGT), including transformation, conjugation, transduction, and gene transfer agents (Dagan et al. 2008; Darmon and Leach 2014). NHR is harder to measure, but can be detected and estimated through pangenome analyses and comparative genomics (Vos et al. 2015; McInerney et al. 2017). Most studies on the diversification of C. difficile have so far focused on its core genome and only a few investigations have addressed the contribution of LGT to this process (He et al. 2010; Roberts et al. 2014) despite the recognized role of this parasexual process and the pangenome in bacterial niche adaptation and genome diversification (Hehemann et al. 2016; McInerney et al. 2017). Previous work on the NAP1/ST01 genome indicate that mutation, rather than homologous recombination, drives the microevolution of C. difficile (He et al. 2010; Dingle et al. 2011). In line with these studies, most C. difficile clades studied so far show dN/dS >1 and r/m rates below or close to 1 (He et al. 2010; Dingle et al. 2011). Other authors, by contrast, have reported that homologous recombination might play a strong role in the evolution of this species. For instance, Lemée et al. (2005) detected large SNPs blocks in cwp66, slpA, and flagellar genes among isolates from different MLST clades, Castillo-Ramírez et al. (2011) identified large recombinational blocks in NAP1/ST01 genomes, and Didelot et al. (2012) found that strains from certain STs have r/m ratios > 2. A major outbreak of CDI in a Costa Rican hospital was caused by NAP1/ST01 and NAPCR1/ST54 strains (Wong-McClure et al. 2012; Quesada-Gómez et al. 2015). Thus, we compared the core and accessory genomes of 17 NAP1/ST01 and 32 NAPCR1/ST54 isolates that cocirculated in Costa Rican hospitals to explore whether these two groups of strains display different signatures of mutation-, recombination-, and MGE-driven diversification in the context of genome evolution. Whereas the effect of mutation was appraised through the estimation of dN/dS rates, we calculated r/m rates and gene content changes to delimitate the contribution of homologous recombination and MGE-driven NHR in microdiversification, respectively. The results presented contribute to the ongoing debate about which and how evolutionary mechanisms shape microbial diversification processes and indicate that strains from the same species may microdiversify through different mechanisms.

Materials and Methods

Isolates and Whole Genome Sequences

Clostridiodes difficile clinical isolates of the NAPCR1/ST54 (n = 32) and NAP1/ST01 (n = 17) groups were recovered between 2003 and 2012 from patients with CDI in the Costa Rican hospitals San Juan de Dios (HSJD), México (HMX), Blanco Cervantes (HBC), Calderón Guardia (HCG), San Vicente de Paul (HSVP), and the National Centre for Rehabilitation (CENARE) (supplementary tables 1 and 2, Supplementary Material online). Whole genome sequences (WGS) for these isolates were obtained at the Wellcome Trust Sanger Institute using HiSeq 2500 instruments (Illumina). Velvet v.1.1 (Zerbino 2010) or Edena V3.131028 (Hernandez et al. 2008) were used for sequence assembly and the corresponding assembly statistics are presented in the supplementary table 3, Supplementary Material online. To resolve the structure of some MGEs, the genomes of selected NAPCR1 isolates were also sequenced using Single Molecule Real Time (SMRT) sequencing technology at the Leibniz Institute DSMZ (Germany). To this end, PacBio RSII long-read sequencing reads (P6 chemistry) were assembled with the “RS_HGAP_Assembly.3” protocol included in the SMRT Portal version 2.3.0. Sequencing data from the NAPCR1/ST54 and NAP1/ST01 isolates is available from the European Nucleotide Archive (Study PRJEB5034). Moreover, MGEs from selected NAPCR1/ST54 isolates were deposited under the accession numbers MF547662, MF547663, MF547664, MF547665, and MF547666.

Core Genome SNP Analyses

Breseq (Barrick et al. 2014) was used to call core genome SNPs using the annotated genomes of C. difficile R20291 (Acc. No.: FN545816) and C. difficile 630 (Acc. No.: AM180355) as reference genomes for the NAP1/ST01 and NAPCR1/ST54 isolates, respectively. A minimum coverage of 20 reads was used to define a SNP to avoid errors from misassemblies or bad alignments. Blocks of two or more SNPs, SNPs located within MGEs, and SNPs in intergenic regions were excluded from downstream analyses to disregard the influence of SNPs arising from recombination and to focus only on coding sequences (CDS). After this selection, we calculated the total number of SNPs and the SNP density for each isolate and classified the SNPs as dN or dS to estimate dN/dS rates. These dN/dS rates were compared using Mann–Whitney U tests. Additionally, we constructed maximum-likelihood bootstrapped trees from concatenated core SNP alignments generated by the CFSAN SNP pipeline (Gouy et al. 2010) with Seaview (Davis et al. 2015), and visualized with FigTree (http://tree.bio.ed.ac.uk/software/figtree/; last accessed March 21, 2018). The results of the Breseq and CFSAN pipelines differ because of thresholds used for SNP detection, since CFSAN considers SNPs in intergenic regions and MGE. All software was run with default parameters.

Analysis of Feature Frequency Profiles

Illumina WGS were compared using feature frequency profiles (FFP) to detect differences at the pangenome level. FFP is an alignment-free method that calculates distance scores based on differences in relative l-mer frequencies, being an l-mer a string of a defined amount of nucleotides. Since it is an alignment free method, it can be applied to WGS with dissimilar gene content and therefore used to determine differences in accessory genomes (Sims and Kim 2011). We used I-mers of 20 nt to reach a compromise between discrimination potential and computational capacity. Comparison matrices were transformed with the neighbor-joining method into trees that were visualized with FigTree.

Homologous Recombination Analyses in Core Genome

The alignments generated with the CFSAN SNP pipeline were analyzed with Gubbins (Croucher et al. 2015) to identify recombination blocks, detect SNPs within recombination blocks, and calculate r/m rates.

Estimation of NHR through Pangenome Comparisons

To compare the pangenomes of the NAPCR1/ST54 and NAP1/ST01 isolates and to facilitate MGE detection, Roary (Page et al. 2015) and Get_Homologues (Contreras-Moreira and Vinuesa 2013) were used to predict unique gene clusters. A unique gene cluster was defined a group of genes found only in a certain isolate. Additionaly, Roary was employed to estimate the size of the core and accessory genomes and to generate gene presence/absence spreadsheets and maximum likelihood phylogenetic trees from the accessory genomes. This pipeline classifies genes in four categories according to their frequency of occurrence in the data set: core genes (>99% of the isolates), soft-core genes (95% ≤ isolates < 99%), shell genes (15% ≤ isolates < 95%), and cloud genes (0% ≤ isolates < 15%). Get_Homologues, in turn, produces pangenomic matrices from which parsimony-based pangenomic trees can be derived. These trees were visualized as described earlier.

MGE Detection

According to their location and branching distances in the trees generated with Get_Homologues, draft genomes of four NAPCR1/ST54 isolates and six NAP1/ST01 were selected for further analyses. To spot unshared regions resembling MGEs, contigs containing unique gene clusters were compared with cognate contigs from reference genomes using WebACT/ACT (Carver et al. 2005). For MGE delimitation, we considered criteria such as presence of genes from known MGEs (i.e., phage-related proteins or recombinases), % GC skews, and atypical codon usages. Putative MGEs were annotated using Prokka v.1.11 (Seemann 2014) and manually curated using BLAST (Gish and States 1993) or InterPro (Finn et al. 2017) searches. A list of differentially distributed MGEs was created, and to measure their role in microdiversification, the Roary analyses were repeated with modified WGS in which these discriminative MGEs were deliberately removed.

Calculation of Gene Content Changes and LGT/SNPs Rates

The pangenome comparisons done with Roary and Get_Homologues provide a list of all accessory genes and the isolates in which they are present. These lists were used to calculate the number of gene changes (gain or loss) between the isolates and their corresponding reference genome. We also determined the amount of gene content changes linked to the MGEs that show a differential distribution among each group of isolates (MGE-driven LGT). To calculate LGT/SNP and MGE-driven LGT/SNP rates, we divided the number of gene content changes by the number of SNPs calculated from the Breseq output.

Comparison of CRISPR Arrays

CRISPR spacer arrays were predicted using the CRISPR Recognition Tool and thereafter manually curated for false positive repeats (Andersen 2016). In short, CRISPR loci were predicted and manually curated for false positive repeats. CRISPR loci were visualized by representing spacers with unique colored boxes that contain icons representing different spacer lengths. CRISPR loci were numbered from the ancestral end at the right hand side. Spacer deletions, showed by a crossed-out box, were deduced through spacer ordering across strains.

Results

SNPs Analyses

Compared with the NAP1/ST01 isolates, a smaller proportion of the reads obtained for the NAPCR1/ST54 isolates mapped to the corresponding reference genome (tables 1 and 2). Though both sets of genomes consist of very closely related isolates separated by only dozens of SNPs, the NAPCR1/ST54 isolates showed more SNPs in CDS and a higher average SNP density in their core genome than the NAP1/ST01 isolates (tables 1 and 2). The branching distance calculated for the NAPCR1/ST54 isolates in an unrooted SNP-based tree was almost 2.5-fold higher than that obtained for the NAP1/ST01 isolates, confirming that the core genome of the NAPCR1/ST54 group of isolates is more diverse (fig. 1). The topology of this tree did not match metadata such as the year or hospital of isolation and presented low bootstrap values. Most nonsynonymous SNPs identified in the NAPCR1/ST54 and NAP1/ST01 isolates (fig. 2; supplementary tables 4 and 5, Supplementary Material online) were found in genes encoding metabolic enzymes (NAPCR1: n = 7; NAP1: n = 11) or antibiotic resistance or virulence traits (NAPCR1: n = 6; NAP1: n = 3).
Table 1

SNPs in the Core Genome of the NAPCR1/ST54 Isolates

IsolatePFGE SmaI PatternGenome Size (Mb)% Reads Mapped to CD630Total Number of SNPsAverage Number of SNPsSNP Density (per 100 kb)Average SNP Density (per SmaI)Number of Nonsynonymous Mutations (dN)Number of Synonymous Mutations (dS)dN/dS RateAverage dN/dS Rate
31474424.5490.224240.530.531682.002.00
57014474.5192.028240.620.5318101.802.77
57114.5490.5230.511762.83
57674.5590.1230.511762.83
57714.5590.3230.511853.60
27844484.5191.223240.510.531672.292.59
31254.5590.4220.481572.14
31374.5192.2230.511672.29
54344.5191.1240.511863.00
57044.5591.0250.551782.13
57074.5191.3240.531772.43
57334.5590.2250.551963.17
57514.5590.8230.511672.29
57744.5590.2230.511853.60
62754.5291.7290.642182.63
31294494.5490.622240.480.531662.672.76
57194.5489.5260.571972.71
57554.5590.2230.511853.60
57724.5590.5250.551872.57
62764.5390.0250.551872.57
62894.6289.9240.521772.43
57344524.5191.124240.530.601863.003.00
29454874.6087.024250.520.532045.004.08
57634.6188.3250.541963.17
29924884.5490.023230.510.511581.881.88
57614894.5090.426280.580.611972.712.67
57624.5091.3290.642182.63
31455584.5590.325260.550.561691.782.01
62854.5590.2260.571882.25
31445784.5590.022220.480.481662.672.41
31504.5590.5250.551691.78
54364.5590.0190.421452.80
Average4.5490.424240.530.541772.622.62
Table 2

SNPs in the Core Genome of the NAP1/ST01 Isolates

IsolateGenome Size (Mb)% Reads Mapped to R20291Total Number of SNPsAverage Number of SNPsSNP Density (per 100 kb)Average SNP DensityNumber of Nonsynonymous Mutations (dN)Number of Synonymous Mutations (dS)dN/dS RateAverage dN/dS Rate
57004.1896.220190.480.461829.008.35
57034.1896.2210.501929.50
57054.1298.1170.411527.50
57064.1297.0210.511836.00
57084.1396.5190.461728.50
57094.1397.8190.461728.50
57104.1397.4190.461728.50
57134.1396.4210.511929.50
57144.0999.3190.461728.50
57184.1397.1190.461728.50
57204.1895.2200.481829.00
57494.1396.9180.441628.00
57584.1398.0170.411527.50
57594.1397.7190.461728.50
57644.1099.5200.491829.00
57654.1497.6190.461728.50
57684.1397.4170.411527.50
Average4.1397.319190.460.461728.358.35
. 1.

—Unrooted phylogenomic maximum likelihood trees of NAPCR1/ST54 (A) and NAP1/ST01 (B) isolates generated from core genome SNP alignments. Compared with the NAP1/ST01 isolates, the NAPCR1/ST54 isolates showed larger distances and were supported by higher bootstrap values. Core genome SNPs were called and aligned using the CFSAN SNP pipeline. The resulting distance matrixes were used as input by Seaview to build trees using the PhyML algorithm and a bootstrap value of 100. Bootstrap values are indicated above the branches. Scales correspond to average number of substitutions per site. Different hospitals are shown by different colors. Different symbols denote different years of isolation.

. 2.

—Genomic localization, score quality, and predicted function of nonsynonymous SNPs detected in core genome of the NAPCR1/ST54 (A) and the NAP1/ST01 (B) isolates. The genomes of strains 630 or R20291 were used as references for the NAPCR1/ST54 and the NAP1/ST01 isolates, respectively. The SmaI pattern of the isolates in panel A is shown in the Y axis. The diameter of the circles represents the score assigned by Breseq to each SNP and the different colors depict the predicted function of the genes with SNPs. The used color code refers to the functional annotation.

SNPs in the Core Genome of the NAPCR1/ST54 Isolates SNPs in the Core Genome of the NAP1/ST01 Isolates —Unrooted phylogenomic maximum likelihood trees of NAPCR1/ST54 (A) and NAP1/ST01 (B) isolates generated from core genome SNP alignments. Compared with the NAP1/ST01 isolates, the NAPCR1/ST54 isolates showed larger distances and were supported by higher bootstrap values. Core genome SNPs were called and aligned using the CFSAN SNP pipeline. The resulting distance matrixes were used as input by Seaview to build trees using the PhyML algorithm and a bootstrap value of 100. Bootstrap values are indicated above the branches. Scales correspond to average number of substitutions per site. Different hospitals are shown by different colors. Different symbols denote different years of isolation. —Genomic localization, score quality, and predicted function of nonsynonymous SNPs detected in core genome of the NAPCR1/ST54 (A) and the NAP1/ST01 (B) isolates. The genomes of strains 630 or R20291 were used as references for the NAPCR1/ST54 and the NAP1/ST01 isolates, respectively. The SmaI pattern of the isolates in panel A is shown in the Y axis. The diameter of the circles represents the score assigned by Breseq to each SNP and the different colors depict the predicted function of the genes with SNPs. The used color code refers to the functional annotation. The average dN/dS rates calculated for both groups of isolates were >1 (tables 1 and 2), but the rate calculated from the NAPCR1/ST54 isolates was 3.2 times lower than that obtained for for the NAP1/ST01 isolates. An exception to this observation was the dN/dS rate of NAPCR1/ST54 isolates from the 487 SmaI pattern, which was 4.08 and therefore unusually high (table 1).

Homologous Recombination Analyses

With a single exception (isolate 5763), the NAPCR1/ST54 isolates showed between 3,301 and 8,226 bases in recombination (supplementary table 6, Supplementary Material online). Additionally, the WGS of the NAPCR1/ST54 isolates 3150 and 5734 had one recombination block of 13 or 12 SNPs each and were therefore linked to r/m ratios >3 (supplementary table 6, Supplementary Material online). The NAP1/ST01 genomes, by contrast, did not have bases in recombination or recombination blocks.

Accessory Genome and Pangenome Comparisons for Assessment of NHR

The NAPCR1/ST54 genomes (4.50–4.62 Mb) were on an average 0.41 Mb larger than their NAP1/ST01 counterparts (4.09–4.18 Mb) (tables 1 and 2). Roary predicted 4,802 gene clusters for the NAPCR1/ST54 isolates of which 74% were catalogued as core genome, 7% as soft-core genome, 8% as shell genome, and 11% as cloud genome (fig. 3). In contrast, only 3,829 gene clusters and a core genome of 94% was predicted for the NAP1/ST01 isolates (fig. 3). The shell and cloud genomes of this group of isolates only included 4.7% and 1.6% of the predicted gene clusters, respectively.
. 3.

—Comparison of the pangenomes of the analyzed NAPCR1/ST54 (A) and NAP1/ST01 isolates (B). According to their frequency of finding, these Roary pie charts show the amount of genes clustered in the categories core (99% ≤ strains ≤ 100%), soft-core (95% ≤ strains < 99%), shell (15% ≤ strains < 95%), and cloud (0% ≤ strains < 15%).

—Comparison of the pangenomes of the analyzed NAPCR1/ST54 (A) and NAP1/ST01 isolates (B). According to their frequency of finding, these Roary pie charts show the amount of genes clustered in the categories core (99% ≤ strains ≤ 100%), soft-core (95% ≤ strains < 99%), shell (15% ≤ strains < 95%), and cloud (0% ≤ strains < 15%). The root-to-tip distance of the NAPCR1/ST01 isolates in a parsimony-based pangenomic tree was larger than that determined for isolates of the NAP1 pulsotype (fig. 4). Comparable results were observed in FFP-based trees (supplementary fig. 1, Supplementary Material online).
. 4.

—Unrooted parsimony-based pangenomic trees calculated for NAPCR1/ST54 (A) and NAP1/ST01 isolates (B). Three distinct groups of NAPCR1/ST54 and six distinct groups of NAP1/ST01 isolates were defined, respectively. These groups appear purple (I), green (II), and blue (III) in panel A or in teal (I), brown (II), purple (III), green (IV), dark red (V), and blue (VI) in panel B. Trees were derived from binary matrixes summarizing the presence–absence of gene clusters in proteome predictions generated with Get_Homologues. Isolates selected for downstream pangenome analyses were marked with block arrows.

—Unrooted parsimony-based pangenomic trees calculated for NAPCR1/ST54 (A) and NAP1/ST01 isolates (B). Three distinct groups of NAPCR1/ST54 and six distinct groups of NAP1/ST01 isolates were defined, respectively. These groups appear purple (I), green (II), and blue (III) in panel A or in teal (I), brown (II), purple (III), green (IV), dark red (V), and blue (VI) in panel B. Trees were derived from binary matrixes summarizing the presence–absence of gene clusters in proteome predictions generated with Get_Homologues. Isolates selected for downstream pangenome analyses were marked with block arrows. Based on the clustering of the isolates in the parsimony-based pangenomic tree, the accessory genomes of the isolates depicted with block arrows in figure 4 were studied in more detail with regard to the detection of unique gene clusters and therefore possible NHR. The selected NAPCR1/ST54 isolates had many more unique gene clusters than the NAP1/ST01 isolates. In the NAPCR1 pulsotype, isolate 2945 from Cluster I showed the greatest number of unique gene clusters (n = 376), followed by isolates 6276 and 6289 from Cluster III (n = 104), and isolate 5761 from Cluster II (n = 62). Within the NAP1 genotype, isolate 5703 from Cluster V had the largest number of unique gene clusters (n = 85) (supplementary table 7, Supplementary Material online). All other representative NAP1 isolates only had between 10 and 17 unique gene clusters (supplementary table 7, Supplementary Material online).

Role of Differentially Distributed MGEs in Microdiversification

The majority of the unique genes of the NAPCR1/ST54 isolates were associated with MGEs, which are absent in the closely related strain C. difficile 630 (table 3). The MGEs that were differentially represented among the NAPCR1/ST54 isolates include: 1) a putative prophage of 56 kb in isolates with the SmaI pattern 487, 2) two putative big phages related to phiCDIF1296T (Wittmann et al. 2015), 3) a putative plasmid of 69 kb exclusively found in isolate 6289 from Cluster III, and 4) a mobilizable transposon similar to Tn4001 not seen in isolates with the SmaI pattern 487 (Cluster I). Three NAPCR1/ST54 isolates lacked two well-described MGEs from CD630 and other C. difficile genotypes, namely, isolate 6276 from Cluster III, which lacks the skinCd element, and isolates 5761 and 5762 from Cluster II, which do not encode Tn5397 (fig. 5) (Haraldsen and Sonenshein 2003; Dannheim et al. 2017). A very different picture was derived from the comparison of the NAP1/ST01 pangenomes, as only the isolates 5703, 5720, and 5700 from Cluster V had a distinctive MGE. This element is identical to a previously reported plasmid-like sequence of the C. difficile type strain DSM 1296 T (Riedel et al. 2015). The topology of the parsimony-based pangenomic trees mirrored the distribution of these MGEs in the data set (fig. 5).
Table 3

Presence–Absence Matrix of MGEs Differentially Distributed among the NAPCR1/ST54 Isolates

Strain/IsolatePFGE SmaI PatternMGE
56-kb ProphageBig Phage (variant 1)Big Phage (variant 2)Putative PlasmidmobTn withTn4001skinCdTn5397
CD630++
3147442++++
5701, 5711, 5767, 5771447++++
2784, 3125, 3137, 5434, 5704, 5707, 5733, 5751, 5774, 6275448++++
3129, 5719, 5755, 5772, 6276, 6289449+a++b+
5734452++++
2945, 5763487++++
2992488++++
5761, 5762489+++
3145, 6285558++++
3144, 3150, 5436578++++

Present in isolate 6289.

Absent in isolate 6276.

. 5.

—Localization of discriminative MGEs of NAPCR1/ST54 (A) and NAP1/ST01 isolates (B) in unrooted, parsimony-based pangenomic trees. MGEs found in certain but not all isolates were highlighted with colors. The NAPCR1/ST54 isolates from Cluster I were characterized by the carriage of a putative big phage (v2, orange) and a putative prophage of 56 kb (blue). Isolates from Clusters II and III have another type of big phage (v1, pink) and a predicted mobilizable transposon with a Tn4001-like element (green). Isolates 5761 and 5762 from Cluster II lack Tn5397. Moreover, isolate 6289 has a putative conjugative plasmid (teal) and isolate 6276 lacks the skinCd element (brown). Only the NAP1/ST01 isolates from Cluster V have a differentially distributed MGE. This element gave a perfect BLAST hit to an episomal sequence with bacteriophage functions previously found in the Clostridiodes difficile type strain DSM 1296 T. These trees were derived from binary matrixes summarizing the presence–absence of gene clusters in proteome predictions generated with Get_Homologues.

Presence–Absence Matrix of MGEs Differentially Distributed among the NAPCR1/ST54 Isolates Present in isolate 6289. Absent in isolate 6276. —Localization of discriminative MGEs of NAPCR1/ST54 (A) and NAP1/ST01 isolates (B) in unrooted, parsimony-based pangenomic trees. MGEs found in certain but not all isolates were highlighted with colors. The NAPCR1/ST54 isolates from Cluster I were characterized by the carriage of a putative big phage (v2, orange) and a putative prophage of 56 kb (blue). Isolates from Clusters II and III have another type of big phage (v1, pink) and a predicted mobilizable transposon with a Tn4001-like element (green). Isolates 5761 and 5762 from Cluster II lack Tn5397. Moreover, isolate 6289 has a putative conjugative plasmid (teal) and isolate 6276 lacks the skinCd element (brown). Only the NAP1/ST01 isolates from Cluster V have a differentially distributed MGE. This element gave a perfect BLAST hit to an episomal sequence with bacteriophage functions previously found in the Clostridiodes difficile type strain DSM 1296 T. These trees were derived from binary matrixes summarizing the presence–absence of gene clusters in proteome predictions generated with Get_Homologues. The MGEs that differentiate the clusters of NAPCR1/ST54 isolates include genes linked to antibiotic resistance or virulence (supplementary table 8, Supplementary Material online). For instance, the putative conjugative plasmid of the NAPCR1 isolate 6289 of Cluster III harbors a von Willebrand factor type A protein, a putative ADP-ribosyltransferase exoenzyme, and what seems to be a Fic/DOC toxin. Likewise, the mobTn with a Tn4001-like element and the 56 kb prophage inserted in some NAPCR1 isolates, carry genes that likely confer resistance to aminoglycosides (Ramírez-Vargas et al. 2017). When the sequences of the putative plasmid, the big bacteriophages, and the 56 kb prophage of the NAPCR1/ST54 isolates were deliberatively removed from their draft WGS and the Roary pangenome calculations were repeated, the number of gene clusters in the NAPCR1/ST54 WGS decreased 4% from 4,802 to 4,595 and, except for isolates from the 487 SmaI macrorestriction pattern, the branching of the resulting pangenomic tree collapsed (fig. 6, panels A and B). When this reanalysis was performed removing the putative plasmid-like sequence from the draft genomes of the NAP1/ST01 isolates 5700, 5703, and 5720, the number of predicted gene clusters was reduced by only 2%, from 3,829 to 3,755 (fig. 6, panels C and D).
. 6.

—Roary analysis of WGS of NAPCR1/ST54 and NAP1/ST01 strains with and without selected MGEs. (A) Original NAPCR1/ST54 pangenome. (B) Pangenome analysis of NAPCR1/ST54 genomes from which the putative plasmid, the two big phages, and the 56-kb prophage were removed. (C) Original NAP1/ST01 pangenome. (D) Pangenome analysis of NAP1/ST01 WGS lacking the putative plasmid-like sequence carrying bacteriophage genes. The trees show the clustering of isolates according to gene presence–absence matrixes. The blue and white bars represent shared and unshared gene clusters, respectively. Red squares delineate the gene clusters associated with the MGEs removed in the reanalysis. Tree distances were more notably reduced among the NAPCR1 WGS when the differential MGE were eliminated.

—Roary analysis of WGS of NAPCR1/ST54 and NAP1/ST01 strains with and without selected MGEs. (A) Original NAPCR1/ST54 pangenome. (B) Pangenome analysis of NAPCR1/ST54 genomes from which the putative plasmid, the two big phages, and the 56-kb prophage were removed. (C) Original NAP1/ST01 pangenome. (D) Pangenome analysis of NAP1/ST01 WGS lacking the putative plasmid-like sequence carrying bacteriophage genes. The trees show the clustering of isolates according to gene presence–absence matrixes. The blue and white bars represent shared and unshared gene clusters, respectively. Red squares delineate the gene clusters associated with the MGEs removed in the reanalysis. Tree distances were more notably reduced among the NAPCR1 WGS when the differential MGE were eliminated.

Gene Content Changes and LGT to SNP Rates

When the WGS of the NAPCR1/ST54 isolates were compared with the genome of the reference strain CD630, the number of gene content changes ranged between 586 and 730 (average: 644 ± 33) (table 4). Up to 55 ± 3% of this acquired genetic material (n = 346–494 CDS, 355 ± 31 on an average) was associated with the aforementioned discriminative MGEs (table 4). In agreement with this observation, the isolates that gained more genes (2945, 5763, 6289) had larger MGEs. A similar comparison of NAP1/ST01 isolates and the genome of the reference strain R20291 only revealed 68–194 gene content changes (average: 129 ± 37 CDS) (table 5).
Table 4

Gene Content Changes between the NAPCR1/ST54 Isolates and the Reference Strain CD630

IsolatePFGE SmaI PatternGene Content Changes (n)AverageGene Content Changes Linked to Differentially Distributed MGEs (n)Average% of Gene Content Changes Linked to Differentially Distributed MGEsAverage
31474426496493463465353
57014475866413463465954
571166334652
576765734653
577165834653
27844485996243463465856
312565534653
313759934658
543459434658
570464434654
570761134657
573364634654
575163634654
577465134653
627560134658
31294496696673463715255
571965734653
575564034654
577264234654
627666434652
628973049468
57344526046043463465757
29454877097074204205959
576370442060
29924886646643463465252
57614896166233463465656
576262934655
31455586466443463465454
628564134654
31445786396463463465454
315065434653
543664534654
Average644±33644±33355±31355±3155±355±3
Table 5

Gene Content Changes between the NAP1/ST01 Isolates and the Reference Strain R20291

IsolateGene Content Changes (n)Gene Content Changes Linked to Differentially Distributed MGEs (n)% of Gene Content Changes Linked to Differentially Distributed MGEs
570019411660
570318411663
570518700
570611800
570811900
570912000
571011200
571311900
57147100
571811600
572018711662
574912100
575812100
575912500
57646800
576511700
576811500
Average129±3720±4610±24
Gene Content Changes between the NAPCR1/ST54 Isolates and the Reference Strain CD630 Gene Content Changes between the NAP1/ST01 Isolates and the Reference Strain R20291 As seen in tables 6 and 7, the NAPCR1/ST54 isolates had a 4-fold higher average LGT/SNP rate (range: 20.7–33.9, average: 26.8 ± 2.8) than the NAP1/ST01 isolates (range: 3.4–11.0, average: 6.8 ± 2.0). Similar results were obtained when the calculation of LGT/SNP rates was restricted to gene content changes linked to the MGEs differentially distributed among both groups of strains (tables 6 and 7).
Table 6

LGT/SNP Rates Calculated for the NAPCR1/ST54 Isolates

IsolatePFGE SmaI PatternLGT/SNP RateaAverageMGE-Driven LGT/SNP RatebAverage
314744227.027.014.414.4
570144720.926.712.414.4
571128.815.0
576728.615.0
577128.615.0
278444826.026.015.014.4
312529.815.7
313726.015.0
543424.814.4
570425.813.8
570725.514.4
573325.813.8
575127.715.0
577428.315.0
627520.711.9
312944930.427.715.715.4
571925.313.3
575527.815.0
577225.713.8
627626.613.8
628930.420.6
573445225.225.214.414.4
294548729.528.917.517.2
576328.216.8
299248828.928.915.015
576148923.722.713.312.6
576221.711.9
314555825.825.213.813.6
628524.713.3
314457829.029.715.715.9
315026.213.8
543633.918.2
Average26.8±2.826.8±2.814.8±1.714.8±1.7

Defined as number of gene content changes with respect to strain CD630/number SNPs identified by Breseq.

Defined as number of gene content changes linked to differentially distributed MGEs/number SNPs identified by Breseq.

Table 7

LGT/SNP Rates Calculated for the NAP1/ST01 Isolates

IsolateLGT/SNP RateaMGE-Driven LGT/SNP Rateb
57009.75.8
57038.85.5
570511.0NA
57065.6NA
57086.3NA
57096.3NA
57105.9NA
57135.7NA
57143.7NA
57186.1NA
57209.45.8
57496.7NA
57587.1NA
57596.6NA
57643.4NA
57656.2NA
57686.8NA
Average6.8±2.05.7±0.2

not applicable.

Defined as number of gene content changes with respect to strain R20291/number SNPs identified by Breseq.

Defined as number of gene content changes linked to differentially distributed MGEs/number SNPs identified by Breseq.

LGT/SNP Rates Calculated for the NAPCR1/ST54 Isolates Defined as number of gene content changes with respect to strain CD630/number SNPs identified by Breseq. Defined as number of gene content changes linked to differentially distributed MGEs/number SNPs identified by Breseq. LGT/SNP Rates Calculated for the NAP1/ST01 Isolates not applicable. Defined as number of gene content changes with respect to strain R20291/number SNPs identified by Breseq. Defined as number of gene content changes linked to differentially distributed MGEs/number SNPs identified by Breseq.

CRISPR Arrays

Based on the assumption that frequent exposure to MGEs will translate into a large diversity and number of CRISPR spacers, we compared the CRISPR-arrays of the NAPCR1/ST54 isolates and NAP1/ST01 isolates with those of the reference strains CD630 and R20291, respectively. The NAPCR1/ST54 isolates had eight of the 12 CRISPR arrays of strain CD630 and showed spacer variations in the loci 3, 4, 5, 6, 7, 8, 9, and 12 (supplementary fig. 2A, Supplementary Material online). From the missing arrays, arrays 1 and 2 are reported to be present in MGEs (Sebaihia et al. 2006). On the contrary, the analyzed NAP1/ST01 isolates have the nine CRISPR arrays that characterize the reference strain C. difficile R20291 (supplementary fig. 2B, Supplementary Material online). In this data set, only isolates 5708 and 5709 deviated from the R20291 CRISPR profile, namely through to the lack of one spacer in locus 8 (supplementary fig. 2B, Supplementary Material online).

Discussion

Our results show that the acquisition/loss of MGEs and homologous recombination, rather than mutation, has had a stronger influence in the microdiversification of the NAPCR1/ST54 isolates compared with the NAP1/ST01 isolates, which—as previously reported—is a pathogenic clone whose microdiversification is primarily driven by mutations in its core genome (He et al. 2010; Didelot et al. 2012) rather than by recombination (Dingle et al. 2011; Stabler et al. 2012). The dN/dS rates calculated for the core genomes of both groups of bacteria were >1 with the NAP1/ST01 having the higher values. Rates >1 can be attributed to purifying selection not having enough time to eliminate deleterious changes, and is a phenomenon usually seen in closely related lineages (Rocha et al. 2006; Castillo-Ramírez et al. 2011). Thus, the higher rates calculated in the NAP1/ST01 group could represent the greater proximity between the isolates as compared with the NAPCR1/ST54 group, and not neccesarily positive selection. However, it is possible that the large number of dN mutations detected among the NAP1/ST01 isolates reflects a greater effect of mutation in its diversfication (Rocha et al. 2006; Kryazhimskiy and Plotkin 2008). Mainly, when previous publications have already stated that the NAP1/ST01 lineage is clonal and microdiversifies through accumulation of mutations in the core genomes rather than recombination (Dingle et al. 2011; Stabler et al. 2012). In addition, positive selection, which is likely to be favored by fine tuned pathogenic strains, has been proposed for other outbreak-causing C. difficile strains from the ST37 from Clade IV (Dingle et al. 2011; Didelot et al. 2012). By contrast, the high number of dS mutations seen among the NAPCR1/ST54 isolates might have derived from unnoticed recombination events (Castillo-Ramírez et al. 2011). In both groups of isolates, we identified SNPs that are noteworthy due to their potential influence on virulence or the regulation of virulence-related phenotypes. In particular, there were SNPs in the precursor of the S-layer protein SlpA, which is related to bacterial adhesion and immune response, and in putative exosporium proteins, which protect the spores in aerobic environments outside of the host as well as from the host immune system (Merrigan et al. 2013; Paredes-Sabja et al. 2014). In addition, we observed SNPs in genes related to the carbohydrate phosphotransferase system PTS, which is relevant for toxin production through catabolite repression (Martin-Verstraete et al. 2016). Previous research has claimed that some sequence types from the Clade I microdiversify through homologous recombination (Stabler et al. 2012). For instance, Didelot et al. (2012) determined a higher r/m ratio for ST54 isolates from other geographic regions (2.54) than for ST01 isolates (0.04). Given that no bases in recombinations were detected for the NAP1/ST01 isolates, that the NAPCR1/ST54 isolates had different amounts of bases in recombination, and that two of the NAPCR1/ST54 isolates had an unshared recombination block, our results coincide with these previous reports. We therefore conclude that the effect of homologous recombination in microdiversification was greater for the NAPCR1/ST54 isolates than for the NAP1/ST01 isolates. The NAPCR1/ST54 isolates gained more CDS, obtained a larger number of CDS through acquisition of differentially distributed MGEs, and were characterized by higher LGT/SNP rates than the NAP1/ST01 isolates. This indicates that the NAPCR1/ST54 isolates are more prone than the NAP1/ST01 isolates to acquire genetic information by LGT. This trait is expected for organisms that thrive in heterogeneous and changing conditions, hence it seems likely that the NAPCR1/ST54 and NAP1/ST01 strains take advantage of distinctive strategies to adapt and colonize the human gut and cause disease and/or outbreaks (Rouli et al. 2015; McInerney et al. 2017). Further supporting the concept that the NAPCR1 pangenome is open, the NAPCR1/ST54 isolates were not distributed in the branches of a pangenomic tree according to their macrorestriction patterns or hospital/year of isolation. Instead, the topology of this tree was dictated by the gain or loss of MGEs that included most of the unique gene clusters. We acknowledge that the disparity in the number of isolates from each genotype can affect our pangenome estimations. However, it is unlikely that the size of the NAP1/ST01 pangenome calculated for our isolates will depart from that of the global NAP1 population, as indicated by the lower SNP counts, the very high percentage of reads that mapped to the reference genome selected, and the already recognized clonality of this strain (Stabler et al. 2006, 2009). MGEs are generally unstable and tend to be eliminated to reduce their burden (Karcagi et al. 2016; McInerney et al. 2017), yet under some circumstances greater pangenomes and the acquisition of MGEs provide advantageous traits for certain bacterial species (Vos et al. 2015; McInerney et al. 2017). Five of the differential MGEs found among the NAPCR1/ST54 isolates are absent in the closely related C. difficile strain 630, suggesting that the biological differences between this reference strain and the more virulent NAPCR1 genotype could be due to laterally transferred DNA (Quesada-Gómez et al. 2015). Although these MGEs await functional characterization, we hypothesize that they are mobilizable or conjugative based on the predicted functions of some of their genes. Our data confirm the enhanced capability of the NAPCR1/ST54 isolates to acquire MGEs and explains the large size of the pangenomes of this clade. This feature is not fully understood, although it could be related to the accuracy and efficiency of restriction-modification systems, CRISPR-Cas systems, and DNA repair mechanisms to cite possible mechanisms (Darmon and Leach 2014). Whether the NAP1/ST01 isolates have active barriers for LGT that are absent in the NAPCR1 isolates remains to be determined. Our results demonstrate that highly virulent, outbreak-causing C. difficile strains from two different ST groups and MLST clades microdiversify through different mechanisms and emphasize the importance of MGE as drivers of bacterial diversification also for ST54 isolates. Future studies addressing the evolution of C. difficile should consider the role of MGEs and the pangenome along with investigations of the core genome because accessory genes may mediate clinically relevant phenotypes such as antimicrobial resistance and virulence. We also acknowledge that the genomic plasticity of the NAPCR1/ST54 isolates poses a threat, as it suggests that MGE gain/loss events may lead to the emergence of non-NAP1 lineages with increased virulence and outbreak potential that cannot be distinguished from ordinary strains through MLST, ribotyping, or core genome-based typing.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.
  62 in total

1.  Toxins A and B from Clostridium difficile differ with respect to enzymatic potencies, cellular substrate specificities, and surface binding to cultured cells.

Authors:  E Chaves-Olarte; M Weidmann; C Eichel-Streiber; M Thelestam
Journal:  J Clin Invest       Date:  1997-10-01       Impact factor: 14.808

2.  Prokka: rapid prokaryotic genome annotation.

Authors:  Torsten Seemann
Journal:  Bioinformatics       Date:  2014-03-18       Impact factor: 6.937

3.  Identification of protein coding regions by database similarity search.

Authors:  W Gish; D J States
Journal:  Nat Genet       Date:  1993-03       Impact factor: 38.330

4.  Manual curation and reannotation of the genomes of Clostridium difficile 630Δerm and C. difficile 630.

Authors:  Henning Dannheim; Thomas Riedel; Meina Neumann-Schaal; Boyke Bunk; Isabel Schober; Cathrin Spröer; Cynthia Maria Chibani; Sabine Gronow; Heiko Liesegang; Jörg Overmann; Dietmar Schomburg
Journal:  J Med Microbiol       Date:  2017-03       Impact factor: 2.472

Review 5.  Antibiotics and hospital-acquired Clostridium difficile infection: update of systematic review and meta-analysis.

Authors:  Claudia Slimings; Thomas V Riley
Journal:  J Antimicrob Chemother       Date:  2013-12-08       Impact factor: 5.790

6.  The Clostridium difficile cell wall protein CwpV is antigenically variable between strains, but exhibits conserved aggregation-promoting function.

Authors:  Catherine B Reynolds; Jenny E Emerson; Lucia de la Riva; Robert P Fagan; Neil F Fairweather
Journal:  PLoS Pathog       Date:  2011-04-21       Impact factor: 6.823

7.  Clinical Clostridium difficile: clonality and pathogenicity locus diversity.

Authors:  Kate E Dingle; David Griffiths; Xavier Didelot; Jessica Evans; Alison Vaughan; Melina Kachrimanidou; Nicole Stoesser; Keith A Jolley; Tanya Golubchik; Rosalind M Harding; Tim E Peto; Warren Fawley; A Sarah Walker; Mark Wilcox; Derrick W Crook
Journal:  PLoS One       Date:  2011-05-19       Impact factor: 3.240

8.  Macro and micro diversity of Clostridium difficile isolates from diverse sources and geographical locations.

Authors:  Richard A Stabler; Lisa F Dawson; Esmeralda Valiente; Michelle D Cairns; Melissa J Martin; Elizabeth H Donahue; Thomas V Riley; J Glenn Songer; Ed J Kuijper; Kate E Dingle; Brendan W Wren
Journal:  PLoS One       Date:  2012-03-02       Impact factor: 3.240

9.  Roary: rapid large-scale prokaryote pan genome analysis.

Authors:  Andrew J Page; Carla A Cummins; Martin Hunt; Vanessa K Wong; Sandra Reuter; Matthew T G Holden; Maria Fookes; Daniel Falush; Jacqueline A Keane; Julian Parkhill
Journal:  Bioinformatics       Date:  2015-07-20       Impact factor: 6.937

10.  The bacterial pangenome as a new tool for analysing pathogenic bacteria.

Authors:  L Rouli; V Merhej; P-E Fournier; D Raoult
Journal:  New Microbes New Infect       Date:  2015-06-26
View more
  3 in total

1.  Higher genome variability within metabolism genes associates with recurrent Clostridium difficile infection.

Authors:  Maria Kulecka; Edyta Waker; Filip Ambrozkiewicz; Agnieszka Paziewska; Karolina Skubisz; Patrycja Cybula; Łukasz Targoński; Michał Mikula; Jan Walewski; Jerzy Ostrowski
Journal:  BMC Microbiol       Date:  2021-01-28       Impact factor: 3.605

2.  First genotypic characterization of toxigenic Clostridioides difficile in Lithuanian hospitals reveals the prevalence of the hypervirulent ribotype 027/ST1.

Authors:  Simona Tratulyte; Jolanta Miciuleviciene; Nomeda Kuisiene
Journal:  Eur J Clin Microbiol Infect Dis       Date:  2019-07-20       Impact factor: 3.267

3.  Origin, genomic diversity and microevolution of the Clostridium difficile B1/NAP1/RT027/ST01 strain in Costa Rica, Chile, Honduras and Mexico.

Authors:  Enzo Guerrero-Araya; Claudio Meneses; Eduardo Castro-Nallar; Ana M Guzmán D; Manuel Álvarez-Lobos; Carlos Quesada-Gómez; Daniel Paredes-Sabja; César Rodríguez
Journal:  Microb Genom       Date:  2020-03-16
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.