Literature DB >> 33208516

Homologous Recombination in Clostridioides difficile Mediates Diversification of Cell Surface Features and Transport Systems.

Hannah D Steinberg1, Evan S Snitkin2.   

Abstract

Illness caused by the pathogen Clostridioides difficile is widespread and can range in severity from mild diarrhea to sepsis and death. Strains of C. difficile isolated from human infections exhibit great genetic diversity, leading to the hypothesis that the genetic background of the infecting strain at least partially determines a patient's clinical course. However, although certain strains of C. difficile have been suggested to be associated with increased severity, strain typing alone has proved insufficient to explain infection severity. The limited explanatory power of strain typing has been hypothesized to be due to genetic variation within strain types, as well as genetic elements shared between strain types. Homologous recombination is an evolutionary mechanism that can result in large genetic differences between two otherwise clonal isolates, and also lead to convergent genotypes in distantly related strains. More than 400 C. difficile genomes were analyzed here to assess the effect of homologous recombination within and between C. difficile clades. Almost three-quarters of single nucleotide variants in the C. difficile phylogeny are predicted to be due to homologous recombination events. Furthermore, recombination events were enriched in genes previously reported to be important to virulence and host-pathogen interactions, such as flagella, cell wall proteins, and sugar transport and metabolism. Thus, by exploring the landscape of homologous recombination in C. difficile, we identified genetic loci whose elevated rates of recombination mediated diversification, making them strong candidates for being mediators of host-pathogen interaction in diverse strains of C. difficile IMPORTANCE Infections with C. difficile result in up to half a million illnesses and tens of thousands of deaths annually in the United States. The severity of C. difficile illness is dependent on both host and bacterial factors. Studying the evolutionary history of C. difficile pathogens is important for understanding the variation in pathogenicity of these bacteria. This study examines the extent and targets of homologous recombination, a mechanism by which distant strains of bacteria can share genetic material, in hundreds of C. difficile strains and identifies hot spots of realized recombination events. The results of this analysis reveal the importance of homologous recombination in the diversification of genetic loci in C. difficile that are significant in its pathogenicity and host interactions, such as flagellar construction, cell wall proteins, and sugar transport and metabolism.
Copyright © 2020 Steinberg and Snitkin.

Entities:  

Keywords:  Clostridioides difficile; S layer; flagella; homologous recombination; phosphotransferase system

Year:  2020        PMID: 33208516      PMCID: PMC7677006          DOI: 10.1128/mSphere.00799-20

Source DB:  PubMed          Journal:  mSphere        ISSN: 2379-5042            Impact factor:   4.389


INTRODUCTION

Clostridium difficile is a significant cause of nosocomial infections and can result in clinical presentations ranging from mild diarrhea to severe colitis, sepsis, and death. C. difficile causes disease when it can colonize and grow in the gastrointestinal tract, most often after antibiotic treatment depletes a patient’s normal gut microbiota (1, 2). Two major toxins, toxin A and toxin B, are associated with the virulence of C. difficile, and their secretion results in the destruction of intestinal epithelial cells and recruitment of inflammatory mediators and neutrophils (2–5). However, other factors, such as adhesins, cell surface capsule proteins, hydrolytic enzymes, and flagella, have also been suggested to play an important role in C. difficile colonization, pathogenesis, and virulence (2, 6). Five major phylogenetic clades of C. difficile have been described (7), and upward of 100 PCR-ribotypes circulate and cause disease in U.S. hospitals (8). Some studies have suggested that epidemic ribotypes 027 (clade 2) and 078 (clade 5) cause more severe disease than other C. difficile strains (9, 10), while others fail to find an association between ribotype and disease severity (11). Thus, more research is needed to explore pathogen factors important in disease manifestation and heterogeneity in severity. Bacteria can acquire genetic diversity through vertically inherited mutations, recombination via transformation of free DNA, transduction by phage, and conjugation of plasmids (12). Recombination can either be homologous, in which a genomic region is exchanged for an allelic variant, or nonhomologous, in which accessory genes can be transferred from one organism to another (12). Studying bacterial recombination is important in genomics studies since it can both weaken phylogenetic signals and reveal important selective pressures (13). Genomic hot spots for recombination have been hypothesized to be due to both mechanistic and evolutionary reasons. For example, lower rates of recombination have been observed at the terminus compared to the origin of replication, and conserved core genes often flank highly recombinant genes in Escherichia coli (14). However, recombination events that are disadvantageous to the organism will be lost via negative selection, and the events that remain to be observed are likely to be advantageous or neutral in effect (13). Homologous recombination has been shown to result in high rates of genetic diversity in adhesion, invasion, and colonization factors, as well as cell membrane/surface structures in various pathogenic bacteria (13, 15–17). These factors each play a role in the pathogenesis and/or host interactions of these pathogens. Elevated levels of genetic diversity in these features could lead to adaptions in virulence, transmission, or evasion of the host immune system. In C. difficile, the S-layer cassette that encodes surface proteins important for its antigenicity has been shown to undergo frequent recombination (13, 18). This study utilizes more than 400 C. difficile whole-genome sequences to unveil the contribution of homologous recombination within and between clades of this pathogen, and to elucidate specific functions associated with higher rates of realized recombination events.

RESULTS AND DISCUSSION

Strain selection.

A total of 412 strains were selected that cluster into five distinct clades (Fig. 1a), as described previously (7). Of the 412 strains, 306 isolates are in clade 1, 76 in clade 2, 15 in clade 3, 10 in clade 2, and 5 in clade 5. Clade 5 is the most distantly related from the rest of the phylogeny, with only 97% sequence identity to reference strain 630 (calculated as number of identical nucleotides over the length of the aligned sequence). Since the majority of isolates are in clades 1 and 2 and since excluding clades 3, 4, and 5 increased the size of the core genome (from 1.66 to 1.75 Mbp) and allowed more genes to be studied, gene and pathway analyses were performed on just the 382 isolates in clades 1 and 2. Moreover, besides ribotype 078, which is in clade 5 and is relatively distantly related to clades 1 and 2, the majority of C. difficile isolates that cause human disease in the United States are in clade 1 (includes ribotype 014/020) and clade 2 (includes ribotype 027) (8), so the results of these analyses can be applicable to much of pathogenic C. difficile.
FIG 1

Recombination in five C. difficile clades. (a) Unrooted phylogeny of genomes used in study, colored by previously defined clade designations: clade 1 (blue, n = 306), clade 2 (yellow, n = 76), clade 3 (green; n = 15), clade 4 (purple, n = 10), and clade 5 (red, n = 5). (b) Distribution of SNVs in recombination events per SNV outside recombination events (r/m) on each branch of each C. difficile clade. Boxplots represent the first to third quartiles of values, with a line representing the median r/m value for each clade, and whiskers are 1.5 times the interquartile range above the third quartile. Values for each branch are plotted on top of the boxplots for each clade. There is a statistically significant difference in r/m values between at least 2 of the clades (Kruskal-Wallis χ2df=4 = 36.8; P < 0.001). (c) Distribution of lengths of recombination events in each C. difficile clade, colored as in panel a.

Recombination in five C. difficile clades. (a) Unrooted phylogeny of genomes used in study, colored by previously defined clade designations: clade 1 (blue, n = 306), clade 2 (yellow, n = 76), clade 3 (green; n = 15), clade 4 (purple, n = 10), and clade 5 (red, n = 5). (b) Distribution of SNVs in recombination events per SNV outside recombination events (r/m) on each branch of each C. difficile clade. Boxplots represent the first to third quartiles of values, with a line representing the median r/m value for each clade, and whiskers are 1.5 times the interquartile range above the third quartile. Values for each branch are plotted on top of the boxplots for each clade. There is a statistically significant difference in r/m values between at least 2 of the clades (Kruskal-Wallis χ2df=4 = 36.8; P < 0.001). (c) Distribution of lengths of recombination events in each C. difficile clade, colored as in panel a.

Homologous recombination in five C. difficile clades.

There is a great deal of heterogeneity in the contribution of recombination to the evolution across the C. difficile phylogeny, as measured by the proportion of single-nucleotide variants (SNVs) inside compared to outside predicted recombination events (r/m) (Fig. 1b). Most branches have more single nucleotide substitution events than recombination events (see Fig. S1 in the supplemental material); however, each recombination event can lead to the transfer of multiple SNVs. Throughout the phylogeny, 72% of SNVs fall within predicted recombination events, even though there are about 15 times as many inferred nucleotide substitution events as there are recombination events. In addition to variance on branches of the tree, there is also as a significant amount of heterogeneity in r/m values between the clades (Kruskal-Wallis χ2df=4 P < 0.001; Fig. 1b), suggesting a potential difference in evolutionary mechanisms between the clades. Clades 2 and 4 have the highest median r/m values, while clades 1 and 5 have the lowest. It is important to note the variance in sample sizes between the clades, which may be biasing these interclade comparisons. Distribution recombination events per single nucleotide substitution event (ρ/θ) on each branch of each C. difficile clade. Boxplots represent the first to the third quartile of values, with a line representing the median ρ/θ value for each clade, and whiskers are 1.5 times the interquartile range above the third quartile. Values for each branch are plotted on top of the boxplots for each clade. There is a statistically significant difference in ρ/θ values between at least of 2 the clades (Kruskal-Wallis χ2df=4 = 36.8; P < 0.001). Colors: clade 1 (blue, n = 306); clade 2 (yellow, n = 76); clade 3 (green; n = 15); clade 4 (purple, n = 10); clade 5 (red, n = 5). Download FIG S1, PDF file, 0.3 MB. The median length of recombination events in all C. difficile clades is ∼6.7 kb, and the maximum length of a predicted recombination event was >100 kb in clade 1. Overall, most recombination events are <10 kb, with an exponential decay in events with increasing length (Fig. 1c). This distribution is observed in all C. difficile clades, though there are more recombination events observed overall in clades 1 and 2 than in clades 3, 4, and 5, likely because the genome sample size is greatest in clades 1 and 2.

Areas of enriched homologous recombination.

Twelve percent (486/3,980) of C. difficile genes analyzed were significantly enriched for homologous recombination (P < 0.05) according to a permutation test (see Table S2 in the supplemental material), and 40 of these genes were enriched for recombination with a fold change of ≥3 (Table 1). The majority of these highly enriched (≥3-fold) genes are involved in flagellar activity. Overall, the set of 486 significant genes were significantly overrepresented in four KEGG pathways, and one COG term involved in flagella, sugar transport and metabolism, and the cell envelope and outer membrane (Table 2). The S-layer cassette, which has previously been shown to be highly recombinant in C. difficile (13, 18), also appears in a high-recombination zone (Fig. 2), with many of the genes involved in S-layer structure significantly enriched for recombination (see Table S2 and Fig. S2 in the supplemental material). There is no KEGG pathway specific to S-layer proteins, but the COG ontology “cell envelope biogenesis, outer membrane,” whose genes are enriched for homologous recombination (Table 2), does include some S-layer proteins.
TABLE 1

C. difficile genes with the greatest increase in recombination events in the clade 1 and 2 core genome

GeneAnnotationPR eventsaOR eventsbFold changecPd
fliGeFlagellar motor switch protein FliG11585.4<0.001
fliHeFlagellar assembly protein FliH11585.3<0.001
fliJeFlagellar protein FliJ10525.0<0.001
fliIeATP synthase subunit beta FliI11524.9<0.001
fliFeFlagellar MS-ring protein12564.8<0.001
fliEeFlagellar hook-basal body protein FliE11504.7<0.001
flgDeBasal-body rod modification protein FlgD10494.7<0.001
fliKeFlagellar hook-length control protein FliK10484.6<0.001
flgEeFlagellar hook protein FlgE11514.6<0.001
flgBeFlagellar basal-body rod protein FlgB11484.5<0.001
flgCeFlagellar basal-body rod protein FlgC11474.3<0.001
CD630_27960fCell surface protein12524.2<0.001
flbDeFlagellar protein FlbD11444.2<0.001
fliS1hFlagellar protein FliS111434.0<0.001
motBeFlagellar motor rotation protein MotB11443.9<0.001
motAeFlagellar motor rotation protein MotA11443.9<0.001
csrAhCarbon storage regulator CsrA10413.9<0.001
fliLeFlagellar basal body-associated protein FliL11413.9<0.001
fliZeFlagellar protein FliZ10403.8<0.001
fliS2hFlagellar protein FliS211413.8<0.001
fliQeFlagellar biosynthetic protein FliQ10403.8<0.001
flhBeBifunctional flagellar biosynthesis protein FliR/FlhB12453.8<0.001
fliPeFlagellar biosynthesis protein FliP11413.7<0.001
CD630_27970fCalcium-binding adhesion protein17623.7<0.001
CD630_36440Hypothetical protein11403.6<0.001
flhAeFlagellar biosynthesis protein FlhA12423.5<0.001
fliChFlagellin C11393.4<0.001
CD630_30830iPTS operon transcription antiterminator11383.4<0.001
fliDhflagellar hook-associated protein FliD12403.3<0.001
CD630_02380hhypothetical protein11353.3<0.001
fliWhFlagellar assembly factor FliW11343.2<0.001
flhFeFlagellar biosynthesis regulator FlhF11363.1<0.001
CD630_02410gPhosphoserine phosphatase10323.1<0.001
CD630_02420gHypothetical protein10323.1<0.001
CD630_02430gHypothetical protein10323.1<0.001
CD630_02440gCDP-glycerol:poly(glycerophosphate) glycerophosphotransferase10323.1<0.001
CD630_22420Hypothetical protein11323.0<0.001
flgLhFlagellar hook-associated protein FlgL11343.0<0.001
CD630_22430jMembrane protein11323.0<0.001
fliAeFlagellar operon RNA polymerase σ28 factor11333.0<0.001

PR events, the predicted number of overlapping recombination events with each gene, calculated as the mean number of recombination events in each of 10,000 permutations randomly placing the identified recombination events.

OR events, the observed number of overlapping recombination events with each gene, as identified by Gubbins.

Ratio (fold change) of observed recombination events compared to predicted recombination events.

Probability (P) that the number of observed recombination events is observed by chance based on 10,000 permutations randomly placing the identified recombination events throughout the clades 1 and 2 core genome.

Gene in F3 regulon.

Gene in S layer.

Gene in F2 regulon.

Gene in F1 regulon.

PTS gene.

Membrane protein gene.

TABLE 2

Functional categories enriched for recombination in C. difficile clades 1 and 2

Functional categoryFold enrichmentP
UnadjustedBenjamini
KEGG pathway
 Flagellar assembly7.8<0.001<0.001
 Phosphotransferase system (PTS)2.7<0.001<0.001
 Amino sugar and nucleotide sugar metabolism3.0<0.001<0.001
 Fructose and mannose metabolism2.5<0.0010.0086
 Pentose and glucuronate interconversions3.10.0320.28
 Pentose phosphate pathway2.40.0390.28
 Bacterial chemotaxis2.60.0640.38
COG ontology
 Cell envelope biogenesis, outer membrane3.1<0.001<0.001
 Defense mechanisms1.60.0780.66
FIG 2

Distribution of recombination events throughout the C. difficile genome. From innermost circle outward: positions in the core genome of clades 1 and 2 (purple) used as input for recombination detection; histogram of recombination events overlapping with each position in the C. difficile genome (dark gray represents more recombination than average, light gray represents less recombination than average); flagellar genes (turquoise); S-layer cassette genes (teal), phosphotransferase system (PTS) genes (dark red); genes involved in amino sugar and nucleotide sugar metabolism (salmon); genes involved in fructose and mannose metabolism (orange), and genes annotated as membrane proteins (navy). Areas highlighted in light blue represent genes that have more recombination than expected with a P value of <0.05 as determined by a permutation test. Numbers around the circular genome plot mark the nucleotide position in the reference genome C. difficile 630.

Distribution of recombination events throughout the C. difficile genome. From innermost circle outward: positions in the core genome of clades 1 and 2 (purple) used as input for recombination detection; histogram of recombination events overlapping with each position in the C. difficile genome (dark gray represents more recombination than average, light gray represents less recombination than average); flagellar genes (turquoise); S-layer cassette genes (teal), phosphotransferase system (PTS) genes (dark red); genes involved in amino sugar and nucleotide sugar metabolism (salmon); genes involved in fructose and mannose metabolism (orange), and genes annotated as membrane proteins (navy). Areas highlighted in light blue represent genes that have more recombination than expected with a P value of <0.05 as determined by a permutation test. Numbers around the circular genome plot mark the nucleotide position in the reference genome C. difficile 630. Recombination in the S-layer genes in C. difficile clades 1 and 2. All variants in the S-layer genes of the C. difficile clades 1 and 2 core genome are plotted on the x axis based on genomic position. For each genome on the y axis, the variant is shown as black (variant present), or white (variant absent). The phylogeny in panel a is a maximum-likelihood tree based on all vertically inherited core variants, and the phylogeny in panel b is a neighbor-joining tree based on only the variants in the S-layer genes. For both panels a and b, clade 1 genomes are shown in shades of green to blue and clade 2 genomes are shown in shades of yellow to red based on their position in the maximum-likelihood phylogeny in panel a. Download FIG S2, PDF file, 2.2 MB. C. difficile genes with the greatest increase in recombination events in the clade 1 and 2 core genome PR events, the predicted number of overlapping recombination events with each gene, calculated as the mean number of recombination events in each of 10,000 permutations randomly placing the identified recombination events. OR events, the observed number of overlapping recombination events with each gene, as identified by Gubbins. Ratio (fold change) of observed recombination events compared to predicted recombination events. Probability (P) that the number of observed recombination events is observed by chance based on 10,000 permutations randomly placing the identified recombination events throughout the clades 1 and 2 core genome. Gene in F3 regulon. Gene in S layer. Gene in F2 regulon. Gene in F1 regulon. PTS gene. Membrane protein gene. Functional categories enriched for recombination in C. difficile clades 1 and 2 Genomes used in analyses and GenBank accession numbers. Download Table S1, CSV file, 0.04 MB. All C. difficile genes with statistically significant enrichment of recombination events in the clades 1 and 2 core genome. Download Table S2, XLSX file, 0.1 MB. The genomic region from approximately 2.6 to 0.7 Mb appears to be enriched for recombination compared to the other half of the genome (Fig. 2). This could suggest the mechanistic feasibility of recombination in this area and/or grouping of genes where genetic diversity is more evolutionarily advantageous. Low recombination at the terminus of the genome is consistent with findings in Escherichia coli (14).

Flagella.

There have been reports of large-scale structural variations of flagella genes in C. difficile, but the contribution of homologous recombination is not as well described. It is clear that unique flagellar variant patterns are present in distantly related isolates (Fig. 3). In particular, within each clade there appears to be a set of distinct flagellar genotypes distributed sporadically throughout the phylogeny, suggesting that large regions of the flagella operons may have been swapped in homologous recombination events. Further, there appears to be a distinct recombination pattern in the two flagellar regulatory units present in the clades 1 and 2 core genome, with there being more F3 genotypes than F1 genotypes present in these isolates. The F3 and F1 operons are responsible for early-stage and late-stage flagellar genes, respectively (19). F1 genes, such as fliC and fliD, code for the flagellin filament and protein cap (19) that may be involved in protein-protein interactions and recombinant FliC proteins have been shown to be involved in C. difficile immunogenicity (20). F3 genes are associated with other structural components of the flagella, such as the basal body, hook, and motor proteins (19), and the F3 gene fliA (sigD) has been suggested to be important not only in regulating late-stage flagellar genes but also in the production of C. difficile toxins (21).
FIG 3

Recombination in flagellar genes in C. difficile clades 1 and 2. All variants in the flagellar region of the C. difficile clades 1 and 2 core genome are plotted on the x axis based on genomic position. For each genome on the y axis, the variant is shown as black (F1 gene variant present), blue (F3 gene variant present), or white (variant absent). (a and b) The phylogeny in panel a is a maximum-likelihood tree based on all vertically inherited core variants, and the phylogeny in panel b is a neighbor-joining tree based on only the variants in the flagellar region. For panels a and b, clade 1 genomes are shown in shades of green to blue and clade 2 genomes are shown in shades of yellow to red based on their position in the maximum-likelihood phylogeny presented in panel a.

Recombination in flagellar genes in C. difficile clades 1 and 2. All variants in the flagellar region of the C. difficile clades 1 and 2 core genome are plotted on the x axis based on genomic position. For each genome on the y axis, the variant is shown as black (F1 gene variant present), blue (F3 gene variant present), or white (variant absent). (a and b) The phylogeny in panel a is a maximum-likelihood tree based on all vertically inherited core variants, and the phylogeny in panel b is a neighbor-joining tree based on only the variants in the flagellar region. For panels a and b, clade 1 genomes are shown in shades of green to blue and clade 2 genomes are shown in shades of yellow to red based on their position in the maximum-likelihood phylogeny presented in panel a. Genomic structural diversity of C. difficile flagella has been described before. For instance, C. difficile ribotype 078 (clade 5) strains lack the F3 regulon completely, rendering that ribotype immotile (19). Furthermore, many strains of C. difficile also have an F2 operon between F1 and F3, although these operational units are so diverse that they do not show up in the core genome for clades 1 and 2, and thus homologous recombination in the F2 operon was not analyzed in this study. Supporting the existence of selective pressure for diversification of F2 are previous reports of nonhomologous recombination in this region. C. difficile 630, for example, has four genes in its F2 region, including genes involved in glycosylation, while ribotype 027 isolate CD196 has six seemingly different genes in this area (19, 22). These results suggest that homologous recombination, in addition to nonhomologous recombination, may be an important mechanism of flagellar diversity in C. difficile. It has been reported that flagella may have a role in motility, colonization, adherence, virulence, and immunogenicity in mucosal pathogens, including C. difficile (23), and in vivo studies show that mutants of late-stage flagellar genes fliC and fliD in C. difficile 630 exhibited greater virulence than did the wild type (24). The enrichment in flagellar diversity in closely related strains observed here suggests an evolutionary advantage, such as antigenic diversification or a unique role in virulence or pathogenicity. Of note, five genes involved in type IV pilus biosynthesis were also shown to be significantly enriched for homologous recombination (see Table S2), which suggests a further role of homologous recombination in the diversification of cell surface structures that act as colonization factors in C. difficile (25).

PTS and sugar metabolism.

The phosphotransferase system (PTS) is responsible for the active import of certain sugars into bacteria (26). It involves many transmembrane proteins that interact with the extracellular environment, and could be important in host interactions and immunogenicity. The combination of enriched recombination in both PTS and the metabolism of certain sugars (fructose, mannose, and pentose; Table 2) also suggests that recombination may play a role in the ability of C. difficile strains to occupy unique niches via diverse metabolic capabilities. Interestingly, a pleiotropic effect of fliC on genes involved in a number of mechanisms, including PTS and carbon metabolism has been observed (6). Variation in the interaction of flagella, PTS, and metabolism proteins itself may be being selected for via homologous recombination.

Cell membrane and the S layer.

Genes that code for components on the outer layers of bacteria that interact with the outside world and host environment have previously been shown to undergo a relatively large amount of homologous recombination (13). That appears to be the case here as well, since flagella are environment-exposed appendages, and the PTS system involves a number of transmembrane proteins that regulate what sugars can come into the cell. The COG ontology that includes genes involved in cell envelope biogenesis and the outer membrane, including S-layer proteins, is significantly enriched for recombination (Table 2). These data are in line with previous studies that show high recombination in the S-layer surface proteins of C. difficile (18). The majority of genes involved in the S layer (27) were significantly enriched for homologous recombination in this study (Table 1; see also Table S2 and Fig. S2 in the supplemental material).

Conclusions.

Homologous recombination is an important contributor to the evolution of C. difficile, accounting for an estimated 72% of SNVs. Furthermore, there are genomic regions and functional pathways especially enriched for homologous recombination. Genes involved in the production of proteins that interact with the host and outside environment (i.e., flagella, cell surface proteins, and PTS transporters) and genes important in the metabolism and import of certain sugars are more likely to be highly recombinant. The genes that have been shown here to undergo high rates of homologous recombination have also been documented elsewhere to be associated with toxin production, virulence, and pathogenicity (2, 6, 19–21, 23, 28). These data suggest that homologous recombination may play a significant role in intraclade variation in virulence. There are a number of hypothetical proteins seen here with recombination much higher than expected (Table 1; see also Table S2). These genes should be further investigated in order to determine their potential role in C. difficile pathogenesis and/or virulence, since there appears to be an evolutionary advantage to sustained heterogeneity in these proteins throughout clinically important strains of the bacteria. Previous work that explored the impact of homologous recombination on C. difficile highlighted the enrichment of recombination in the S layer (13) but failed to find significance in other areas, such as the flagella and PTS. Limitation to only clades 1 and 2 in our analyses allowed for an increased core genome that includes many important genes in these systems that would be missed if only working with the entire C. difficile core genome. Another implication of this research is the importance of examining the contribution of homologous recombination and allele variants when designing vaccine and drug targets. Both flagellar proteins and S-layer proteins have been suggested as potential vaccine targets for C. difficile (29); however, since the genomic evidence reveals multiple allelic types of these structures, multivalent vaccines may need to be considered. The analysis of homologous recombination can be conducted on publicly available genomes, and reveal a great deal about pathogen evolution and selective pressures. Here, it has suggested a potentially important role of genetic diversity in flagella, the S layer, PTS, and metabolic pathways to the survival and pathogenesis of C. difficile. Further in vitro and in vivo assays should be conducted to confirm these roles.

MATERIALS AND METHODS

C. difficile isolates with whole-genome sequence data were selected from a set of 917 genomes previously analyzed in our lab and 4,443 previously published genomes (30) to maximize representation of clinically important strains. To reduce redundancy of nearly identical genomes, isolates that were differentiated from another isolate in the sample by less than six coding mutations were excluded. All genomes used in this analysis were sequenced previously and are publicly available (see Table S1).

Genome alignments.

Whole-genome alignments were created for each clade, as well as the combination of clades 1 and 2 and the entire set of genomes. Whole-genome alignments were produced as previously described (31). Briefly, each assembly was first concatenated into a pseudochromosome by using mauve contig mover to order and orient large contigs relative to a reference (32). The finished chromosome of the C. difficile 630 strain (included in clade 1) was used as a reference genome where possible, and the most complete whole-genome sequence in each alignment (determined by minimum number of contigs) was chosen as the reference genome otherwise. Each pseudochromosome was aligned to the reference genome using nucmer, and variants were identified relative to the reference genome using the show-snps command (33). Single-nucleotide variants (SNVs) were filtered to remove those SNVs that were likely to be due to alignment or sequencing errors. SNVs were filtered out if (i) they resided in genes annotated as phage, transposase, or integrase; (ii) they resided within 20 bp of the start or end of a contig; (iii) they resided in tandem repeats of total length >20 bp, as determined by the exact-tandem program associated with MUMmer; or (iv) they resided in large inexact repeats as determined by nucmer. Core genomes were calculated based on positions in the reference genome that were aligned in every genome using nucmer.

Recombination identification.

Gubbins v1.3.4 (Genealogies Unbiased By recom-Binations In Nucleotide Sequences [34]) was used to identify areas likely introduced under homologous recombination in C. difficile. Gubbins was run on core genome alignments of each C. difficile clade, as well as clades 1 and 2 collectively, and on all 412 representative C. difficile genomes used in this study. Genomic regions identified by Gubbins as having densities of SNVs statistically different from background SNV densities in the core genome were inferred to be homologous recombination events, although it is possible that other molecular mechanisms could also have played a role in some of these regions. Maximum-likelihood recombination-filtered phylogenetic trees for the entire set of genomes, as well as just clades 1 and 2, were produced using RAxML v8.2.9 (35) assuming a general time reversible model of nucleotide variations (-m GTRCAT). The bootstrap convergence test and the autoMRE convergence criteria (-N autoMRE) were used to determine the number of bootstrap replicates. The best-scoring maximum-likelihood tree was identified with rapid bootstrap analysis (-f a).

Identification of genes and pathways with enriched recombination.

A permutation test was implemented and run in R v3.4.0 (36) to randomly place each observed recombination event in the C. difficile clades 1 and 2 core genome. Only clades 1 and 2 were used in this analysis since that is where the majority of recombination events were found and, since these two clades are closely related, it allowed for an expanded core genome compared to the entire set. Furthermore, epidemic ribotypes 014 and 027 fall within clades 1 and 2, respectively. The permutation test was run 10,000 times. Information about all deposited C. difficile genes in the reference genome C. difficile 630 were downloaded from the National Center for Biotechnology Information (NCBI) gene database. A recombination event was considered overlapping with a gene if they overlapped in at least one nucleotide position. One-sided P values were calculated for each gene as the number of permutations that predicted a number greater than or equal to the number of recombination events observed plus 1, divided by 10,001. Genes with a P value of <0.05 were input as a gene list to the DAVID v6.8 (david.ncifcrf.gov) functional annotation tool to identify the KEGG pathways and COG ontologies associated with increased homologous recombination in C. difficile.

Data visualization.

An unrooted phylogenetic tree (Fig. 1a) of all 412 isolates was plotted in R v3.4.0 with the ape package v4.1. Box plots and histograms (Fig. 1b and c; see also Fig. S1 in the supplemental material) were made in R v3.4.0. Genome visualization and a recombination map (Fig. 2) was made in DNAPlotter v1.0 (37). Variant heatmaps (Fig. 3; see also Fig. S2) were made with the ggtree package v1.8.1 (38) in R v3.4.0.

Data availability.

All of the genomes discussed here, as well as their accession numbers, are listed in Table S1 in the supplemental material.
  36 in total

1.  Why do bacteria engage in homologous recombination?

Authors:  Michiel Vos
Journal:  Trends Microbiol       Date:  2009-05-20       Impact factor: 17.079

2.  Emergence of Clostridium difficile infection due to a new hypervirulent strain, polymerase chain reaction ribotype 078.

Authors:  Abraham Goorhuis; Dennis Bakker; Jeroen Corver; Sylvia B Debast; Celine Harmanus; Daan W Notermans; Aldert A Bergwerff; Frido W Dekker; Ed J Kuijper
Journal:  Clin Infect Dis       Date:  2008-11-01       Impact factor: 9.079

3.  Clostridium difficile ribotype 027: relationship to age, detectability of toxins A or B in stool with rapid testing, severe infection, and mortality.

Authors:  Krishna Rao; Dejan Micic; Mukil Natarajan; Spencer Winters; Mark J Kiel; Seth T Walk; Kavitha Santhosh; Jill A Mogle; Andrzej T Galecki; William LeBar; Peter D R Higgins; Vincent B Young; David M Aronoff
Journal:  Clin Infect Dis       Date:  2015-03-31       Impact factor: 9.079

Review 4.  Phosphoenolpyruvate:carbohydrate phosphotransferase system of bacteria.

Authors:  P W Postma; J W Lengeler
Journal:  Microbiol Rev       Date:  1985-09

5.  The central, surface-exposed region of the flagellar hook protein FlgE of Campylobacter jejuni shows hypervariability among strains.

Authors:  E Lüneberg; E Glenn-Calvo; M Hartmann; W Bär; M Frosch
Journal:  J Bacteriol       Date:  1998-07       Impact factor: 3.490

6.  The flagellin FliC of Clostridium difficile is responsible for pleiotropic gene regulation during in vivo infection.

Authors:  Amira Barketi-Klai; Marc Monot; Sandra Hoys; Sylvie Lambert-Bordes; Sarah A Kuehne; Nigel Minton; Anne Collignon; Bruno Dupuy; Imad Kansau
Journal:  PLoS One       Date:  2014-05-19       Impact factor: 3.240

7.  Population genomics of Chlamydia trachomatis: insights on drift, selection, recombination, and population structure.

Authors:  Sandeep J Joseph; Xavier Didelot; James Rothschild; Henry J C de Vries; Servaas A Morré; Timothy D Read; Deborah Dean
Journal:  Mol Biol Evol       Date:  2012-08-13       Impact factor: 16.240

8.  Genomic insights into the fate of colistin resistance and Acinetobacter baumannii during patient treatment.

Authors:  Evan S Snitkin; Adrian M Zelazny; Jyoti Gupta; Tara N Palmore; Patrick R Murray; Julia A Segre
Journal:  Genome Res       Date:  2013-04-05       Impact factor: 9.043

9.  DNAPlotter: circular and linear interactive genome visualization.

Authors:  Tim Carver; Nick Thomson; Alan Bleasby; Matthew Berriman; Julian Parkhill
Journal:  Bioinformatics       Date:  2008-11-05       Impact factor: 6.937

10.  Effects of control interventions on Clostridium difficile infection in England: an observational study.

Authors:  Kate E Dingle; Xavier Didelot; T Phuong Quan; David W Eyre; Nicole Stoesser; Tanya Golubchik; Rosalind M Harding; Daniel J Wilson; David Griffiths; Alison Vaughan; John M Finney; David H Wyllie; Sarah J Oakley; Warren N Fawley; Jane Freeman; Kirsti Morris; Jessica Martin; Philip Howard; Sherwood Gorbach; Ellie J C Goldstein; Diane M Citron; Susan Hopkins; Russell Hope; Alan P Johnson; Mark H Wilcox; Timothy E A Peto; A Sarah Walker; Derrick W Crook
Journal:  Lancet Infect Dis       Date:  2017-01-25       Impact factor: 71.421

View more
  2 in total

1.  Clostridioides difficile strain-dependent and strain-independent adaptations to a microaerobic environment.

Authors:  Andy Weiss; Christopher A Lopez; William N Beavers; Jhoana Rodriguez; Eric P Skaar
Journal:  Microb Genom       Date:  2021-12

2.  Assessment of the Transmission Dynamics of Clostridioides difficile in a Farm Environment Reveals the Presence of a New Toxigenic Strain Connected to Swine Production.

Authors:  Frederico Alves; Alexandra Nunes; Rita Castro; António Sequeira; Olga Moreira; Rui Matias; João Carlos Rodrigues; Leonor Silveira; João Paulo Gomes; Mónica Oleastro
Journal:  Front Microbiol       Date:  2022-04-14       Impact factor: 6.064

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.