| Literature DB >> 29445826 |
Adriana Cabal1, Se-Ran Jun2, Piroon Jenjaroenpun2, Visanu Wanchai2, Intawat Nookaew2, Thidathip Wongsurawat2, Mary J Burgess3, Atul Kothari3, Trudy M Wassenaar1,2, David W Ussery4.
Abstract
Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same hospital. We conclude that metagenomics can contribute to the identification of CDI and can assist in characterization of the most probable causative strain in CDI patients.Entities:
Keywords: AAI; C. difficile; Community-acquired infections; Comparative genomics; MLST
Mesh:
Substances:
Year: 2018 PMID: 29445826 PMCID: PMC6132499 DOI: 10.1007/s00248-018-1155-7
Source DB: PubMed Journal: Microb Ecol ISSN: 0095-3628 Impact factor: 4.552
Fig. 1Amino acid identity (AAI) tree of 25 Firmicutes type strains. Members of the Clostridia are colored brown, Bacilli green, and Negativicutes blue
Fig. 2AAI tree of 234 completely sequenced genomes belonging to the Clostrida class. The light blue shading (top left) identifies the 8 included Clostridioides difficile genomes. For clarity, branches are colored according to their main clusters with same-color descriptions added. Branch labels only giving CGA numbers refer to genomes from strains described as ‘Clostridium sp.’, unless indicated otherwise with species-specific descriptions outside the tree
Fig. 3AAI trees of 663 C. difficile genomes collapsed at a range of cutoffs from 95% (a) to 98% (d). The number of members of a cluster is indicated between brackets (only shown for clusters containing 4 or more members, not shown in d). Two clusters that are discussed in the text are marked in blue and green. Numbers in red (not shown in d) give the confidence of nodes and clusters, with an asterisk for value 100. An unscaled version with all confidence values added is available as supplementary Fig. S1
Fig. 4In silico MLST of 607 C. difficile genomes. a A non-redundant NJ tree with bootstrap values in red, where sequence types (ST) are given with the number of members in brackets. The arrow points to the ST01 branch with 146 members. ST00 indicates that a sequence type is not assigned to that allele combination. The four MLST clades are indicated and four genomes shaded in blue are the same that formed the four-member cluster in Fig. 3. b Part of an NJ tree based on complete MLST genes coding sequences. The red line separates clade 1 and clade 2 STs. A number of clade 1 STs are more similar to clade 2 sequences (thus colored green, left of the red line), and ST2 is split between the two clusters
Fig. 5Distribution of toxin genes in the clusters of C. difficile collapsed at 97%. Pie charts are only shown for clusters containing 9 members or more, and those shown outside the radial tree are scaled. The distribution in all 663 genomes combined is shown inside the radial tree
Summary statistics for each of the metagenomic datasets for the four patients with C. difficile infection
| Patient | ||||||
|---|---|---|---|---|---|---|
| C1 (Canada) | C2 (Canada) | C3 (Canada) | I1 (Italy) | I2 (Italy) | ||
| SRA ID of human gut metagenomic sequences | SRR2565933 | SRR2565934 | SRR2565548 | SRR2582247 | SRR2582248 | |
| Total metagenomic reads ( | 12,529,978 | 15,032,540 | 54,725,506 | 1,216,877 | 1,002,345 | |
| Metagenomic reads mapped to | 148,740 1.19% | 45,218 0.30% | 139,248 0.25% | 24,568 2.01% | 90,741 9.05% | |
| Strain name Assembly ID origin | Strain 5.3 GCA_000586575 (Australia) | Strain VL_0181 GCA_900012755 (Canalda) | Strain VL_0083 GCA_900011925 (Canada) | Strain IT1118 GCA_001497755 (Italy) | Strain Y384 GCA_000451045 (USA) | |
| Genome size (Mb) | 4.00932 | 3.97192 | 4.23194 | 4.23893 | 6.91120 | |
| GC (%) | 28.3% | 28.6% | 28.8% | 28.5% | 32.5% | |
| Coverage of reads (%) on best matching | 98.76% | 96.51% | 9.69% | 52.63% | 51.04% | |
| Presence of | no | No | n.a. | yes | yes | |
| Presence of | no | No | n.a. | yes | yes | |
n.a. not applicable
Fig. 6Genome coverage for the four toxin operons within C. difficile reference genome NC_009089.1. a The position of locations shown in b and c on the genome of the reference genome. b The read coverage of the tcd locus (red underlined) and its direct environment for the four metagenomic samples with heatmap colors in gray/blue as indicated. Gene locations are shown in green. c The cdt locus (red underlined) and its direct environment of the same metagenomic samples. Data taken from BioProjects PRJNA297252 and PRJNA297269