| Literature DB >> 26823648 |
Yun Luo1, Chen Huang1, Julian Ye1, Weijia Fang2, Wanjun Gu3, Zhiping Chen1, Hui Li4, XianJun Wang5, Dazhi Jin1.
Abstract
Peptoclostridium difficile (Clostridium difficile) is the major pathogen associated with infectious diarrhea in humans. Concomitant with the increased incidence of C. difficile infection worldwide, there is an increasing concern regarding this infection type. This study reports a draft assembly and detailed sequence analysis of C. difficile strain ZJCDC-S82. The de novo assembled genome was 4.19 Mb in size, which includes 4,013 protein-coding genes, 41 rRNA genes, and 84 tRNA genes. Along with the nuclear genome, we also assembled sequencing information for a single plasmid consisting of 11,930 nucleotides. Comparative genomic analysis of C. difficile ZJCDC-S82 and two other previously published strains, such as M120 and CD630, showed extensive similarity. Phylogenetic analysis revealed that genetic diversity among C. difficile strains was not influenced by geographic location. Evolutionary analysis suggested that four genes encoding surface proteins exhibited positive selection in C. difficile ZJCDC-S82. Codon usage analysis indicated that C. difficile ZJCDC-S82 had high codon usage bias toward A/U-ended codons. Furthermore, codon usage patterns in C. difficile ZJCDC-S82 were predominantly affected by mutation pressure. Our results provide detailed information pertaining to the C. difficile genome associated with a strain from mainland China. This analysis will facilitate the understanding of genomic diversity and evolution of C. difficile strains in this region.Entities:
Keywords: C. difficile ZJCDC-S82; codon usage; evolutionary analysis; genome sequencing; phylogenetic analysis
Year: 2016 PMID: 26823648 PMCID: PMC4727486 DOI: 10.4137/EBO.S32476
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
General features of the ZJCDC-S82 genome.
| GENETIC ELEMENT | SIZE (bp) | NO. OF CONTIGS | N50 (bp) | GC CONTENT (%) | NO. OF CODING SEQUENCES | NO. OF rRNA GENES | NO. OF tRNA GENES |
|---|---|---|---|---|---|---|---|
| Chromosome | 4190365 | 19 | 598695 | 29.1 | 4013 | 41 | 84 |
| Plasmid | 11930 | 1 | 11930 | 26.4 | 19 | 0 | 0 |
Figure 1General features of the C. difficile strain ZJCDC-S82 genome. (A) Genes connected to subsystems and their distribution in different categories with the genome of ZJCDC-S82. The results were obtained using SEED viewer (http://rast.nmpdr.org). (B) Circular representations of the C. difficile strain ZJCDC-S82 chromosome. From the outside: circles 1 and 2 show the position of all genes transcribed in a clockwise and counterclockwise direction, circle 3 shows RNA genes (cyan, rRNA genes; red, tRNA genes), circle 4 shows GC content (plotted using a 10-kb window), and circle 5 shows GC deviation (plotted using a 10-kb window; orange, >0%; red, <0%).
Figure 2Comparative genome analysis results of C. difficile strains ZJCDC-S82, M120, and CD630. (A) Unique and shared CDSs of C. difficile strains ZJCDC-S82, M120, and CD630. The Venn diagram shows the number of unique and shared genes between the three strains. (B) Genome alignment of C. difficile strains ZJCDC-S82, M120, and CD630. Pairwise comparison of the three genomes as visualized using Mauve is shown.
Figure 3Phylogenetic tree based on genome sequences of 11 C. difficile strains.
Figure 4The distribution of dN/dS estimates for genes with ortholog relationships between C. difficile strains ZJCDC-S82, M120, and CD630.
Positively selected genes in C. difficile strain ZJCDC-S82.
| GENE | dN/dS | ANNOTATION | GENOMES COMPARED | |
|---|---|---|---|---|
| CD0410 | 2.12 | n.s | Hypothetical protein | ZJCDC-S82:CD630 |
| CD2317 | 2.06 | n.s | Hypothetical protein | ZJCDC-S82:CD630 |
| CD3205 | 1.98 | n.s | Hypothetical protein | ZJCDC-S82:CD630 |
| CD2768 | 1.58 | n.s | Membrane protein | ZJCDC-S82:CD630 |
| CD0855 | 1.57 | n.s | Hypothetical protein | ZJCDC-S82:CD630 |
| CD1509 | 1.22 | n.s | Hypothetical protein | ZJCDC-S82:CD630 |
| CD3922 | 1.11 | n.s | Membrane protein | ZJCDC-S82:CD630 |
| CD0789 | 2.00 | n.s | 50S ribosomal protein L21 | ZJCDC-S82:M120 |
| CD3380 | 2.00 | n.s | 50S ribosomal protein L36 | ZJCDC-S82:M120 |
| CD3544 | 1.72 | n.s | Fema-like peptidoglycan biosynthesis protein | ZJCDC-S82:M120 |
| CD3135 | 1.11 | n.s | Lipoprotein | ZJCDC-S82:M120 |
Notes:
Gene name of C. difficile strain ZJCDC-S82.
CD630 – C. difficile strain CD630; M120 – C. difficile strain M120; ZJCDC-S82 – C. difficile strain ZJCDC-S82.
Abbreviation: n.s, nonsignificant.
Toxin genes (tcdA and tcdB) LRTs for PAML M7 and M8 site models.
| GENE | MODEL | lnLnull | lnLalternative | 2ΔlnL | POSITIVELY SELECTED SITESd | ||
|---|---|---|---|---|---|---|---|
| M7 vs M8 | −12224.17 | −12218.05 | 12.24 | 2032G(0.966) | |||
| 2180N(0.968) | |||||||
| 2219G(0.968) | |||||||
| 2361L(0.967) | |||||||
| 2428N(0.966) | |||||||
| 2581N(0.964) | |||||||
| M7 vs M8 | −11112.81 | −11111.21 | 3.20 | n.s. | none | ||
Notes:
log-likelihood scores.
LRT to detect positive selection.
Positively selected sites: posterior probabilities >0.95 in the BEB analyses.
Abbreviation: n.s, nonsignificant.
Codon usage for the C. difficile strain ZJCDC-S82 genome.
| CODON | COUNT | RSCU | CODON | COUNT | RSCU |
|---|---|---|---|---|---|
| GCU(A) | 25956 | 1.68 | CCU(P) | 11631 | 1.49 |
| GCC(A) | 3204 | 0.21 | CCC(P) | 940 | 0.12 |
| GCA(A) | 30118 | CCA(P) | 17659 | ||
| GCG(A) | 2356 | 0.15 | CCG(P) | 1066 | 0.14 |
| UGU(C) | 12217 | CAA(Q) | 21948 | ||
| UGC(C) | 2296 | 0.32 | CAG(Q) | 5597 | 0.41 |
| GAU(D) | 53559 | CGU(R) | 2912 | 0.47 | |
| GAC(D) | 11932 | 0.36 | CGC(R) | 612 | 0.1 |
| GAA(E) | 66979 | CGA(R) | 1320 | 0.21 | |
| GAG(E) | 17683 | 0.42 | CGG(R) | 289 | 0.05 |
| UUU(F) | 42831 | AGA(R) | 28026 | ||
| UUC(F) | 6766 | 0.27 | AGG(R) | 4031 | 0.65 |
| GGU(G) | 27083 | 1.5 | UCU(S) | 21829 | 1.74 |
| GGC(G) | 4279 | 0.24 | UCC(S) | 2177 | 0.17 |
| GGA(G) | 34321 | UCA(S) | 21418 | 1.7 | |
| GGG(G) | 6319 | 0.35 | UCG(S) | 1624 | 0.13 |
| CAU(H) | 13007 | AGU(S) | 23118 | ||
| CAC(H) | 2749 | 0.35 | AGC(S) | 5323 | 0.42 |
| AUU(I) | 38103 | 1 | ACU(T) | 24691 | 1.74 |
| AUC(I) | 6539 | 0.17 | ACC(T) | 2701 | 0.19 |
| AUA(I) | 69662 | ACA(T) | 26932 | ||
| AAA(K) | 79658 | ACG(T) | 2583 | 0.18 | |
| AAG(K) | 25995 | 0.49 | GUU(V) | 30086 | 1.57 |
| UUA(L) | 53307 | GUC(V) | 3465 | 0.18 | |
| UUG(L) | 13573 | 0.78 | GUA(V) | 35899 | |
| CUU(L) | 21357 | 1.22 | GUG(V) | 7162 | 0.37 |
| CUC(L) | 1446 | 0.08 | UGG(W) | 7235 | 1 |
| CUA(L) | 12061 | 0.69 | UAU(Y) | 38643 | |
| CUG(L) | 2941 | 0.17 | UAC(Y) | 8586 | 0.36 |
| AUG(M) | 30486 | 1 | UAA(*) | 2702 | |
| AAU(N) | 60832 | UAG(*) | 1029 | 0.76 | |
| AAC(N) | 12095 | 0.33 | UGA(*) | 339 | 0.25 |
Figure 5ENc of each CDS plotted against GC3s. ENc shows the effective number of codons, and GC3s shows the GC content on the synonymously variable third position of the sense codon. The red points denote total CDSs; the red solid line represents the expected ENc.