| Literature DB >> 22292095 |
Darina Cejková1, Marie Zobaníková, Lei Chen, Petra Pospíšilová, Michal Strouhal, Xiang Qin, Lenka Mikalová, Steven J Norris, Donna M Muzny, Richard A Gibbs, Lucinda L Fulton, Erica Sodergren, George M Weinstock, David Smajs.
Abstract
BACKGROUND: The yaws treponemes, Treponema pallidum ssp. pertenue (TPE) strains, are closely related to syphilis causing strains of Treponema pallidum ssp. pallidum (TPA). Both yaws and syphilis are distinguished on the basis of epidemiological characteristics, clinical symptoms, and several genetic signatures of the corresponding causative agents. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2012 PMID: 22292095 PMCID: PMC3265458 DOI: 10.1371/journal.pntd.0001471
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Distribution of identified differences between TPA and TPE proteins encoded by different functional gene groups.
| Functional gene group | No. of genes encoding identical proteins (%) statistical significance | No. of genes encoding proteins with 1 aa substitution (%) statistical significance | No. of genes encoding proteins with 2–5 aa substitutions (%) statistical significance | No. of genes encoding proteins with 6 and more changes and/or MSC | Total no. of genes (%) |
| General metabolism |
|
|
|
|
|
| Cell processes; Cell structure |
|
|
|
|
|
| DNA replication, repair, recombination |
|
|
|
|
|
| Regulation; Transcription; Translation |
|
|
|
|
|
| Transport |
|
|
|
|
|
| Virulence; Potential virulence factor |
|
|
|
|
|
| Unknown |
|
|
|
|
|
| No of genes in the category |
|
|
|
|
|
Genetic differences were characterized in 983 similarly annotated genes and pseudogenes in both T. p. ssp. pertenue (TPE; Samoa D, CDC-2, and Gauthier) and T. p. ssp. pallidum strains (TPA; Nichols, DAL-1, SS14, Chicago). Only changes present in all investigated TPE strains when compared to all investigated TPA strains are shown in this table.
statistically significant results when compared to the genes encoding components of general metabolism. The p-value was corrected using Bonferroni correction for multiple comparisons to p = 0.008 and p-values lower than 0.008 were considered as statistically significant. p-values lower than 0.05 are also shown.
MSC (major sequence changes) were defined as 15 or more dispersed amino acid substitutions, indels longer than 10 amino acids, truncated and full length proteins (results of frameshift and reverted frameshift mutations, nonsense and start codon changing mutations).
not statistically significant.
Summary of the genomic features of the three T. pallidum ssp. pertenue strains.
|
| |||
| Genome parameter | Samoa D | CDC-2 | Gauthier |
| Genome size | 1,139,330 bp | 1,139,744 bp | 1,139,417 bp |
| G+C content | 52.80% | 52.80% | 52.80% |
| No. of predicted genes | 1125 including 54 untranslated genes | 1125 including 54 untranslated genes | 1125 including 54 untranslated genes |
| No. of fused genes | 25 (52 corresponding genes in the Nichols genome | 24 (50 corresponding genes in the Nichols genome | 24 (50 corresponding genes in the Nichols genome |
| Sum of the intergenic region lengths (% of the genome length) | 52,844 bp (4.64%) | 52,963 bp (4.65%) | 53,300 bp (4.68%) |
| Average/median gene length | 980.3/831.0 bp | 980.4/831.0 bp | 979.3/831.0 bp |
| Average/median gene length of genes with unknown function | 843.4/657 bp | 843.8/657 bp | 841.4/652.5 bp |
| No. of genes encoded on plus/minus DNA strand | 600/525 | 600/525 | 600/525 |
| No. of genes coding for proteins with predicted function | 639 | 639 | 639 |
| No. of genes coding for treponemal conserved hypothetical proteins | 140 | 140 | 140 |
| No. of genes coding for conserved hypothetical proteins | 141 | 141 | 141 |
| No. of genes coding for hypothetical proteins | 145 | 145 | 145 |
| No. of annotated pseudogenes (no. of all pseudogenes) | 6 (13) | 6 (13) | 6 (13) |
| No. of tRNA loci | 45 | 45 | 45 |
| No. of rRNA loci | 6 (2 operons) | 6 (2 operons) | 6 (2 operons) |
| No. of ncRNAs | 3 | 3 | 3 |
AE000520.1.
Figure 1A schematic representation of the chromosomal TP0126 and TP0127 region.
Newly annotated genes for each TPE strain are shown in comparison with resequenced and reannotated Nichols genes. Gene names were modified according to the GenBank instructions from the previously published ones [19]. The TP0126.1, TP0126.2, TP0126.4, and TP0126.5 genes were renamed as TP0126a, TP0126b, TP0126c and TP0126d, respectively.
Gene fusions observed between TPE Samoa D genome and TPA Nichols genome.
| Annotated | Resequenced | Annotated | Number of fused Nichols (AE000520.1) genes in the Samoa D genome |
| TP0006, TP0007, TP0008 | TP0006_7_8 | TPESAMD_0006 | 3 |
| TP0013, TP0014 | TP0013_14 | TPESAMD_0013 | 2 |
| TP0018, TP0019 | TP0018_19 | TPESAMD_0018 | 2 |
| TP0126 | TP0126, TP0126a | TPESAMD_0126 | 2 |
| TP0127 | TP0127 | TPESAMD_0127a, TPESAMD_0127b | 0 |
| TP0172, TP0173 | TP0172_173 | TPESAMD_0172 | 2 |
| TP0174, TP0175, TP0176 | TP0174_175_176 | TPESAMD_0174 | 3 |
| TP0284, TP0285 | TP0284_285 | TPESAMD_0284 | 2 |
| TP0286, TP0287 | TP0286_287 | TPESAMD_0286 | 2 |
| TP0288, TP0289 | TP0288_289 | TPESAMD_0288 | 2 |
| TP0299, TP0300 | TP0299_300 | TPESAMD_0300 | 2 |
| TP0314, TP0315 | TP0314, TP0315 | TPESAMD_0314 | 2 |
| TP0324, TP0325 | TP0324_325 | TPESAMD_0324 | 2 |
| TP0377, TP0378 | TP0377_378 | TPESAMD_0377 | 2 |
| TP0419, TP0420 | TP0419_420 | TPESAMD_0419 | 2 |
| TP0433, TP0434 | TP0433_434 | TPESAMD_0433 | 2 |
| TP0462, TP0463 | TP0462_463 | TPESAMD_0462 | 2 |
| TP0468, TP0469 | TP0468_469 | TPESAMD_0468 | 2 |
| TP0481, TP0482 | TP0481_482 | TPESAMD_0481 | 2 |
| TP0587, TP0588 | TP0587_588 | TPESAMD_0587 | 2 |
| TP0597, TP0598 | TP0597_598 | TPESAMD_0597 | 2 |
| TP0702, TP0703 | TP0702_703 | TPESAMD_0702 | 2 |
| TP0781, TP0782 | TP0781_782 | TPESAMD_0781 | 2 |
| TP0859, TP0860 | TP0859, TP0860 | TPESAMD_0859 | 2 |
| TP0899, TP0900 | TP0899_900 | TPESAMD_0899 | 2 |
| TP0928, TP0929 | TP0928_929 | TPESAMD_0928 | 2 |
Resequencing of selected Nichols genes (unpublished results) resulted in similar fusions (with the exceptions of TP0126, TP0126a; TP0314, TP0315; and TP0859, TP0860) as in the Samoa D genome. Gene names with underscored dash indicate fusions.
Genes are fused only in the Samoa D and SS14 genomes, not in other pertenue or pallidum tested genomes.
Genes are fused only in the Nichols genome and separated in all further 6 investigated genomes (Samoa D, CDC-2, Gauthier, SS14, Chicago, DAL-1).
TPE genes containing major sequence changes encoding proteins with predicted cell function.
| Gene | Gene name | Gene/protein function | Functional gene group | Type of change in gene/protein | Gene expression rate | Remark | Z-test of Selection result (p) |
| 0009 |
| Tpr protein A | Potential virulence factor | rev. fr. mut. | 0.315 | authentic frameshift mutation in the Nichols genome, MSC in Cuniculi A ortholog | - |
| 0103 |
| ATP-dependent helicase RecQ | DNA replication, repair, recombination | fr. mut., gene elongation | 0.394 | MSC in Cuniculi A ortholog | - |
| 0117 |
| Tpr protein C | Potential virulence factor | MSC | 0.737 | MSC in Cuniculi A ortholog – originally detected in | - |
| 0131 |
| Tpr protein D | Potential virulence factor | 9 aa S | 0.703 | MSC in Cuniculi A ortholog – originally detected in | - |
| 0136 | treponemal conserved hypothetical outer membrane protein | Virulence | MSC | 5.199 | antigen | - | |
| 0316 |
| Tpr protein F | Potential virulence factor | rev. fr. mut. | 0.679 | MSC in Cuniculi A ortholog – originally detected in | - |
| 0326 |
| outer membrane protein | Virulence | 5 aa S, 5 aa D | 0.628 | antigen, MSC in Cuniculi A ortholog | Positive (0.048) |
| 0433 TP0433-4 |
| treponemal conserved hypothetical protein | Potential virulence factor | MSC | 1.390 1.985 | MSC in Cuniculi A ortholog; antigen; recombination detected; positive selection according to | - |
| 0488 |
| methyl-accepting chemotaxis protein | Cell processes | MSC | 3.341 | MSC in Cuniculi A ortholog | Positive (0.000) |
| 0620 |
| Tpr protein I | Potential virulence factor | 10 aa S | 0.654 | MSC in Cuniculi A ortholog – originally detected in | - |
| 0621 |
| Tpr protein J | Potential virulence factor | MSC | 0.751 | MSC in Cuniculi A ortholog – originally detected in | - |
| 0671 | Ethanolamine phosphotransferase | General metabolism | start codon mut. | 0.197 | pseudogene | ||
| 0897 |
| Tpr protein K | Virulence | MSC | 1.121 | MSC in Cuniculi A ortholog – originally detected in | - |
| 1031 |
| Tpr protein L | Potential virulence factor | MSC | 0.757 | MSC in Cuniculi A ortholog, recombination detected – originally detected in | - |
T. p. ssp. pertenue (TPE) genes encoding proteins with predicted cell function containing six or more amino acid changes and/or major sequence changes between all studied T. p. ssp. pertenue and all T. p. ssp. pallidum strains are shown.
D, deletion; fr. mut., frameshift mutation; I, insertion; MSC (major sequence changes) are defined in Table 1; rev. fr. mut., reverted frameshift mutation; S, substitution.
Gene expression rate in Nichols strain grown in rabbits. The gene expression rates were taken from [58].
The gene was shown to contain frameshift mutations or MSC in the genome of Treponema paraluiscuniculi Cuniculi A [33].
Detected recombination within the gene or previously identified recombination [56].
The corresponding protein was identified as an antigen [54].
The corresponding protein was identified as a lipoprotein [55].
The selection test was calculated using the Kumar model [47] using MEGA4 [48] software.
TPE genes containing major sequence changes encoding hypothetical proteins.
| Gene | Protein prediction | Type of gene/protein change | Gene expression rate | Remark | Z-test Selection Type (p) |
| 0129 | HP | nonsense mutation | 0.763 | pseudogene, MSC in Cuniculi A ortholog | |
| 0132 | HP | deletions, fr. mut. | 0.832 | pseudogene, MSC in Cuniculi A ortholog | - |
| 0135 | HP | fr. mut. | 1.526 | pseudogene, MSC in Cuniculi A ortholog | - |
| 0180 | HP | fr. mut. | 1.741 | pseudogene, MSC in Cuniculi A ortholog | - |
| 0133 | TCHP | 13 aa S | 2.093 | antigen | |
| 0134 | TCHOMP | 6 aa S | 3.586 | MSC in Cuniculi A ortholog | |
| 0266 | HP | MSC | 3.213 | pseudogene | - |
| 0304 | TCHP | 6 aa S, 1 aa D | 1.235 | MSC in Cuniculi A ortholog | |
| 0314 TP0314-5 | TCHP | MSC, fr. mut. resulting in gene fusion | 0.906 0.919 | MSC in Cuniculi Aortholog, antigen | - |
| 0318 | HP | fr. mut. | 0.947 | pseudogene, MSC in Cuniculi A ortholog | - |
| 0370 | HP | nonsense mutation | 0.822 | pseudogene, MSC in Cuniculi A ortholog | |
| 0462 TP0462-3 | CHP | MSC | 4.894 4.184 | MSC in Cuniculi A ortholog, possible lipoprotein, antigen | Positive (0.000) |
| 0483 | TCHP | 10 aa S | 0.446 | - | |
| 0577 | TCHMP | 3 aa S, 4 aa I | 0.531 | MSC in Cuniculi A ortholog | |
| 0619 | TCHP | MSC | 1.068 | MSC in Cuniculi A ortholog, recombination detected | - |
| 0733 | TCHP | 6 aa S | 1.735 | - | |
| 0858 | TCHP | MSC | 11.770 | antigen, MSC in Cuniculi A ortholog | |
| 0865 | TCHOMP | 11 aa S, 1 aa I | 0.371 | MSC in Cuniculi A ortholog | Positive (0.013) |
| 0968 | TCHP | 11 aa S | 3.256 | MSC in Cuniculi A ortholog | Positive (0.035) |
| 1030 | HP | fr. mut. | 1.46 | pseudogene | - |
T. p. ssp. pertenue (TPE) genes encoding hypothetical proteins containing six or more amino acid changes and/or major sequence changes between all studied T. p. ssp. pertenue and all T. p. ssp. pallidum strains are shown. CHP, conserved hypothetical protein; HP, hypothetical protein; TCHMP, treponemal conserved hypothetical membrane protein; TCHOMP, treponemal conserved hypothetical outer membrane protein; TCHP, treponemal conserved hypothetical protein.
D, deletion; fr. mut., frameshift mutation; I, insertion; MSC (major sequence changes) are defined in Table 1; S, substitution.
Gene expression rate in Nichols strain grown in rabbits. The gene expression rates were taken from [58].
The gene was shown to contain frameshift mutations or MSC in the genome of Treponema paraluiscuniculi Cuniculi A [33].
The corresponding protein was identified as an antigen [54].
The corresponding protein was identified as a lipoprotein [55].
Detected recombination.
The selection test was calculated using the Kumar model [47] using MEGA4 [48] software.
Genetic changes specific for individual TPE strain compared to the other two sequenced TPE strains.
| Strain | Affected gene (predicted function) | Detected strain specific change | Coordinates in AE000520 | Coordinates in the Samoa D genome |
|
| TPESAMD_0067 (unknown; CHP) | 303 bp deletion | 73404–73720 | 73402–73415 |
| TPESAMD_0326 (unknown; OMP) | 3 bp insertion+2 nt changes | 346396–346403 | 347762–347772 | |
| TPESAMD_0433 ( | contains 12 repetitions of a 60 bp region | 461078–461508 | 462429–463158 | |
| TPESAMD_0470 (unknown; CHP) | contains 12 repetitions of a 24 bp region | 497264–497691 | 498894–499201 | |
| TPESAMD_0967 (unknown; TCHMP) | 18 bp deletion | 1050284–1050311 | 1051983–1052004 | |
| other changes throughout the genome | 4 single nt indels, 3 nt changes in intergenic regions, 60 SNPs in coding regions | |||
|
| TPECDC2_0136 (virulence; TCHOMP) | 33 bp insertion | 157339–157459 | 158172–158229 |
| TPECDC2_0322 (transport) | 1 bp insertion and 1 bp deletion resulting in different reading frame in 32 bp region | 338786–338817 | 340151–340182 | |
| TPECDC2_0433 ( | contains 4 repetitions of a 60 bp region | 461078–461508 | 462429–463158 | |
| TPECDC2_0470 (unknown; CHP) | contains 37 repetitions of a 24 bp region | 497264–497691 | 498894–499201 | |
| other changes throughout the genome | 1 single nt intergenic indel, 3 nt changes in intergenic regions, 42 nt changes in coding regions | |||
|
| TPEGAU_0131 ( | multiple indels and substitutions | 151101–152897 | 152044–153834 |
| TPEGAU_0259 (unknown; CHP) | 9 bp deletion | 270349–270366 | 271116–271133 | |
| TPEGAU_0433 ( | contains 10 repetitions of a 60 bp region | 461078–461508 | 462429–463158 | |
| TPEGAU_0470 (unknown; CHP) | contains 25 repetitions of 24 a bp region | 497264–497691 | 498894–499201 | |
| TPEGAU_0629 (unknown; TCHP) | 302 bp deletion | 686989–687300 | 688575–688886 | |
| TPEGAU_0858 (unknown; TCHP) or 0856a (unknown; HP) | 79 bp deletion | 935475–935571 | 937067–937163 | |
| other changes throughout the genome | 9 single nt indels, 1 nt change in intergenic regions, 58 nt changes in coding regions |
CHP, conserved hypothetical protein; HP, hypothetical protein; OMP, outer membrane protein; TCHMP, treponemal conserved hypothetical membrane protein; TCHOMP, treponemal conserved hypothetical outer membrane protein; TCHP, treponemal conserved hypothetical protein.
tprD sequence was not included in the analysis.
Calculated nucleotide diversity and divergence among and between TPA and TPE strains and subspecies.
| Parameter |
|
|
| No. of nucleotide changes in the intergenic regions | 27 (5.1 nt per 10,000 bp) | 7 (1.3 nt per 10,000 bp) |
| No. of nucleotide changes in the coding regions | 379 (35 nt per 100,000 bp) | 156 (14 nt per 100,000 bp) |
| Whole genome nucleotide diversity (π) among subspecies strains ± standard deviation | 0.00033±0.00015 | 0.00032±0.00011 |
| Average number of nucleotide differences between subspecies strains | 2111.7 | |
| Whole genome nucleotide divergence between subspecies ( | 0.00154±0.00065 | |
highly divergent tprD and tprK were excluded from the analysis.
calculated from the complete genome sequences.
Figure 2Number of nucleotide differences consistently found between all investigated pertenue and pallidum genomes.
Differences shown in 5 kb-long intervals along the treponemal chromosome. Please note that the numbers of nucleotide changes (shown in bp) are for 7 intervals artificially terminated at 100 nt; the real values are shown next to corresponding vertical columns.