| Literature DB >> 32236138 |
Lenka Mikalová1, Klára Janečková1, Markéta Nováková1, Michal Strouhal1, Darina Čejková2, Kristin N Harper3, David Šmajs1.
Abstract
Treponema pallidum subsp. endemicum (TEN) is the causative agent of endemic syphilis (bejel). Until now, only a single TEN strain, Bosnia A, has been completely sequenced. The only other laboratory TEN strain available, Iraq B, was isolated in Iraq in 1951 by researchers from the US Centers for Disease Control and Prevention. In this study, the complete genome of the Iraq B strain was amplified as overlapping PCR products and sequenced using the pooled segment genome sequencing method and Illumina sequencing. Total average genome sequencing coverage reached 3469×, with a total genome size of 1,137,653 bp. Compared to the genome sequence of Bosnia A, a set of 37 single nucleotide differences, 4 indels, 2 differences in the number of tandem repetitions, and 18 differences in the length of homopolymeric regions were found in the Iraq B genome. Moreover, the tprF and tprG genes that were previously found deleted in the genome of the TEN Bosnia A strain (spanning 2.3 kb in length) were present in a subpopulation of TEN Iraq B and Bosnia A microbes, and their sequence was highly similar to those found in T. p. subsp. pertenue strains, which cause the disease yaws. The genome sequence of TEN Iraq B revealed close genetic relatedness between both available bejel-causing laboratory strains (i.e., Iraq B and Bosnia A) and also genetic variability within the bejel treponemes comparable to that found within yaws- or syphilis-causing strains. In addition, genetic relatedness to TPE strains was demonstrated by the sequence of the tprF and tprG genes found in subpopulations of both TEN Iraq B and Bosnia A. The loss of the tprF and tprG genes in most TEN microbes suggest that TEN genomes have been evolving via the loss of genomic regions, a phenomenon previously found among the treponemes causing both syphilis and rabbit syphilis.Entities:
Year: 2020 PMID: 32236138 PMCID: PMC7112178 DOI: 10.1371/journal.pone.0230926
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of the genomic features of the TEN strains Iraq B and Bosnia A.
| Genome parameter | TEN Iraq B | TEN Bosnia A |
|---|---|---|
| GenBank Accession No. | CP032303.1 | CP007548.1 |
| Genome size | 1,137,653 bp | 1,137,653 bp |
| G+C content | 52.77% | 52.77% |
| No. of predicted genes | 1125 including 54 untranslated genes | 1125 including 54 untranslated genes |
| Sum of the intergenic region length (% of the genome length) | 52,289 bp (4.60%) | 52,643 bp (4.63%) |
| Average/median gene length | 978.8/831.0 bp | 979.2/831.0 bp |
| No. of genes encoded on plus/minus DNA strand | 600/525 | 600/525 |
| No. of annotated pseudogenes | 19 | 15 |
| No. of tRNA loci | 45 | 45 |
| No. of rRNA loci | 6 (2 operons) | 6 (2 operons) |
| No. of ncRNAs | 3 | 3 |
*Pseudogenes comprised those identified during comparison of the sequence of TEN Iraq B with TEN Bosnia A sequence (TP0146, TP0279, TP0461, TP0479, TP0520, TP0532, TP0812, TP0865, TP1029 and TP1031) and those identified during comparison with TPA Nichols and TPE Samoa D sequences (TP0082a, TP0129, TP0132, TP0135, TP0266, TP0318, TP0370, TP0671 and TP1030).
**Pseudogenes annotated in the sequence of TEN Bosnia A resulted either from comparison with TPE Samoa D sequence (TP0082a, TP0146, TP0316, TP0370, TP0520, TP0532, TP0812, TP1029) or with TPA Nichols sequence (TP0129, TP0132, TP0135, TP0266, TP0318, TP0671 and TP1030).
Genetic differences between TEN Bosnia A and Iraq B genomes.
| TEN Iraq B (CP032303.1) coordinate | Nucleotide in TEN Iraq B | Nucleotide in TEN Bosnia A | Gene | Gene coordinates | Protein | Amino acid replacement(Iraq B/Bosnia A) |
|---|---|---|---|---|---|---|
| 18409 | G | A | TP0017 | 1 | hypothetical protein | M/M |
| 18413 | T | C | 5 | V/A | ||
| 80362 | T | C | TP0073 | 260 | HDOD domain protein | K/R |
| 135228 | G | T | TP0117 | 1451 | TprC | P/E |
| 135229 | G | C | 1450 | P/E | ||
| 135246 | C | T | 1433 | R/K | ||
| 136528 | C | T | 151 | E/K | ||
| 136542 | C | T | 137 | R/H | ||
| 188037 | G | T | TP0165 | 477 | TroC | L/L |
| 203271 | G | A | TP0186 | 368 | HemN | G/E |
| 230525 | C | A | TP0225 | 241 | hypothetical protein | P/T |
| 320541 | A | G | TP0304 | 1580 | hypothetical protein | M/T |
| 330101 | G | A | TP0313 | 1042 | TprE | V/I |
| 333269 | C | T | TP0316 | 801 | TprG | G/G |
| 372199 | T | C | IGR TP0347-8 | - | - | - |
| 450049 | G | A | TP0422 | 769 | hypothetical protein | A/T |
| 468671 | C | T | TP0442 | 1298 | RecN | A/V |
| 497702 | G | A | TP0470 | 75 | hypothetical protein | L/L |
| 512065 | G | C | TP0483 | 1187 | hypothetical protein | S/W |
| 521209 | A | G | TP0488 | 819 | Mcp2-1 | R/R |
| 521388 | T | C | 998 | F/S | ||
| 534764 | G | A | TP0500 | 952 | penicillin-binding protein | A/T |
| 566812 | T | G | TP0524 | 5 | S16 family endopeptidase La | E/A |
| 592250 | C | T | TP0548 | 1043 | FadL-like protein | P/L |
| 643057 | G | A | TP0592 | 744 | hypothetical protein | M/I |
| 690733 | G | A | TP0632 | 929 | TprS | R/H |
| 702522 | G | C | TP0641a | 94 | hypothetical protein | L/V |
| 725892 | A | G | TP0664 | 46 | FlaA | T/A |
| 728789 | A | C | TP0667 | 546 | bifunctional phosphoribulokinase/uridine kinase | H/Q |
| 882333 | A | G | TP0814 | 69 | TrxB | A/A |
| 942845 | G | A | TP0864 | 67 | hypothetical protein | L/L |
| 993147 | T | G | TP0915 | 1030 | hypothetical protein | F/V |
| 994934 | T | C | TP0917 | 1320 | MgtE | F/F |
| 1032779 | C | T | TP0952 | 620 | putative lipase/esterase | G/E |
| 1085945 | T | C | TP0999 | 261 | FtsK | A/A |
| 1089703 | C | T | TP1001 | 552 | hypothetical protein | A/A |
| 1130225 | A | G | TP1035 | 731 | ValS | L/P |
Listed are 37 single nucleotide variants. Differences in the number of repetitive sequences and differences in homopolymeric regions are not listed. The hypervariable tprK (TP0897) gene was excluded from the analysis.
*P/E replacement is a result of two adjacent nucleotide changes.
**IGR, intergenic region.
***Protein predictions by Radolf and Kumar [30].
****A region comprising 7 T nucleotides.
Fig 1A phylogenetic tree based on the alignment of the Treponema pallidum subsp. endemicum (TEN) Iraq B genome with additional TEN and T. pallidum subsp. pertenue (TPE) genomes.
The tree was constructed from the complete genome sequences of the TEN strains (Bosnia A, Iraq B) and TPE strains (CDC-2, Gauthier, Samoa D, Ghana-051, CDC 2575, Kampung Dalan K363, Sei Geringging K403, LMNP-1, and Fribourg-Blanc). The tprK sequences, tRNA-Ile and tRNA-Ala regions of both rRNA operons, tprD, arp, and TP0470 genes were omitted from the analysis due to their recombinant or repetitive character. The genome sequence of the T. pallidum subsp. pallidum (TPA) strain SS14 was used as an outgroup. There were a total of 1,129,405 positions in the final dataset. The Maximum Likelihood tree was constructed in RAxML-NG (v0.9.0) using the TN93 substitution model [23]. Gamma-distributed substitution rates among sites and proportion of invariable sites were applied. The robustness of the tree branches was assessed by 500-bootstrap replicate analyses. Predicted tree was visualized by iTOL (v5.5) [25]. The bar scale corresponds to a difference of 0.0001 nucleotides per nucleotide site.
Intrastrain heterogeneity found in the TEN Iraq B genome.
| TEN Iraq B (CP032303.1) coordinates | Reference allele | Alternative allele | Amino acid replacement | Average depth coverage (x) | Percentage of alternative allele (%) | Gene | Protein |
|---|---|---|---|---|---|---|---|
| 14661 | C | T | R/C | 1841 | 19.0 | phenylalanine—tRNA ligase, beta subunit | |
| 135229 | G | T | P/E | 4380 | 44.5 | TprC | |
| 135228 | G | C | P/E | 4360 | 44.4 | ||
| 135246 | C | T | R/K | 3573 | 34.7 | ||
| 136528 | C | T | E/K | 4264 | 26.0 | ||
| 136542 | C | T | R/H | 3961 | 24.8 | ||
| 334466 | G | A | A/T | 6111 | 8.1 | sugar ABC superfamily ATP binding cassette transporter | |
| 384160 | G | T | A/S | 3050 | 10.0 | sensor histidine kinase | |
| 388371 | G | A | D/N | 2222 | 12.2 | response regulator | |
| 824846 | G | A | A/A | 3614 | 12.7 | (TP0762) | hypothetical protein |
| 883419 | G | T | G/W, P/H | 2479 | 9.2 | (TP0814a, TP0814b) | hypothetical proteins |
| G/V | (TP0815) | GNAT family acetyltransferase | |||||
| 1091347 | G | C | A/P | 1005 | 9.3 | (TP1003) | hypothetical protein |
Minor alleles with a frequency over 8% and depth coverage over 100x are shown. The tprK (TP0897) gene was excluded from the analysis.
Fig 2A. A schematic representation of a chromosomal region of the Treponema pallidum subsp. endemicum (TEN) Iraq B containing the TP0314-TP0318 genes. The entire genome region shown was amplified as two overlapping regions (25BA, 25BB), and each of these two loci contained one specific (Sp) and one non-specific (NSp) primer. Specific primers recognized unique sequences in the TEN Iraq B genome, while non-specific primers recognized binding sites in the paralogous TP0619-TP0621 region. The 25BA region was amplified in a one-step PCR, while the 25BC region was amplified in a second step with the template 25BB DNA (the 25BB template DNA did not have sufficient concentration for sequencing). The use of specific primers verified the amplification from the correct genome part. A 2.3 kb-long deletion present in a portion of the TEN population is shown and covers considerable parts of the tprFG loci (TP0316, TP0317). The paralogous sequence covering TP0314-TP0316 is also shown (see Fig 2B), and this sequence is identical to the region containing the tprIJ loci (TP0619-TP0621). Direct 75-bp long repetitions were found in the DNA regions flanking the deletion. B. A schematic representation of a paralogous sequence covering the TP0314-TP0316 and TP0619-TP0621 regions among available sequences from TEN and T. pallidum subsp. pertenue (TPE) genomes. Whereas the regions covering TP0314-TP0316 and TP0619-TP0621 are identical (or nearly identical) within a single TEN or TPE genome, intragenomic sequences of these regions are different among T. pallidum subsp. pallidum(TPA) strains [28]. Whereas individual TPE strains differed in 5–8 nucleotides from the reference sequence of TPE Samoa D within TP0314-TP0316 and TP0619-TP0621, TEN strains differed from TPE Samoa D in 15 nt positions. Two TPE strains (TPE Kampung Dalan, TPE CDC-2) did not have identical TP0314-TP0316 and TP0619-TP0621 loci and differed between them in 2 and 2 nucleotides, respectively. Some of the nucleotide differences were shared between both TEN and some TPE strains.