Literature DB >> 31216284

Identification of positively selected genes in human pathogenic treponemes: Syphilis-, yaws-, and bejel-causing strains differ in sets of genes showing adaptive evolution.

Denisa Maděránková1, Lenka Mikalová2, Michal Strouhal2, Šimon Vadják1, Ivana Kuklová3, Petra Pospíšilová2, Lenka Krbková4, Pavlína Koščová1, Ivo Provazník1, David Šmajs2.   

Abstract

BACKGROUND: Pathogenic treponemes related to Treponema pallidum are both human (causing syphilis, yaws, bejel) and animal pathogens (infections of primates, venereal spirochetosis in rabbits). A set of 11 treponemal genome sequences including those of five Treponema pallidum ssp. pallidum (TPA) strains (Nichols, DAL-1, Mexico A, SS14, Chicago), four T. p. ssp. pertenue (TPE) strains (CDC-2, Gauthier, Samoa D, Fribourg-Blanc), one T. p. ssp. endemicum (TEN) strain (Bosnia A) and one strain (Cuniculi A) of Treponema paraluisleporidarum ecovar Cuniculus (TPeC) were tested for the presence of positively selected genes. METHODOLOGY/PRINCIPAL
FINDINGS: A total of 1068 orthologous genes annotated in all 11 genomes were tested for the presence of positively selected genes using both site and branch-site models with CODEML (PAML package). Subsequent analyses with sequences obtained from 62 treponemal draft genomes were used for the identification of positively selected amino acid positions. Synthetic biotinylated peptides were designed to cover positively selected protein regions and these peptides were tested for reactivity with the patient's syphilis sera. Altogether, 22 positively selected genes were identified in the TP genomes and TPA sets of positively selected genes differed from TPE genes. While genetic variability among TPA strains was predominantly present in a number of genetic loci, genetic variability within TPE and TEN strains was distributed more equally along the chromosome. Several syphilitic sera were shown to react with some peptides derived from the protein sequences evolving under positive selection.
CONCLUSIONS/SIGNIFICANCE: The syphilis-, yaws-, and bejel-causing strains differed relative to sets of positively selected genes. Most of the positively selected chromosomal loci were identified among the TPA treponemes. The local accumulation of genetic variability suggests that the diversification of TPA strains took place predominantly in a limited number of genomic regions compared to the more dispersed genetic diversity differentiating TPE and TEN strains. The identification of positively selected sites in tpr genes and genes encoding outer membrane proteins suggests their role during infection of human and animal hosts. The driving force for adaptive evolution at these loci thus appears to be the host immune response as supported by observed reactivity of syphilitic sera with some peptides derived from protein sequences showing adaptive evolution.

Entities:  

Mesh:

Year:  2019        PMID: 31216284      PMCID: PMC6602244          DOI: 10.1371/journal.pntd.0007463

Source DB:  PubMed          Journal:  PLoS Negl Trop Dis        ISSN: 1935-2727


Introduction

Adaptive evolution including positive selection plays crucial roles in the evolution of bacterial human pathogens and both have been well documented on a genome-wide scale in a number of bacterial genera including Escherichia, Helicobacter, Neisseria, Listeria, Salmonella, Streptococcus, Campylobacter, and Actinobacillus [1-8]. Pathogenic treponemes are both human and animal pathogens. Human pathogens include Treponema pallidum ssp. pallidum (TPA), the causative agent of syphilis, T. p. ssp. pertenue (TPE, the causative agent of yaws), and T. p. ssp. endemicum (TEN, the causative agent of bejel) while animal pathogens include TPE causing non-human primate infections [9-12], and T. paraluisleporidarum ecovar Cuniculus (TPeC; formerly denoted as Treponema paraluiscuniculi), the causative agent of venereal spirochetosis in rabbits, and T. paraluisleporidarum ecovar Lepus, which infects hares [13-15]. Although the recent work of Edmondson et al. [16] reported successful long-term cultivation of T. pallidum in a tissue culture system, most of the data on treponemal genetics comes from the whole genome sequencing studies [17,18]. The above-listed pathogens are monomorphic, i.e., highly similar at the genetic level and all the genomes of these pathogenic treponemes characterized to date share a genetic identity of 98% or higher [17,18]. The group of human pathogens (TPA, TPE, TEN) are even more related sharing a genetic identity over 99.7% [17,19,20]. Currently, 14 complete treponemal genomes have been published including six TPA genomes, six TPE genomes, one TEN genome, and one TPeC genome [12,19-29]. In addition, 23, 25, 8, and 6 draft whole genome sequences of treponemal strains or isolates were published recently by Arora et al. [30], Pinto et al. [31], Sun et al. [32], and Marks et al. [33], respectively. The genomes of pathogenic treponemes related to T. pallidum contain no prophages or insertion sequence-elements [21,34], or plasmids [35]. Therefore, recombination is expected to be quite infrequent among these treponemes due to a lack of mobile genetic elements [21]. However, traces of both intragenomic DNA recombinations via gene conversion [36-38] and intergenomic homologous recombination after DNA horizontal gene transfer have been described [20,25]. In addition, traces of positive selection have been detected in previously published papers including TPA and TPE comparisons [19], comparisons within TPA strains [39], and detailed intrastrain analysis [40]. However, no comprehensive analysis of positively selected loci in the genomes of pathogenic treponemes has been performed to date. Despite the fact that treponemes related to Treponema pallidum are monomorphic bacteria with extremely low level of genetic diversity [18], it has been shown that human immunity does not protect against different subspecies and not even against different syphilis strains [41]. Therefore, divergent genes encoding differences in proteomes of individual treponemal strains and subspecies are likely of importance for development of syphilis vaccine. In this communication, the whole genome sequences of 11 treponemal strains were systematically analyzed for the presence of positive selection. The identified genes were further reanalyzed relative to all sequences available in 62 draft genomes published to date. The causative agents of syphilis, yaws, and bejel differed in sets of positively selected genes. Moreover, several synthetic peptides covering positively selected protein regions were found to interact with syphilitic sera.

Materials and methods

Strains used in this study

A set of 11 treponemal genomic sequences was examined in this study and included genomes of five TPA strains (Nichols, DAL-1, Mexico A, SS14, Chicago B), four TPE strains (CDC-2, Gauthier, Samoa D, Fribourg-Blanc), one TEN strain (Bosnia A), and one strain of TPeC (Cuniculi A). An overview of the complete genome sequences used is shown in Table 1. Subsequent analysis of selected genes was performed on additional 62 draft genomes (Arora et al. [30]; Pinto et al. [31]; Sun et al. [32]; GenBank genome TPA sequences [UW074B, UW189B, UW228B, UW254B, UW391B] and TEN Iraq B [CP032303.1]). The overall algorithm is shown in Fig 1.
Table 1

Treponemal genomes analyzed in this study.

TP strain*Place and year of isolationReferenceGenBank Accession number, Genome sequence reference
TPA NicholsWashington, D.C., USA; 1912[42]CP004010.2, [21,26]
TPA DAL-1Dallas, USA; 1991[43]CP003115.1, [29]
TPA SS14Atlanta, USA; 1977[44]CP004011.1, [24,26]
TPA Mexico AMexico City, Mexico; 1953[41]CP003064.1, [25]
TPA ChicagoChicago; 1951[41]CP001752, [22]
TPE CDC-2Akorabo, Ghana; 1980[45]CP002375.1, [19]
TPE GauthierBrazzaville, Congo; 1960[46]CP002376.1, [19]
TPE Samoa DApia, Samoa; 1953[41]CP002374.1, [19]
TPE Fribourg-BlancGuinea; 1966[9,10]CP003902.1, [12]
TEN Bosnia ABosnia; 1950[47]CP007548, [20]
TPeC Cuniculi Aunknown; before 1957[41]CP002103.1, [27]

*Additional genome sequences of TPE Ghana-051 and CDC 2575 became available recently [28] and TPA Sea81-4 was published as a whole genome sequence [23].

Fig 1

The algorithm used for identification of positively selected genes.

The original search for positively selected genes started with identification of gene orthologs with 3 or more nucleotide differences leading to nonsynonymous amino acid replacements. The original search was performed on a set of 11 complete treponemal genomes listed in Table 1. Subsequently, orthologous gene sequences extracted from published treponemal draft genomes were used and the Cuniculi A orthologs were removed due to frequent sequential diversity and due to lack of pathogenicity of TPeC to humans. Orthologs from draft genomes were used when available and positively selected genes were analyzed within treponemal subspecies using branch-site PAML model analysis.

The algorithm used for identification of positively selected genes.

The original search for positively selected genes started with identification of gene orthologs with 3 or more nucleotide differences leading to nonsynonymous amino acid replacements. The original search was performed on a set of 11 complete treponemal genomes listed in Table 1. Subsequently, orthologous gene sequences extracted from published treponemal draft genomes were used and the Cuniculi A orthologs were removed due to frequent sequential diversity and due to lack of pathogenicity of TPeC to humans. Orthologs from draft genomes were used when available and positively selected genes were analyzed within treponemal subspecies using branch-site PAML model analysis. *Additional genome sequences of TPE Ghana-051 and CDC 2575 became available recently [28] and TPA Sea81-4 was published as a whole genome sequence [23].

Identification of genes under adaptive evolution

The TPE Samoa D was used as a reference genome, and all 1068 orthologous genes were extracted from the other 10 complete genomes using given annotation coordinates. The orthologous sequences from the complete genomes were aligned at the codon level using Matlab R2013a software and the Bioinformatics Toolbox. Only genes where at least three nonsynonymous mutations at different sites occurred were further analyzed with respect to the presence of positive selection. A BLAST search was used to determine orthologous sequences in the draft genomes. These sequences were aligned in Matlab at the nucleotide level because large number of ambiguous sites precluded proper automatic ORF localization. Then the sequences from the draft genomes were scanned for nucleotide differences. After filtering sites with unknown nucleotides, insertions, and deletions, only orthologs with more than five nucleotide differences at different sites were analyzed further. These orthologs were aligned with the corresponding genes from the complete genomes to determine the ORF and identify the number of nonsynonymous mutations. At the same time, the TPeC orthologs were excluded in this step due to frequent sequential diversity and due to the lack of pathogenicity of TPeC to humans. The removal of the TPeC orthologs did not change the number of detected positively selected genes. Compared to an analysis of whole genomes, no new locus with more than three nonsynonymous mutations at different sites was identified during the analysis of draft genomes. For each analyzed gene from the complete genomes, a maximum likelihood phylogenetic tree with 50 bootstrap replicates was constructed using MEGA 6 [48]. Different evolution models (Kimura 2-parameter [49], Tamura 3-parameter [50], and Tamura-Nei [51]) were applied to each gene. The trees of each gene were compared by calculating Robinson-Foulds distances [52] using R software (packages phytools and phangorn) [53]. The comparison showed that the choice of the evolution model did not significantly change the topology of the tree; the Tamura-Nei evolution model was chosen for the construction of all phylogenetic trees. A calculation of mutational rate ratio ω between two gene sequences was the basis for the positive selection analysis. The ω was calculated as a ratio of nonsynonymous to synonymous mutational rates. The ratio indicates negative purifying selection (0 < ω < 1), neutral evolution (ω = 1), and positive selection (ω > 1) [54]. A set of selected genes from complete genomes was tested relative to positive selection using the maximum likelihood method using the CODEML of the PAML software package [55]. PAML version 4 [56] and its user interface PAMLX [57] were used in our study. For each analyzed gene, its maximum likelihood phylogenetic tree was used as an input tree. The CODEML offers several different codon evolutionary models, and the statistical likelihood ratio test (LRT) was used to compare the codon evolutionary model to the null model. The Bayes empirical Bayes method (BEB) [58] was then used to evaluate the posterior probability of sites considered to have been positively selected. The CODEML models could produce different results (i.e., a list of sites under positive selection) since they calculate different parameter estimates. Site models allow ω to vary in each site (codon) within the gene. Statistical testing was required for sites with ω > 1. Two pairs of models were predominantly used since their LRTs have low false-positive rates. M1a (nearly neutral evolution) was compared to M2a (positive selection) [58,59] and M7 (beta) was compared to M8 (beta & ω) [60]. Our preliminary testing found that the two model pairs gave the same or very similar results. Therefore we chose to use the M7-M8 model pair. The M7 model is a null model that allows 10 classes of sites with a ω beta-distribution within the interval 0 ≤ ω ≤ 1. Sites with ω > 1 are not allowed. The alternative M8 model adds an eleventh class of sites with ω > 1. Each site was tested to determine the class to which it belongs. The LRT compares twice the log-likelihood difference 2Δl = 2(l1-l0) between the M7 model (log likelihood value l0) and the M8 model (log likelihood value l1) to the χ2 distribution [61]. If the twice log-likelihood difference is above a critical χ2 value, then the null model is rejected, and the positive selection is statistically significant. A considerable disadvantage of the site models is that ω was calculated as an average over all codons of the site. Therefore, the site models are not suitable for the data where ω also varies between lineages. In contrast, the branch-site models search for positive selection in sites and pre-specifies lineages where different rates of ω may occur [62]. Sequences of lineages are a priori divided into a group of foreground lineages where positive selection may occur and group of background lineages where only purifying selection or neutral evolution occurs. We used branch-site model A, which allows four classes of sites and different setups of foreground lineages to be tested depending on the gene phylogeny. In branch-site model A, all lineages under purifying selection with a low value of ω0 belong to site class 0. Weak purifying selection and neutral evolution with ω1 near to value 1 are allowed in site class 1. In site class 2a, a proportion of class 0 sites in foreground lineages is under positive selection with ω2 > 1. Similarly, site class 2b is a proportion of class 1 sites under positive selection with ω2 > 1. The null model for LRT has ω2 = 1. Critical values of LRT (2Δl) are 2.71 at 5% and 5.41 at 1% [63]. The posterior probabilities of suggested sites under positive selection were calculated using the BEB method. The average pairwise p-distances (APD) and average number of mutations (transitions and transversions), calculated using MEGA-X [64], were used to evaluate genetic diversity. A pairwise deletion of sites with gaps/missing data was used. The Fisher exact statistical test was used to assess the significance of the changes between average numbers of mutations.

Synthetic peptides and chemiluminescent detection of serological reactions

Synthetic biotinylated peptides, covering protein regions containing positively selected residues, were designed. Peptide synthesis was performed by JPT Peptide Technologies (Berlin, Germany) on a 50–200 nmol scale. The lyophilized peptides were resuspended in TBS buffer (25 mM Tris, 150 mM NaCl, pH = 7.2) at 1 mM concentrations and were stored at −20°C. Prior to further use, synthetic peptides were diluted 1000x in TBS buffer. Streptavidin-coated 96 well plates (Pierce Streptavidin Coated High Binding Capacity White 96-Well Plates; Thermo Scientific, Rockford, USA) were washed three times with 200 μl of washing buffer (TBS buffer containing Tween 20 (0.05%) and Bovine Serum Albumin, BSA (0.1%) (Sigma-Aldrich, Prague, Czech Republic); then 100 μl of diluted peptide was added to each well and incubated for 30 min at room temperature as recommended by the manufacturer (Thermo Scientific) with mild shaking. Subsequently, each well was washed three times with washing buffer (200 μl in each step); then 100 μl of blocking buffer SuperBlock Blocking Buffer in TBS (Thermo Scientific) were added and incubated for 30 min at room temperature with mild shaking. Each well was washed three times with washing buffer (200 μl in each step) and 100 μl of diluted sera (1:500 in washing buffer) were added and incubated for 30 min at room temperature with mild shaking. Each well was washed three times with washing buffer (200 μl in each step) and 100 μl of diluted secondary antibody conjugated with horseradish peroxidase were subsequently added (1:2000 in washing buffer; Goat Anti-Human IgG/IgA/IgM Horseradish Peroxidase Conjugate; Life Technologies, Carlsbad, USA) and incubated for 30 min at room temperature with mild shaking. Each well was then washed three times with washing buffer (200 μl in each step); then 100 μl of chemiluminescent detection solution (Super Signal ELISA Pico Chemiluminescent Substrate; Thermo Scientific) was added. Luminescence was measured on a TriStar2 LB 942 luminometer with a Modular Multimode Microplate Reader (Berthold Technologies, Bad Wildbad, Germany). Each experiment was performed at least three times. A signal was considered positive when it was higher than the average of the three lowest values for each serum plus five standard deviations of the average value.

Ethics statement

The human sera were collected from adult patients diagnosed with syphilis at the Department of Dermatology, 1st Faculty of Medicine, Charles University, Prague, Czech Republic. Sera from child patients diagnosed with Lyme disease were obtained from the Department of Children's Infectious Diseases, Faculty of Medicine and University Hospital, Masaryk University, Brno, Czech Republic. All clinical samples were obtained after patients or parents of involved children signed an informed consent. The design of the study was approved by the ethics committee of the Faculty of Medicine, Masaryk University. All human sera were collected under established guidelines.

Results

Identification of positively selected genes

A comparison of 47 orthologous gene sequences from the complete genomes where at least three nonsynonymous mutations at different sites occurred was used to identify positively selected genes using the site and branch-site models of the CODEML in PAML package [55]. The completely sequenced genomes are listed in Table 1 and include 11 genomes. In addition, 25 draft TPA genomes [31], 23 draft TPA and TPE genomes [30], 8 TPA genomes [32], 5 TPA genomes from GenBank (UW074B, UW189B, UW228B, UW254B, UW391B), and one TEN (Iraq B) were also analyzed. The overall algorithm is shown in Fig 1. In all cases of complete treponemal genomes, the genome structure was identical or very similar allowing straightforward identification of gene orthologs. However, in many cases, draft genome sequences were either incomplete or contained many ambiguous bases precluding their use in analyses. This resulted in a variable number of sequences used for identification of positively selected sites within individual loci. Altogether, 22 positively selected genes were identified in the TP genomes using site model analysis in PAML (Table 2). The number of positively selected amino acid sites varied from 1 to 65, with a median value of 8.5. A list of positively selected protein sites identified using PAML software (site and branch-site models) as well as PAML-identified positively selected protein sites within treponemal subspecies are shown in S1 Table.
Table 2

A set of 22 genes evolving under adaptive evolution that was identified using site and branch-site model analysis in the PAML program.

GeneGene nameProteinNo. of positively selected protein sites identified by PAML site or branch-site model (no. of analyzed sequences)Previously published evidence of recombinationReferencesGene average pairwise p-distances
TP0117tprCTpr protein C22 (41)+[38]0.009171
TP0126bhypothetical protein2 (69)-0.005899
TP0131tprDTpr protein D65 (41)+[38]0.033553
TP0133outer membrane proteina3 (66)+[20]0.004718
TP0136fibronectin binding protein5 (54)+[65]0.016599
TP0314subtilisin-like proteina7 (49)-0.026378
TP0316tprFTpr protein F5 (33)-0.003250
TP0317tprGTpr protein G8 (39)+[38]0.003241
TP0326bamABamA7 (64)+[20,25,66]0.002131
TP0462lipoprotein, subtilisin-like proteina44 (60)-0.009186
TP0488mcp2methyl-accepting chemotaxis protein50 (64)+[20,25,67]0.002691
TP0515outer membrane protein15 (66)-0.001381
TP0548FadL-like proteinb24 (52)+[67]0.010881
TP0619Fe, Mn superoxide dismutasea7 (40)-0.029166
TP0620tprITpr protein I14 (38)+[38]0.006651
TP0621tprJTpr protein J52 (40)+[38]0.023909
TP0733OprG/OmpW-likeion-channelb1 (62)-0.005125
TP0856FadL-like proteinb16 (62)+[68]0.003663
TP0858FadL-like proteinb2 (61)+[33,68]0.005894
TP0859FadL-like proteinb7 (33)-0.004191
TP0865FadL-like proteinb13 (64)+[30]0.004340
TP1031tprLTpr protein L9 (58)+[20]0.007283

aprotein predictions by Naqvi et al. [69]

bprotein predictions by Radolf and Kumar [70]

aprotein predictions by Naqvi et al. [69] bprotein predictions by Radolf and Kumar [70] Average pairwise p-distances (APD) were calculated for each of the 22 genes from 10 complete genomes (Cuniculi A genome was removed from analysis) and from the draft genomes. The APD value for each gene was compared with APDw10 = 0.000525 of the 10 complete genomes without 54 variable loci (listed in S2 Table) which can be considered as a background level of polymorphism. All 22 genes evolving under adaptive evolution had elevated nucleotide substitution density. Out of these 22 genes, 14 genes were previously reported as recombinant (Table 2). These genes are listed in Table 3. Functionally, these genes comprised the tpr genes (tprC, D, G, I, J, L), outer membrane proteins (TP0133, TP0136, TP0548, TP0856, TP0858, TP0865), and genes encoding the outer membrane biogenesis protein (TP0326 [BamA]) and methyl-accepting chemotaxis protein (TP0488 [Mcp-2]). Putative recombination loci were most frequently identified in TEN strain (n = 9) while in other treponemes there were fewer predicted recombinant loci (TPA, n = 7; TPeC, n = 1; TPE, n = 1). When recombinant sequences were removed from analyses, positive selection among 14 these loci (Table 3) was found mostly within TPA strains or isolates (n = 9), within TPE (n = 4), between TEN and TPA/TPE sequences (n = 2), between TPA and TPE strains (n = 1) and between TPA and TPE/TEN sequences (n = 1).
Table 3

A set of 14 positively selected genes with previously detected recombination events.

GeneGene nameProteinPutative recombination inEvidence of positive selection among non-recombinant sequences
TP0117tprCTpr protein CTPA, TENwithin TPAbetween TPA and TPE
TP0131tprDTpr protein Dtwo alternative tprD and tprD2 allelesano
TP0133outer membrane proteinbTENbetween TEN/TPE and TPA
TP0136fibronectin binding proteinTPAwithin TPA
TP0317tprGTpr protein GTPAwithin TPA
TP0326bamABamATPA, TENwithin TPA
TP0488mcp2methyl-accepting chemotaxis proteinTPA, TENwithin TPA
TP0548FadL-like proteincTENwithin TPAwithin TPE
TP0620tprITpr protein ITPEwithin TPE
TP0621tprJTpr protein JTPA, TPeCwithin TPA
TP0856FadL-like proteincTENbetween TEN and TPA/TPE
TP0858FadL-like proteincTENwithin TPE
TP0865FadL-like proteincTPA, TENwithin TPAwithin TPEbetween TEN and TPA/TPE
TP1031tprLTpr protein LTENwithin TPA

atprD and tprD2 alleles existed in both TPA and TPE strains [18,71]

bprotein predictions by Naqvi et al. [69]

cprotein predictions by Radolf and Kumar [70]

atprD and tprD2 alleles existed in both TPA and TPE strains [18,71] bprotein predictions by Naqvi et al. [69] cprotein predictions by Radolf and Kumar [70]

Positively selected genes with no recombination events described so far

A list of the 8 positively selected genes is shown in Table 4. These genes include tprF gene, genes encoding outer membrane proteins (TP0515, TP0733, TP0859), subtilisin-like proteins (TP0314, TP0462), enzymes (TP0619), and hypothetical protein (TP0126b). Evidence of positive selection from analyses performed within the strains belonging to different treponemal subspecies and analyses from the PAML branch-site model revealed that positive selection was found mostly between TPA and TPE strains (n = 6), and within TPA strains or isolates (n = 2). In one case (TP0859), positive selection was found both between TPA and TPE strains (n = 1), and between TEN and TPA/TPE (n = 1).
Table 4

Positively selected genes revealed by the PAML program with no recombination events described so far.

GeneGene nameProteinPositively selected branch
TP0126bhypothetical proteinbetween TPA and TPE
TP0314subtilisin-like proteinabetween TPA and TPE
TP0316tprFTpr protein Fbetween TPA and TPE
TP0462lipoprotein, subtilisin-like proteinawithin TPA
TP0515outer membrane proteinwithin TPA
TP0619Fe, Mn superoxide dismutaseabetween TPA and TPE
TP0733OprG/OmpW-like ion channelbbetween TPA and TPE
TP0859FadL-like proteinbbetween TPA and TPEbetween TEN and TPA/TPE

aprotein predictions by Naqvi et al. [69]

bprotein predictions by Radolf and Kumar [70]

aprotein predictions by Naqvi et al. [69] bprotein predictions by Radolf and Kumar [70] Identified positively selected genes in various treponemal species and subspecies are shown in Table 5 and Fig 2. While Table 5 lists all identified recombinant and positively selected genes in TPA, TPE, and TEN groups of strains or isolates, Fig 2 shows only recombinant or positively selected genes that were identified within a particular subspecies.
Table 5

Positively selected genes and the corresponding proteins in different treponemal species and subspecies.

Proteins previously reported as recombinant out of the positively selected proteins are also shown.

Treponema pallidum subspeciesRecombinant proteinsPositively selected proteins
TPATprC, G, J, BamA, Mcp-2, TP0136TprC, F, G, J, L, BamA, Mcp-2TP0126b, TP0133, TP0136, TP0314, TP0462, TP0515, TP0548, TP0619, TP0733, TP0859, TP0865
TPETprITprE, F, ITP0126b, TP0133, TP0314, TP0548, TP0619, TP0733, TP0858, TP0859, TP0865
TENBamA, Mcp-2, TprLTP0548, TP0856, TP0858TprLTP0856, TP0858, TP0865
Fig 2

Positively selected genes as well as positively selected genes with previously identified recombination event that were identified within particular subspecies.

Genes identified as recombinant in a particular treponemal subspecies are shown in bold. Positively selected genes with no evidence of recombination are shown in regular version. Positively selected genes identified between subspecies of treponemes, but not within any of them, are not shown. Note that positively selected genes occur mostly within the TPA and the recombinant genes are within the TEN genomes. The TP0548 and TP0865 genes were found to be positively selected within TPA and also within TPE subspecies.

Positively selected genes as well as positively selected genes with previously identified recombination event that were identified within particular subspecies.

Genes identified as recombinant in a particular treponemal subspecies are shown in bold. Positively selected genes with no evidence of recombination are shown in regular version. Positively selected genes identified between subspecies of treponemes, but not within any of them, are not shown. Note that positively selected genes occur mostly within the TPA and the recombinant genes are within the TEN genomes. The TP0548 and TP0865 genes were found to be positively selected within TPA and also within TPE subspecies.

Positively selected genes and the corresponding proteins in different treponemal species and subspecies.

Proteins previously reported as recombinant out of the positively selected proteins are also shown.

Distribution of genetic diversity among TPA, TPE, and TEN strains

To test whether genetic diversity in treponemal genomes was distributed evenly along the chromosome, an average pairwise p-distance (APD) and average number of mutations (ANM, transitions and transversions) were calculated (Table 6) for concatenation of 54 genes (S2 Table) representing a total length of 86.8 kbp (7.6% of the genome length). These 54 genes included the tpr genes, genes with traces of possible recombination events (according to Gray et al. [38], and Čejková et al. [19]), genes showing inter-strain variability between TPE and TPA strains and their paralogs (according to Čejková et al. [19]), genes showing intra-strain variability [40], and previously identified positively selected genes (according to Čejková et al. [19]) (S2 Table). The TPA strains contained greater genetic diversity within these genes compared to TPE strains (APD(TPA) = 420.0*10−5 versus APD(TPE) = 384.2*10−5). Note that the average pairwise p-distances within TPA strains (compared to TPE/TEN strains) are lower when complete genomes were analyzed but higher within selected 54 loci (Table 6). In addition, genetic distance within TPA strains (compared to TPE/TEN strains) is markedly lower in complete genomes without selected 54 genes compared to whole genome analyses (p = 0.0008).
Table 6

Average pairwise p-distances (APD) and average number of mutations (ANM, transitions + transversions) within TPA, within TPE/TEN, and between TPA and TPE/TEN, for whole complete genomes, for selected 54 genomic loci, and for complete genomes without selected 54 loci.

Whole genomesAPD (ANM)54 selected loci (S2 Table)APD (ANM)Genomes without 54 lociAPD (ANM)
TP subspeciesWithinGroupBetweengroupsWithingroupBetweengroupsWithingroupBetweenGroups
TPA0.000415a(472.7)TPA-TPE/TEN0.001908(2171.4)0.004200(361.8)TPA-TPE/TEN0.015227(1297.0)0.000105a(110.9)TPA-TPE/TEN0.000831(874.4)
TPE/TEN0.000455a(517.4) 0.003842(325.7) 0.000182a(191.7) 

aStatistically significant difference for genetic distance within TPA strains compared to TPE/TEN strains when complete genomes and genomes without selected 54 genes were compared (The Fisher exact test, p = 0.0008).

aStatistically significant difference for genetic distance within TPA strains compared to TPE/TEN strains when complete genomes and genomes without selected 54 genes were compared (The Fisher exact test, p = 0.0008).

Serological reactivity of human syphilis and Lyme disease sera with synthetic biotinylated peptides derived from positively selected proteins

Synthetic biotinylated peptides were designed to cover protein regions where positively selected amino acid residues were detected. These peptides were tested for reactivity with the patient's syphilis sera (Table 7). As a control, serum from a patient with Lyme disease was used. A positive result was obtained for one to seven peptides (out of 8 tested) depending on the serum used. At the same time, serum obtained from Lyme disease patient failed to recognize peptides derived from treponemal proteins but did react with peptides derived from borrelial protein ErpA.
Table 7

Serological reactivity of a patient's syphilis and Lyme disease sera recognizing synthetic peptides corresponding to protein regions containing positively selected amino acid residues.

Peptides were designed to cover protein regions containing several positively selected amino acid positions.

PeptideDerived from geneProtein functionSequenceSyphilis seraLyme disease serum
350a356b405cB0403201d
TPA51STP0117TprCYVFYRNNGGYELNRVVPSGI+ e
TPE83TP0136fibronectin binding proteinGNSANGGGGGGGCGS+
TPA04TP0314subtilisin-like proteinLQPSSSSYSAGNWHR+++
TPA58TP0316TprFHQSNADADCRLPATG++
TPA61-STP0462lipoprotein, subtilisin-like proteinTPSTVLDKTNGAIR++
TPA64-STP0515outer membrane proteinYRLHSEPPSSGSRQ+
TPA17TP0619Fe, Mn superoxide dismutaseLGQGLLQPSSSSYSA
TPA21TP0733OprG/OmpW-like ion channelGDIASSPDKCRAVGL++
BB-ErpA-BvalerpAErpAKIKNKDTNSSWIDL+

aHuman serum comes from a 33-year-old patient that was syphilis positive for VLDR (1:4), TPHA and western blot IgG test.

bHuman serum comes from a 32-year-old patient that was syphilis positive for VLDR (1:8), TPHA and western blot IgG test.

cHuman serum comes from a 21-year-old patient that was syphilis positive for VLDR (1:128), TPHA and western blot IgG and IgM test.

dHuman serum comes from a 12-year-old patient with Lyme disease positive for ELISA IgG and western blot IgM and IgG tests.

e+, signal above threshold; for each peptide, an average of the three lowest values (out of 9) plus 5 standard deviations was used as a threshold.

Serological reactivity of a patient's syphilis and Lyme disease sera recognizing synthetic peptides corresponding to protein regions containing positively selected amino acid residues.

Peptides were designed to cover protein regions containing several positively selected amino acid positions. aHuman serum comes from a 33-year-old patient that was syphilis positive for VLDR (1:4), TPHA and western blot IgG test. bHuman serum comes from a 32-year-old patient that was syphilis positive for VLDR (1:8), TPHA and western blot IgG test. cHuman serum comes from a 21-year-old patient that was syphilis positive for VLDR (1:128), TPHA and western blot IgG and IgM test. dHuman serum comes from a 12-year-old patient with Lyme disease positive for ELISA IgG and western blot IgM and IgG tests. e+, signal above threshold; for each peptide, an average of the three lowest values (out of 9) plus 5 standard deviations was used as a threshold.

Discussion

In this study, 22 genes showing traces of positive selection were identified among the TPA, TPE, and TEN genomes. Within this group of genes, recombination was previously reported in 14 genes. Genes with previously detected recombination events were often found to contain positively selected amino acid residues both among the recombinant and the non-recombinant sequences, which indicates that both recombination and positive selection are different mechanisms of treponemal adaptive molecular evolution. Adaptive evolution is common to many bacterial pathogens and can usually be found in genes important to the interaction between the host and the pathogen, i.e., where new protein variants are of selective advantage for the survival of the pathogenic strain. The immune pressure from the host favors, in microbial genes encoding proteins exposed on the surface of the pathogen, emerging genetic variants, which due to immune evasion, get positively selected. In Escherichia coli, positive selection is limited to a few dozen genes [3] while in several other genomes, including Streptococcus [1] and Campylobacter [2], traces of positive selection have been found in more than half of the core genome. The extent to which adaptive evolution of different bacterial pathogens differs depends on several bacterial features including bacterial mutation rate, frequency of genetic recombination and horizontal gene transfer, and genome size. Genetic recombination occurs more frequently in Neisseria [8] and Helicobacter [5] compared to several bacterial pathogens such as Escherichia [3], Salmonella [4], and Listeria [6]. Moreover, compared to E. coli, Helicobacter pylori has about a 100-times higher mutation rate due to the lack of a highly efficient DNA repair system [72]. Treponemes related to T. pallidum represent bacterial pathogens with small genomes, with an extreme paucity of outer membrane proteins [73], high genetic similarity, and a relatively low mutation rate [28]. Moreover, there are no known mechanisms of horizontal gene transfer in syphilis, bejel, and yaws treponemes [18]. These features of pathogenic treponemes are consistent with a relatively small number of positively selected genetic loci, which consists of just 22 genes (2.1% of all protein-encoding genes). Moreover, this situation also partly reflects the fact that the number of determined treponemal genomes is quite low due to the difficulties in long-term cultivation of treponemes [16]; the sequenced genomes currently available do not reveal the entire genetic variability present among human pathogenic treponemes. A recently developed MLST typing of TPA treponemes [74,75] revealed a number of genetic variants of the TP0705 gene and the vast majority of these variants resulted in amino acid replacements, which suggests positive evolution at this locus. This locus was not identified in this study because of the limited variability present in the currently available genome sequences. It is therefore expected that the list of positively selected/recombinant treponemal loci will grow larger as the number of additional genomes accumulate. Among the genes identified in this study, a substantial number of genes were shown to have evidence of recombination event. Several recent studies revealed that genetic recombination in pathogenic treponemes is not only limited to intra-genomic homologous recombination and gene conversion events involving rDNA loci, tpr genes and their paralogs, and the TP0856 and TP0858 genes [36,38,68]. In addition to intra-genome recombinations, two genes, TP0326 and TP0488, in the TPA Mexico A genome, were found as a result of recombination with exogenous DNA, likely as a result of DNA uptake and chromosome incorporation during coinfection with a different treponemal subspecies [25]. Similar recombinations were detected in a TPA isolate from South Africa [66]. In the work of Grange et al. [76], TEN strain 11qj within the TP0548 locus and a nucleotide sequence almost identical to TPE strains [67,77,78] indicating that the TP0548 locus in other TEN strains is also a result of an interstrain recombination event. In the genome of TPA Sea84-1, the TP0621 locus revealed sequences identical to TPE [23]. Moreover, whole-genome sequencing of TEN Bosnia A revealed several genomic loci similar to TPA strains [20]. All these findings suggest that genetic recombinations of exogenous DNA into the chromosomal loci of treponemes are rare but detectable events. In addition, these findings suggest that the corresponding recombination could be of selective advantage during host infection given the fact that the recombination of foreign DNA without available horizontal gene transfer mechanisms is quite infrequent. In the work of Arora et al. [30], the authors find the genes with predicted putative recombinant regions (e.g., TP0136, TP0462, TP0548, TP0733, TP0894-898) using a phylogenetic incongruence method, ClonalFrameML, and Gubbins, overlapping with genes identified in our study. However, several predicted recombinant genes comprising TP0179, TP0313, TP0315, TP0967, TP0968 [30], were not identified in this study reflecting the fact that not all recombination events result in detectable positive selection signal. Out of the 22 genes showing adaptive evolution identified in this study, nine genes (40.9%) were identified in the set of 71 genes showing intra-strain heterogeneity (reviewed in Šmajs et al. [18]). Since most of the observed intra-strain heterogeneity resulted in non-synonymous amino acid changes, identified intra-strain heterogeneity should be considered as ongoing adaptive evolution and, as such, it is not surprising that both sets of positively selected/recombinant genes and genes showing intra-strain heterogeneity overlapped considerably. Functionally, the genes showing adaptive evolution identified in this study were the tpr genes (tprC, D, F, G, I, J, L), outer membrane proteins (TP0133, TP0136, TP0515, TP0548, TP0733, TP0859, TP0865) and genes encoding outer membrane biogenesis protein (BamA), lipoproteins (TP0462, TP0856, TP0858), methyl-accepting chemotaxis protein (TP0488), and three other proteins (TP0126a, TP0314, TP0619). The driving force for adaptive evolution at these loci thus appears to be the host immune response. The observed reactivity of syphilitic sera with some peptides derived from protein sequences evolving under positive evolution supports this prediction. This study has several limitations. First, the number of analyzed genomes was quite limited, especially regarding the TEN genomes. In addition, in recently published draft genomes of TPA, the candidate genes are often only partially sequenced. With a much larger set of treponemal sequences, one can expect a higher number of positively selected genes. The second major limitation was the disproportionality in the number of analyzed genes per species/subspecies, which reflects genome availability. Another limitation is related to the fact that the analyzed complete genomes were obtained from treponemal strains propagated in rabbits and could therefore reflect adaptation of treponemes to this host. However, the analysis of draft genome sequences in this study obtained directly from clinical material suggests that the observed traces of positive selections are present also during infection of humans. Moreover, the identified positively selected positions may represent recent mutations that were not yet removed by negative selection. In this study, a detailed analysis of traces of positive selection in 3 T. pallidum subspecies including ssp. pallidum (TPA), ssp. pertenue (TPE), and ssp. endemicum (TEN) enabled us to classify most of the identified positively selected genes to a particular subspecies when analyses were performed separately within strains and isolates of the same subspecies or when a PAML branch-site model was used to identify lineages with positively selected loci. The majority of positively selected genes were identified within the TPA and TPE genomes, likely as a result of the highest number of available sequences for these subspecies. However, TPA sets of positively selected genes differed from TPE genes. Among TPA, members of the paralogous tpr family (tprCGJ) and the TP0136 paralogous gene family (TP0136, TP0462) prevailed, while among TPE, a paralogous gene family containing TP0856, TP0858, TP0859, TP0865, showed adaptive evolution. Interestingly, the genes belonging to the latter family (TP0548, TP0856, TP0858, TP0859) were found to be recombinant among TEN genomes. These findings suggest that genomic loci showing signs of adaptive evolution could differ between TPA and TPE/TEN strains/isolates. This finding supports the observed and consistent genetic differences between treponemal subspecies TPA, TPE, and TEN, and shows that the ways TPA and TPE strains interact with a host during infection is different. Although some authors suggest that the subspecies classification is a case of opportunity and not the consequence of genetic and biological differences [79,80], our findings support the latter explanation.

A list of positively selected protein positions identified by PAML software using site and branch-site models and PAML analysis within treponemal subspecies.

(XLSX) Click here for additional data file.

A set of 54 genes representing a total length of 86.8 kbp (7.6% of the genome length) that was removed from the whole genome sequences.

(XLSX) Click here for additional data file.
  74 in total

1.  Treponema pallidum subsp. pallidum has a single, circular chromosome with a size of approximately 900 kilobase pairs.

Authors:  E M Walker; J K Arnett; J D Heath; S J Norris
Journal:  Infect Immun       Date:  1991-07       Impact factor: 3.441

2.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors:  Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

3.  A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.

Authors:  M Kimura
Journal:  J Mol Evol       Date:  1980-12       Impact factor: 2.395

4.  When is syphilis not syphilis? Or is it?

Authors:  Sheila A Lukehart; Lorenzo Giacani
Journal:  Sex Transm Dis       Date:  2014-09       Impact factor: 2.830

5.  Treponema infection associated with genital ulceration in wild baboons.

Authors:  S Knauf; E K Batamuzi; T Mlengeya; M Kilewo; I A V Lejora; M Nordhoff; B Ehlers; K N Harper; R Fyumagwa; R Hoare; K Failing; A Wehrend; F J Kaup; F H Leendertz; K Mätz-Rensing
Journal:  Vet Pathol       Date:  2011-03-16       Impact factor: 2.221

6.  Multilocus Sequence Typing of Treponema pallidum subsp. pallidum in Cuba From 2012 to 2017.

Authors:  Linda Grillová; Angel A Noda; Reto Lienhard; Orestes Blanco; Islay Rodríguez; David Šmajs
Journal:  J Infect Dis       Date:  2019-03-15       Impact factor: 5.226

7.  Identification of Treponema pallidum in amniotic fluid and fetal blood from pregnancies complicated by congenital syphilis.

Authors:  G D Wendel; P J Sánchez; M T Peters; T W Harstad; L L Potter; M V Norgard
Journal:  Obstet Gynecol       Date:  1991-11       Impact factor: 7.661

8.  Is there a difference between hare syphilis and rabbit syphilis? Cross infection experiments between rabbits and hares.

Authors:  Johannes T Lumeij; Lenka Mikalová; David Smajs
Journal:  Vet Microbiol       Date:  2013-02-11       Impact factor: 3.293

9.  Complete genome sequence of Treponema pallidum, the syphilis spirochete.

Authors:  C M Fraser; S J Norris; G M Weinstock; O White; G G Sutton; R Dodson; M Gwinn; E K Hickey; R Clayton; K A Ketchum; E Sodergren; J M Hardham; M P McLeod; S Salzberg; J Peterson; H Khalak; D Richardson; J K Howell; M Chidambaram; T Utterback; L McDonald; P Artiach; C Bowman; M D Cotton; C Fujii; S Garland; B Hatch; K Horst; K Roberts; M Sandusky; J Weidman; H O Smith; J C Venter
Journal:  Science       Date:  1998-07-17       Impact factor: 47.728

10.  Complete genome sequence of Treponema pallidum ssp. pallidum strain SS14 determined with oligonucleotide arrays.

Authors:  Petra Matejková; Michal Strouhal; David Smajs; Steven J Norris; Timothy Palzkill; Joseph F Petrosino; Erica Sodergren; Jason E Norton; Jaz Singh; Todd A Richmond; Michael N Molla; Thomas J Albert; George M Weinstock
Journal:  BMC Microbiol       Date:  2008-05-15       Impact factor: 3.605

View more
  3 in total

1.  Comparative genomics and full-length Tprk profiling of Treponema pallidum subsp. pallidum reinfection.

Authors:  Amin Addetia; Lauren C Tantalo; Michelle J Lin; Hong Xie; Meei-Li Huang; Christina M Marra; Alexander L Greninger
Journal:  PLoS Negl Trop Dis       Date:  2020-04-06

2.  The pan-genome of Treponema pallidum reveals differences in genome plasticity between subspecies related to venereal and non-venereal syphilis.

Authors:  Arun Kumar Jaiswal; Sandeep Tiwari; Syed Babar Jamal; Letícia de Castro Oliveira; Leandro Gomes Alves; Vasco Azevedo; Preetam Ghosh; Carlo Jose Freira Oliveira; Siomar C Soares
Journal:  BMC Genomics       Date:  2020-01-10       Impact factor: 3.969

3.  Evolutionary Processes in the Emergence and Recent Spread of the Syphilis Agent, Treponema pallidum.

Authors:  Marta Pla-Díaz; Leonor Sánchez-Busó; Lorenzo Giacani; David Šmajs; Philipp P Bosshard; Homayoun C Bagheri; Verena J Schuenemann; Kay Nieselt; Natasha Arora; Fernando González-Candelas
Journal:  Mol Biol Evol       Date:  2022-01-07       Impact factor: 16.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.