| Literature DB >> 25378473 |
Rita Ferreira1, Minia Antelo1, Alexandra Nunes2, Vítor Borges2, Vera Damião1, Maria José Borrego1, João Paulo Gomes3.
Abstract
Microbes possess a multiplicity of virulence factors that confer them the ability to specifically infect distinct biological niches. Contrary to what is known for other bacteria, for the obligate intracellular human pathogen Chlamydia trachomatis, the knowledge of the molecular basis underlying serovars' tissue specificity is scarce. We examined all ~900 genes to evaluate the association between individual phylogenies and cell-appetence or ecological success of C. trachomatis strains. Only ~1% of the genes presented a tree topology showing the segregation of all three disease groups (ocular, urogenital, and lymphatic) into three well-supported clades. Approximately 28% of the genes, which include the majority of the genes encoding putative type III secretion system effectors and Inc proteins, present a phylogenetic tree where only lymphogranuloma venereum strains form a clade. Similarly, an exclusive phylogenetic segregation of the most prevalent genital serovars was observed for 61 proteins. Curiously, these serovars are phylogenetically cosegregated with the lymphogranuloma venereum serovars for ~20% of the genes. Some clade-specific pseudogenes were identified (novel findings include the conserved hypothetical protein CT037 and the predicted α-hemolysin CT473), suggesting their putative expendability for the infection of particular niches. Approximately 3.5% of the genes revealed a significant overrepresentation of nonsynonymous mutations, and the majority encode proteins that directly interact with the host. Overall, this in silico scrutiny of genes whose phylogeny is congruent with clinical prevalence or tissue specificity of C. trachomatis strains may constitute an important database of putative targets for future functional studies to evaluate their biological role in chlamydial infections.Entities:
Keywords: Chlamydia trachomatis; clinical prevalence; genomics; loci; tropism
Mesh:
Substances:
Year: 2014 PMID: 25378473 PMCID: PMC4291473 DOI: 10.1534/g3.114.015354
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Top five ranking of the most polymorphic C. trachomatis chromosomal genes
| Rank | Nucleotide | Amino Acid | ||
|---|---|---|---|---|
| No. Differences | No. Differences | |||
| 1 | CT870/ | CT681/ | CT870/ | CT681/ |
| 2 | CT681/ | CT051 (0.07) | CT619 (48.4) | CT051 (0.093) |
| 3 | CT619 (124.2) | CT870/ | CT051 (46.6) | CT049 (0.08) |
| 4 | CT872/ | CT049 (0.048) | CT681/ | CT870/ |
| 5 | CT050 (104.9) | CT619 (0.047) | CT049 (38.9) | CT050 (0.058) |
The numbers in parenthesis refer to the respective number of differences and p-distance value.
Figure 1Evaluation of the association between polymorphism and dN, dS and dN/dS. (A and B) Distribution of dN/dS and p-distance values, respectively, obtained from the analyses of all the 874 genes from the 53 strains. The horizontal axis represents the C. trachomatis chromosomal positions where genes are placed in their chromosomal order, from the CT001 to the CT875 (genes names and positions according to D/UW-3/CX strain annotation). (C) 25 genes (ordered by their relative chromosomal position) that display the greater values for both analyses, which are representative of the lack of correlation between dN/dS and polymorphism. (D) Scatter plots of p-distance vs. dN, dS, and dN/dS, on a log-scale for clarity. The Pearson’s product moment correlation coefficient for p-distance vs. dN, dS, and dN/dS are R = 0.92, R = 0.9, and R = 0.02, respectively.
Figure 2Phylogenetic reconstruction of C. trachomatis species. The tree was constructed using the whole genome of 53 strains encompassing the majority of the CT681/ompA serovars. The asterisks indicate the 17 strains representative of the major tree branches (in red) that were used to evaluate the relation between species polymorphism and the number of taxa (see the section Results for details).
Figure 3Differences obtained during the analyses using 53 and 17 strains. The graphs show the differences obtained between the results of the p-distance (A) and the dN/dS (B) analyses of all the 53 and the set of 17 strains (representative of the majority of the tree branches). Each black dot represents one of the 100 polymorphic genes selected for these comparisons. P-values were calculated through the paired two-tailed t-test.
Figure 4Recombination analyses of the D(s)/2923 and D/SontonD1 strains. (A) Number of nucleotide differences (vertical axis) that exist between the genomic sequence of D(s)/2923 or D/SotonD1 and F/SW5. This polymorphism assessment was performed by using the DnaSP software, v5, with a window size and a step size of 1000 base pairs each. The smaller graph represents an enlarged view of the detected highly polymorphic region. (B) (first crossover) and (C) (second crossover) show the genes in each analyzed region (1) and also the results of the SimPlot (2), the BootScan (3), and the phylogenetic (4) analyses. Recombination breakpoints were individually analyzed because they were better mapped when a different outgroup strain was used for each one, i.e., the L3/404-LN for the first (B) and the C/TW-3 for the second (C) breakpoint. SimPlot graphs (2) show the level of similarity between the recombinant sequences and the respective parental strains (the number of informative sites supporting this relatedness are colored according to the graph legend box), whereas the BootScan graphs (3) show the phylogenetic relatedness (% of permuted trees) between those same sequences. Both analyses were obtained with a sliding window size of 200 bp and a step size of 30 bp. The sequence of the recombinant D strains was used as query. The vertical dashed black lines indicate the location of the estimated crossovers, shown in detail in Figure S1. Seventy-one informative sites support the similarity between the recombinant strain and F/SW5, whereas 76 support its similarity with D/UW-3/CX (P = 9.28 × 10−44). Forty-four informative sites support the similarity between the recombinant strain and D/UW-3/CX, whereas 25 support its similarity with F/SW5 (P = 6.65 × 10−19). In these defined regions there are no informative sites supporting the alternative hypotheses. The phylogenetic trees (4) were constructed with the nucleotide sequences adjacent to each estimated breakpoint region (NJ method; Kimura 2-parameter method; bootstrap = 1000) and support the recombination event.
Figure 5Genes that segregate strains according to their biological characteristics. The outer circle in both panels represents the genome of C. trachomatis D/UW-3/CX strain, where each bar represents a gene at its respective genomic position (light gray bars, forward strand; dark gray bars, reverse strand). (A) The tracks’ color scheme represent genes whose phylogeny segregates at least a group of strains according to their biological characteristics, i.e., each color illustrates a particular segregation (that may not be exclusive): full-tropism (purple), LGV strains (orange), strains from prevalent genital serovars (green), cosegregation of LGV and prevalent genital serovar strains (blue), genital strains (prevalent and nonprevalent serovars) (black), and ocular strains (red). (B) The tracks’ color scheme was maintained for the different groups of strains and represent genes that exclusively segregate a unique group of strains. For both panels, the outer and inner tracks of each color correspond to nucleotide and amino acid results, respectively.
Number of genes/proteins that segregate C. trachomatis strains according to distinct phenotypes
| Segregation by Phenotype | Exclusive Segregation by Phenotype | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Full-tropism | Ocular | Genital | Prevalent Genital | Prevalent Genital + LGV | LGV | Ocular | Genital | Prevalent Genital | LGV | |
| Nucleotide | 11 (1.3%) | 136 (15.6%) | 14 (1.6%) | 431 (49.3%) | 173 (19.8%) | 695 (79.5%) | 7 (0.8%) | 0 (0%) | 47 (5.4%) | 245 (28%) |
| Amino acid | 12 (1.4%) | 105 (12%) | 15 (1.7%) | 302 (34.6%) | 146 (16.7%) | 531 (60.8%) | 21 (2.4%) | 1 (0.1%) | 61 (7%) | 240 (27.5%) |
The numbers in parenthesis refer to the proportion of genes/proteins, found in each category, relative to the 874 analyzed genes/proteins. LGV, lymphogranuloma venereum.
Genes/proteins for which the phylogenetic tree differentiates at least one group of strains in a nonexclusive manner.
Genes/proteins for which the phylogenetic tree differentiates only one particular group of strains whereas the remainder are mixed.
Genes/proteins for which the phylogenetic tree shows three clades (ocular, genital, and LGV serovars).
Refers to all genital strains (prevalent plus non-prevalent serovars).
Genes/proteins for which the phylogenetic tree clusters the strains from prevalent genital and LGV serovars in the same clade.
C. trachomatis known and putative pseudogenes for a particular disease group and genes that present differences in gene length among strains from different disease groups
| Gene | Functional Category | Strains Group | Observations | Reference | |||
|---|---|---|---|---|---|---|---|
| Ocular | Nonprevalent Genital | Prevalent Genital | LGV | ||||
| CT037 | HP | = | R | Ψ | Ψ | A/2497, A/363, A/5291, and A/7249 are smaller than the nonprevalent genital serovars. | This study |
| CT052 | Coproporphyrinogen III oxidase | > | > | = | R | This study | |
| CT058 | Putative inclusion membrane protein | Ψ | = | = | R | A/Har13 and C/TW-3 are not Ψs. | |
| CT101 | Inclusion membrane protein | = | Ψ | Ψ | R | E/Bour, E/11023, and D/UW3 are not Ψs. | |
| CT105 | T3S effector | Ψ | = | = | R | ||
| CT106 | Predicted pseudouridine synthetase family | = | > | > | R | This study | |
| CT160 | HP | > | > | > | R | B/Jali20 is a Ψ, and F(s)/70 is smaller. | This study |
| CT161 | HP | = | < | < | R | B/Jali20 and E/SotonE8 are Ψs. | This study |
| CT162 | HP | < | < | < | R | E/SotonE8, F(s)/70, J/6276, Ia/SotonIa1, and Ia/SotonIa3 are Ψs. | This study |
| CT171 | Tryptophan synthase (alpha chain) | Ψ | = | = | R | B/TZ is not a Ψ. | |
| CT172 | HP | < | << | << | R | This study | |
| CT234 | Membrane transport protein from the major facilitator superfamily | = | = | < | R | This study | |
| CT300 | Putative inclusion membrane protein | R | = | = | Ψ | ||
| CT358 | HP | > | > | > | R | B/Jali20 is smaller. | |
| CT373 | HP | R | = | = | Ψ | ||
| CT374 | Arginine/ornithine antiporter | Ψ | = | = | R | ||
| CT392 | HP | > | > | > | R | This study | |
| CT441 | Tail-specific protease | < | < | = | R | Ia/SotonIa1, Ia/SotonIa3, and J/6276 have the size of the LGV and prevalent genital serovars sequences. | This study |
| CT470 | HP | = | = | > | R | This study | |
| CT473 | HP | = | = | Ψ | R | This study | |
| CT480.1 | HP | > | > | > | R | G/9301, G/9768, G/11074, J/6276, Ia/SotonIa1, and Ia/SotonIa3 are smaller than the remainder strains’ sequences. | This study |
| CT522 | S3 ribosomal protein | = | = | < | R | This study | |
| CT605 | HP | > | > | = | R | This study | |
| CT793 | HP | > | > | > | R | G/9301, G/11074, and G/9768 are Ψs. | This study |
| CT807 | Glycerol-3-P acyltransferase | < | < | = | R | This study | |
| CT809 | HP | < | < | = | R | This study | |
| CT833 | Initiation factor 3 | < | < | < | R | This study | |
| CT852 | YhgN family | < | < | < | R | This study | |
| CT868 | Membrane thiol protease (predicted) | > | > | > | R | This study | |
The differences in sequence length shown only refer to differences in termination between strains. Genes with discordant 5′ annotation, for which the correct start codon lacks confirmation, were not included. The differences in length do not contemplate indel events. LGV, lymphogranuloma venereum. Ψ, sequences considered as pseudogenes; R, the sequence whose size was used for reference purposes. LGV sequences were used by default except for LGV pseudogenes; =, gene of the same size as the reference; >, gene larger than the reference; <, gene smaller than the reference; <<, gene with the smallest size. Three sequence sizes were observed for CT172, depending on the disease group.