| Literature DB >> 32155798 |
Charlotte A Jones1, James Hadfield2, Nicholas R Thomson3, David W Cleary1, Peter Marsh4, Ian N Clarke1, Colette E O'Neill1.
Abstract
Chlamydia trachomatis is an obligate intracellular pathogen of humans, causing both the sexually transmitted infection, chlamydia, and the most common cause of infectious blindness, trachoma. The majority of sequenced C. trachomatis clinical isolates carry a 7.5-Kb plasmid, and it is becoming increasingly evident that this is a key determinant of pathogenicity. The discovery of the Swedish New Variant and the more recent Finnish variant highlight the importance of understanding the natural extent of variation in the plasmid. In this study we analysed 524 plasmid sequences from publicly available whole-genome sequence data. Single nucleotide polymorphisms (SNP) in each of the eight coding sequences (CDS) were identified and analysed. There were 224 base positions out of a total 7550 bp that carried a SNP, which equates to a SNP rate of 2.97%, nearly three times what was previously calculated. After normalising for CDS size, CDS8 had the highest SNP rate at 3.97% (i.e., number of SNPs per total number of nucleotides), whilst CDS6 had the lowest at 1.94%. CDS5 had the highest total number of SNPs across the 524 sequences analysed (2267 SNPs), whereas CDS6 had the least SNPs with only 85 SNPs. Calculation of the genetic distances identified CDS6 as the least variable gene at the nucleotide level (d = 0.001), and CDS5 as the most variable (d = 0.007); however, at the amino acid level CDS2 was the least variable (d = 0.001), whilst CDS5 remained the most variable (d = 0.013). This study describes the largest in-depth analysis of the C. trachomatis plasmid to date, through the analysis of plasmid sequence data mined from whole genome sequences spanning 50 years and from a worldwide distribution, providing insights into the nature and extent of existing variation within the plasmid as well as guidance for the design of future diagnostic assays. This is crucial at a time when single-target diagnostic assays are failing to detect natural mutants, putting those infected at risk of a serious long-term and life-changing illness.Entities:
Keywords: Chlamydia trachomatis; diagnostics; evolution; genetic variation; plasmid; sequencing
Year: 2020 PMID: 32155798 PMCID: PMC7143637 DOI: 10.3390/microorganisms8030373
Source DB: PubMed Journal: Microorganisms ISSN: 2076-2607
Functions of each plasmid CDS and RNA sequence.
| CDS (Pgp) | Function of Encoded Protein | Summary of Current Knowledge and References |
|---|---|---|
|
| Plasmid Replication | Homologue of integrase, part of the family of phage proteins [ |
|
| Plasmid Replication | Homologue of recombinase, part of the family of phage proteins; role in regulation of plasmid replication [ |
|
| Plasmid Replication | Homology observed with DnaB helicase proteins of |
|
| Function unknown | Required for plasmid maintenance [ |
|
| Virulence protein | 28 kDa protein [ |
|
| Transcriptional regulation | Role in ability of |
|
| Regulation of partitioning and copy number | Partial homology to |
|
| Regulation of partitioning and copy number | Thought to function in conjunction with pCDS7 in a similar manner to that of the |
Number of SNP loci, rates, and characteristics of all SNPs in dataset. CDS, coding sequence. SNP rate is the number of intragenic SNP loci (base positions containing at least one SNP in any of the 524 sequences in the dataset), compared to the length of the CDS, expressed as a percentage. Total SNPs is the total number of SNPs across the dataset for all variable sites. Average number of SNPs per locus is the average number of SNPs per variable site across the CDS in question. Nonsynonymous SNPs per CDS are presented, along with the % of total SNPs. Nonsynonymous (N-S) SNPs involving a change of amino acid characteristics also shown as a percentage of the total number of nonsynonymous SNPs.
| Length (bp) | Number of Intragenic SNP Loci | SNP loci Rate (%) | Total SNPs | Average Number of SNPs per Locus | Non Synonymous (NS) SNPs (%) | NS SNPs Involving a Change of Amino Acid Characteristics (%) | |
|---|---|---|---|---|---|---|---|
|
| |||||||
| 1 | 918 | 30 | 3.27 | 2012 | 67.06 | 762 (37.9) | 436 (57.2) |
| 2 | 993 | 26 | 2.62 | 1734 | 66.69 | 364 (21) | 237 (65.1) |
| 3 | 1356 | 32 | 2.36 | 2116 | 66.13 | 575 (27.2) | 182 (31.7) |
| 4 | 1065 | 27 | 2.54 | 1435 | 53.14 | 518 (36.1) | 339 (65.4) |
| 5 | 795 | 28 | 3.52 | 2267 | 80.96 | 1335 (58.9) | 850 (63.4) |
| 6 | 309 | 6 | 1.94 | 85 | 14.16 | 55 (64.7) | 53 (96.4) |
| 7 | 825 | 21 | 2.55 | 1072 | 51.05 | 928 (86.6) | 722 (77.8) |
| 8 | 744 | 29 | 3.90 | 1126 | 38.83 | 450 (40) | 252 (56) |
|
| 7005 | 199 | 2.84 | 11,847 | - | 4987 (42.1) | 3070 (61.6) |
The genetic distance d represents estimates of average evolutionary divergence over all sequence pairs for each coding sequence in isolation. The genetic distance of the entire plasmid is also shown for the nucleotide alignments, both coding and noncoding regions; amino acid comparisons cannot be made as not all nucleotides are in coding sequences. SE = standard error (1000 replicates). All numbers were rounded to 3 decimal places.
| Nucleotide Sequences | Amino Acid Sequences | |||
|---|---|---|---|---|
|
| d | SE | d | SE |
| 1 | 0.004 | 0.001 | 0.003 | 0.002 |
| 2 | 0.004 | 0.001 | 0.001 | 0.001 |
| 3 | 0.003 | 0.001 | 0.003 | 0.001 |
| 4 | 0.004 | 0.001 | 0.003 | 0.002 |
| 5 | 0.007 | 0.002 | 0.013 | 0.004 |
| 6 | 0.001 | 0.001 | 0.002 | 0.002 |
| 7 | 0.003 | 0.001 | 0.008 | 0.003 |
| 8 | 0.004 | 0.001 | 0.005 | 0.002 |
| Plasmid | 0.004 | 0.000 | N/A | N/A |
SNP loci and total number of SNPs in relation to position within the codon. Percentage of bases at that position are shown in parentheses.
| Number of SNP Loci (%) | Total Number of SNPs (%) | |||||||
|---|---|---|---|---|---|---|---|---|
| CDS | Base 1 | Base 2 | Base 3 | Total | Base 1 | Base 2 | Base 3 | Total |
|
| 16 (53.3) | 4 (13.3) | 10 (33.3) | 30 | 470 (23.4) | 292 (14.5) | 1250 (62.1) | 2012 |
|
| 4 (15.4) | 4 (15.4) | 18 (69.2) | 26 | 129 (7.4) | 39 (2.2) | 1566 (90.3) | 1734 |
|
| 10 (31.25) | 6 (18.75) | 16 (50) | 32 | 164 (7.8) | 408 (19.3) | 1544 (73) | 2116 |
|
| 6 (22.2) | 1 (3.7) | 20 (70.1) | 27 | 426 (29.7) | 1 (0.1) | 1008 (70.2) | 1435 |
|
| 9 (32.1) | 8 (28.6) | 11 (39.3) | 28 | 594 (26.2) | 633 (27.9) | 1040 (45.9) | 2267 |
|
| 2 (33.3) | 1 (16.7) | 3 (50) | 6 | 2 (2.4) | 1 (1.2) | 82 (96.5) | 85 |
|
| 5 (23.8) | 7 (33.3) | 9 (42.8) | 21 | 300 (28) | 626 (58.4) | 146 (13.6) | 1072 |
|
| 4 (13.8) | 8 (27.6) | 17 (58.6) | 29 | 114 (10.1) | 196 (17.4) | 816 (72.5) | 1126 |
|
| 56 (28.1) | 39 (19.6) | 104 (52.3) | 199 | 2199 (18.6) | 2196 (18.5) | 7452 (62.9) | 11,847 |
Figure 1Number of sequences carrying each of the 199 SNP loci identified. Blue dots indicated bi-allelic SNP locations, red dots indicate tri-allelic SNP locations, and green dots are intergenic SNPs.
Details of premature and delayed stop codons collated from whole dataset.
| Open Reading Frame | Base | Number of Sequences with SNP | Reference Code | SNP Change | Pre-AA | Post-AA | Type of Stop Codon (Premature or Delayed) | Change to Size of CDS |
|---|---|---|---|---|---|---|---|---|
| 1 | 1080 | 7 | TGA | GGA | STOP | G | Delayed | +3 codons |
| 4 | 4667 | 33 | GAA | TAA | E | STOP | Premature | −4 codons |
| 4 | 4679 | 2 | TAA | CAA | STOP | Q | Delayed | +1 codon |
Figure 2Rarefaction of C. trachomatis plasmid nucleotide sequences. Aligned sequences were clustered using cd-hit-est at 100% nucleotide identity. The diversity of each lineage, as measured by cluster discovery versus sampling effort (rarefaction), is shown for each plasmid lineage (A) and all plasmid sequences (inset (B)). No asymptotes were observed indicating sampling effort had not captured all the diversity within the C. trachomatis plasmid populations. Lineages A and C were the least diverse (L3 being represented by only one sequence), with D and J the most diverse.
Number of repeats in the 22 bp repeat region per genotype according to the automated alignment process. Imperfect repeats (any variation of a perfect repeat) are abbreviated to “imp.” in the table.
| Number of Repeats | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Geno-Type | Four Repeats | Three + imp. | Three Repeats | Two + imp. | Two Repeats | One + imp. | One Repeat | None + imp. | No Repeats | Total |
|
| 0 | 45 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 47 |
|
| 13 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 |
|
| 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
|
| 5 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 |
|
| 32 | 11 | 5 | 1 | 1 | 6 | 0 | 0 | 0 | 56 |
|
| 66 | 58 | 0 | 1 | 0 | 16 | 0 | 0 | 2 | 143 |
|
| 27 | 20 | 3 | 1 | 0 | 9 | 0 | 0 | 0 | 60 |
|
| 33 | 10 | 5 | 2 | 1 | 4 | 0 | 0 | 0 | 55 |
|
| 16 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 18 |
|
| 7 | 3 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 12 |
|
| 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 2 |
|
| 8 | 7 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 18 |
|
| 19 | 8 | 2 | 2 | 2 | 3 | 0 | 0 | 0 | 36 |
|
| 12 | 5 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 20 |
|
| 7 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 9 |
|
| 12 | 2 | 0 | 1 | 0 | 9 | 0 | 1 | 0 | 25 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
|
| 257 | 177 | 16 | 8 | 5 | 57 | 0 | 2 | 2 | 524 |