| Literature DB >> 26436423 |
Darina Čejková1, Michal Strouhal2, Steven J Norris3, George M Weinstock4, David Šmajs2.
Abstract
BACKGROUND: Pathogenic uncultivable treponemes comprise human and animal pathogens including agents of syphilis, yaws, bejel, pinta, and venereal spirochetosis in rabbits and hares. A set of 10 treponemal genome sequences including those of 4 Treponema pallidum ssp. pallidum (TPA) strains (Nichols, DAL-1, Mexico A, SS14), 4 T. p. ssp. pertenue (TPE) strains (CDC-2, Gauthier, Samoa D, Fribourg-Blanc), 1 T. p. ssp. endemicum (TEN) strain (Bosnia A) and one strain (Cuniculi A) of Treponema paraluisleporidarum ecovar Cuniculus (TPLC) were examined with respect to the presence of nucleotide intrastrain heterogeneous sites. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2015 PMID: 26436423 PMCID: PMC4593590 DOI: 10.1371/journal.pntd.0004110
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Treponemal genomes used in this study.
| Genome | Place and year of isolation | Reference | GenBank Accession number, SRA Accession number (Genome reference) |
|---|---|---|---|
| Average coverage (Illumina/454), average Illumina read length (bp), estimated Illumina error rate from BWA | |||
| TPA Nichols | Washington, D.C., USA; 1912 | [ | CP004010.2, SRX012305 [ |
| TPA DAL-1 | Dallas, USA; 1991 | [ | CP003115.1, SRX012302 [ |
| TPA SS14 | Atlanta, USA; 1977 | [ | CP004011.1, SRX012306 [ |
| TPA Mexico A | Mexico City, Mexico; 1953 | [ | CP003064.1, SRX012304 [ |
| TPE CDC-2 | Akorabo, Ghana; 1980 | [ | CP002375.1, SRX012301 [ |
| TPE Gauthier | Brazzaville, Congo; 1960 | [ | CP002376.1, SRX104412 [ |
| TPE Samoa D | Apia, Samoa; 1953 | [ | CP002374.1, SRX012307 [ |
| TPE Fribourg-Blanc | Guinea; 1966 | [ | CP003902.1, SRX104411 [ |
| TEN Bosnia A | Bosnia; 1950 | [ | CP007548, SRX144510, SRX144511, SRX144514, SRX144515 [ |
| TPLC Cuniculi A | unknown; before 1957 | [ | CP002103.1, SRX012308 [ |
aerror rate per nucleotide was estimated using the Borrows-Wheeler Aligner (BWA) [55],[56]
Fig 1Data analysis workflow.
(A) An automated identification pipeline and optimization process. (B) An application of further restrictions and verification of identified putative candidates.
Summary of the intrastrain variable sites identified within Illumina sequencing reads in investigated treponemal genomes.
|
| Genome sequence | Verified by 454 or Sanger sequencing | Major/minor allele | Gene/Genome position | Amino acid change | Protein function/Functional group | Cell localization |
|---|---|---|---|---|---|---|---|
|
| T | 454 | T/C | TPANIC_0006/7179 | *56S; read through stop codon | Hypothetical protein/Unknown | cytoplasm |
|
| T | 454 | T/C | TPANIC_0051/59894 | S104P | PrfA/Translation | cytoplasm |
| A | 454 | A/C | TPANIC_0222/228259 | E46D; conservative | Hypothetical protein/Unknown | unknown | |
| G | Sanger | G/A | TPANIC_0471/500905 | D357N | Hypothetical protein/Unknown | cytoplasmic membrane | |
| T | 454 | G/T | upstream of TPANIC_0584/635418 | n/a | n/a | n/a | |
|
| C | 454 | C/T | TPADAL_0065/71972 | R70W | SAM dependent methyltransferase/General metabolism | cytoplasm |
|
| G | Sanger | G/A | TPADAL_0720/789942 | A155V; conservative | CheC-FliY/Motility, Chemotaxis | cytoplasm, flagellar |
| T | 454 | T/C | TPADAL_0720/790038 | N123S | CheC-FliY/Motility, Chemotaxis | cytoplasm, flagellar | |
| T | 454 | T/G | TPADAL_0897/976768 | K338Q | TprK/Virulence | periplasm [ | |
|
| G | 454 | G/C | TPASS_20117/135108 | N533K | TprC/Virulence | outer membrane [ |
|
| A | 454 | A/G | TPASS_20117/135261 | Y483H | TprC/Virulence | outer membrane [ |
| T | 454 | C/T | TPASS_20341/364888 | L64P | MurC/Cell structure | cytoplasm | |
| A | Sanger | A/C | TPASS_20394/420117 | H107P | TopA/DNA metabolism | cytoplasm | |
| T | 454 | T/C | TPASS_20402/428628 | L134P | FliI/Motility | cytoplasm | |
| G | 454 | G/T | TPASS_20402/428930 | A235S | FliI/Motility | cytoplasm | |
| G | 454 | G/A | TPASS_21029/1125352 | D12D; synonymous | Hypothetical protein/Unknown | cytoplasm | |
|
| C | 454 | C/T | TPESAMD_0134/155544 | C284Y | Hypothetical protein/Unknown | unknown |
|
| |||||||
|
| C | 454 | C/G | TENDBA_0314/331578 | E215Q | Hypothetical protein/Unknown | unknown |
|
| A | 454 | A/T | TENDBA_0314/331618 | H201Q | Hypothetical protein/Unknown | unknown |
| A | 454 | A/G | TENDBA_0316/333355 | V240A; conservative | chimeric TprGI | unknown | |
| C | 454 | C/T | TENDBA_0621/672156 | T104T; synonymous | TprI/Virulence | unknown | |
| S | 454 | C/G | TENDBA_0897/974407 | E347Q | TprK/Virulence | periplasm [ | |
| TCCTCCCCC | 454 | 9 bp indel | TENDBA_0967/1049918-1049951 | n/a | Hypothetical protein/Unknown | unknown |
Illumina-identified intrastrain variable sites were verified using 454 or Sanger sequencing.
ano intrastrain heterogeneous site were identified in the TPA Mexico A, TPE CDC-2, TPE Gauthier, TPE Fribourg-Blanc and TPLC Cuniculi A genomes
bnonconservative amino acid replacements are not listed
cif not indicated, localization was predicted by PSORTb
dnot applicable
e[20],[23]
fvariable number of direct repeat (TCCTCCCCC)
Fig 2A schematic representation of the identified heterogeneous positions in all investigated genomes.
The proportion of alternative alleles is based on nucleotide frequency within individual Illumina reads. While red cells represent identified sites of intrastrain heterogeneity, grey cells represent sites of intrastrain homogeneity. The numbers within cells indicate the number of alternative/standard reads in the sites where the number of alternative reads exceeded 10% but were lower than 20% and therefore remained below the threshold used in this study. Blue cells show nucleotide positions omitted from analysis due to excluded paralogous sequences (S2 Table). For the Bosnia A strain, the intrastrain heterogeneous sites TENDBA_0314/331578, TENDBA_0314/331618, TENDBA_0317/333355 and TENDBA_0621/672156 are not shown because in all other genomes these positions were excluded from analysis due to paralogous sequences. Note that the TPADAL_0897/976678 and TENDBA_0897/974407 positions are the same.
Selected intrastrain heterogeneous sites identified in TPA SS14, examined in four different DNA preparations.
| Bact erial stock no. | DNA preparation no. | G/C | A/G | T/C | T/C | G/T | T/C |
|---|---|---|---|---|---|---|---|
| TPASS_20117/135108 | TPASS_20117/135261 | TPASS_20341/364888 | TPASS_20402/428628 | TPASS_20402/428930 | TPASS_20971/1056002 | ||
|
| 4933 | G/C (0.0–0.1) | A/G (0.0–0.2) | T/C (0.5–0.6) | T (0.0) | T (1.0) | T/C (0.5–0.6) |
| 4950 | G (0.0) | A (0.0) | T/C (0.5–0.6) | T (0.0) | T (1.0) | T/C (0.7) | |
|
| 4934 | G/C (0.3–0.4) | A/G (0.4–0.6) | T/C (0.7) | T/C (0.2–0.3) | G/T (0.4–0.7) | T/C (0.3) |
| 4951 | G/C (0.3–0.4) | A/G (0.4–0.5) | T/C (0.5) | T/C (0.3–0.4) | G/T (0.3–0.6) | T/C (0.1) |
DNA preparations originated from two different rabbit passages. Relative proportions of alleles not stated in the reference genome are shown in parentheses as derived from repeated Sanger sequencing.
athe first nucleotide corresponds to the sequence published in the SS14 genome sequence CP004011.1 [16]
bDNA preparations 4950 and 4951 were used for whole genome sequencing of the TPA SS14 strain by Matějková et al. [14]; preparation 4951 was used for re-sequencing of this strain [16]
cheterogeneous positions identified in this study (Table 2)
dheterogeneous positions identified by Matějková et al. [14]
Comparison of heterogeneous positions identified in TPA SS14 strain by Matějková et al. [14] and by the automated pipeline used in this study.
| Gene | Genome position in the SS14 genome CP000805.1 (CP004011.1) | Heterogeneity identified by Matějková et al. [ | Nucleotide frequency identified in this study | Heterogeneity detected in Illumina reads |
|---|---|---|---|---|
| TPASS_20117 | 135098 (135108) | G or C (5/6) | G or C (32/12) | yes |
| 135107 (135117) | T or C (3/4) | T or C (50/1) | Yes | |
| 135235 (135245) | G or A (2/10) | A (46) | no | |
| 135239 (135249) | C or T (2/10) | T (49) | no | |
| 135251 (135261) | A or G (6/6) | A or G (41/11) | yes | |
| TPASS_20402 | 427435 (428628) | C or T (NA) | C or T (15/21) | yes |
| 427737 (428930) | G or T (NA) | G or T (25/14) | yes | |
| TPASS_20620 | 671746 (673228) | T or C (9/3) | T (23) | no |
| 671751 (673233) | T or G (19/10) | T (22) | no (but detected by 454) | |
| 671753 (673235) | T or C (19/10) | T (22) | no (but detected by 454) | |
| 671763 (673245) | C or T (8/4) | C or T (24/5) | yes | |
| 672286 (673768) | G or A (4/12) | A (29) | no | |
| Upstream of TPASS_20620 | 672916–7 (674399–674400) | (-) or C (6/6) | (-) or C (7/5) | yes |
| 672944 (674427) | A or G (14/6) | A (14) | no | |
| TPASS_20621 | 673425 (674908) | C or T (2/8) | T (44) | no |
| 673428 (674911) | A or G (2/8) | G (44) | no | |
| TPASS_20971 | 1054447 (1056002) | T or C (NA) | T or C (35/3) | yes |
| TPASS_21029 | 1123796 (1125352) | G or A (5/6) | G or A (24/18) | yes |
aadditional intrastrain heterogeneous genome positions identified by Matějková et al. [14] including 135141, 135144, 135149, 135220, 135227, 671982, 672004, 672016, 672025, 672026, 672027, 672028, 672036, 672039, 672040, 672041, 672042, 672043, 672044, 672154, 673088, 673119, 673511, 673545, 673550, and 673554 (according to the CP000805.1) were located in paralogous regions and therefore were excluded from the automated pipeline (S2 Table)
bnumbers in parentheses show numbers of sequenced clones [14] or nucleotide frequency within individual Illumina sequence reads (this study); NA—not available
cnot present in Table 2; heterogeneous positions were detected in raw Illumina sequencing reads but were excluded due to study criteria
d these heterogeneous sites were not found among Illumina reads, but were identified among 454 reads (SRX000109)
esee also Table 3; independent DNA preparations showed clear differences in proportions of alternative alleles, ranging from 0.1 to 0.7