| Literature DB >> 22496726 |
Christopher R E McEvoy1, Ruben Cloete, Borna Müller, Anita C Schürch, Paul D van Helden, Sebastien Gagneux, Robin M Warren, Nicolaas C Gey van Pittius.
Abstract
Mycobacterium tuberculosis complex (MTBC) genomes contain 2 large gene families termed pe and ppe. The function of pe/ppe proteins remains enigmatic but studies suggest that they are secreted or cell surface associated and are involved in bacterial virulence. Previous studies have also shown that some pe/ppe genes are polymorphic, a finding that suggests involvement in antigenic variation. Using comparative sequence analysis of 18 publicly available MTBC whole genome sequences, we have performed alignments of 33 pe (excluding pe_pgrs) and 66 ppe genes in order to detect the frequency and nature of genetic variation. This work has been supplemented by whole gene sequencing of 14 pe/ppe (including 5 pe_pgrs) genes in a cohort of 40 diverse and well defined clinical isolates covering all the main lineages of the M. tuberculosis phylogenetic tree. We show that nsSNP's in pe (excluding pgrs) and ppe genes are 3.0 and 3.3 times higher than in non-pe/ppe genes respectively and that numerous other mutation types are also present at a high frequency. It has previously been shown that non-pe/ppe M. tuberculosis genes display a remarkably low level of purifying selection. Here, we also show that compared to these genes those of the pe/ppe families show a further reduction of selection pressure that suggests neutral evolution. This is inconsistent with the positive selection pressure of "classical" antigenic variation. Finally, by analyzing such a large number of genes we were able to detect large differences in mutation type and frequency between both individual genes and gene sub-families. The high variation rates and absence of selective constraints provides valuable insights into potential pe/ppe function. Since pe/ppe proteins are highly antigenic and have been studied as potential vaccine components these results should also prove informative for aspects of M. tuberculosis vaccine design.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22496726 PMCID: PMC3319526 DOI: 10.1371/journal.pone.0030593
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Phylogenetic reconstruction of the evolutionary relationships between the members of the pe and ppe protein families.
A. Phylogeny of the ppe protein family. The phylogenetic tree was constructed from the phylogenetic analysis done on the 180 aa N-terminal domains of the ppe proteins. The tree was rooted to the outgroup Rv3873 (ppe68), shown to be the first ppe insertion into the ESAT-6 (esx) gene clusters [8]. Figure reproduced from reference 8 with permission of the authors. B. Phylogeny of the pe protein family. The phylogenetic tree was constructed from the phylogenetic analysis done on the 110 aa N-terminal domains of the pe proteins. The tree was rooted to the outgroup Rv3872 (pe35), shown to be the first pe insertion into the ESAT-6 (esx) gene clusters [8]. Figure reproduced from reference 8 with permission of the authors.
Figure 2Sequence variation levels in ppe and pe genes.
A. Calculations of sequence variation in 64 ppe genes. Synonymous variations have been ignored. The Y axis shows the proportion of sequences that show variation predicted to result in amino acid changes. A value of 1 indicates that all analysed sequences were unique. Average number of genomes analysed per gene = 15.2. Genes have been grouped together according to their subfamily [8] by colour and subfamilies are also separated by dotted lines. Each vertical bar is subdivided into micromutations (nsSNP's, frameshifts, small in-frame indels) in dark shading and macromutations (homologous recombination, IS6110 integration, partial and whole gene deletions) in light shading. Ppe38 and ppe50 were not included due to hypervariability at the macromutational level [26], [30] and the difficulty in establishing a consensus sequence. For details of all variations detected see Tables S1 and S2. B. Calculations of sequence variation in 33 pe (excluding pgrs) genes. Synonymous variations have been ignored. Average number of isolates analysed per gene = 16.5. The genes from subfamily V (pgrs subfamily, yellow) are those which are classified as members of this subfamily by their N-terminal amino acid sequences [8] but that do not include the long PGRS C-terminal region. For details of all variations detected see Tables S1 and S3.
Details of 18 whole genome sequence isolates used for in silico comparative gene analysis.
| Isolate | Lineage | Reference |
| T92 | Lineage 1. PGG1, EAI family |
|
| T17 | Lineage 1. PGG1, EAI family |
|
| T46 | Lineage 1. PGG1, EAI family |
|
| EAS054 | Lineage 1. PGG1, EAI family |
|
| 94_M4241A | Lineage 2. PGG1, Beijing family |
|
| 02_1987 | Lineage 2. PGG1, Beijing family |
|
| T85 | Lineage 2. PGG1, Beijing family |
|
| C strain | Lineage 4. PGG2, low copy clade |
|
| CDC1551 | Lineage 4. PGG2, low copy clade |
|
| Haarlem | Lineage 4. PGG2, Haarlem family |
|
| F11 | Lineage 4. PGG2, LAM family |
|
| GM1503 | Lineage 4. PGG2, LAM family |
|
| KZN1435 | Lineage 4. PGG2, LAM family |
|
| 98-R604_INH-RIF-EM | Lineage 4. PGG2, LAM family |
|
| H37Rv | Lineage 4. PGG3 |
|
| CPHL_A | Lineage 5. PGG1, West Africa-1 ( |
|
| K85 | Lineage 6. PGG1, West Africa-2 ( |
|
|
| Animal lineage |
|
Each analysed genome sequence is listed along with its lineage number [78], Principal Genetic Group (PGG) [2] and family group.
Details of clinical isolates used in this study.
| Isolate | Lineage | South African IS |
| SAWC 1659 | 1, PGG1, EAI | - |
| SAWC 2493 | 1, PGG1, EAI | - |
| SAWC 4981 | 1, PGG1, EAI | - |
| SAWC 2803 | 3, PGG1, CAS | F34 |
| SAWC 2240 | 3, PGG1, CAS | F20 |
| SAWC 2666 | 3, PGG1, CAS | F33 |
| SAWC 974 | 3, PGG1, CAS | F25 |
| SAWC 2088 | 2, PGG1, Atypical Beijing | F31 |
| SAWC 2701 | 2, PGG1, Atypical Beijing | F27 |
| SAWC 2076 | 2, PGG1, Typical Beijing | F29 |
| SAWC 1430 | 4, PGG2 | F3 |
| SAWC 3656 | 4, PGG2, LAM | F26 |
| SAWC 2576 | 4, PGG2, LAM | F15 |
| SAWC 2525 | 4, PGG2, LAM | F9 |
| SAWC 1815 | 4, PGG2, LAM | F11 |
| SAWC 1733 | 4, PGG2, LAM | F13 |
| SAWC 3100 | 4, PGG2, LAM | F14 |
| SAWC 1595 | 4, PGG2, Quebec/S | F28 |
| SAWC 198 | 4, PGG2, “1 bander” | F110 |
| SAWC 2073 | 4, PGG2, LCC – “2 bander” | F120 |
| SAWC 233 | 4, PGG2, LCC – “3 bander” | F130 |
| SAWC 861 | 4, PGG2, LCC – “4 bander” | F140 |
| SAWC 1162 | 4, PGG2, LCC – “5 bander” | F150 |
| SAWC 716 | 4, PGG2, Pre-Haarlem | F19 |
| SAWC 1748 | 4, PGG2, Pre-Haarlem | F24 |
| SAWC 1127 | 4, PGG2, Haarlem-like | F6 |
| SAWC 103 | 4, PGG2, Haarlem-like | F7 |
| SAWC 386 | 4, PGG2, Haarlem | F1 |
| SAWC 1645 | 4, PGG2, Haarlem | F10 |
| SAWC 1841 | 4, PGG2, Haarlem | F4 |
| SAWC 2185 | 4, PGG2, Haarlem | F2 |
| SAWC 239 | 4, PGG3, T | F22 |
| SAWC 2901 | 4, PGG3, T | F16 |
| SAWC 1608 | 4, PGG3, T | F5 |
| SAWC 1109 | 4, PGG3, T | F23 |
| SAWC 4302 | 4, PGG3, T | F18 |
| SAWC 1956 | 4, PGG3, T | F17 |
| SAWC 1290 | 4, PGG3, T | F21 |
| SAWC 300 | 4, PGG3, T | F12 |
| SAWC 1870 | 4, PGG3, T | F8 |
Each clinical isolate along with its lineage number [78], PGG group [2], spoligotype family group status [88] and South African IS6110 lineage [84] is listed.
Details of the pe and ppe genes examined by whole gene sequencing.
| Gene | Rv number | Size in H37Rv (bp) | Sublineage | Variability in literature | Comments |
|
| Rv | 300 | I | No data. | Ancestral pe protein. Present in RD1 region. Highly immunogenic, eg |
|
| Rv | 303 | IV | Invariable | B cell responses in subgroups of patients |
|
| Rv | 1407 | V (PGRS subfamily) | No data. | Atypical sublineage V protein. Not pgrs. |
|
| Rv | 2772 | V (PGRS subfamily) | Highly variable | Upregulated in mouse model |
|
| Rv | 1374 | V (PGRS subfamily) | Known to undergo homologous recombination with | Highly upregulated during the early stages of |
|
| Rv | 1476 | V (PGRS subfamily) | Highly variable | Downregulated in mouse model |
|
| Rv | 1497 | V (PGRS subfamily) | Highly variable | Localised in cell wall |
|
| Rv | 1515 | V (PGRS subfamily) | No data. | Elicits strong antibody response |
|
| Rv | 1107 | I | No data. | Ancestral ppe protein. |
|
| Rv | 1671 | II (PPW subfamily) | No data. | PPW subfamily. |
|
| Rv | 1149 | IV (SVP subfamily) | Limited diversity. Alteration in Beijing isolates | Variable expression in clinical isolates |
|
| Rv | 1464 | V (MPTR subfamily) | No data. | Ancestral ppe MPTR protein. |
|
| Rv | 1743 | V (MPTR subfamily) | Variable in clinical isolates | Elicits a high humoral and low T cell response |
|
| Rv | 1749 | V (MPTR subfamily) | No data. | MPTR protein. |
As defined in reference [ .
Each gene sequenced in this study is listed along with its phylogenetic position within its family and any additional information regarding its protein's function available in the literature.