| Literature DB >> 28377903 |
Xinmiao Jia1, Li Yang1, Mengxing Dong1, Suting Chen2, Lingna Lv2, Dandan Cao3, Jing Fu1, Tingting Yang1, Ju Zhang3, Xiangli Zhang1, Yuanyuan Shang2, Guirong Wang2, Yongjie Sheng4, Hairong Huang2, Fei Chen5.
Abstract
Tuberculosis now exceeds HIV as the top infectious disease cause of mortality, and is caused by the Mycobacterium tuberculosis complex (MTBC). MTBC strains have highly conserved genome sequences (similarity >99%) but dramatically different phenotypes. To analyze the relationship between genotype and phenotype, we conducted the comparative genomic analysis on 12 MTBC strains representing different lineages (i.e., Mycobacterium bovis; M. bovis BCG; M. microti; M. africanum; M. tuberculosis H37Rv; M. tuberculosis H37Ra, and six M. tuberculosis clinical isolates). The analysis focused on the three aspects of pathogenicity: host association, virulence, and epitope variations. Host association analysis indicated that eight mce3 genes, two enoyl-CoA hydratases, and five PE/PPE family genes were present only in human isolates; these may have roles in host-pathogen interactions. There were 15 SNPs found on virulence factors (including five SNPs in three ESX secretion proteins) only in the Beijing strains, which might be related to their more virulent phenotype. A comparison between the virulent H37Rv and non-virulent H37Ra strains revealed three SNPs that were likely associated with the virulence attenuation of H37Ra: S219L (PhoP), A219E (MazG) and a newly identified I228M (EspK). Additionally, a comparison of animal-associated MTBC strains showed that the deletion of the first four genes (i.e., pe35, ppe68, esxB, esxA), rather than all eight genes of RD1, might play a central role in the virulence attenuation of animal isolates. Finally, by comparing epitopes among MTBC strains, we found that four epitopes were lost only in the Beijing strains; this may render them better capable of evading the human immune system, leading to enhanced virulence. Overall, our comparative genomic analysis of MTBC strains reveals the relationship between the highly conserved genotypes and the diverse phenotypes of MTBC, provides insight into pathogenic mechanisms, and facilitates the development of potential molecular targets for the prevention and treatment of tuberculosis.Entities:
Keywords: Mycobacterium tuberculosis complex (MTBC); PacBio; comparative genomics; epitope; host association; pathogenicity; tuberculosis (TB); virulence
Mesh:
Substances:
Year: 2017 PMID: 28377903 PMCID: PMC5360109 DOI: 10.3389/fcimb.2017.00088
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 5.293
Conserved genomes (genotypes) and diverse phenotypes of 12 MTBC strains.
| Modern strains | 27294/L4 | F1 | Human | 4,429,062 | 99.9845 | 4,400 | 3,834 | 110 | 253 | 2,208 | |
| 25177/L4 | F28 | 4,421,992 | 99.9893 | 4,366 | 3,837 | 92 | 252 | 2,205 | |||
| L4 | 22,115 | 4,402,103 | 99.9112 | 4,356 | 3,827 | 840 | 253 | 2,201 | |||
| L4 | 37,004 | 4,417,474 | 99.8955 | 4,375 | 3,820 | 855 | 252 | 2,189 | |||
| L4 | 22,103 | 4,399,638 | 99.8498 | 4,345 | 3,809 | 1,033 | 253 | 2,175 | |||
| L3 | 26,105 | 4,426,728 | 99.8796 | 4,393 | 3,833 | 1,504 | 251 | 2,261 | |||
| L2 | 2,242 | 4,420,756 | 99.8322 | 4,428 | 3,839 | 1,434 | 251 | 2,128 | |||
| L2 | 2,279 | 4,406,429 | 99.8436 | 4,400 | 3,839 | 1,514 | 252 | 2,181 | |||
| Ancient strains | 35711/L6 | 25 | Human (tropical Africa) | 4,388,515 | 99.7731 | 4,385 | 3,819 | 2,336 | 246 | 2,135 | |
| 19422/L8 | 12 | Voles and rodents | 4,370,890 | 99.7712 | 4,360 | 3,815 | 2,158 | 232 | 1,756 | ||
| 19210/L8 | 30 | Wide range of mammals especially cattle | 4,336,684 | 99.7680 | 4,312 | 3,804 | 2,345 | 238 | 2,137 | ||
| 35735/L8 | 26 | 4,353,641 | 99.7488 | 4,348 | 3,832 | 2,356 | 236 | 1,706 |
ANI: Average Nucleotide Identity;
Reference genome: H37Rv (.
Figure 1Pair-wise comparisons of SNPs and orthologous genes in 12 MTBC strains. The plum red region represents the SNP number, and the blue region indicates the orthologous gene number. The colors darken with increases in number.
Human-associated strain specific genes.
| – | Rv0221 | 469 | COG4908R | Diacyglycerol O-acyltransferase | RD10 |
| echA1 | Rv0222 | 262 | COG1024I | Enoyl-CoA hydratase EchA1 | RD10 |
| PE_PGRS5 | Rv0297 | 591 | – | PE-PGRS family protein PE_PGRS5 | / |
| PPE7 | Rv0354c | 141 | – | PPE family protein PPE7 | / |
| galTb | Rv0619 | 181 | COG1085G | Galactose-1-phosphate uridylyltransferase GalTb | / |
| – | Rv1503c | 182 | COG0399M | TDP-4-oxo-6-deoxy-D-glucose aminotransferase | / |
| PE_PGRS31 | Rv1768 | 618 | COG5164 | PE-PGRS family protein PE_PGRS31 | RD14 |
| yrbE3A | Rv1964 | 265 | COG0767Q | Integral membrane protein | RD7 |
| yrbE3B | Rv1965 | 271 | COG0767Q | Integral membrane protein | RD7 |
| mce3A | Rv1966 | 425 | COG1463Q | Mce family protein Mce3A | RD7 |
| mce3B | Rv1967 | 342 | COG1463Q | Mce family protein Mce3B | RD7 |
| mce3C | Rv1968 | 410 | COG1463Q | Mce family protein Mce3C | RD7 |
| mce3D | Rv1969 | 423 | COG1463Q | Mce family protein Mce3D | RD7 |
| lprM | Rv1970 | 377 | COG1463Q | Mce family lipoprotein LprM | RD7 |
| mce3F | Rv1971 | 437 | COG1463Q | Mce family protein Mce3F | RD7 |
| – | Rv1972 | 191 | – | Mce associated membrane protein | RD7 |
| – | Rv1973 | 160 | – | Mce associated membrane protein | RD7 |
| – | Rv1974 | 125 | – | Membrane protein | RD7 |
| – | Rv1975 | 221 | COG2340S | Hypothetical protein | RD7 |
| – | Rv1976c | 135 | – | Hypothetical protein | RD7 |
| – | Rv1977 | 348 | COG0501O | Hypothetical protein | RD7 |
| – | Rv2073c | 249 | COG0300R | Oxidoreductase | RD9 |
| – | Rv2074 | 137 | – | Pyridoxamine 5'-phosphate oxidase | RD9 |
| – | Rv2227 | 233 | COG3826S | Hypothetical protein | / |
| echA18 | Rv3374 | 82 | COG1024I | Enoyl-CoA hydratase | / |
| ephA | Rv3617 | 322 | COG0596R | Epoxide hydrolase EphA | RD8 |
| – | Rv3618 | 395 | COG2141C | Monooxygenase | RD8 |
| PPE65 | Rv3621c | 413 | COG5651N | PPE family protein PPE65 | RD8 |
| PE32 | Rv3622c | 99 | – | PE family protein PE32 | RD8 |
Eight mce3 family genes, two enoyl-CoA hydratases and five PE/PPE family genes are highlighted as red, green, and blue letters.
The non-synonymous mutations between Beijing strains and other MTBC strains on virulence factors.
| mce1D | Rv0172 | 563 (188) | T->C (I->T) | Mce family protein Mce1D |
| eccD3 | Rv0290 | 227 (76) | G->A (S->N) | ESX-3 secretion system protein EccD |
| eccD3 | Rv0290 | 283 (95) | G->A (A->T) | ESX-3 secretion system protein EccD |
| mce2F | Rv0594 | 1,295 (432) | A->G (N->S) | Mce family protein Mce2F |
| mmpL10 | Rv1183 | 1,222 (408) | A->G (T->A) | Transmembrane transport protein MmpL10 |
| plcC | Rv2349c | 1,081 (361) | G->T (G->C) | Phospholipase C |
| plcA | Rv2351c | 1,336 (446) | A->G (T->A) | Membrane-associated phospholipase A |
| mbtB | Rv2383c | 2,020 (674) | G->C (V->L) | Phenyloxazoline synthase |
| ppsA | Rv2931 | 3,581 (1194) | T->G (L->R) | Phthiocerol synthesis polyketide synthase type I PpsA |
| Mas | Rv2940c | 6,013 (2005) | A->C (T->P) | Multifunctional mycocerosic acid synthase |
| – | Rv2952 | 526 (176) | G->A (G->R) | Phthiotriol/phenolphthiotriol dimycocerosates methyltransferase |
| kefB | Rv3236c | 304 (102) | A->G (T->A) | Integral membrane transport protein |
| lipF | Rv3487c | 697 (233) | C->T (R->C) | Carboxylesterase LipF |
| papA2 | Rv3820c | 1,397 (466) | C->T (P->L) | Trehalose-2-sulfate acyltransferase |
| fadD23 | Rv3826 | 1,264 (422) | G->C (E->Q) | Long-chain-fatty-acid–CoA ligase FadD23 |
| espK | Rv3879c | 130 (44) | G->A (D->N) | ESX-1 secretion-associated protein EspK |
| espK | Rv3879c | 1,979 (660) | A->C (E->A) | ESX-1 secretion-associated protein EspK |
| eccC2 | Rv3894c | 1,949 (650) | A->G (D->G) | ESX-2 type VII secretion system protein EccC |
SNPs in red are the ones that are further validated in other ten Beijing strains.
The SNPs between non-virulent H37Ra and virulent H37Rv.
| Rv0966c | 524 (175) | A->G(V->A) | Hypothetical protein |
| Rv0757(phoP) | 656 (219) | C->T(S->L) | Two component system response transcriptional positive regulator PhoP |
| Rv0658c | 224 (75) | A->G(L->P) | Integral membrane protein |
| Rv1021(mazG) | 656 (219) | C->A(A->E) | Nucleoside triphosphate pyrophosphohydrolase |
| Rv3879c(espK) | 684 (228) | G->C(I->M) | ESX-1 secretion-associated protein EspK |
For comparison, the H37Rv genome is used as reference genome;
The number in the parentheses indicates the mutant position in protein;
The letters in the parentheses indicates the amino acid substitutions.
Figure 2A schematic diagram showing the RD1 distribution in 12 MTBC strains. Different colors represent different genes on RD1 (PE35, pink; PPE68, purple; esxB, blue; esxA, slate blue; espI, green; eccD1, yellow; espJ, orange; espK, red). Genes in the red dashed box indicate the four lost genes in the attenuated M. bovis BCG 26 and M. microti 12 strains. Genes upstream (eccCb1) and downstream (espJ) of RD1 are shown in gray.
Figure 3Comparison of (A) T cell and (B) B cell epitopes in 12 MTBC strains. Duplicate epitopes were removed, and only epitopes with 100% identical matches were considered present in the strain. The x-axis refers to the strain name, and the vertical axis indicates the number of epitopes in the corresponding strain.
Figure 4Distribution of differential T cell (A) and B cell (B) epitope-clusters in 12 MTBC strains. The epitope-clusters present in all 12 MTBC strains with the same copy number were excluded. Each row represents an epitope-cluster, and each column indicates a strain. The color intensity shows the copy number of each epitope-cluster. Some epitope-clusters were enlarged to highlight the differences amongst diverse MTBC lineages on the right section of figure.
Figure 5Distribution of some differential T-cell (A) and B-cell (B) epitopes and the corresponding antigen genes for 12 MTBC strains. The distribution of epitopes and corresponding antigens are shown at the top and bottom sections of the figure. Epitope copy number and antigen is indicated by the intensity of the color.