| Literature DB >> 20495566 |
Iñaki Comas1, Jaidip Chakravartti, Peter M Small, James Galagan, Stefan Niemann, Kristin Kremer, Joel D Ernst, Sebastien Gagneux.
Abstract
Mycobacterium tuberculosis is an obligate human pathogen capable of persisting in individual hosts for decades. We sequenced the genomes of 21 strains representative of the global diversity and six major lineages of the M. tuberculosis complex (MTBC) at 40- to 90-fold coverage using Illumina next-generation DNA sequencing. We constructed a genome-wide phylogeny based on these genome sequences. Comparative analyses of the sequences showed, as expected, that essential genes in MTBC were more evolutionarily conserved than nonessential genes. Notably, however, most of the 491 experimentally confirmed human T cell epitopes showed little sequence variation and had a lower ratio of nonsynonymous to synonymous changes than seen in essential and nonessential genes. We confirmed these findings in an additional data set consisting of 16 antigens in 99 MTBC strains. These findings are consistent with strong purifying selection acting on these epitopes, implying that MTBC might benefit from recognition by human T cells.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20495566 PMCID: PMC2883744 DOI: 10.1038/ng.590
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Strains used in this study, sequencing coverage, and number of raw and filtered SNPs after comparison to the H37Rv reference genome
| Strain | Lineage | Origin | Average mapped | Number | Percent genome | Raw SNPs | Filtered SNPs |
|---|---|---|---|---|---|---|---|
|
| |||||||
| MTB_95_0545 | Lineage 1 | Laos | 77.37 | 7,621,946 | 99.75 | 3,478 | 2,017 |
| MTB_K21 | Lineage 1 | Zimbabwe | 77.99 | 7,112,888 | 99.29 | 2,853 | 2,151 |
| MTB_K67 | Lineage 1 | Comoro Islands | 78.29 | 7,097,284 | 98.95 | 2,943 | 2,070 |
| MTB_K93 | Lineage 1 | Tanzania | 65.52 | 6,017,391 | 99.22 | 2,949 | 2,041 |
| MTB_T17 | Lineage 1 | The Philippines | 72.59 | 7,130,412 | 99.36 | 3,788 | 1,988 |
| MTB_T92 | Lineage 1 | The Philippines | 46.01 | 5,068,053 | 98.85 | 4,080 | 1,994 |
| MTB_00_1695 | Lineage 2 | Japan | 77.92 | 7,394,236 | 99.02 | 2,875 | 1,351 |
| MTB_98_1833 | Lineage 2 | China | 64.49 | 6,395,114 | 99.1 | 2,962 | 1,361 |
| MTB_M4100A | Lineage 2 | South Korea | 40.47 | 4,022,290 | 98.94 | 3,316 | 1,354 |
| MTB_T67 | Lineage 2 | China | 78.77 | 7,616,603 | 98.73 | 2,820 | 1,343 |
| MTB_T85 | Lineage 2 | China | 61.65 | 6,159,284 | 99.04 | 3,046 | 1,377 |
| MTB_91_0079 | Lineage 3 | Ethiopia | 74.03 | 7,228,038 | 99.14 | 2,920 | 1,363 |
| MTB_K49 | Lineage 3 | Tanzania | 75.52 | 6,845,266 | 99.25 | 2,195 | 1,416 |
| H37Rv | Lineage 4 | USA | Reference | ||||
| MTB_4783_04 | Lineage 4 | Sierra-Leone | 78.12 | 7,466,814 | 98.78 | 1,559 | 741 |
| MTB_GM_1503 | Lineage 4 | The Gambia | 82.26 | 7,891,933 | 99.08 | 2,283 | 782 |
| MTB_K37 | Lineage 4 | Uganda | 59.86 | 5,480,451 | 98.85 | 2,496 | 822 |
| MAF_11821_03 | Lineage 5 | Sierra-Leone | 78.22 | 7,491,737 | 99.02 | 3,741 | 2,102 |
| MAF_5444_04 | Lineage 5 | Ghana | 79.75 | 7,578,690 | 98.92 | 3,686 | 2,079 |
| MAF_4141_04 | Lineage 6 | Sierra-Leone | 72.62 | 7,027,143 | 98.61 | 3,886 | 2,180 |
| MAF_GM_0981 | Lineage 6 | The Gambia | 76.39 | 7,350,873 | 99 | 4,451 | 2,213 |
| MTB_K116 |
| Somalia | 93.01 | 6,544,254 | 96.32 | 19,008 | 14,730 |
| Total MTBC | 62,327 | 32,745 | |||||
Notes:
Defined as in 8.
Compared to the reference genome H37Rv.
Figure 1Neighbour-joining phylogeny based on 9,037 variable common nucleotide positions across 21 human M. tuberculosis complex genome sequences. The tree is rooted with M. canettii, the closest known outgroup. Node support following 1,000 bootstrap replications is indicated. Branches are coloured according to the six main phylogeographic lineages of MTBC defined previously3,7-8. Highly congruent topologies were obtained by Maximum likelihood and Bayesian inference (Supplementary Fig. 1).
Figure 2Average gene-by-gene nucleotide diversity across three gene classes. Boxplot indicating median (horizontal line), interquartile range (box), and minimum and maximum values (whiskers).
Distribution of synonymous and non-synonymous SNPs in gene concatenates
| Gene category | Length of | Measure A | Measure B | |||
|---|---|---|---|---|---|---|
| Nonredundant SNPs | nonsyn/syn | dN/dS | Range | |||
| nonsyn | syn | |||||
|
| ||||||
|
| 907,584 | 1,124.83 | 755.17 | 1.49 | 0.53 | 0.45-0.67 |
|
| 2,674,329 | 4,392.51 | 2,338.49 | 1.88 | 0.65 | 0.78-0.56 |
|
| 81,660 | 126.5 | 87.5 | 1.45 | 0.57 | 0.17-1.15 |
|
| ||||||
|
| 12,234 | 19 | 12 | 1.58 | na | na |
|
| ||||||
|
| 11,088 | 9 | 12 | 0.75 | na | na |
|
| 68,556 | 106.5 | 75.5 | 1.41 | na | na |
Notes:
The number of non-redundant synonymous and non-synonymous SNPs after mapping the changes onto the phylogeny shown in Figure 1. An overall dN/dS was calculated based on these SNPs and is shown in Figure 3 (Measure A; see Materials and Methods).
Calculated using Measure B. The median dN/dS was calculated from the 21 strain specific dN/dS values. This measure of dN/dS could only be calculated for the essential, non-essential and antigen categories because in the epitope and non-epitope concatenates some strains had zero values for synonymous or non-synonymous changes.
After exclusion of the three outlier antigens esxH, pstS1, and Rv1986 (see main text).
not applicable.
Figure 3Ratio of the rates of synonymous and non-synonymous substitutions (dN/dS) in various gene classes of MTBC. Overall dN/dS was calculated based on the number of non-redundant synonymous and non-synonymous changes after comparing each of the 21 MTBC genomes to the inferred most likely recent common ancestor of MTBC. This shows that essential genes are more conserved than non-essential genes, and that antigens are as conserved as essential genes. Figures for the epitope- and non-epitope regions refer to the calculations after excluding the three outlier antigens esxH, pstS1, and Rv1986.
Figure 4Number of variable amino acid positions in 491 human T cell epitopes of MTBC. This demonstrates the remarkable lack of genetic variability among the regions of the genome that interact with the human immune system.