| Literature DB >> 19664275 |
Frédéric Veyrier1, Daniel Pletzer, Christine Turenne, Marcel A Behr.
Abstract
BACKGROUND: In the past decade, the availability of complete genome sequence data has greatly facilitated comparative genomic research aimed at addressing genetic variability within species. More recently, analysis across species has become feasible, especially in genera where genome sequencing projects of multiple species have been initiated. To understand the genesis of the pathogen Mycobacterium tuberculosis within a genus where the majority of species are harmless environmental organisms, we have used genome sequence data from 16 mycobacteria to look for evidence of horizontal gene transfer (HGT) associated with the emergence of pathogenesis. First, using multi-locus sequence analysis (MLSA) of 20 housekeeping genes across these species, we derived a phylogeny that serves as the basis for HGT assignments. Next, we performed alignment searches for the 3989 proteins of M. tuberculosis H37Rv against 15 other mycobacterial genomes, generating a matrix of 59835 comparisons, to look for genetic elements that were uniquely found in M. tuberculosis and closely-related pathogenic mycobacteria. To assign when foreign genes were likely acquired, we designed a bioinformatic program called mycoHIT (mycobacterial homologue investigation tool) to analyze these data in conjunction with the MLSA-based phylogeny.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19664275 PMCID: PMC3087520 DOI: 10.1186/1471-2148-9-196
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Phylogeny of . The radial tree was generated using MEGA 4.1. The different mycobacterial lineages, based on slow-growing versus rapid-growing organisms, are also indicated. The arrow indicates schematically the positions of the different list of HGT, detected in this study (A, B, C and D), during the step-wise evolution of slow-growing mycobacteria. The scale represents the number of amino acid differences.
Numbers of genes detected as potential HGTs using different parameters:
| Thresholds | List A | List B | List C | List D | Total |
|---|---|---|---|---|---|
| "any hits" (> 1, < 0) | 142 | 90 | 42 | 274 | 548 |
| 95 percentile (48,53,53) | 134 | 83 | 41 | 274 | 532 |
| 90% percentile (56,61,61) | 112 | 74 | 39 | 274 | 499 |
| 85% percentile (66,66,66) | 87 | 65 | 31 | 274 | 457 |
Figure 2Sub-classification of potential HGT. A) Number of proteins from the different lists (A, B, C and D) presenting the specified characteristics. The lists were obtained using the "any hits" threshold in mycoHIT, each predicted protein was blasted against the NCBI database, and the results were used to classify them in the specified category as described in material and methods. B) Proportion of proteins, from list A, B, C and D, which present the specified characteristics.
List of genes detected, with high confidence, to be horizontally transferred:
| Name of Gene | Annotation (essential genes) | Gene GC% | Presence of HGT vehicle (Yes or No) |
|---|---|---|---|
| Rv0082 to Rv0087 | 69.1 | N | |
| Rv0113 to Rv0115 | sedoheptulose-7-phosphate isomerase | 61.3 | N |
| Rv1006 | Glycosyl transferase | 65 | N |
| Rv1513/14c, Rv1516c, Rv1518 and Rv1520 | LOS locus (14c: | NA | Y |
| Rv1722 | Carboxylase | 67 | Y |
| Rv2067c | Methyltransferase | 59.5 | Y |
| Rv2387 | Permease ( | 61.8 | N |
| Rv2531 | Arginine Decarboxylase | 63.4 | Y |
| Rv2561/62 | H.P | 59.9 | N |
| Rv2633c | Hemerythrin HHE cation binding domain scavenger | 60.8 | N |
| Rv2955c to Rv2957 and Rv2963 | PGL locus | NA | Y |
| Rv3528c | Methyltransferase | 48.6 | N |
| Rv3768 | Permease | 61.8 | Y |
| Rv3788 | Transcription Elongation GreA | 66.3 | N |
| Rv0104 | cAMP-kinase regulatory subunit | 63.8 | N |
| Rv0193c | H.P. | 61.6 | N |
| Rv0213c/14c | Methyltransferase and NadR | 63.4 | N |
| Rv0347 | H.P ( | 62.1 | N |
| Rv0379 | SecE2 | 63.4 | N |
| Rv0520/21 | Methyltransferase | 62.1 | N |
| Rv0793 | Antibiotique monooxygenase | 66.3 | Y |
| Rv0899 | OmpA | 60.8 | N |
| Rv1192 | PGAP1 family ( | 67.8 | N |
| Rv1289 | H.P | 60.0 | N |
| Rv1371 to Rv1376 | Fatty acid desaturase, Pks18, glycolipid sulfotransferase, TfuA like protein (71: | 63.4 | Y |
| Rv1500 to Rv1508c and Rv1525 | LOS locus | NA | Y |
| Rv1541c | H.P. | 63.5 | N |
| Rv1732 | Thioredoxin | 69.6 | N |
| Rv1749 | H.P. | 62.2 | N |
| Rv1995 | Hemerythrin HHE cation binding domain scavenger | 63 | Y |
| Rv2075c | C-type lectin domain | 68.3 | N |
| Rv2277 | Glycerophosphodiesterase (GdpD) ( | 61.9 | Y |
| Rv2289 | CDP-diacylgylcerol pyrophosphatase | 57.5 | N |
| Rv2636 | chloramphenicol 3-O-phosphotransferase | 63.4 | N |
| Rv2761c | type I restriction/modification system | 63.7 | Y |
| Rv2949c | PGL locus: 4-hydroxybenzoate synthetase | 52.5 | Y |
| Rv2958c and Rv2962c | PGL locus: glycosyltransferase | 64.8 and 65.8 | Y |
| Rv3091 | patatin-like phospholipase family protein | 69.4 | N |
| Rv3172c | H.P. | 59.6 | N |
| Rv0325/26 | Methyltransferase (26: | 63.9 | N |
| Rv0611c | H.P. | 63.9 | Y |
| Rv0628c and Rv0874c | H.P. (28c: | 72.2 and 70.5 | Y and N |
| Rv1498c | Los locus: Methyltransferase | 58.7 | Y |
| Rv1671 | H.P. | 62.1 | N |
| Rv2292c/93c | Methylthioadenosine nucleosidase | 63.9 | N |
| Rv2959c | PGL Locus: Methyltransferase | 53.7 | Y |
| Rv3081 | H.P. | 64.8 | N |
| Rv3138 | Pyruvate formate lyase activating enzyme | 61.9 | N |
| Rv3373/74/75 | Enoyl-CoA Hydratase 18, Amidase (75: | 59.7 | Y |
| Rv0032/33 | BioF/AcpA | 61.6 | Y |
| Rv0059/60 | Appr1-P (60: | 60.8 | Y |
| Rv0078A | H.P. | 61.1 | N |
| Rv0329c/30c | Methyltransferase, TetR | 67.5 | N |
| Rv0987/88 | Adhesion/permease and hydroxymethylcoA reducatase | 53.9 | N |
| Rv1045 to Rv1049 | H.P. (45 and 49: | 64.6 | Y |
| Rv1509 and Rv1515c | LOS locus: Methyltransferase | NA | Y |
| Rv1552 to Rv1555 | Fumarate reductase | 63.4 | Y |
| Rv1673c/74c | transglutaminase-like/ArsR | 64 | N |
| Rv2003c, Rv2008c and Rv2011c | Methyltransferase/ATPase, AAA+/H.P. GI4 | NA | Y |
| Rv2295 | H.P. | 64.5 | N |
| Rv2336/37c/38c | Molybdoterine thyamine synthesis (38c: | 58.8 | N |
| Rv2432c/33c | H.P. | 63.8 | N |
| Rv2491/92 | H.P. | 54.8 | Y |
| Rv2735 | H.P. | 56.5 | Y |
| Rv2804c and Rv2816c to Rv2826c | H.P. GI6 (17c: | 62.1 | Y |
| Rv2954c | PGL locus: Methyltransferase | 62.8 | Y |
| Rv2990c | Methyltransferase | 62.5 | N |
| Rv3122/23 | H.P. | 67.8 | N |
| Rv3189/90c | filamentous haemagglutinin-adhesin/H.P. | 62.0 | Y |
| Rv3376/77c/78c | Production of the halimane skeleton (76c: | 54.7 | Y |
| Rv3402c/03c/04c | Aminotransferase/FAD dependent oxidoreductase/formyltransferase | 62.6 | Y |
| Rv3471 | Cupin family protein | 65.9 | Y |
Neighboring genes were abbreviated using the last two numbers and the letter c (if required) for the second or third genes (e.g. Rv1513 and Rv1514c become Rv1513/14c). NA = Not Applicable (GC% not presented for sets of separated genes), H.P. = Hypothetical protein, GI = Giant island (see Additional file 1 – Figure S4). Genes previously described as essential for M. tuberculosis by transposon screens [17,22,23] are indicated in parentheses after the annotation.
Figure 3Example of lineage specific biochemical properties mediated via the acquisition of foreign genes. A) The lipooligosaccharide (LOS) locus is represented from Rv1494 to Rv1520. A graphical representation of the GC content, across a 110 Kb DNA sequence, illustrates the GC% drop of the DNA comprising the indicated clusters of HGT. B) The phenolic glycolipid (PGL) locus is represented from Rv2949c to Rv2964. Genes colored in black are those detect as HGT. Again, a graphical representation of the GC content, across a 100 Kb DNA sequence, illustrates the GC% drop of the foreign DNA. For visual reasons, some small genes have been labeled with the two last numbers (e.g. Rv2949c has been named 49c). In all case, the genes colored in black are those detect as HGT (or HGT vehicle), and unless otherwise stated, genes are from list D.
Figure 4Example of anaerobic adaptation via the acquisition of a locus transferred several times in . A) Representation of the formate hydrogenase locus (hyc) locus in Nocardiodes sp.. A part of the Noca_4472 gene (pckA) has been duplicated at each side of the hyc locus, suggesting integration of the hyc locus by homologous recombination. B) Representation of the hyc locus in M. avium subsp. hominissuis compared with the corresponding region in the M. avium subsp. avium genome. The presence of a transposase and the split of radC into two pieces suggests integration by transposition of the hyc locus in M. avium subsp. hominissuis. C) Representation of the hyc locus in M. tuberculosis and the corresponding GC% increase. The graphical representation of the GC content, across a 135 Kb DNA sequence, illustrates other HGT elements with GC% decrease, as indicated.
Figure 5Example of a virulence locus, not yet functionally characterized, acquired via horizontal genes transfer. The Rv3375 locus is represented from Rv3374 to Rv3381c, genes colored in black are those detect as HGT (or HGT vehicle). A graphical representation of the GC content, across a 100 Kb DNA sequence, illustrates the GC% drop of the DNA comprising the indicated clusters of HGT and accompanying transposases.