Literature DB >> 18325090

Detecting the molecular scars of evolution in the Mycobacterium tuberculosis complex by analyzing interrupted coding sequences.

Caroline Deshayes1, Emmanuel Perrodou, Daniel Euphrasie, Eric Frapy, Olivier Poch, Pablo Bifani, Odile Lecompte, Jean-Marc Reyrat.   

Abstract

BACKGROUND: Computer-assisted analyses have shown that all bacterial genomes contain a small percentage of open reading frames with a frameshift or in-frame stop codon We report here a comparative analysis of these interrupted coding sequences (ICDSs) in six isolates of M. tuberculosis, two of M. bovis and one of M. africanum and question their phenotypic impact and evolutionary significance.
RESULTS: ICDSs were classified as "common to all strains" or "strain-specific". Common ICDSs are believed to result from mutations acquired before the divergence of the species, whereas strain-specific ICDSs were acquired after this divergence. Comparative analyses of these ICDSs therefore define the molecular signature of a particular strain, phylogenetic lineage or species, which may be useful for inferring phenotypic traits such as virulence and molecular relationships. For instance, in silico analysis of the W-Beijing lineage of M. tuberculosis, an emergent family involved in several outbreaks, is readily distinguishable from other phyla by its smaller number of common ICDSs, including at least one known to be associated with virulence. Our observation was confirmed through the sequencing analysis of ICDSs in a panel of 21 clinical M. tuberculosis strains. This analysis further illustrates the divergence of the W-Beijing lineage from other phyla in terms of the number of full-length ORFs not containing a frameshift. We further show that ICDS formation is not associated with the presence of a mutated promoter, and suggest that promoter extinction is not the main cause of pseudogene formation.
CONCLUSION: The correlation between ICDSs, function and phenotypes could have important evolutionary implications. This study provides population geneticists with a list of targets, which could undergo selective pressure and thus alters relationships between the various lineages of M. tuberculosis strains and their host. This approach could be applied to any closely related bacterial strains or species for which several genome sequences are available.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18325090      PMCID: PMC2277376          DOI: 10.1186/1471-2148-8-78

Source DB:  PubMed          Journal:  BMC Evol Biol        ISSN: 1471-2148            Impact factor:   3.260


Background

Recent in silico surveys showed that most bacterial genomes contain interrupted coding sequences (ICDSs) [1-3]. These ICDSs generally result from the insertion or deletion of nucleotides, affecting the frame read and splitting the original coding sequence into two or more smaller open reading frames. These mutations may also result in a shift in reading frame, thereby altering the carboxy-terminus of the protein. ICDSs may be present in genes with known or unknown functions, or in hypothetical open reading frames [4]. Reported prokaryotic genomes have a mean of 74 ICDSs per genome, corresponding to 1 to 5% of the genes present, irrespective of genome size or GC content [2,3]. One of the few exceptions is the genome of M. leprae, which contains about 30% ICDSs, frequently described as pseudogenes [2,5]. The accumulation of mutations in this species is thought to be due to the loss of the proofreading activity of the DnaQ subunit of DNA polymerase III [6]. A similar sort of reductive evolution is also observed in the case of M. ulcerans [7] or for species of the genus Rickettsiales [8]. ICDSs may correspond to authentic mutations, generally resulting in a loss of function, but may in some cases reflect sequencing errors. These sequencing errors are misleading when conducting genomic analysis, but have been shown to account for only some of the detected ICDSs [4,9-12]. Most ICDSs correspond to authentic mutations and can therefore be compared between strains, making it possible to explore conserved and unique mutation events. The availability of complete genomes sequences for genetically related organisms has facilitated comparative analyses of ICDSs. This simple concept, which has not been reported before, enables to investigate evolutionary relationship between isolates or species. In this study, we took the finished genome of two mycobacterial species as a model: M. tuberculosis, which causes tuberculosis in humans, and M. bovis, which principally causes tuberculosis in ruminants. We also studied six phylogenetically distinct isolates of M. tuberculosisH37Rv, CDC1551, Haarlem, F11, C [13], and 210 (a representative of the W-Beijing family) and M. africanum, a species of the M. tuberculosis complex for which the genome sequence is still at the assembly step. These isolates are different from each other as they belong to distinct evolutionary branches of the M. tuberculosis species, sensu stricto (s.s), yet more closely related to each other than to the more distantly related members of the M. tuberculosis complex (M. africanum, M. bovis, M. microti and M. pinnipedii) [14]. The W-Beijing family is a clonal group of highly successful M. tuberculosis strains associated with multiple outbreaks [15]. This family is one of the oldest lineages to diverge as determined by single nucleotide polymorphism (SNP) and region of deletion analysis [14]. In contrast, H37Rv, the first M. tuberculosis strain to be completely sequenced is believed to be one of the most recent (youngest) lineages of M. tuberculosis [14,16]. Strain CDC1551 belongs to a lineage that branched between the W-Beijing and the H37Rv isolates. Overall these three isolates represent 3 different genetic groups of the species [14-17]. These isolates have been studied in detail and display differences in genotype [14,18], phenotype and virulence properties [19,20]. By comparing the open reading frames containing frameshifts in these organisms, we showed that ICDSs could be classified as "common to all strains" or "lineage- or strain-specific". The common ICDSs probably correspond to mutations occurring before the divergence of the isolates, whereas lineage- or strain-specific ICDSs correspond to more recently acquired mutations. Thus, ICDS investigation can be used to characterize the molecular scars of evolutionary relationships between organisms and may well provide a unique molecular signature for a particular strain or species, complementary to single nucleotide polymorphism (SNP) and other molecular markers analyses for the characterization of strain variation [18,21]. We also show that ICDS formation is not associated with mutation in the promoter region. The present data suggests that promoter extinction is not a major event in the "pseudogenization" process. To experimentally prove that ICDSs comparison is a powerful phylogenomic tool, we analyzed 21 clinical M. tuberculosis isolates for their ICDS content. We showed that the W-Beijing lineage differs from the other TB phyla by a lower number of common ICDSs, confirming early divergence with M. tuberculosis s.s strains. ICDS characterization in addition to phylogenetic investigations or typing can be used to select strains or phenotypes for studies of particular phenotypic characters, such as virulence. Indeed, as frameshift acquisition may lead to a loss of function, researchers should consider the possible presence of ICDS before choosing a strain or species for investigating a particular phenotype.

Results

Detecting the molecular scars of evolution in M. tuberculosis and in M. bovis

Comparative analyses of frameshift-containing genes require the complete genome sequences of closely related organisms. The TB complex, which includes two recently sequenced species and at least 6 accessible strains, is therefore a highly suitable model. We investigated ICDSs in M. tuberculosis and in M. bovis. The genome sequence of M. tuberculosis H37Rv has been available since 1998 and has recently been re-annotated [22,23]. The genome sequences of M. tuberculosis strain CDC1551 and M. bovis have been characterized independently [18,24]. The great advantage of studying this model system is that the evolution of these two species and the phylogenetic links between them are well documented [25]. The M. tuberculosis genomes (CDC1551 and H37Rv) have nucleotide sequences more than 99.95 % identical to that of M. bovis [18,24]. The three genomes were screened for the presence of ICDSs. To this end, the genomic sequences of each predicted ICDS [3] were extracted for each strain or species and compared between them. Each common or specific ICDS was then analyzed manually to characterize the molecular event leading to the detected frameshift. The genome of H37RV contains 113 ICDSs, whereas CDC1551 has 137 ICDSs and M. bovis has 134 ICDSs, corresponding to about 2% of the total coding sequences [3]. These organisms have similar numbers of ICDSs, but the alterations do not always affect the same genes. We therefore investigated whether some of these ICDSs were common to all three organisms. We compared the nucleotide and deduced amino-acid sequences of each frameshift-containing open reading frame in the three organisms. We found that 81 of the frameshift-containing genes were common to all three strains (Figure 1A, Table 1), and were identical at the molecular level. The proteins affected by these frameshifts included proteins of unknown function as well as annotated and/or characterized proteins (Table 1). The fact that these three mycobacterial genomes were sequenced and assembled independently suggests that these 81 common ICDSs correspond to authentic frameshift-containing genes rather than sequencing errors. These results indicate that these 81 ICDSs correspond to frameshifts acquired before the splitting of the M. tuberculosis and M. bovis species (Table 1). Alternatively, the same 81 genetic mutations may result from convergent evolution and hence have occurred independently in all three genomes, a highly unlikely scenario.
Figure 1

A- Schematic representation of the ICDSs common to M. tuberculosis H37Rv, CDC1551 and M. bovis AF2122/97 or specific to one of these strains. The total number of ICDSs is indicated. B- Schematic representation of the ICDSs of M. bovis BCG 1173P2 compared to the other analyzed strains.

Table 1

List of the 81 ICDSs common to M. tuberculosis H37Rv, CDC1551, M. bovis AF2122/97 and M. africanum GM041182.

M. tub H37RvM. tub CDC1551M. bov AF2122/97M. bov BCG 1173P2M. africanumPutative functionFunctional classification
0002 (Rv0151c 588 aa)000400060109ICDSPE family proteinPE/PPE
0003 (Rv0152c 525 aa)000500070004ICDSPE family proteinPE/PPE
0007 (Rv0366c 197 aa – Rv0367c 129 aa)001100100110ICDSConserved hypotheticalUnknown
0009 (Rv0393 441 aa)001200110008ICDSConserved hypotheticalUnknown
0010 (Rv0520 116 aa – Rv0521 101 aa)001400130012ICDSDimethylglycine N-methyltransferaseIntermediary metabolism
0012 (Rv0601c 157 aa)001700180016ICDS°Two-component sensor kinaseRegulation
0014 (Rv0635 158 aa – Rv0636 142 aa)001900200111ICDSConserved hypotheticalUnknown
0015 (Rv0636 142 aa – Rv0637 166 aa)002000210112ICDSConserved hypotheticalUnknown
0017 (Rv0724A 112 aa – Rv0725c 301 aa)002200230020ICDSConserved hypotheticalUnknown
0020 (Rv0865 160 aa)002600850113ICDSMolybdopterin biosynthesis proteinIntermediary metabolism
0021 (Rv0890c 882 aa – Rv0891c 285 aa)002700250114ICDSTranscriptional regulatorRegulation
0023 (Rv1034c 130 aa – Rv1035c 228 aa)003000300031ICDSTransposaseIS/phage
0024 (Rv1035c 228 aa – Rv1036c 112 aa)003100310032ICDSTransposaseIS/phage
0025 (Rv1041c 287 aa – Rv1042c 135 aa)003200860034ICDSTransposaseIS/phage
0026 (Rv1104 229 aa)003400870115ICDSEsteraseIntermediary metabolism
0027 (Rv1104 229 aa)003500880037ICDSEsteraseIntermediary metabolism
0028 (Rv1105 171 aa)003600330036ICDSPara-nitrobenzyl esteraseIntermediary metabolism
0029 (Rv1119c 49 aa - Rv1120c 164 aa)003700340039ICDSConserved hypotheticalUnknown
0030 (Rv1136 113 aa)003900890040ICDSEnoyl-CoALipid metabolism
0032 (Rv1149 135 aa – Rv1150 183 aa)004100900041ICDSTransposaseIS/phage
0033 (Rv1163 201 aa – Rv1164 246 aa)004200350116ICDSNitrate reductase NarI-JIntermediary metabolism
0035 (Rv1203c 194 aa – Rv1204c 562 aa)004400360043ICDSConserved hypotheticalUnknown
0036 (Rv1413 171 a)004600410047ICDSConserved hypotheticalUnknown
0040 (Rv1662 1602 aa – Rv1663 502 aa)005300430117ICDSPolyketide synthase Pks8/17Lipid metabolism
0041 (Rv1687c 255 aa)005401320128ICDS°ATP binding protein, ABC transporterCell wall, process
0043 (Rv1735c 166 aa)005601120050ICDSMalic acid transport proteinCell wall, process
0046 (Rv1878 450 aa)006100520118ICDS°Glutamine synthetase GlnA3Intermediary metabolism
0047 (Rv1888A 58 aa – Rv1889c 118 aa)006200530119ICDSConserved hypotheticalUnknown
0048 (Rv1931c 259 aa)006400540058ICDSConserved hypotheticalUnknown
0049 (Rv1949c 319 aa – Rv1950c 63 aa)006500550059ICDSConserved hypotheticalUnknown
0050 (Rv2013 159 aa – Rv2014 196 aa)006600560061ICDS°TransposaseIS/phage
0051 (Rv2086 201 aa)006700930062ICDSConserved hypotheticalUnknown
0052 (Rv2086 201 aa – Rv2087 76 aa)006800580063ICDSConserved hypotheticalUnknown
0053 (Rv2087 76 aa)006900940064ICDSConserved hypotheticalUnknown
0054 (Rv2095c 316 aa – Rv2096 332 aa)007001330129ICDSConserved hypotheticalUnknown
0058 (Rv2321 182 aa – Rv2322c 221 aa)007400600067ICDSOrnithine aminotransferase RocD1Intermediary metabolism
0059 (Rv2325 282 aa – Rv2326c 697 aa)007500960120ICDSConserved hypotheticalUnknown
0060 (Rv2331 129 aa)007601340130ICDSHypotheticalUnknown
0061 (Rv2337 372 aa – Rv2338c 318 aa)007700970068ICDSConserved hypotheticalUnknown
0062008100650076ICDSHydrogenase nickel incorporation protein HypBIntermediary metabolism
0063 (Rv2877c 287 aa – Rv2878c 173 aa)008200990077ICDS°Conserved hypotheticalUnknown
0065 (Rv2943A 177 aa – Rv2944 239 aa)008300660079ICDSTransposaseIS/phage
0068 (Rv3128c 338 aa)008800680082ICDSConserved hypotheticalUnknown
0069 (Rv3152 410 aa – Rv3153 211 aa)008901000121ICDSNADH dehydrogenase IIntermediary metabolism
0070 (Rv3172c 160 aa)009000690083ICDSConserved hypotheticalUnknown
0071 (Rv3200c 355 aa)009101010122ICDSHypotheticalUnknown
0075 (Rv3349c 246 aa)010001020085ICDS°TransposaseIS/phage
0076 (Rv3351c 264 aa – Rv3352c 123 aa)010100710088ICDSOxidoreductaseIntermediary metabolism
0077 (Rv3352c 123 aa – Rv3353c 86 aa)010200720089ICDSOxidoreductaseIntermediary metabolism
0079 (Rv3419c 344 aa)010401350131ICDSO-sialoglycoprotein endopeptidaseIntermediary metabolism
0080 (Rv3420c 158 aa – Rv3421c 211 aa)010501040092ICDS°Conserved hypotheticalUnknown
0083010801070095ICDSTransposaseIS/phage
0084 (Rv3636 115 aa – Rv3637 166 aa)011100760097ICDSTransposaseIS/phage
0087 (Rv3741c 224 aa – Rv3742c 131 aa)011400780099ICDS°Aromatic-ring hydroxylaseIntermediary metabolism
0088 (Rv3770A 61 aa – Rv3770B 64 aa)011500790100ICDS°TransposaseIS/phage
0089 (Rv 3844 164 aa – Rv3845 120 aa)011601360132ICDSTransposaseIS/phage
0090 (Rv3866 283 aa – Rv3867 183 aa)011701090123ICDSConserved hypotheticalUnknown
0091 (Rv3880c 115 aa – Rv3881 460 aa)011801370133ICDSConserved hypotheticalUnknown
0095 (Rv3900c 311 aa)012200830124ICDSConserved hypotheticalUnknown
0097 (Rv3913 335 aa – Rv3914 116 aa)012301110001ICDSThioredoxin reductaseIntermediary metabolism
0098012400320035ICDSConserved hypotheticalUnknown
0099 (Rv3386 234 aa – Rv3387 225 aa)012501030090ICDSTransposaseIS/phage
0100 (Rv0342 640 aa – Rv0343 493 aa)012600090125ICDSIsoniazid inductible gene proteinCell wall, process
0101 (Rv0763c 69 aa – Rv0764c 451 aa)012700840126ICDSCytochrome P450Intermediary metabolism
0102 (Rv1858 264 aa – Rv1859 369 aa)012800920127ICDSMolybdenum transport ABC transporterCell wall, process
0103 (Rv0449c 439 aa)012901130010ICDSConserved hypotheticalUnknown
0104 (Rv0471c 162 aa)013001140011ICDSHypotheticalUnknown
0105 (Rv0859 403 aa – Rv0860 720 aa)013101150024ICDS°Acyl-CoA thiolase FadA and dehydrogenase FadBLipid metabolism
0106 (Rv0880 143 aa – Rv0881 288 aa)013201160025ICDS°Transcriptional regulatorInformation pathway
0107 (Rv0997 143 aa)013301170029ICDSHypotheticalUnknown
0108 (Rv1041c 287 aa – Rv1042c 135 aa)013401180033ICDS°TransposaseIS/phage
0109 (Rv1104 229 aa – Rv1105 171 aa)013501190038ICDSPara-nitrobenzyl esteraseIntermediary metabolism
0110 (Rv1221 257 aa – Rv1222 154 aa)013601200044ICDSAlternative sigma factor SigEInformation pathway
0111 (Rv1752c 149 aa)013701210051ICDSConserved hypotheticalUnknown
0112 (Rv1961 164 aa)013801220060ICDSHypotheticalUnknown
0113 (Rv2309c 151 aa)013901230066ICDSIntegraseInformation pathway
0114 (Rv2420c 127 aa – Rv2421c 211 aa)014001240070ICDSnicotinate-nucleotide adenylyltransferase NadDIntermediary metabolism
0115 (Rv2732c 205 aa – Rv2733c 512 aa)014101250073ICDSConserved hypotheticalUnknown
0116 (Rv2922A 94 aa – Rv2923c 137 aa)014201260078ICDSAcylphosphatase AcyPIntermediary metabolism
0117 (Rv3774 274 aa – Rv3775 274 aa)014301270101ICDS°Enoyl-CoA hydratase EchA21 and lipase LipELipid metabolism
0119 (Rv2599 143 aa – Rv2600 133 aa)014400980134ICDS°Conserved hypotheticalUnknown

ICDS number (variable, according to the strain), the size of the predicted protein and its putative function are indicated. The corresponding ORF numbers in M. tuberculosis H37Rv are indicated in brackets. "°" indicates ICDSs containing additional mutations with respect to M. tuberculosis H37Rv, CDC1551 and M. bovis AF2122/97.

A- Schematic representation of the ICDSs common to M. tuberculosis H37Rv, CDC1551 and M. bovis AF2122/97 or specific to one of these strains. The total number of ICDSs is indicated. B- Schematic representation of the ICDSs of M. bovis BCG 1173P2 compared to the other analyzed strains. List of the 81 ICDSs common to M. tuberculosis H37Rv, CDC1551, M. bovis AF2122/97 and M. africanum GM041182. ICDS number (variable, according to the strain), the size of the predicted protein and its putative function are indicated. The corresponding ORF numbers in M. tuberculosis H37Rv are indicated in brackets. "°" indicates ICDSs containing additional mutations with respect to M. tuberculosis H37Rv, CDC1551 and M. bovis AF2122/97. The two M. tuberculosis s.s strains were found to have 19 additional common ICDSs, raising their total number to 100 (Figure 1A, Table 2). This suggests that the 19 additional mutations common to these two strains but not to M. bovis were acquired post-divergence of M. tuberculosis and M. bovis. One ICDS in M. bovis (ICDS0046, Mb1789c-Mb1790c) was present in M. tuberculosis CDC1551 (ICDS0057, MT1807) but not in M. tuberculosis H37Rv (Rv1759c). This mutation (deletion of one G) was identical in the M. bovis and M. tuberculosis CDC1551 strains, but an additional mutation was present close to this mutation in the M. bovis genome. One ICDS in M. bovis (ICDS0128, Mb3813-Mb3814) was also present in M. tuberculosis H37Rv (ICDS0118, Rv3784-Rv3785) but not in M. tuberculosis CD1551 (MT3893) (Table 2).
Table 2
M. tub H37RvM. tub CDC1551M. bovis AF2122/97M. tub 210M. africanumPutative functionFunctional classification
0001 (Rv0095c 136 aa)0003Mb0098c 260 aaICDSNot FoundConserved hypotheticalUnknown
0005 (Rv0325 74 aa – Rv0326 151 aa)0010Mb0333 229 aaFLFLHypotheticalUnknown
0011 (Rv0590 275 aa – Rv0590A 9 0 84 aa)0016Mb0605 343 aaFLFLMCE-family proteinVirulence, detox, adapt
0013 (Rv0618 231 aa – Rv0619 1 8 181 aa)0018Mb0635 394 aaICDSFLGalactose-1-phosphate uridylyltransferaseIntermediary metabolism
0022 (Rv0924c 428 aa – Rv0925c 245 aa)0028Mb0948c 684 aaICDSICDSManganese transport protein MntHCell wall, process
0031 (Rv1145 303 aa – Rv1146 470 aa)0040Mb1177 781 aaFLFLTransmembrane transport protein MmpL13Cell wall, process
0037 (Rv1503c 182 aa – Rv1504c 199 aa)0048Mb1542c 382 aaICDSFLConserved hypotheticalUnknown
0038 (Rv1549 175 aa – Rv1550 571 aa)0051Mb1576 647 aaICDSFLFatty-acid-coA ligase FadD11Lipid metabolism
0039 (Rv1553-Rv1554 247 aa – 126 aa)0052Mb1579 374 aaICDS°ICDS°Fumarate reductaseIntermediary metabolism
0045 (Rv1792 59 aa)0058Mb1820 98 aaICDSFLESAT-6-like protein EsxMCell wall, process
0055 (Rv2227 233 aa)0072Mb2252 124 aaICDSFLConserved hypotheticalUnknown
0066 (Rv2946c 1616 aa – Rv2947 496 aa)0084Mb2971c 2112 aaFLFLPolyketide synthase Pks15/1Lipid metabolism
0067 (Rv2974c 470 aa – Rv2975c 84 aa)0085Mb2999c 553 aaFLFLConserved hypotheticalUnknown
0072 (Rv3233c 196 aa – Rv3234c 271 aa)0092Mb3262c 469 aaFLFLConserved hypotheticalUnknown
0073 (Rv3337 128 aa – Rv3338 214 aa)0094Mb3370 297 aaICDSFLConserved hypotheticalUnknown
0078 (Rv3373 213 aa – Rv3374 82 aa)0103Mb3408 296 aaICDSFLEnoyl-CoA hydratase EchA18Lipid metabolism
0085 (Rv3725 309 aa)0112Mb3752 333 aaFLFLOxidoreductaseIntermediary metabolism
0086 (Rv3738c 315 aa – Rv3739c 77 aa)0113Not determinedICDSICDSPPE family proteinPE/PPE
0094 (Rv3897c 210 aa – Rv3898c 110 aa)0121Mb3927c 329 aaFLFLConserved hypotheticalUnknown

M. tub H37RvM. bovis AF2122/97M. tub CDC1551M. tub 210M. africanumPutative functionFunctional classification

0118 (Rv3784 326 aa – Rv3785 357 aa)0128MT3893 712 aaNTICDSNAD-dependent epimerase/dehydrataseInformation pathway

M. tub CDC1551M. bovis AF2122/97M. tub H37RvM. tub 210M. africanumPutative functionFunctional classification

0057 (MT1806 820 aa – MT1807 94 aa)0046Rv1759c 914 aaNTNTPE_PGRS family proteinPE/PPE

List of the 19 ICDSs common to M. tuberculosis H37Rv and CDC1551, the ICDSs common to M. tuberculosis H37Rv and M. bovis AF2122/97 and the ICDSs common to M. tuberculosis CDC1551 and M. bovis AF2122/97. ICDS number (variable, according to strain), the size of the predicted protein and its putative function are indicated. The genes that do not contain a frameshift in either M. tuberculosis strain 210 and in M. africanum and that correspond to a full-length ORF are noted "FL". "NT", not tested.

List of the 19 ICDSs common to M. tuberculosis H37Rv and CDC1551, the ICDSs common to M. tuberculosis H37Rv and M. bovis AF2122/97 and the ICDSs common to M. tuberculosis CDC1551 and M. bovis AF2122/97. ICDS number (variable, according to strain), the size of the predicted protein and its putative function are indicated. The genes that do not contain a frameshift in either M. tuberculosis strain 210 and in M. africanum and that correspond to a full-length ORF are noted "FL". "NT", not tested. The availability of genomic resources for M. tuberculosis is increasing exponentially. This enabled us to investigate the presence or absence of these shared ICDSs in the Haarlem, F11, and C strains, the genomic sequences of which are currently at the assembly stage at the Broad Institute [26]. As the sequence of these genomes is in progress, the total number of frameshift-containing genes in these genomes cannot yet be accurately determined; nonetheless, it is possible to check whether the 81 ICDSs present in M. bovis and in other M. tuberculosis strains are present in these strains. All 81 ICDSs common to all three strains previously tested were also present in Haarlem and F11 strains, while 79 were present in the C strain (corresponding H37Rv ORFs ICDS0103 and ICDS0105 were full-length in this strain) (see Additional file 1). Noteworthy, was the identification of additional mutations in the vicinity (≤ 200 bp) of the original frameshift (see additional file 1). We next investigated whether the 19 ICDSs common to all M. tuberculosis s.s strains were present in the other clinical isolates. In each case, the ICDSs were also present in the three strains (Haarlem, F11, and C), but accompanied, in some cases, by additional mutations in the flanking region (see Additional file 1). Thus, 98 frameshift-containing genes were found to be conserved in all five M. tuberculosis strains analyzed. The recently published M. bovis BCG genome sequence is of a particular interest in this respect [27]. This strain, which is currently used for vaccination in humans, was derived from M. bovis after 13 years of repetitive passages in vitro [28]. A number of genetic differences, such as deletions and duplications had already been identified in the BCG strain [29,30], but large amounts of additional information have now been obtained from its genome sequence. According to our investigation, M. bovis BCG 1173P2 contains 127 ICDSs in total, 9 of which are strain-specific (Figure 1B). The 81 ICDSs common to the 3 other isolates are also present in this strain (Table 1) and 35 ICDSs are common to the M. bovis strain. We detected frameshift-containing genes in M. bovis AF2122/97 that corresponded to full-length ORFs in M. bovis BCG 1173P2, suggesting that this M. bovis strain is not the direct progenitor of the BCG vaccine (see Additional file 2).

Strain-specific ICDSs reflect newly acquired mutations and are a useful phylogenetic tool

Eighty-one ICDSs were common to all three strains, but some were specific to one strain only: 12 for M. tuberculosis H37Rv (see Additional file 3), 36 for CDC1551 (see Additional file 4) and 51 for M. bovis (see Additional file 2, Figure 1A). The proportion of ICDSs that were strain-specific was highly variable. These ICDSs accounted for 10% of all ICDSs in H37Rv, 26% in CDC1551 and 38% in M. bovis. The much larger proportion of strain-specific ICDSs in CDC1551 than in H37Rv strain is surprising, and we currently have no reasonable explanation for this phenomenon. A plausible hypothesis is that the genome sequence of CDC1551 strain has not been re-sequenced like the H37Rv genome sequence [22,28]. Strain-specific frameshift-containing genes most likely correspond to mutations acquired after the divergence of these strains. Like the common ICDSs, these events affected genes from several classes, including "unknown or hypothetical ORFs", "intermediary metabolism" and "cell wall, process" (Additional files 2, 3 and 4). As stated above, few of these strain-specific ICDSs may correspond to errors introduced during the sequencing procedure [4,11], but such errors would nonetheless have only a slight effect on the overall outcome of the comparative analysis. This study shows that the genome sequence of M. tuberculosis contains ICDSs that have been acquired during the evolution of this species. The pool of ICDSs can be classified into ICDSs common to a set of strains or species and ICDSs specific to a particular strain-lineage or strain, revealing genetic differences between strains or species.

Using ICDS comparisons to type W-Beijing strains and other M. tuberculosis lineages

W-Beijing is a lineage of M. tuberculosis that has attracted considerable attention. Indeed, strains of this lineage have been implicated in severe outbreaks and have been shown to have different genetic and phenotypic properties [20,21,31]. The genome of a strain of the W-Beijing family (strain 210) is currently sequenced but not yet fully assembled; nevertheless it can be consulted in homology searches. Consequently the total number of frameshift-containing genes in this species and the full characterization of specific ICDSs remain elusive. It is however possible to screen for the presence of ICDSs in this strain. We first investigated whether the 81 frameshift-containing genes common to all strains were also present in the genome of strain 210. All 81 of these genes also contained the same frameshift in strain 210, in agreement with the data described above. This suggests that these 81 frameshift mutations were acquired before the divergence of strain 210 from these other strains. We then investigated the 19 genes containing frameshifts common to the five strains of M. tuberculosis (H37Rv, CDC1551, Haarlem, F11, C) but not to M. bovis. We found that eight of these 19 genes contain no frameshift in strain 210, and hence corresponded to full-length ORFs (Table 2). Three genes contained frameshifts corresponding to those observed in strains CDC1551, H37Rv, Haarlem, F11 and C, but also contained additional mutations in the corresponding flanks (≤ 200 bp) of the original frameshift (Table 2). The remaining 11 ICDSs corresponded to frameshift-containing genes common to all six TB strains examined (CDC1551, H37Rv, Haarlem, F11, C, 210) and the events were identical at the molecular level. Thus, the 19 frameshift-containing genes in the two TB strains (CDC1551 and H37Rv) displayed polymorphism in strain 210 and 11 of these identified ICDSs were common to all six TB strains examined. Some of these ICDSs display no further mutation (the gene contains the frameshift alone), whereas others have acquired additional mutations, contributing to the "pseudogenization" process (data not shown). We then investigated the eight ICDSs showing polymorphism in M. tuberculosis in 21 strains of the W-Beijing lineage from several phylogenetic groups (Table 3). The eight loci were amplified by PCR, sequenced and the nucleotide sequence was compared with that of strains 210 and H37Rv. In all W-Beijing strains tested, the eight genes were full-length, with sequences 100% identical to that in strain 210, excepted for the ICDS0085 where a non-disruptive SNP is present in the region. The W-Beijing lineage is therefore a genetically homogeneous group with fewer ICDSs in common with other TB strains.
Table 3

Analysis in 21 W-Beijing isolates of the 8 ICDSs of H37Rv strain corresponding to full-length ORFs in W-Beijing strain 210.

Finger printTracking NumberICDS 0005ICDS 0011ICDS 0031ICDS 0066ICDS 0067ICDS 0072ICDS 0085ICDS 0094
W-BeijingW10648FLFLFLFLFLFLNTFL
W565FLFLFLFLFLFLFL*FL
W410775FLFLFLFLFLFLFL*FL
W143617FLFLFLFLFLFLFL*FL
W2610270FLFLFLFLFLFLFL*FL
W695418FLFLFLFLFLFLFL*FL
W887052FLFLFLFLFLFLFL*FL
W1306707FLFLFLFLFLFLFL*FL
W1488561FLFLFLFLFLFL*FL*FL
W1837657FLFLFLFLFLFLNTFL
W2158963FLFLFLFLFLFLFL*FL
W34210644FLFLFLFLFLFLFL*FL

Ancestral W-BeijingN173046FLFLFLFLFLFLFL*FL
LB8128FLFLFLFLFLFLFL*FL
AR12360FLFLFLFLFLFLFL*FL
AM4948FLFLFLFLFLFLFL*FL
CK6595FLFLFLFLFLFLFL*FL
CN116116FLFLFLFLFLFLFL*FL
HE713454FLFLFLFLFLFLFL*FL
HI5116FLFLFLFLFLFLFL*FL
KY10583FLFLNTFLFLFLFL*FL

AF/H37 lineageH37RvATCC25618ICDSICDSICDSICDSICDSICDSICDSICDS

M. bovis AF2122/97FLFLFLFLFLFLFLFL

M. tuberculosis isolates from various lineages for which chromosomal DNA was used as a template for PCR amplification of the selected locus. "FL" indicates the presence of a full-length ORF identical to that in the M. tuberculosis 210 strain, "*" indicates an additional mutation acquired in these isolates with respect to M. tuberculosis H37Rv, "NT", not tested.

Analysis in 21 W-Beijing isolates of the 8 ICDSs of H37Rv strain corresponding to full-length ORFs in W-Beijing strain 210. M. tuberculosis isolates from various lineages for which chromosomal DNA was used as a template for PCR amplification of the selected locus. "FL" indicates the presence of a full-length ORF identical to that in the M. tuberculosis 210 strain, "*" indicates an additional mutation acquired in these isolates with respect to M. tuberculosis H37Rv, "NT", not tested. To extend our analysis, we investigate the M. africanum strain, which is currently sequenced at the Sanger centre. Similarly to M. tuberculosis 210 strain, the M. africanum genome is still at the assembly step, but can be nevertheless consulted on line. We investigated whether the 81 frameshift containing genes common to all strains tested were also present in the M. africanum strain (Table 1). All 81 of these genes also contained a frameshift in M. africanum, which suggests that these mutations were acquired before the divergence of the M. tuberculosis complex. We then investigated the 19 genes containing frameshift common to the 5 M. tuberculosis strains (CDC1551, H37Rv, Haarlem, F11, C). We found that 15 out of these 19 genes were deprived of the frameshift in M. africanum and corresponded to full-length ORFs in this strain (Table 2). Eight out of these 15 genes match the wild-type ORFs identified in M. tuberculosis strain 210 and other strains of the W-Beijing lineage. In conclusion, the genome of M. africanum contains fewer ICDSs in common with the other TB isolates (CDC1551, H37Rv, Haarlem, F11, C) than with the W-Beijing strain and seems genetically closer to this lineage.

ICDS formation is not correlated with mutation in the promoter region

It has been suggested that pseudogene formation is associated with mutations in the upstream untranslated region, abolishing pseudogene expression to prevent a loss of metabolic function [32]. Once turned off, the gene continues to accumulate mutations, leading to complete pseudogene formation. ICDSs are not pseudogenes in the strict sense of the word. Indeed, the ORF is split into only two or three unframed fragments and can, in theory, revert to a wild-type allele. ICDSs are therefore considered to be ORFs undergoing "pseudogenization" rather than pseudogenes per se. Strain-specific ICDSs are, by definition, genes that are mutated in one strain, but not in another. We therefore investigated whether ICDS formation was correlated with mutation in the promoter region. All the intergenic regions (99) located upstream from strain-specific ICDSs of M. tuberculosis H37Rv, CDC1551 and M. bovis were compared with the corresponding region in the two strains having a wild-type gene. We used as a control the promoter region of randomly selected genes that are full-length in these 3 strains. We compared the level of differences observed in the promoter regions of genes full-length or containing frameshift. Nucleotide differences were observed in 27% of the upstream region of genes containing frameshift (see Additional file 5A), while 20% was observed in the case of the full-length genes (see Additional file 5B), which is not statistically significant using the chi square test. In all but 6 cases for ICDS and 2 cases for full-length genes, the difference in the upstream region was limited to one or two SNPs. We therefore conclude that ICDS formation is not correlated with mutation in the untranslated upstream region and suggest that either promoter mutations do not play a major role in pseudogene formation in the M. tuberculosis complex or that "pseudogenization" is recent.

Discussion

The presence of frameshift-containing genes in bacterial genomes is well documented [1-3,33]. A few species can bypass such frameshifts, but most do not, generally resulting in a loss of function. We show here that ICDSs can be classified as "common to all strains" or "strain-specific". The ICDSs common to all strains probably correspond to mutations acquired before the divergence of the strains, whereas strain-specific ICDSs correspond to those acquired subsequently (Figure 2). Mutations acquired after the speciation of M. tuberculosis from M. bovis were also detected. We identified 19 ICDSs common to the five M. tuberculosis strains (H37Rv, CDC1551, Haarlem, F11 and C) but not to M. bovis, about one-fifth of ICDSs common to all strains. Comparative analyses of ICDSs help to characterize the phylogenetic relationships between highly related strains and species (Figure 2) and could be applied to any bacterial species for which several genome sequences are available. In few cases, ICDSs may correspond to fusion/fission of orthologous genes in other genomes. The detection of this kind of events is due to the method of identification of ICDS but remains however a minor inconvenience [3]. It is however possible that a low percentage of specific ICDSs does correspond to sequencing errors, inducing thus artifactual phylogenetic relationships. Researchers should resequence these regions before assuming that the ICDS corresponds to a frameshift acquisition. Several studies have compared the genome sequences of M. tuberculosis CDC1551 and H37Rv, using high-resolution genomics techniques [18]. This has led to the definition of regions containing large-sequence polymorphisms (LSPs, greater than 10 bp) and single nucleotide polymorphisms (SNPs). The SNPs have been investigated in more detail in various clinical isolates, to draw up a global phylogeny of M. tuberculosis [17]. Other molecular methods, such as analyses of the deleted regions (deligotyping), variable numbers of tandem repeats (VNTR), mycobacterial interspersed repetitive unit (MIRU) and spoligotyping, have helped to unravel global genomic sequence diversity in this species [34-36]. These techniques are highly useful for epidemiological studies, but as far provide little information pertaining to genetic differences in terms of putative function. In contrast, studies of regions of deletion (RD) have proved useful for both global phylogeny and study of a loss of phenotype in both M. tuberculosis and in M. ulcerans [25,30,37].
Figure 2

Hypothetical phylogenetic links assessed by comparative analyses of ICDSs. In this schematic representation, the common ancestor gave rise to several branches of strains of the TB complex. Eighty-one frameshifts were acquired during the common evolution of M. bovis and M. tuberculosis. Since the separation of these species, M. bovis has acquired 51 frameshifts, while the branch leading to M. tuberculosis isolates has acquired 19 new frameshifts. Since separation of the isolates, M. tuberculosis H37Rv has acquired 12 new frameshifts and CDC1551 36 new frameshifts. Common and unique ICDSs are shown in dark and light gray, respectively. "*" these 8 ICDSs correspond to full-length ORF in M. tuberculosis 210 and in M. africanum GM041182. "**" 7 out of these 11 ICDSs correspond to full-length ORF in M. africanum GM041182 (Table 2).

Hypothetical phylogenetic links assessed by comparative analyses of ICDSs. In this schematic representation, the common ancestor gave rise to several branches of strains of the TB complex. Eighty-one frameshifts were acquired during the common evolution of M. bovis and M. tuberculosis. Since the separation of these species, M. bovis has acquired 51 frameshifts, while the branch leading to M. tuberculosis isolates has acquired 19 new frameshifts. Since separation of the isolates, M. tuberculosis H37Rv has acquired 12 new frameshifts and CDC1551 36 new frameshifts. Common and unique ICDSs are shown in dark and light gray, respectively. "*" these 8 ICDSs correspond to full-length ORF in M. tuberculosis 210 and in M. africanum GM041182. "**" 7 out of these 11 ICDSs correspond to full-length ORF in M. africanum GM041182 (Table 2). Frameshift acquisition generally leads to a loss of function, as shown in a number of published studies. Loss-of-function associated with the presence of a frameshift has been reported in both M. tuberculosis and M. bovis. For instance, ICDS0066 in M. tuberculosis H37Rv corresponds to a frameshift-containing gene encoding a polyketide synthase (pks1). This pks1 gene also contains a frameshift in M. tuberculosis CDC1551, resulting in two different ORFs: pks1 and pks15. In contrast, M. bovis and M. leprae carry a full-length functional pks1 gene [38]. The pks15/1 gene is now frequently used as a marker in epidemiological studies [39,40] and, interestingly, the pks gene contains no frameshift in the W-Beijing strains of M. tuberculosis [40], resulting in phenolglycolipid production in most cases [41]. Our analysis shows that the pks gene of M. africanum is also full-length suggesting that this species produces PGL. This observation suggests that these early strains are more closely related to M. bovis or to the last ancestor than other M. tuberculosis strains. Similarly, ICDS0067 in M. bovis corresponds to a putative frameshift-containing glycosyltransferase gene. The ortholog of this gene has no frameshift in the two strains of M. tuberculosis (Rv2958c and MT3034). Functional complementation of M. bovis BCG with the Rv2958c gene from M. tuberculosis leads to the accumulation of a new metabolite, the diglycosylated phenolglycolipid [42]. Some frameshift-containing genes have been studied experimentally in M. tuberculosis, without considering the possibility that these ORFs may well contain frameshift [43,44]. Mutation by homologous recombination has been achieved at the mntH and mmpL13 loci. In both cases, no detectable phenotype was associated with the mutation. Our data indicate that MmpL13 function should be investigated in a W-Beijing strain or in M. africanum. Another example that has not yet been studied is the pks3 and pks4 genes of M. tuberculosis H37Rv, which constitute a single ORF in CDC1551 and in M. bovis. This suggests that – like the pks1 and pks15 genes, which are pseudogenes in M. tuberculosis – the pks3 and pks4 genes are probably not functional in the H37Rv strain. It would therefore be pointless to investigate function in the H37Rv strain by creating mutants in pks3 and pks4 genes or by expressing constructs encoding the corresponding polypeptides. These examples from previous publications illustrate the major biological impact of frameshift acquisition. They demonstrate the importance of choosing the right strain or species for investigations of the function of a particular gene. However, it is not always possible to infer from the position of the frameshift whether the protein's activity will be affected. For instance, GlnA3, a glutamine synthetase generated from a frameshift-containing gene (Table 1), has been purified and shown to retain some activity [45]. It would be interesting to reframe these ORFs to test the impact of frameshift on protein function. On the other hand, it has been shown in silico that protein-coding sequences can be tolerant of frameshift translation events and thus that frameshit acquisition is an important reservoir for creating novel proteins [46]. Several of the truncated ORFs described here have also been detected in other studies, based on different analyses [17,18,40,47,48]. However, we present here a comprehensive comparative analysis of three related mycobacterial species and nine strains at the ICDS level. We found no association between ICDS formation and mutation in the promoter region of the corresponding ORF. This suggests that promoter mutation and inactivation of gene expression are not the principal source of ICDS formation and hence of pseudogene accumulation in the M. tuberculosis complex. It may also suggest that ICDS formation in these species is a recent process. We favor the hypothesis that ORFs are first split into two or three parts, inactivating their function, and are then subject to secondary mutation (in both the ICDS and the untranslated region), leading to irreversible pseudogene fixation. Consistent with this hypothesis, we have observed additional mutations in the vicinity of the original frameshift in some strains. We have shown that ICDS investigation can be used to infer the evolutionary relationships between strains and species. We provide here a list of more than 150 ICDSs that may be useful for characterizing TB strains and inferring phylogenetic relationships. The genome sequences of more than 10 TB strains will be released in the near future [26], and will, by no doubt, identify some new common and strain-specific ICDSs. Strain typing should clearly combine various markers, such as SNPs, MIRU, LSPs, RD, PE polymorphism [49] and ICDSs, in a matrix-based comparison from which the global phylogeny of TB isolates may be deduced. The polymorphism associated with these mutations is complementary to other methods [17,34,36,37,50], hence can be used to explore genetic diversity within a given species. Interestingly, in strain 210, from the W-Beijing family, eight of the 19 ICDSs common to the five M. tuberculosis strains tested (H37Rv, CDC1551, Haarlem, F11, C) corresponded to full-length ORFs, illustrating its earlier divergence. Some of these genes may be involved in virulence, as they concern functions such as host cell invasion (ICDS0011 of H37Rv), lipid biosynthesis (ICDS0066 and ICDS0031 of H37Rv) and intermediary metabolism (ICDS0085 of H37Rv). To test whether this trait was a particularity of the 210 strain or applied more generally to the W-Beijing phylum, we sequenced these eight ORF that were full-length in this strain in 21 other clinical isolates of the W-Beijing (Table 3). In all cases, the ORF were corresponding to a full-length ORF and not to an ICDS, demonstrating that these strains are genetically homogenous. The analysis performed using a strain of M. africanum showed that this species is characterized by an even fewer number of ICDSs common to M. tuberculosis H37Rv and CDC1551 than to the W-Beijing strains. More genome sequences of various strains and species are required for characterization of the genetic differences between the W-Beijing strains and other species of the M. tuberculosis complex. The alkA gene has been shown to contain frameshift in both M. bovis and some M. tuberculosis isolates from Central African Republic [48]. The presence of SNPs in the adjacent region of the non-sense mutation has led the authors to propose a convergent evolution. Although, it probably depends from genes to genes, we instead favor the hypothesis that the non-sense mutation was acquired by the ancestor and spread to the progeny with acquisition of subsequent mutations in the adjacent region. Epidemiologists should bear in mind that a small percentage of ICDSs may correspond to sequencing errors [4,11], generating artifactual genetic differences. Our analysis did not allow for the detection of mutations in which the frame of the coding sequence was conserved (synonymous mutation, in frame deletion), decreasing the total level of diversity observed. However, comparative ICDS analysis presents the major advantage of making it possible to associate the frameshift with a putative function and, possibly, with a particular phenotype. In conclusion, more attention should be paid to ICDS detection and comparison, particularly at the genomic scale.

Conclusion

We report here a comparative analysis of ICDSs in six isolates of M. tuberculosis, two of M. bovis and one of M. africanum. We show that these ICDSs can be classified as "common to all strains" or "strain-specific". Common ICDSs result from mutations acquired before the divergence of the species, whereas strain-specific ICDSs were acquired after this divergence. Comparative analyses of these ICDSs allow the definition of the molecular signature of a particular strain, phylogenetic lineage or species. We further show that ICDS formation is not correlated with the presence of a mutated promoter, and suggest that promoter extinction is not the main cause of pseudogene formation. The correlation between ICDSs, function and phenotypes could have important evolutionary implications and provides population geneticists with a list of targets, which could undergo selective pressure and thus alters relationships between the various lineages of M. tuberculosis strains and their host.

Methods

Databases

The genome sequences of M. tuberculosis H37Rv and CDC1551 and M. bovis AF2122/97 were taken from TIGR website [51]. The genome sequences of M. tuberculosis strains 210 or F11, C and Haarlem have been consulted on the TIGR or Broad Institute websites [52]. The genome sequence of M. bovis BCG 1173P2 has been taken from National Center for Biotechnology Information (NCBI) website (accession number, AM408590). The genome sequence of M. africanum GM041182 was consulted on line at the Sanger centre [53].

Detection of common ICDS

The genomic sequences of M. tuberculosis CDC1551, M. tuberculosis H37Rv, M. bovis AF2122/97 and M. bovis BCG 1173P2 have been scanned for couple of adjacent coding sequences that exhibit common homologs after translation. Such pair of coding sequences is considered as an ICDS if no paralogy relationship exists between the two coding sequences. The detailed description of ICDS detection is described in [3]. The ICDSs detected in each strain were then cross-compared by all-against-all blastn searches. For each ICDS, the best hits (E < 10-65) detected in the different strains were manually analysed to discriminate common and strain-specific ICDS.

Sequencing analysis

Chromosomal DNA of M. tuberculosis isolates from various lineages (Table 3) was used as a template for PCR amplification of the selected locus. The primers used to amplify and sequence were designed as previously described [3], using an optimized version of CADO4MI [54]. The nucleotide and deduced amino-acid sequences were analyzed with DNA Strider [55].

Promoter analysis

A region of 200 bp upstream the initiation codon was extracted for each of the 99 ICDSs specific to M. tuberculosis H37Rv, CDC1551 and M. bovis AF2122/97 (Additional files 2, 3 and 4). As a control group, 200 bp upstream the initiation codon was extracted for 99 genes (full-length) randomly selected from M. tuberculosis H37Rv. These 99 genes are full-length in M. tuberculosis H37Rv, CDC1551 and M. bovis AF2122/97. In each case (promoter to be tested and control group), the promoter regions of the 3 strains were aligned using ClustalW [56] and the sequence variation was recorded. The number of differences observed in the upstream region was statistically compared using the Chi2 test.

Statistical analysis

The statistical significance of the distribution of the frequency of sequence polymorphism observed in the upstream ICDS regions and upstream full-length regions, was tested using a Chi square test (X2). The chi square test is used to determine relationship between two distributions. The calculated values were obtained: X2: 1,367, df: 1, P value: 0.2423, hence the difference between 2 groups are not statistically significant (α < 0.05).

Abbreviations

ICDS, Interrupted CoDing Sequence. ORF, Open Reading Frame.

Authors' contributions

CD helped to carry out the bioinformatic studies, analysed the TB strains by sequencing and drafted the manuscript. EP carried out the bioinformatic studies and helped to draft the manuscript. DE analysed the TB strains by sequencing. EF helped to analyze the promoter regions. OP helped to draft the manuscript. PB participated in the analysis of the W-Beijing strains and help to write the manuscript. OL participated in the design of the study, carried out the bioinformatic studies and drafted the manuscript. JMR conceived the study, participated in its design and coordination and in finalizing of the manuscript. All authors read and approved the final manuscript. Click here for file Click here for file Click here for file Click here for file

Additional file 5

Nucleotide sequence differences of the upstream region (200 bp) of the A- strain specific ICDS B- the full-length genes (control group). Click here for file
  52 in total

1.  'DNA Strider': a 'C' program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers.

Authors:  C Marck
Journal:  Nucleic Acids Res       Date:  1988-03-11       Impact factor: 16.971

2.  Giant plasmid-encoded polyketide synthases produce the macrolide toxin of Mycobacterium ulcerans.

Authors:  Timothy P Stinear; Armand Mve-Obiang; Pamela L C Small; Wafa Frigui; Melinda J Pryor; Roland Brosch; Grant A Jenkin; Paul D R Johnson; John K Davies; Richard E Lee; Sarojini Adusumilli; Thierry Garnier; Stephen F Haydock; Peter F Leadlay; Stewart T Cole
Journal:  Proc Natl Acad Sci U S A       Date:  2004-01-21       Impact factor: 11.205

3.  The complete genome sequence of Mycobacterium bovis.

Authors:  Thierry Garnier; Karin Eiglmeier; Jean-Christophe Camus; Nadine Medina; Huma Mansoor; Melinda Pryor; Stephanie Duthoy; Sophie Grondin; Celine Lacroix; Christel Monsempe; Sylvie Simon; Barbara Harris; Rebecca Atkin; Jon Doggett; Rebecca Mayes; Lisa Keating; Paul R Wheeler; Julian Parkhill; Bart G Barrell; Stewart T Cole; Stephen V Gordon; R Glyn Hewinson
Journal:  Proc Natl Acad Sci U S A       Date:  2003-06-03       Impact factor: 11.205

4.  A glycolipid of hypervirulent tuberculosis strains that inhibits the innate immune response.

Authors:  Michael B Reed; Pilar Domenech; Claudia Manca; Hua Su; Amy K Barczak; Barry N Kreiswirth; Gilla Kaplan; Clifton E Barry
Journal:  Nature       Date:  2004-09-02       Impact factor: 49.962

5.  Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv.

Authors:  Jean-Christophe Camus; Melinda J Pryor; Claudine Médigue; Stewart T Cole
Journal:  Microbiology       Date:  2002-10       Impact factor: 2.777

6.  A marked difference in pathogenesis and immune response induced by different Mycobacterium tuberculosis genotypes.

Authors:  B López; D Aguilar; H Orozco; M Burger; C Espitia; V Ritacco; L Barrera; K Kremer; R Hernandez-Pando; K Huygen; D van Soolingen
Journal:  Clin Exp Immunol       Date:  2003-07       Impact factor: 4.330

7.  Characterization of three glycosyltransferases involved in the biosynthesis of the phenolic glycolipid antigens from the Mycobacterium tuberculosis complex.

Authors:  Esther Pérez; Patricia Constant; Anne Lemassu; Françoise Laval; Mamadou Daffé; Christophe Guilhot
Journal:  J Biol Chem       Date:  2004-08-03       Impact factor: 5.157

8.  Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes.

Authors:  Stéphanie Bocs; Antoine Danchin; Claudine Médigue
Journal:  BMC Bioinformatics       Date:  2002-02-05       Impact factor: 3.169

9.  Evolution of two distinct phylogenetic lineages of the emerging human pathogen Mycobacterium ulcerans.

Authors:  Michael Käser; Simona Rondini; Martin Naegeli; Tim Stinear; Francoise Portaels; Ulrich Certa; Gerd Pluschke
Journal:  BMC Evol Biol       Date:  2007-09-27       Impact factor: 3.260

10.  Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes.

Authors:  Yang Liu; Paul M Harrison; Victor Kunin; Mark Gerstein
Journal:  Genome Biol       Date:  2004-08-26       Impact factor: 13.583

View more
  7 in total

Review 1.  Myths and misconceptions: the origin and evolution of Mycobacterium tuberculosis.

Authors:  Noel H Smith; R Glyn Hewinson; Kristin Kremer; Roland Brosch; Stephen V Gordon
Journal:  Nat Rev Microbiol       Date:  2009-06-01       Impact factor: 60.633

2.  Deciphering the genetic bases of the structural diversity of phenolic glycolipids in strains of the Mycobacterium tuberculosis complex.

Authors:  Wladimir Malaga; Patricia Constant; Daniel Euphrasie; Angel Cataldi; Mamadou Daffé; Jean-Marc Reyrat; Christophe Guilhot
Journal:  J Biol Chem       Date:  2008-04-04       Impact factor: 5.157

3.  Significance of the identification in the Horn of Africa of an exceptionally deep branching Mycobacterium tuberculosis clade.

Authors:  Yann Blouin; Yolande Hauck; Charles Soler; Michel Fabre; Rithy Vong; Céline Dehan; Géraldine Cazajous; Pierre-Laurent Massoure; Philippe Kraemer; Akinbowale Jenkins; Eric Garnotel; Christine Pourcel; Gilles Vergnaud
Journal:  PLoS One       Date:  2012-12-27       Impact factor: 3.240

4.  The genome of Mycobacterium africanum West African 2 reveals a lineage-specific locus and genome erosion common to the M. tuberculosis complex.

Authors:  Stephen D Bentley; Iñaki Comas; Josephine M Bryant; Danielle Walker; Noel H Smith; Simon R Harris; Scott Thurston; Sebastien Gagneux; Jonathan Wood; Martin Antonio; Michael A Quail; Florian Gehre; Richard A Adegbola; Julian Parkhill; Bouke C de Jong
Journal:  PLoS Negl Trop Dis       Date:  2012-02-28

5.  Genomic analysis of smooth tubercle bacilli provides insights into ancestry and pathoadaptation of Mycobacterium tuberculosis.

Authors:  Philip Supply; Michael Marceau; Sophie Mangenot; David Roche; Carine Rouanet; Varun Khanna; Laleh Majlessi; Alexis Criscuolo; Julien Tap; Alexandre Pawlik; Laurence Fiette; Mickael Orgeur; Michel Fabre; Cécile Parmentier; Wafa Frigui; Roxane Simeone; Eva C Boritsch; Anne-Sophie Debrie; Eve Willery; Danielle Walker; Michael A Quail; Laurence Ma; Christiane Bouchier; Grégory Salvignol; Fadel Sayes; Alessandro Cascioferro; Torsten Seemann; Valérie Barbe; Camille Locht; Maria-Cristina Gutierrez; Claude Leclerc; Stephen D Bentley; Timothy P Stinear; Sylvain Brisse; Claudine Médigue; Julian Parkhill; Stéphane Cruveiller; Roland Brosch
Journal:  Nat Genet       Date:  2013-01-06       Impact factor: 38.330

6.  Genome-Wide Study of Drug Resistant Mycobacterium tuberculosis and Its Intra-Host Evolution during Treatment.

Authors:  Denis Lagutkin; Anna Panova; Anatoly Vinokurov; Alexandra Gracheva; Anastasia Samoilova; Irina Vasilyeva
Journal:  Microorganisms       Date:  2022-07-17

7.  A sister lineage of the Mycobacterium tuberculosis complex discovered in the African Great Lakes region.

Authors:  Jean Claude Semuto Ngabonziza; Chloé Loiseau; Michael Marceau; Agathe Jouet; Fabrizio Menardo; Oren Tzfadia; Rudy Antoine; Esdras Belamo Niyigena; Wim Mulders; Kristina Fissette; Maren Diels; Cyril Gaudin; Stéphanie Duthoy; Willy Ssengooba; Emmanuel André; Michel K Kaswa; Yves Mucyo Habimana; Daniela Brites; Dissou Affolabi; Jean Baptiste Mazarati; Bouke Catherine de Jong; Leen Rigouts; Sebastien Gagneux; Conor Joseph Meehan; Philip Supply
Journal:  Nat Commun       Date:  2020-06-09       Impact factor: 14.919

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.