Literature DB >> 12456266

High resolution, on-line identification of strains from the Mycobacterium tuberculosis complex based on tandem repeat typing.

Philippe Le Flèche1, Michel Fabre, France Denoeud, Jean-Louis Koeck, Gilles Vergnaud.   

Abstract

BACKGROUND: Currently available reference methods for the molecular epidemiology of the Mycobacterium tuberculosis complex either lack sensitivity or are still too tedious and slow for routine application. Recently, tandem repeat typing has emerged as a potential alternative. This report contributes to the development of tandem repeat typing for M. tuberculosis by summarising the existing data, developing additional markers, and setting up a freely accessible, fast, and easy to use, internet-based service for strain identification.
RESULTS: A collection of 21 VNTRs incorporating 13 previously described loci and 8 newly evaluated markers was used to genotype 90 strains from the M. tuberculosis complex (M. tuberculosis (64 strains), M. bovis (9 strains including 4 BCG representatives), M. africanum (17 strains)). Eighty-four different genotypes are defined. Clustering analysis shows that the M. africanum strains fall into three main groups, one of which is closer to the M. tuberculosis strains, and an other one is closer to the M. bovis strains. The resulting data has been made freely accessible over the internet http://bacterial-genotyping.igmors.u-psud.fr/bnserver to allow direct strain identification queries.
CONCLUSIONS: Tandem-repeat typing is a PCR-based assay which may prove to be a powerful complement to the existing epidemiological tools for the M. tuberculosis complex. The number of markers to type depends on the identification precision which is required, so that identification can be achieved quickly at low cost in terms of consumables, technical expertise and equipment.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12456266      PMCID: PMC140014          DOI: 10.1186/1471-2180-2-37

Source DB:  PubMed          Journal:  BMC Microbiol        ISSN: 1471-2180            Impact factor:   3.605


Background

The precise identification of bacterial pathogens at the strain level is essential for epidemiological purposes. Consequently, constant efforts are undertaken to develop easy to use, low cost and standardized methods which can eventually be applied routinely in a clinical laboratory. Newer developments are usually genetic methods based on PCR (Polymerase Chain Reaction) to type variations directly at the DNA level. The development of polymorphic markers is now further facilitated by the availability of whole genome sequences for bacterial genomes. Recently, it has been shown that tandem repeat (usually called minisatellites or VNTRs for Variable Number of Tandem Repeats) loci provide a source of very informative markers not only in humans where some are still in use for identification purposes (paternity analyses, forensics) but also in bacteria. Tandem repeats are easily identified from genome sequence data, the typing of tandem repeat length is relatively straight forward, and the resulting data can be easily coded and exchanged between laboratories independently of the technology used to measure PCR fragment sizes. Furthermore, the resolution of tandem repeats typing is cumulative, i.e. the inclusion of more markers in the typing assay can, when necessary, increase the identification resolution. However, the density of tandem repeats in bacterial genomes varies from species to species, and not all tandem repeats are polymorphic [1]. In addition, some tandem repeats are so unstable that they have no or little long-term epidemiological value [2]. This indicates that for each species under consideration, tandem repeats must be evaluated using representative collections of strains before they can be used. Tandem repeats for bacterial identification have already proved their utility for the typing of the highly monomorphic pathogens Bacillus anthracis, Yersinia pestis, [1] and M. tuberculosis. In this last case, the value of tandem repeat based identification was recognised very early [3]. The so-called DR (direct repeat) locus is a relatively large tandem repeat locus of unknown biological significance. The motif is 72 bp long, one half is highly conserved, whereas the other half (called the spacer element) is highly diverged. The spoligotyping method [4] takes advantage of these internal variations to distinguish the hundreds of different alleles at this locus, which have been reported in the M. tuberculosis complex among the thousands of strains typed so far [5]. Although it is quite powerful, with many advantages, spoligotyping suffers from a lack of resolution compared to the current gold-standard in M. tuberculosis genetic identification, IS6110 typing [6]. IS6110 typing is an RFLP (Restriction Fragment Length Polymorphism) method using the mobile element IS6110 as a probe. Strains with a low-copy number of IS6110 elements (such as most M. bovis strains) are poorly resolved by this method. The so-called PGRS (polymorphic GC-rich sequence) method is an other RFLP approach in which the probe used is a GC-rich tandem repeat. The polymorphisms which are scored at multiple loci simultaneously on the Southern blot are variations in the tandem repeats length (and not internal variations at a single locus as assayed by spoligotyping). The profiles generated are very informative, but in comparison with IS6110 typing, PGRS results are more difficult to score, because the intensity of the bands are highly variable (alleles with a small tandem array yield a lower hybridisation signal) [6]. Both PGRS and IS6110 typing are hindered by the requirement for relatively large amounts of high quality DNA which is an issue for slow-growing mycobacteria. More recently, and owing to the release of genome sequence data, the allele-length polymorphism of tandem repeat loci has been evaluated by PCR. Essentially three complementary sets of markers have been developed [7-9]. In the first report, exact tandem repeats (ETRs) were identified by searching the existing literature as well as early versions of the M. tuberculosis genome sequence data [7]. The resolution provided by this first set of five loci is lower than both IS6110 RFLP typing and spoligotyping according to a comparative study [6]. In the second report, a family of tandem repeats characterized by similar repeat units was identified by sequence similarity search in the genome sequence data. A set of 12 loci was selected (including two of the five ETR loci) and the resulting panel has a resolution close to IS6110 typing according to [10]. In the third report tandem repeats with highly conserved (>95%) motifs longer than 50 bp identified in the M. tuberculosis genome sequence have been investigated. Altogether, the currently available collection of polymorphic tandem repeats for the typing of M. tuberculosis comprises 27 loci (taking into account duplicates) (Table 1). Fifteen have a polymorphism index above 0.5.
Table 1

Polymorphic minisatellite markers for the M. tuberculosis complex

Locus name"MIRU" alias [8]"ETR" alias [7]"QUB" alias [9,11]Other aliasReferenceTR location on H37Rv genomeExpected length in H37Rv (copy number)Expected length in CDC1551 (copy number)Expected length in M bovis AF2122 (copy number)N° of strainsSize range observed (copy number)N° of alleles observedPolymorphism index
H37Rv_0024_18 bpMtub01This report24648328 (10)310 (9)310 (9)92274–328 bp (7–10)40.48
H37Rv_0079_9 bpMtub02This report79503230 (6)239 (7)239 (7)92221–275 bp (5–11)70.76
H37Rv_0154_53 bpMIRU2[8]154111508 (2)508 (2)508 (2)92455–561 bp (1–3)30.09
H37Rv_0424_51 bpMtub04This report424010269 (2.6)371 (4.6)269 (2.6)28218 – 320 (1.6–3.6)30.52
H37Rv_0531_15 bpMPTR-A[7]531430328 (16)328 (16)328 (16)48(15–17)30.23
H37Rv_0577_58 bpETR-C[8]577172346 (4)288 (3)404 (5)92230–404 bp (2–5)40.63
H37Rv_0580_77 bpMIRU4ETR-D[7]580546353 (3.3)330 (3)483 (5)92253–715 bp (2–8)70.35
H37Rv_0802_54 bpMIRU40[8]802194199 (1)415 (5)253 (2)92199–469 bp (1–6)50.71
H37Rv_0959_53 bpMIRU10[8]959868643 (3)750 (5)590 (2)92537–1014 bp (1–10)90.76
H37Rv_1121_15 bpMtub12This report1121658215 (4)230 (5)215 (4)92200–230 bp (3–5)30.19
H37Rv_1443_56 bpMtub16This report1443417291 (1)347 (2)347 (2)11291–515 (1–5)30.56
H37Rv_1451_57 bpQUB-1451c[9]1451778305 (3.8)305 (3.8)305 (3.8)56(2–4) (bovis)20.12
H37Rv_1612_21 bpQUB-23[11]1612529141 (5)162 (6)162 (6)20141–203 (5–8)30.18
H37Rv_1644_53 bpMIRU16[8]1644026671 (2)724 (3)671 (2)92618–777 bp (1–4)40.59
H37Rv_1895_57 bpQUB-1895[9]1895344319 (4)205 (2)319 (4)56(2–4) (bovis)30.35
H37Rv_1955_57 bpMtub21This report1955580206 (2)263 (3)263 (3)92149–491 bp (1–7)70.76
H37Rv_1982_78 bpQUB-18[11]1982873621 (5)777 (7)465 (3)24387–1167 (2–12)90.74
H37Rv_2059_77 bpMIRU20[8]2059429591 (2)591 (2)591 (2)53(1–2)20.29
H37Rv_2074_56 bp*Mtub24This report2074431805 (3.6)693 (1.6)693 (1.6)44637–749 (0.6–2.6)30.52
H37Rv_2163_a_69 bpQUB-11apUCD1[11]2163607305 (3)581 (7)788 (10)92305–1832 bp (3–26)150.88
H37Rv_2163_b_69 bpQUB-11bpUCD1[11]2163729412 (5)274 (3)343 (4)52136–826 (1–11)80.82
H37Rv_2165_75 pbETR-A[7]2165223397 (3)322 (2)847 (9)92322–847 bp (2–9)80.73
H37Rv_2347_57 bpMtub29This report2347393350 (4)292 (3)293 (3)92236–350 bp (2–4)30.55
H37Rv_2401_58 bpMtub30This report2401815319 (2)435 (4)435 (4)92261–435 bp (1–4)30.55
H37Rv_2461_57 bpETR-B[7]2461279292 (3)235 (2)406 (5)92178–406 bp (1–5)60.51
H37Rv_2531_53 bpMIRU23[8]2531560873 (6)820 (5)767 (4)92608–979 bp (1–8)70.60
H37RV_2387_54 bpMIRU24[8]2684427447 (1)447 (1)447 (1)53(1–2)20.24
H37Rv_2990_55 bpMtub31This report2990582257 (2)312 (3)312 (3)49202–312 bp (1–3)30.15
H37Rv_2996_51 bpMIRU26[8]2996002614 (3)716 (5)716 (5)57563–818 (2–7)50.61
H37Rv_3006_53 bpMIRU27QUB-5[8]3006875657 (3)657 (3)657 (3)92551–710 bp (1–4)40.25
H37Rv_3171_54 bpMtub34This report3171465279 (3)225 (2)279 (3)11171–225 (1–2)20.3
H37Rv_3192_53 bpMIRU31ETR-E[7]3192168651 (3)651 (3)651 (3)92545–810 bp (1–6)60.67
H37Rv_3232_56 bpQUB-3232[9]3232649591 (3)760 (6)703 (5)56(4–22) (bovis)100.65
H37Rv_3239_79 bpETR-F[7]3239469476 (2.8)476 (2.8)421 (2.1)48(1–3)30.49
H37Rv_3336_59 bpQUB-3336[9]3336499407 (5)466 (6)289 (3)56(3–21) (bovis)80.55
H37Rv_3663_63 bp**Mtub38This report3663751373 (2.7)310 (1.7)310 (1.7)92247–400 bp (0.7–3.1)50.35
H37Rv_3690_58 bp*Mtub39This report3690947341 (2.6)*397 (3.6)341 (2.6)92247–1349 bp (1–20)*110.64
H37Rv_4052_111 bpQUB-26[11]4052969708 (5)819 (6)597 (4)100(4–14) (bovis)50.41
H37Rv_4156_59 bpQUB-4156c[9]4156797224 (2)283 (3)165 (1)52106–283 (0–3)40.69
H37Rv_4348_53 bpMIRU39[8]4348401646 (2)646 (2)646 (2)92593–699 bp (1–3)30.31

The markers are listed according to their position in the H37Rv genome. The proposed reference name includes the size of the repeat unit. The twenty-one markers used in the present report are italicised and underlined. Alias names identified in the literature are indicated. QUB11a, QUB11b, and ETR-A (position 2163–2165) are located within the gene PPE34 [19]. The expected length assumes that the primers listed in Table 2 were used. * : the observed size (Table 3) is not the expected size. ** : the repeat unit is not easily defined, size variations do not correspond to a multiple of 63 base-pairs. Polymorphism index is calculated as 1 - ∑ (allele frequency)2 among the 86 distinct genotypes. The values are deduced from the original report in nine cases (indicated by the absence of size range in the "size range" column). In some instances [9,11], the population of strains used is biased (M. bovis strains).

Polymorphic minisatellite markers for the M. tuberculosis complex The markers are listed according to their position in the H37Rv genome. The proposed reference name includes the size of the repeat unit. The twenty-one markers used in the present report are italicised and underlined. Alias names identified in the literature are indicated. QUB11a, QUB11b, and ETR-A (position 2163–2165) are located within the gene PPE34 [19]. The expected length assumes that the primers listed in Table 2 were used. * : the observed size (Table 3) is not the expected size. ** : the repeat unit is not easily defined, size variations do not correspond to a multiple of 63 base-pairs. Polymorphism index is calculated as 1 - ∑ (allele frequency)2 among the 86 distinct genotypes. The values are deduced from the original report in nine cases (indicated by the absence of size range in the "size range" column). In some instances [9,11], the population of strains used is biased (M. bovis strains).
Table 2

Set of primers for MLVA analysis

Locus nameforward primerreverse primer
H37Rv_0024_18 bpGAGAAACAGGAGGGCGTTGTATTACGACGACCGCTATGC
H37Rv_0079_9 bpCGTGCACAGTTGGGTGTTTATTCGTTCAGGAACTCCAAGG
H37Rv_0154_53 bpTGGACTTGCAGCAATGGACCAACTTACTCGGACGCCGGCTCAAAAT
H37Rv_0424_51 bpGTCCAGGTTGCAAGAGATGGGGCATCCTCAACAACGGTAG
H37Rv_0531_15 bpGGTTACCACTTCGATGCGTCTGCGAGCCGCCGAAACCCATC
H37Rv_0577_58 bp*GACTTCAATGCGTTGTTGGA*GTCTTGACCTCCACGAGTGC*
H37Rv_0580_77 bpCAGGTCACAACGAGAGGAAGAGCGCGGATCGGCCAGCGACTCCTC
H37Rv_0802_54 bp*AAGCGCAAGAGCACCAAG*GTGGGCTTGTACTTGCGAAT*
H37Rv_0959_53 bpGTTCTTGACCAACTGCAGTCGTCCGCCACCTTGGTGATCAGCTACCT
H37Rv_1121_15 bpCTCCCACACCCAGGACACCGGCCTACCCAACATTCC
H37Rv_1443_56 bpGGTAATCCTGGTCGCTTGTCACCCAAATTGCCCTGGTC
H37Rv_1451_57 bpGGTAGCCGTCGTCGAGAAGCCGCCACCACCGCACTGGC
H37Rv_1612_21 bpGCTGCACCGGTGCCCATCCACCGGAGCCGGAACGGC
H37Rv_1644_53 bpTCGGTGATCGGGTCCAGTCCAAGTACCCGTCGTGCAGCCCTGGTAC
H37Rv_1895_57 bpGGTGCACGGCCTCGGCTCCAAGCCCCGCCGCCAATCAA
H37Rv_1955_57 bpAGATCCCAGTTGTCGTCGTCCAACATCGCCTGGTTCTGTA
H37Rv_1982_78 bp*ATCGTCAGCTGCGGAATAGT*AATACCGGGGATATCGGTTC*
H37Rv_2059_77 bpTCGGAGAGATGCCCTTCGAGTTAGGGAGACCGCGACCAGGTACTTGTA
H37Rv_2074_56 bpAAATTCAAAGAGTTTCTCGACAGTGGATCTTGAGAACCAAGATGTCCTT
H37Rv_2163_a_69 bpCCCATCCCGCTTAGCACATTCGTATTCAGGGGGGATCCGGGA
H37Rv_2163_b_69 bpCGTAAGGGGGATGCGGGAAATAGGCGAAGTGAATGGTGGCAT
H37Rv_2165_75pb*ATTTCGATCGGGATGTTGAT*TCGGTCCCATCACCTTCTTA*
H37Rv_2347_57 bpAACCCATGTCAGCCAGGTTAATGATGGCACACCGAAGAAC
H37Rv_2401_58 bpAGTCACCTTTCCTACCACTCGTAACATTAGTAGGGCACTAGCACCTCAAG
H37Rv_2461_57 bpGCGAACACCAGGACAGCATCATGGGCATGCCGGTGATCGAGTGG
H37Rv_2531_53 bpCAGCGAAACGAACTGTGCTATCACCGTGTCCGAGCAGAAAAGGGTAT
H37RV_2387_54 bpCGACCAAGATGTGCAGGAATACATGGGCGAGTTGAGCTCACAGAA
H37Rv_2990_55 bpGTGACGTTTACCGTGCTCTATTTCGTCGTCGGACAGTTCTAGCTTT
H37Rv_2996_51 bpCCCGCCTTCGAAACGTCGCTTGGACATAGGCGACCAGGCGAATA
H37Rv_3006_53 bpTCGAAAGCCTCTGCGTGCCAGTAAGCGATGTGAGCGTGCCACTCAA
H37Rv_3171_54 bpGCAGATAACCCGCAGGAATAGGAGAGGATACGTGGATTTGAG
H37Rv_3192_53 bp*ACTGATTGGCTTCATACGGCTTTA*GTGCCGACGTGGTCTTGAT*
H37Rv_3232_56 bpCAGACCCGGCGTCATCAACCCAAGGGCGGCATTGTGTT
H37Rv_3239_79 bpCTCGGTGATGGTCCGGCCGGTCACGGAAGTGCTCGACAACGCCATGCC
H37Rv_3336_59 bpATCCCCGCGGTACCCATCGCCAGCGGTGTCGACTATCC
H37Rv_3663_63 bpGCCCAAAAAGCATGGGAACGTGCCCCTGGTTGTCCCCGCAGTATCTC
H37Rv_3690_58 bpAATCACGGTAACTTGGGTTGTTTGATGCATGTTCGACCCGTAG
H37Rv_4052_111 bpAACGCTCAGCTGTCGGATGGCCAGGTCCTTCCCGAT
H37Rv_4156_59 bp*TGGTCGCTACGCATCGTGTCGGCCCGT*TACCACCCGGGCAGTTTAC*
H37Rv_4348_53 bpCGCATCGACAAACTGGAGCCAAACCGGAAACGTCTACGCCCCACACAT

* : the primers indicated are not the primers used in the princeps publication, but were designed for the present study, usually in order to reduce the size of the PCR product and consequently to improve allele size identification.

Table 3

Genotype data for 21 loci and 92 strains (including CDC1551 and AF2122/97)

strain id.species247915457758080295911211644195521632165234724012461253130063192366336904348
percy196bcg911253.322431952254331.722
percy197bcg911253.322431652254331.722
percy142bcg911253.3224311152254331.722
percy62bcg911253.3224311152254331.722
percy43abovis910254224311053454131.722
percy53bovis99254224311153424251.722
percy184bovis9823422423663444311.732
percy55bovis9825422433663444331.732
AF2122bovis97255224231083454331.72
percy54bovis98254124231093454331.722
percy61africanum97254254231043444351.742
percy119africanum98253174431023454361.712
percy57africanum962531104241073454351.152
cipt950052africanum97253244441062444351.732
percy18africanum98253254441063444351.762
cipt960340africanum9524314442923424421.742
percy59africanum9624314442933424421.732
percy16africanum9524314432943424421.732
percy17africanum9524314442843424421.732
percy99aafricanum9622314442943224421.712
percy13africanum9524314432943444321.722
percy58africanum9524314442943224321.742
percy56africanum9524314342943422421.732
percy60africanum9524314342943422421.732
percy27africanum99223343141044225351.733
percy7africanum99223343341044225351.733
percy91africanum99223343341044225341.733
percy122beijing tuberculosis96243134361144425130.732
cipt20001272beijing tuberculosis91124333445644425350.732
cipt991053beijing tuberculosis91124332435944425350.732
cipt990590beijing tuberculosis91124232435944423350.733
cipt971135beijing tuberculosis91124333434944425320.733
percy248tuberculosis911243333311444425350.733
percy164tuberculosis96243334351044425350.733
percy170tuberculosis98248263251073245351.743
percy211tuberculosis98243334251173245351.743
percy189atuberculosis9924434436533244351.773
percy128tuberculosis9824443436663118361.743
percy210tuberculosis98246444372473116341.743
percy172tuberculosis910233345331034425341.732
percy231tuberculosis910233244331134425331.732
CDC1551tuberculosis9723355533723425331.72
percy155tuberculosis98233354321034425331.732
percy44tuberculosis9823335432834425331.732
percy216tuberculosis8823335432734425331.732
percy6btuberculosis9823345434734425331.732
percy239tuberculosis98243255322434425331.732
percy208tuberculosis98233354331134425331.752
percy232tuberculosis98233354331134425331.732
percy222tuberculosis98233354231034425431.732
percy237tuberculosis9823335433622425331.732
percy42tuberculosis9823325433433425331.732
percy27btuberculosis9823363434734425331.733
percy28btuberculosis9823363434734425341.733
percy169tuberculosis9823335432532223331.732
percy228tuberculosis98233354332432423331.732
percy219tuberculosis9923335432932423331.762
percy84tuberculosis98233344221132423331.732
percy256tuberculosis96145374441234234331.712
percy259tuberculosis106243215312632235121.731
percy11tuberculosis10624333433644245331.722
percy43btuberculosis10624313433644225331.7202
percy31tuberculosis10624333423644225331.7172
percy18btuberculosis10624333433644225331.732
percy201tuberculosis10624333433644225331.732
percy35tuberculosis10624333433644225331.732
percy39tuberculosis10624333433644225331.732
percy250tuberculosis10624333433642225331.732
percy40tuberculosis10624333433644225331.72
percy33tuberculosis106243334322544225351.742
percy20tuberculosis10634423431444225321.732
percy16btuberculosis10623333433243421.75331.732
percy234tuberculosis10623333433253421.75331.722
percy37tuberculosis106253334432434225331.742
percy249tuberculosis10643334232544225331.722
percy7btuberculosis106243334232534225331.792
percy41tuberculosis106243434232534235331.7102
percy165tuberculosis106243334221034225331.732
percy230tuberculosis10624342422634225331.732
percy245tuberculosis106223224221134225331.732
percy220tuberculosis85243224222424225332.732
percy215tuberculosis10623323422524225332.732
percy33btuberculosis10623323423634225321.732
percy238tuberculosis10623323412634221343.132
percy217tuberculosis106243434322434226342.732
percy29btuberculosis106243434312534225312.732
percy240tuberculosis10624343412424225332.752
H37Rvtuberculosis106243.313422334236332.752
percy221tuberculosis10524314423824116141.122
percy241tuberculosis10524334423624226331.712
percy236tuberculosis75143634332324126321.712
percy244tuberculosis10514342433824126321.732

Allele sizes were converted to number of repeats according to the correspondence indicated in Table 1. In some instances, decimal values are used, reflecting the existence of alleles with intermediate size. The markers are named and listed according to their position on the genome (Table 1). The strains are listed according to their position in the clustering analysis (Figure 1). M. tuberculosis CDC1551 and M. bovis AF2122/97 are included based on the predicted allele sizes (Table 1) with the exception of locus H37Rv_3690 (disagreement between observed and expected size for H37Rv at this locus).

This collection of markers should already provide a typing resolution comparable to the current reference methods. Given that not all tandem repeats present in M. tuberculosis have been evaluated for polymorphism, it is likely that the typing resolution of minisatellites could further be improved. Eventually, normalisation work will have to be done in order to promote the use of tandem repeats. A number of the loci analysed are known under different names in different studies, (for instance, ETRD [7] is also known as MIRU4 in [10]; and VNTR 0580 in [11]) and the coding (number of motifs in an allele) of alleles can also be different in different studies, for reasons explained in [11]. This is due in part to the fact that the number of repeats is not necessarily an integer value (Table 1). Furthermore, because the repeats in an array are not necessarily exact repeats, there can be ambiguities in the definition of the first and last base pair of the array. Finally, in addition to length variations due to the addition or deletion of an exact number of units, microdeletions or insertions within some repeat units are sometimes observed (MIRU4 is one such instance [12]). One purpose of the present report is to contribute to the development of Multiple Loci VNTR Analysis (MVLA) through the evaluation of new markers and the setting up of an on-line identification tool for the M. tuberculosis complex which can be queried very easily with the user's personal data. In the present report, we first take advantage of the availability of genome sequence from two M. tuberculosis strains to complement the current collection of polymorphic tandem repeat markers. We identified in silico tandem repeats showing a different length in the two strains using the previously described tandem repeat database [1]. Thirteen loci with a different predicted length in the two genomes and which have not been previously investigated have been tested for polymorphism and ease of typing. Eight among the 13 polymorphic loci were used together with 13 among the previously described markers to genotype a collection of different M. tuberculosis complex strains. The data produced clusters the strains as suggested by morphological observations and biochemical analyses. The resulting data can be queried from a dedicated web page .

Results

Tandem repeats predicted to be of a different size in H37Rv and CDC1551

The size of tandem repeats in the two M. tuberculosis strains sequenced to date, H37Rv and CDC1551, was compared using the tandem repeat database . Fifty-one of the tandem repeats identified in CDC1551 have repeat units longer than 9 base-pairs and a predicted overall size which differs from the H37Rv homolog estimate by at least 9 base-pairs. Seventeen have an expected product size above one kilobase. They include the DR locus and members of the family of PGRS sequences [13] and were not investigated further. Eighteen have been analyzed in previous investigations [7-9,11]. Three produced multiband patterns or inconsistent results. The results obtained for the remaining 13 loci together with the description of the 18 previously described loci are summarized in Table 1. In addition, Table 1 includes nine markers which are not polymorphic between H37Rv and CDC1551 but have already been quoted in the literature. Each locus is designated by its position (expressed in kilobases) on the H37Rv genome and by the repeat unit length as defined by the Tandem Repeat Finder software and indicated in the Tandem Repeat Database . All thirteen newly evaluated loci are polymorphic as predicted. In two cases (Table 1) the expected product size is not the observed size. The expected size has not been observed in the collection of strains used here, which suggests that the incorrect prediction is due to an artifact along the sequencing process. Eight loci among the thirteen have polymorphism indexes above 0.50 (two are above 0.7). The vast majority of the repeats units are more than 50 bp long (Table 1) which makes them easy to assay by ordinary agarose gel electrophoresis when using the primer pairs indicated in Table 2. In one instance however (H37Rv_3663_63 bp) the PCR size products clearly do not differ by a perfect number of (63 bp) repeat units (Table 1). Set of primers for MLVA analysis * : the primers indicated are not the primers used in the princeps publication, but were designed for the present study, usually in order to reduce the size of the PCR product and consequently to improve allele size identification.

Typing of strains and clustering analysis

The forty loci listed in Table 1 were used to genotype a collection of 90 strains from the M. tuberculosis complex, using the primers listed in Table 2. In our hands, some of the markers did not prove to be sufficiently robust for easy and reproducible typing in the conditions used here. On this basis, we have selected a collection of 21 markers (comprising thirteen previously described markers and eight among the new loci evaluated). The 21 markers used are italicised and underlined in Table 1 and 2. After analysis of the images using Bionumerics 3.0, and conversion of allele sizes in copy numbers of motifs in the tandem arrays, clustering analysis was done using the categorical and Ward parameters. The results of the clustering analysis are shown in Figure 1. The genotyping data from strains M. tuberculosis CDC1551 and M. bovis AF2122/97 was deduced (Table 1) from the sequence data and included in the analysis. Six major groups are defined (Figure 1). Group I contains the M. bovis strains and 5 of the M. africanum strains. Group II is composed of nine M. africanum strains. The third group includes three M. africanum strains and seven M. tuberculosis strains. Interestingly, five of these strains have been independently identified as representing the Beijing type [14] (the last two have not been tested). The last three groups comprise the vast majority of the M. tuberculosis strains. M. africanum strains which are negative for nitrate reduction (Africanum I type [15]) are among the first two groups, closer to the M. bovis strains as previously observed [16,17]. In contrast, the three M. africanum strains which are positive for nitrate reduction are in the third group, closer to M. tuberculosis strains. In order to facilitate the comparison with earlier investigations [16,17], Figure 1 displays the genotypes for the five ETR markers, extracted from the full data presented in Table 3. Group I in Figure 1 is reminiscent of group A in [17] and group A1 in [18]. Group II in Figure 1 is reminiscent of group B in [17] and group A2 in [18] which are both characterized by the 42432 ETR pattern.
Figure 1

Dendrogram deduced from the clustering analysis of the 92 strains (including CDC1551 and AF2122/97) . The first column from the left identify the strains. The second column indicates the species (Red : M. bovis strains; green : M. africanum strains; yellow : M. tuberculosis strains known to be of the Beijing type and indicated "beijing tuberculosis"; blue : other M. tuberculosis strains). The third column indicates the geographic origin of the strain. The fourth column indicates the ETR pattern (ETR-A to ETR-E) extracted from the full data presented in Table 3. The last four columns indicate, from left to right, the result of the niacine production, nitrate reductase, TCH susceptibility and Lebek tests (0, negative ; 1, positive) when available.

Dendrogram deduced from the clustering analysis of the 92 strains (including CDC1551 and AF2122/97) . The first column from the left identify the strains. The second column indicates the species (Red : M. bovis strains; green : M. africanum strains; yellow : M. tuberculosis strains known to be of the Beijing type and indicated "beijing tuberculosis"; blue : other M. tuberculosis strains). The third column indicates the geographic origin of the strain. The fourth column indicates the ETR pattern (ETR-A to ETR-E) extracted from the full data presented in Table 3. The last four columns indicate, from left to right, the result of the niacine production, nitrate reductase, TCH susceptibility and Lebek tests (0, negative ; 1, positive) when available. Genotype data for 21 loci and 92 strains (including CDC1551 and AF2122/97) Allele sizes were converted to number of repeats according to the correspondence indicated in Table 1. In some instances, decimal values are used, reflecting the existence of alleles with intermediate size. The markers are named and listed according to their position on the genome (Table 1). The strains are listed according to their position in the clustering analysis (Figure 1). M. tuberculosis CDC1551 and M. bovis AF2122/97 are included based on the predicted allele sizes (Table 1) with the exception of locus H37Rv_3690 (disagreement between observed and expected size for H37Rv at this locus). The ETR panel alone discriminates 44 genotypes (instead of 84 with the panel of 21 loci; 86 genotypes when including the CDC1551 and AF2122/97 data, Figure 1) and is not sufficient to clearly separate the M. africanum strains from the M. tuberculosis strains (analysis not shown) as can be achieved using the 21 loci.

Internet-based identifications

The genotyping data presented in Table 3 can be queried directly via an internet service . Figure 2 provides a brief description of the current M. tuberculosis query page (likely to evolve as updates are made). For each locus, allele sizes can be selected among a list of possibilities (observed sizes). Alternatively, more experienced users will go directly to a "copy-paste" page using the appropriate format. The results of the query indicate a similarity score and include links to the complete data for each strain listed. Help files are available, including a link to updated versions of Figure 1.
Figure 2

Internet database interrogation page . The query page can be accessed via . The home page (not shown) includes a link to help files (and data updates information), and links to individual species query pages. Currently, identification pages are available for Y. pestis, B. anthracis (based on the data published in [1] and some additional unpublished data) and M. tuberculosis. Figure 2 shows the current M. tuberculosis query page. For each marker, allele sizes can be selected among the list of observed sizes. Allele sizes are indicated either as number of motifs, or as fragment sizes, assuming that the primers used are the primers listed in Table 2. The allele size listed in green corresponds to the H37RV control strain allele. More experienced users can go directly to a page on which data (expressed in base-pairs or in repeat unit number) can be directly pasted using the appropriate format.

Internet database interrogation page . The query page can be accessed via . The home page (not shown) includes a link to help files (and data updates information), and links to individual species query pages. Currently, identification pages are available for Y. pestis, B. anthracis (based on the data published in [1] and some additional unpublished data) and M. tuberculosis. Figure 2 shows the current M. tuberculosis query page. For each marker, allele sizes can be selected among the list of observed sizes. Allele sizes are indicated either as number of motifs, or as fragment sizes, assuming that the primers used are the primers listed in Table 2. The allele size listed in green corresponds to the H37RV control strain allele. More experienced users can go directly to a page on which data (expressed in base-pairs or in repeat unit number) can be directly pasted using the appropriate format.

Testing the reproducibility of the approach

In order to test the reproducibility of the approach, ten blinded-coded control samples were typed. Figure 3 shows the typing of two markers, H37Rv_0802_54 bp (left, 54 bp unit; H37Rv allele : 1 unit, 199 bp PCR product) and H37Rv_1955_57 bp (right, 57 bp unit; H37Rv allele : 2 units, 206 bp PCR product). The number of units in each allele can be unambiguously deduced by comparison with the H37Rv control lanes and the 100 base-pairs ladder size marker. All ten unknown strains were correctly identified using the internet base service described above.
Figure 3

Set-up of the genotyping on agarose gels . The figure illustrates the usual setup for the running of pcr products on agarose gels. Twelve DNA samples (including two "H37Rv" control lanes) are typed at two loci. A 100 bp ladder size marker lane (L) flanks both sides of each group of 6 PCR products. The experiment shown is part of a reproducibility test. The ten blinded-coded samples are numbered from one to ten (percy59, percy55, percy40, percy189a, percy122, percy33, percy28b, percy33b, percy31, percy53). The number of units is easily deduced from the pattern observed, the largest alleles contain six copies of the repeat unit.

Set-up of the genotyping on agarose gels . The figure illustrates the usual setup for the running of pcr products on agarose gels. Twelve DNA samples (including two "H37Rv" control lanes) are typed at two loci. A 100 bp ladder size marker lane (L) flanks both sides of each group of 6 PCR products. The experiment shown is part of a reproducibility test. The ten blinded-coded samples are numbered from one to ten (percy59, percy55, percy40, percy189a, percy122, percy33, percy28b, percy33b, percy31, percy53). The number of units is easily deduced from the pattern observed, the largest alleles contain six copies of the repeat unit.

Discussion

The list of 40 markers given in Table 1 is close to representing the complete collection of tandem repeats of interest for MLVA typing in M. tuberculosis. It includes all loci with a different predicted size in H37Rv and CDC1551 and which are amenable to routine PCR typing. Nine additional loci which have been quoted in published reports are also included even if they do not fulfill this criteria. Clustering analysis (Figure 1) shows that the two strains CDC1551 and H37Rv (Figure 1) are relatively distant within the M. tuberculosis species. This would predict that tandem repeats of identical size in the two strains are likely to be poorly informative across the complex. However, this appears not to be absolutely true, since for instance, ETR-E (H37Rv_3192_53 bp) happens to have the same size in H37Rv, CDC1551 and even AF2122/97 (Table 1) in spite of its very high polymorphism index (0.69, Table 1). Consequently, the few additional loci, not explored here, which are of equal size in H37Rv and CDC1551, but differ with the predicted size for M. bovis strain AF2122/97 might also prove to be of interest. As can be seen in Table 1, most repeat units are more than 50 bp long and allele sizes rarely exceed 1000 bp. As a result, the precision which can be achieved by ordinary agarose gel electrophoresis is sufficient to estimate the number of units in an allele. The selection of 21 markers proposed here was tested specifically in order to be easily assayed using this low-cost technological approach. Although a database system is necessary to efficiently manage a genotyping project with a high number of markers and strains, the identification of up to a few strains per day in a clinical setting for instance requires no sophisticated equipment nor costly consumables. Genotypes can be scored by visual analysis of the gel images, and a subset of the collection of available markers can be chosen for routine identification purposes. The data can then be analysed using the site described in Figure 2. The role of tandem repeats in the M. tuberculosis genome is largely unknown. Twenty-one of the loci listed in Table 1 have repeat units which are a multiple of three base-pairs. The majority (fifteen) falls within putative genes, often of unknown function, such as the PPE family of genes [19]. The most remarkable instance is probably PPE34 at position 2163–2165 of the genome (Rv1917c in ) which contains three minisatellites [20] (Table 1, Qub11a, Qub11b, ETR-A). The present study includes 17 M. africanum strains. All strains have been identified as such independently, based on morphological features of the colonies grown on Lowenstein-Jensen medium, and biochemical analyses. M. africanum has long since been recognized as showing an extensive phenotypic heterogeneity [21], suggesting that M. africanum could display a phenotypic continuum between M. tuberculosis and M. bovis. This was recently supported by the study of deletion events distinguishing the H37Rv M. tuberculosis strain and the BCG M. bovis strain [22] and suggesting that M. bovis is the most recent member of the M. tuberculosis complex. The analysis of deletion events in the M. africanum strains investigated showed that West African strains fall into two groups, clearly distinguished from the M. tuberculosis strains. In contrast, no deletion event distinguished East African M. africanum strains from M. tuberculosis strains. The present study includes three Africanum type II strains (positive nitrate reductase test). All three originate from East Africa (Djibouti). Although the MLVA analysis presented here does confirm that they are very close to M. tuberculosis strains, they are clearly distinct, at least within the collection of strains evaluated. Interestingly, they appear to be closest to the Beijing type of M. tuberculosis strains (Figure 1, Group III, strains percy7, percy27 and percy91).

Conclusions

In its present form, the database should be considered as preliminary. More strains must be typed in order to provide a continuous and robust coverage of the M. tuberculosis complex, and the clustering analysis presented in Figure 1 should be considered as provisional. If the MLVA approach is considered to be of use by the community, and given that the associated data is highly portable, then it should be relatively easy, through collaborative efforts, to significantly expand the available data. It is hoped that this data will constitute an easy-to-use high-resolution classification resource which will then help address medical and epidemiological issues regarding the M. tuberculosis complex.

Methods

Strains and DNA preparation

Identification of mycobacteria used conventional morphological and biochemical tests as previously described [23]. In particular, M. tuberculosis, M. africanum and M. bovis were distinguished according to their morphology on Lowenstein-Jensen plates. M. tuberculosis strains are eugonic. The dysgonic M. africanum strains colonies are rough and flat. The dysgonic M. bovis colonies are smooth, hemispheric and white. Biochemical analyses included niacin production, nitrate reduction, TCH (thiophene-2-carboxylic acid hydrazide) sensitivity tests and growth characteristics on Lebek medium. DNA for PCR analysis was prepared using a simple thermolysis procedure. Briefly, a few colonies were resuspended in 1 ml water, and incubated at 95°C for 30 minutes. The tube was then centrifuged and the supernatant was recovered.

Identification of tandem repeats

The tandem repeats database described in [1] and accessible at was used to identify tandem repeats with a predicted size which differs between the two strains H37Rv [24] and CDC1551 [19]. The database uses the Tandem Repeat Finder software [25] to identify tandem repeats in bacterial genomes. Predicted PCR products size in M. bovis AF2122/97 was deduced using the M. bovis blast server at .

Minisatellite PCR amplification and genotyping

PCR reactions were performed in 15 μl containing approximately 1 ng of DNA (2 μl of the thermolysate), 1× PCR buffer, 1 unit of Taq DNA polymerase, 200 μM of each dNTP, 0.3 μM of each flanking primer. The Taq DNA polymerase was obtained from Qbiogen and used as recommended by the manufacturer. PCR reactions were run on a MJResearch PTC200 thermocycler. An initial denaturation at 94°C for five minutes was followed by 40 cycles of denaturation at 94°C for 1 minute, annealing at 62°C for one minute (except for H37Rv_0079 and H37Rv_2387 : annealing temperature 55°C), elongation at 72°C for 90 seconds, followed by a final extension step of 10 minutes at 72°C. Five microliters of the PCR products were run on standard 2% agarose gel (Qbiogen) in 0.5 × TBE buffer at a voltage of 10 V/cm (10× TBE is 890 mM Tris base, 890 mM boric acid, 20 mM EDTA, pH 8.3). Samples were manipulated and dispensed (including gel loading) with multi-channel electronic pipettes (Biohit) in order to reduce the risk of errors. Gel length of 20 cm were used. Gels were stained with ethidium bromide, visualized under UV light, and photographed. Allele sizes were estimated using a 100 bp ladder (MBI Fermentas or Biorad) as size marker. Each 50 wells gel contained 8 regularly spaced size-marker lanes. In addition, strain H37Rv was included as a control for size assignments (one H37Rv control for each set of five DNA samples; see Figure 3). Gel images and resulting data were managed using the Bionumerics software package (version 3.0, Applied-Maths, Belgium).

Data analysis and on-line access

Band size estimates were exported from Bionumerics and converted to number of units. The resulting data was imported in Bionumerics as an opened character data set. Clustering analysis of genotyping data was performed using the Bionumerics package (categorical and Ward). The use of the categorical coefficient implies that the character states are considered as unordered. The same weight is given to a large vs. a small number of differences in the number of repeats at a locus. Among the many possibilities available for clustering analysis, the categorical and Ward combination were empirically selected for their ability to cluster the strains in almost perfect agreement with the microbiological analysis (Figure 1). The web-page site running identifications was developed using the BNserver application (version 3.0, Applied-Maths, Belgium).

Authors' contributions

PLF has compiled and evaluated previously described markers, evaluated new markers, and genotyped the strains. FD has analyzed the H37Rv, CDC1551 and AF2122/97 sequence data to identify tandem repeats, and is the curator of the tandem repeat database in which known data on individual markers is available. FD and GV have designed and set-up the internet strain identification service. GV conceived the study and participated in its design and coordination. MF and JLK have isolated and characterized the strains at the biochemical level, and also prepared PCR-quality DNA. All authors contributed to the writing of the paper and approved the final manuscript.
  24 in total

Review 1.  Proposed minimal standards for the genus Mycobacterium and for description of new slowly growing Mycobacterium species.

Authors:  V V Lévy-Frébault; F Portaels
Journal:  Int J Syst Bacteriol       Date:  1992-04

2.  Insertion element IS987 from Mycobacterium bovis BCG is located in a hot-spot integration region for insertion elements in Mycobacterium tuberculosis complex strains.

Authors:  P W Hermans; D van Soolingen; E M Bik; P E de Haas; J W Dale; J D van Embden
Journal:  Infect Immun       Date:  1991-08       Impact factor: 3.441

3.  Comparison of DNA fingerprint patterns of isolates of Mycobacterium africanum from east and west Africa.

Authors:  W H Haas; G Bretzel; B Amthor; K Schilke; G Krommes; S Rüsch-Gerdes; V Sticht-Groh; H J Bremer
Journal:  J Clin Microbiol       Date:  1997-03       Impact factor: 5.948

4.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

5.  Comparison of methods based on different molecular epidemiological markers for typing of Mycobacterium tuberculosis complex strains: interlaboratory study of discriminatory power and reproducibility.

Authors:  K Kremer; D van Soolingen; R Frothingham; W H Haas; P W Hermans; C Martín; P Palittapongarnpim; B B Plikaytis; L W Riley; M A Yakrus; J M Musser; J D van Embden
Journal:  J Clin Microbiol       Date:  1999-08       Impact factor: 5.948

6.  Phenotypic and genotypic characterization of Mycobacterium africanum isolates from West Africa.

Authors:  R Frothingham; P L Strickland; G Bretzel; S Ramaswamy; J M Musser; D L Williams
Journal:  J Clin Microbiol       Date:  1999-06       Impact factor: 5.948

7.  Predominance of a single genotype of Mycobacterium tuberculosis in countries of east Asia.

Authors:  D van Soolingen; L Qian; P E de Haas; J T Douglas; H Traore; F Portaels; H Z Qing; D Enkhsaikan; P Nymadawa; J D van Embden
Journal:  J Clin Microbiol       Date:  1995-12       Impact factor: 5.948

8.  Genetic diversity in the Mycobacterium tuberculosis complex based on variable numbers of tandem DNA repeats.

Authors:  R Frothingham; W A Meeker-O'Connell
Journal:  Microbiology       Date:  1998-05       Impact factor: 2.777

9.  Subdivision of Mycobacterium tuberculosis into five variants for epidemiological purposes: methods and nomenclature.

Authors:  C H Collins; M D Yates; J M Grange
Journal:  J Hyg (Lond)       Date:  1982-10

10.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence.

Authors:  S T Cole; R Brosch; J Parkhill; T Garnier; C Churcher; D Harris; S V Gordon; K Eiglmeier; S Gas; C E Barry; F Tekaia; K Badcock; D Basham; D Brown; T Chillingworth; R Connor; R Davies; K Devlin; T Feltwell; S Gentles; N Hamlin; S Holroyd; T Hornsby; K Jagels; A Krogh; J McLean; S Moule; L Murphy; K Oliver; J Osborne; M A Quail; M A Rajandream; J Rogers; S Rutter; K Seeger; J Skelton; R Squares; S Squares; J E Sulston; K Taylor; S Whitehead; B G Barrell
Journal:  Nature       Date:  1998-06-11       Impact factor: 49.962

View more
  90 in total

1.  Clinical and genotypic characteristics of childhood tuberculosis in Chongqing, China.

Authors:  L Xing; R Liu; Q Li; Z Peng; C Zhu
Journal:  Eur J Clin Microbiol Infect Dis       Date:  2011-12-09       Impact factor: 3.267

2.  Analysis of the allelic diversity of the mycobacterial interspersed repetitive units in Mycobacterium tuberculosis strains of the Beijing family: practical implications and evolutionary considerations.

Authors:  Igor Mokrousov; Olga Narvskaya; Elena Limeschenko; Anna Vyazovaya; Tatiana Otten; Boris Vyshnevskiy
Journal:  J Clin Microbiol       Date:  2004-06       Impact factor: 5.948

3.  High genetic diversity revealed by variable-number tandem repeat genotyping and analysis of hsp65 gene polymorphism in a large collection of "Mycobacterium canettii" strains indicates that the M. tuberculosis complex is a recently emerged clone of "M. canettii".

Authors:  Michel Fabre; Jean-Louis Koeck; Philippe Le Flèche; Fabrice Simon; Vincent Hervé; Gilles Vergnaud; Christine Pourcel
Journal:  J Clin Microbiol       Date:  2004-07       Impact factor: 5.948

4.  First worldwide proficiency study on variable-number tandem-repeat typing of Mycobacterium tuberculosis complex strains.

Authors:  Jessica L de Beer; Kristin Kremer; Csaba Ködmön; Philip Supply; Dick van Soolingen
Journal:  J Clin Microbiol       Date:  2011-12-14       Impact factor: 5.948

5.  Mycobacterial interspersed repetitive-unit locus PCR amplification and Beijing strains of Mycobacterium tuberculosis.

Authors:  Tao Luo; Chongguang Yang; Qian Gao
Journal:  J Clin Microbiol       Date:  2011-11       Impact factor: 5.948

6.  Molecular characterization and second-line antituberculosis drug resistance patterns of multidrug-resistant Mycobacterium tuberculosis isolates from the northern region of South Africa.

Authors:  Halima M Said; Marleen M Kock; Nazir A Ismail; Matsie Mphahlele; Kamaldeen Baba; Shaheed V Omar; Ayman G Osman; Anwar A Hoosen; Marthie M Ehlers
Journal:  J Clin Microbiol       Date:  2012-05-30       Impact factor: 5.948

7.  Evaluation of the epidemiological relevance of variable-number tandem-repeat genotyping of Mycobacterium bovis and comparison of the method with IS6110 restriction fragment length polymorphism analysis and spoligotyping.

Authors:  Caroline Allix; Karl Walravens; Claude Saegerman; Jacques Godfroid; Philip Supply; Maryse Fauville-Dufaux
Journal:  J Clin Microbiol       Date:  2006-06       Impact factor: 5.948

8.  Molecular typing of Mycobacterium tuberculosis by mycobacterial interspersed repetitive unit-variable-number tandem repeat analysis, a more accurate method for identifying epidemiological links between patients with tuberculosis.

Authors:  Henk van Deutekom; Philip Supply; Petra E W de Haas; Eve Willery; Susan P Hoijng; Camille Locht; Roel A Coutinho; Dick van Soolingen
Journal:  J Clin Microbiol       Date:  2005-09       Impact factor: 5.948

9.  Progression toward an improved DNA amplification-based typing technique in the study of Mycobacterium tuberculosis epidemiology.

Authors:  Krishna K Gopaul; Timothy J Brown; Andrea L Gibson; Malcolm D Yates; Francis A Drobniewski
Journal:  J Clin Microbiol       Date:  2006-07       Impact factor: 5.948

10.  Molecular typing of Mycobacterium bovis strains isolated in Italy from 2000 to 2006 and evaluation of variable-number tandem repeats for geographically optimized genotyping.

Authors:  M Beatrice Boniotti; Maria Goria; Daniela Loda; Annalisa Garrone; Alessandro Benedetto; Alessandra Mondo; Ernesto Tisato; Mariagrazia Zanoni; Simona Zoppi; Alessandro Dondo; Silvia Tagliabue; Stefano Bonora; Giorgio Zanardi; M Lodovica Pacciarini
Journal:  J Clin Microbiol       Date:  2009-01-14       Impact factor: 5.948

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.