| Literature DB >> 22703188 |
Alejandro Reyes1, Andrea Sandoval, Andrés Cubillos-Ruiz, Katherine E Varley, Ivan Hernández-Neuta, Sofía Samper, Carlos Martín, María Jesús García, Viviana Ritacco, Lucelly López, Jaime Robledo, María Mercedes Zambrano, Robi D Mitra, Patricia Del Portillo.
Abstract
BACKGROUND: The insertion element IS6110 is one of the main sources of genomic variability in Mycobacterium tuberculosis, the etiological agent of human tuberculosis. Although IS 6110 has been used extensively as an epidemiological marker, the identification of the precise chromosomal insertion sites has been limited by technical challenges. Here, we present IS-seq, a novel method that combines high-throughput sequencing using Illumina technology with efficient combinatorial sample multiplexing to simultaneously probe 519 clinical isolates, identifying almost all the flanking regions of the element in a single experiment.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22703188 PMCID: PMC3443423 DOI: 10.1186/1471-2164-13-249
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Overview of IS-seq Method. Genomic DNA (gDNA) is randomly sheared and ligated to adapters containing 24 different barcodes (BC, purple circles). Amplification from the IS 6110 ends is done in a first PCR (PCR1) with primers that anneal on the IS 6110 (blue) and the adapter (green), incorporating additional BC (yellow circles) and the PE-sequencing primer (red). A second PCR (PCR2) is done to incorporate amplification primers (orange and light blue). Sequencing is carried out using regular PE-primers in an Illumina-GA II.
Accuracy of the IS-seq method with reference strains
| | | | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Strain | RFLP bands | PCR confirmed | #IS sites | False Positives | False Negatives | True Positives | #IS sites | False Positives | False Negatives | True Positives |
| Col-H37Rv | 13 | 15 | 15 | 0 | 0 | 15 | 17 | 2 | 0 | 15 |
| Col-UT98 | 11 | 13 | 10 | 0 | 3 | 10 | 13 | 0 | 0 | 13 |
| Col-UT98C | 11 | 13 | 10 | 0 | 3 | 10 | 13 | 0 | 0 | 13 |
| Col-UT261 | 10 | 14 | 13 | 0 | 1 | 13 | 14 | 0 | 0 | 14 |
| Arg-410 | 9 | 12 | 9 | 0 | 3 | 9 | 12 | 0 | 0 | 12 |
| Arg-6548 | 8 | 11 | 8 | 0 | 3 | 8 | 11 | 0 | 0 | 11 |
| Spa-C1 | 10 | 15 | 11 | 0 | 4 | 11 | 15 | 0 | 0 | 15 |
| Total | 72 | 93 | | 0 | 17 | 76 | | 2 | 0 | 93 |
| Sensitivity | 77% | | | | | 82% | | | | 100% |
| PPV* | 100% | 98% | ||||||||
*PPV: Positive Predicted value.
For each reference strain the number of RFLP observed bands is reported as well as the number of IS sites identified using IS-seq by one or both termini of the insertion and the number of confirmed sites by PCR. Sensitivity and Positive Predicted value is calculated as indicated in Methods.
Figure 2Genomic Distribution of ISElements. The M. tuberculosis H37Rv circular genome is shown with outermost circles representing genes located on the positive (red) and negative (green) strands. Inner circle shows insertions in non-coding (purple) and coding (yellow) regions, with peak sizes scaled to indicate number of unique insertions per 200 bp window. Black lines indicate 2 kb regions with significantly more insertions than expected by a uniform distribution, brown boxes are 50Kb regions with significantly more insertions and blue boxes indicate regions of 50Kb with significantly less insertions.
Functional categories of interrupted genes
| | | | |||
|---|---|---|---|---|---|
| | 280.083 | 96 | 3.97E-16 | 1.00E + 00 | |
| - | Not in COG | 670.659 | 158 | 1.12E-11 | 1.00E + 00 |
| M | 144.558 | 36 | 4.76E-04 | 1.00E + 00 | |
| T | Signal transduction mechanisms | 119.979 | 24 | 3.72E-02 | 9.63E-01 |
| A | RNA processing and modification | 648 | 0 | 8.75E-02 | 9.13E-01 |
| N | Cell Motility | 657 | 0 | 8.87E-02 | 9.11E-01 |
| L | Replication. recombination and repair2 | 213.870 | 37 | 8,98E-02 | 9.10E-01 |
| K | Transcription | 175.641 | 24 | 5,13E-01 | 4.87E-01 |
| V | Defense mechanisms | 46.863 | 5 | 6,50E-01 | 3.50E-01 |
| R | General function prediction only | 449.865 | 57 | 7.88E-01 | 2.12E-01 |
| S | 199.185 | 23 | 8.14E-01 | 1.86E-01 | |
| H | Coenzyme transport and metabolism | 171.450 | 19 | 8.37E-01 | 1.63E-01 |
| D | Cell cycle control, cell division. chromosome partitioning | 52.116 | 3 | 9,37E-01 | 6.35E-02 |
| F | Nucleotide transport and metabolism | 70.152 | 4 | 9.70E-01 | 3.00E-02 |
| U | Intracellular trafficking, secretion. and vesicular transport | 24.903 | 0 | 9.71E-01 | 2.93E-02 |
| O | Posttranslational modification, protein turnover. chaperones | 114.210 | 7 | 9.92E-01 | 8.51E-03 |
| Q | Secondary metabolites biosynthesis. transport and catabolism | 341.379 | 28 | 9.99E-01 | 7.55E-04 |
| I | 289.923 | 19 | 1.00E + 00 | 6.52E-05 | |
| J | Translation. ribosomal structure and biogenesis | 138.906 | 4 | 1.00E + 00 | 1.83E-05 |
| G | Carbohydrate transport and metabolism | 168.417 | 6 | 1.00E + 00 | 1.15E-05 |
| P | Inorganic ion transport and metabolism | 159.159 | 4 | 1.00E + 00 | 1.61E-06 |
| C | 258.336 | 10 | 1.00E + 00 | 1.15E-07 | |
| E | Amino acid transport and metabolism | 247.980 | 4 | 1.00E + 00 | 1.76E-11 |
| 147.877 | 7 | 1.00E + 00 | 3.55E-04 | ||
1. Proteins belonging to the PE/PPE families were extracted to a separate category, as they constitute an important family of proteins in M. tuberculosis.
2. The genes for the transposase IS6110 were removed from the category of Replication, recombination and repair.
3. Cumulative length (bp) of all the genes in a given category. The probability of under or over-representation of a given functional category is dependent on both the number of genes and their length.
4. Represents the number of independent insertion events identified in genes of a given category.
5. Probability of over or under-representation of insertion sequences interrupting genes of a given category. In bold, categories with significant over-, under-representation after Bonferroni correction. Bonferroni corrected threshold = 2.2E-3.
6. This category is not part of COG; it is defined in tuberculist (see Methods).
Insertions in genes important for growth or virulence
| Rv0336 | - | Conserved 13E12 repeat family protein | 71.2 | + | [ |
| Rv0405 | pks6 | Probable membrane bound polyketide synthase | 26.7 | + | [ |
| Rv1371 | - | Probable conserved membrane protein | 17.8 | + | [ |
| Rv1469 | ctpD | Probable cation transporter P-type ATPase D | 2.5 | + | [ |
| Rv1477 | ripA | Peptidoglycan hydrolase | 72.9 | + | [ |
| Rv1753c | PPE24 | PPE Family protein | 2.4 | + | [ |
| Rv1978 | - | Conserved hypothetical protein | 41.8 | + | [ |
| Rv2388c | hemN | Coproporphyrinogen III oxidase | 28.7 | - | [ |
| Rv2708c | - | Conserved hypothetical protein | 61.8 | + | [ |
| Rv2808 | - | Hypothetical protein | 23.6 | + | [ |
| Rv2812 | - | Probable transposase | 8.8 | + | [ |
| Rv2817c | - | Conserved hypothetical protein | 71.8 | + | [ |
| Rv3018c | PPE46 | PPE family protein | 4.7 | + | [ |
| Rv3112 | moaD1 | Probable molybdenum cofactor biosynthesis protein D | 29.8 | + | [ |
| Rv3113 | - | Possible phosphatase | 34.1 | + | [ |
| Rv3114 | - | Conserved hypothetical protein | 40.7 | - | [ |
| Rv3201c | - | Probable ATP-dependent DNA helicase | 0.2 | + | [ |
| Rv3229c | desA3 | Possible linoleoyl-CoA desaturase | 43.4 | + | [ |
| Rv3343c | PPE54 | PPE family protein | 9.3 | + | [ |
| Rv3376 | - | Conserved hypothetical protein | 60.9 | + | [ |
| Rv0001 | dnaA | Chromosomal replication initiation protein | 95.1 | + | [ |
| Rv0755c | PPE12 | PPE Family protein | 96.4 | + | [ |
| Rv2282c | - | Probable transcription regulator (LysR family) | 84.9 | + | [ |
| Rv2833c | ugpB | Probable Sn-glycerol-3-phosphate-binding lipoprotein | 88.9 | + | [ |
| Rv2856 | nicT | Possible nickel-transport integral membrane protein | 89.8 | + | [ |
| Rv3111 | moaC1 | Molybdenum cofactor biosynthesis protein C | 92.8 | - | [ |
| Rv3177 | - | Possible peroxidase | 90.4 | + | [ |
| Rv0591 | mce2C | MCE-family protein | 56.1 | | |
| Rv1477 | ripA | Peptidoglycan hydrolase | 72.9 | | |
| Rv1720c | vapC12 | Possible toxin | 7.9 | | |
| Rv2494 | - | Conserved hypothetical protein | 97.2 | | |
| Rv3176c | mesT | Probable epoxide hydrolase | 11.8 | | |
| Rv3177 | - | Possible peroxidase | 90.4 | | |
| Rv3473c (MT3579 | bpoA | Possible peroxidase | 93.8 | ||
1) Indicates the position of the insertion within the ORF as the percentage of gene upstream of the insertion.
2) Insertions verified using PCR: +, PCR positive; -, PCR negative.
Figure 3Classification ofStrains Based on ISInsertions. An UPGMA tree was generated from the distance matrix obtained using Variation of Information metric for 504 strains. Colors represent the main lineage 4 families present in the study, using Beijing strains (lineage 2) as an outgroup. Reference sequenced bacterial genomes are indicated by numbers on the side: 1- TB str Haarlem, 2- TB H37Rv, 3- TB H37Ra, 4- TB CDC-1551, 5-TB-C, 6- TB KZN_4207, 7- TB_KZN_R596, 8- TB_KZN_V2475, 9- TB_KZN_1435, 10- TB_KZN_605, 11- TB_F11, 12- TB GM_1503, 13- TB 98-R604, 14- TB T85, 15- TB 210, 16- TB 02_1987 (see Methods).