| Literature DB >> 18992142 |
Andrés Cubillos-Ruiz1, Juan Morales, María Mercedes Zambrano.
Abstract
BACKGROUND: The recent determination of the complete nucleotide sequence of several Mycobacterium tuberculosis (MTB) genomes allows the use of comparative genomics as a tool for dissecting the nature and consequence of genetic variability within this species. The multiple alignment of the genomes of clinical strains (CDC1551, F11, Haarlem and C), along with the genomes of laboratory strains (H37Rv and H37Ra), provides new insights on the mechanisms of adaptation of this bacterium to the human host.Entities:
Year: 2008 PMID: 18992142 PMCID: PMC2590607 DOI: 10.1186/1756-0500-1-110
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Functional category classification of the LSPs found in the multiple alignments. Dark green bars represent the variation in each category respect to the total of LSPs. The percentages are given with respect to a total of 166 matches because there are genes that are classified in more than one functional category. Light green bars represent the percentage of LSPs found respect to the total number of genes in each category. The percentage of the intergenic region category (red bar) represents the LSPs that involved a non-coding region.
Figure 2Distribution and frequency of LSPs in strains H37Rv, H37Ra, CDC1551, F11, Haarlem and C. The picture depicts LSP sites observed for each of the six strains (MTB H37Rv, H37Ra, CDC1551, F11, Haarlem and C) with respect to the other strains. The frequency of the LSP can be inferred from the color scale. Background color represents invariable positions. The base of the square is 25,000 bp and the sizes of the LSPs are proportionately scaled.
Figure 3Extrapolation of strain-specific deletions. The number of specific deletions is plotted as a function of the number n of strains. For each n, markers are the 6!/[(n - 1)!·(6 - n)!] values obtained for the different strain combinations. The continuous curve represents the least-squares fit of the function F(n) = κexp [-n/τ] + Ω, where κ= 186.7 ± 44.3, -1/τ= -0.83 ± 0.13 and Ω = 7.6 ± 1.7. The best fit obtained had a correlation of 0.997.
Figure 4Extrapolation of strain-specific insertions. The number of specific insertions is plotted as a function of the number n of strains. For each n, markers are the 6!/[(n - 1)!·(6 - n)!] values obtained for the different strain combinations. The continuous curve represents the least-squares fit of the function F(n) = κexp [-n/τ] + Ω where κ= 578.35 ± 198.07, -1/τ= -1.36 ± 0.18 and Ω = 5.1 ± 1.1. The best fit was obtained with a correlation of 0.998.
Genes Involved in strain-specific deletions.
| 1987985 | 6830 | MT1800 (Glycosyl transferase) – MT1801 (Molybdopterin oxidoredutase) – |
| 206981 | 58 | Rv0175 (Probable conserved MCE Associated membrane protein) |
| 888687 | 875 | Rv0792c (GNTR-Family transcriptional regulatory protein) – Rv0793 (Hypothetical protein) – Rv0794c (Putative oxidoreductase) |
| 2161299 | 139 | |
| 2349725 | 58 | Intergenic Region between |
| 2373345 | 58 | C-terminal end of the Rv2112c (Hypothetical protein) |
| 2382265 | 2275 | |
| 2701639 | 501 | Rv2406c (Conserved hypothetical protein) – Intergenic region – Rv2407 (Ribonuclease Z) |
| 3727960 | 5284 | |
| 3732032 | 192 | PPE54 |
| 3948252 | 640 | Rv3519 (Conserved hypothetical protein) |
| 374076 | 61 | PPE5 |
| 1509585 | 1101 | Rv1334 (Conserved Hypothetical protein – Rv1335 (CFP10A) – |
| 1890753 | 172 | Possible Promoter region of Rv1668c (Probable Macrolide ABC transport. ATP binding protein) |
| 2381228 | 2645 | |
| 3183106 | 109 | Rv2859c (Hypothetical amidotransferase) |
| 3702663 | 104 | Intergenic region between |
| 912152 | 92 | Rv0823c (Possible transcriptional regulatory Protein) |
| 965426 | 481 | |
| 1451173 | 115 | Intergenic region betwwen |
| 1521604 | 4753 | Rv1353c (Probable transcriptional regulatory protein) – Rv1354c (Possible diguanylate cyclase protein) – |
| 2060412 | 643 | |
| 2365930 | 1774 | |
| 2546849 | 6480 | Rv2271 (Hypothetical protein) – Rv2272 (Probable conserved transmembrane protein) – Rv2273 (Probable conserved transmembrane protein) – Rv2274c (Hypothetical protein) – Rv2275 (Hypothetical protein) – |
| 2792261 | 170 | |
| 3117314 | 439 | Repeat region (DR) Intergenic between Rv2813 and Rv2814 |
| 3192481 | 87 | IS |
| 4033587 | 616 | Rv3600c (Conserved hypothetical protein) – |
| 4365135 | 84 | |
| 334840 | 6611 | |
| 474110 | 125 | Rv0401 (Probable conserved transmembrane protein) |
| 565404 | 113 | Intergenic region between Rv0480c (Probable amidohydrolase) and Rv0481c (Hypothetical protein) |
| 956083 | 54 | Intergenic region between |
| 1984522 | 109 | C-terminal end |
| 2354142 | 214 | |
| 2456125 | 115 | Intergenic region between |
| 2825166 | 53 | Rv2517c (Hypothetical protein) |
| 2914707 | 115 | |
| 3113322 | 736 | Repeat region DR Intergenic between Rv2816c and Rv2817c |
| 3182209 | 54 | Intergenic region between |
| 3369008 | 113 | |
| 3369190 | 81 | |
| 3369344 | 62 | |
| 3369422 | 171 | |
| 3426378 | 83 | Rv3074 (Hypothetical protein) |
| 3911404 | 11522 | |
| 4009833 | 3486 | |
| 4143780 | 560 | Rv3728 (Probable conserved two domain membrane protein – drug transporter) |
Gene names are according to Tuberculist and for genes absent in the MTB H37Rv genome the names are according to the MTB CDC1551 genome annotation [12]. Position indicated correspond to the site where the deletion has occurred.
Genes Involved in strain-specific insertions.
| 3551227–3552586 | 1359 | Intergenic region between Rv3185 (Probable transposase) and Rv3188 (Hypothetical protein) |
| 13627–14986 | 1359 | Insertion of the IS |
| 1989343–1990797 | 1437 | Insertion of the IS |
| 804474–804691 | 217 | Intergenic region between |
| 2341782–2341840 | 58 | MT2144 (Hypothetical protein) |
| 934662–936021 | 1359 | IS |
| 1945992–1947351 | 1359 | IS |
| 1992354–1993713 | 1359 | IS |
| 1997360–1998719 | 1359 | IS |
| 2004182–2005541 | 1359 | IS |
| 2017862–2019221 | 1359 | IS |
| 2270442–2271801 | 1359 | IS |
| 2349289–2350648 | 1359 | Intergenic region between Rv2077c (Possible conserved transmembrane protein) – Rv2078 (Hypothetical protein) |
| 2697325–2698684 | 1359 | IS |
| 3492136–3493495 | 1359 | IS |
| 3564263–3565622 | 1359 | IS |
| 3831596–3831772 | 176 | Intergenic region between Rv3401 (Hypothetical protein) and Rv3402c (Hypothetical protein) |
| 3853062–3860408 | 2394 | TBFG_13461 (Hypothetical protein) – TBFG_13462 (Hypothetical protein) – TBFG_13463 (Hypothetical protein) – TBFG_134614 (Probable Transposase) – Intergenic region |
| 1071617–1072976 | 1359 | IS |
| 1714841–1716200 | 1359 | IS |
| 1834511–1834569 | 58 | Rv1637c (Hypothetical protein) |
| 1973210–1973436 | 226 | PPE24 |
| 2606037–2607396 | 1359 | IS |
| 4144939–4144997 | 58 | |
| 4088473–4088705 | 232 | Rv3680 (Probable anion transporter ATPase) |
Gene names are according to Tuberculist when homologous to MTB H37Rv, if not, genes are named according to the annotation of the respective genome.