| Literature DB >> 22196360 |
Lavanya Rishishwar1, Bhasker Pant, Kumud Pant, Kamal R Pardasani.
Abstract
Mycobacterium tuberculosis (MTB), causative agent of tuberculosis, is one of the most dreaded diseases of the century. It has long been studied by researchers throughout the world using various wet-lab and dry-lab techniques. In this study, we focus on mining useful patterns at genomic level that can be applied for in silico functional characterization of genes from the MTB complex. The model developed on the basis of the patterns found in this study can correctly identify 99.77% of the input genes from the genome of MTB strain H37Rv. The model was tested against four other MTB strains and the homologue M. bovis to further evaluate its generalization capability. The mean prediction accuracy was 85.76%. It was also observed that the GC content remained fairly constant throughout the genome, implicating the absence of any pathogenicity island transferred from other organisms. This study reveals that dinucleotide composition is an efficient functional class discriminator for MTB complex. To facilitate the application of this model, a web server Tuber-Gene has been developed, which can be freely accessed at http://www.bifmanit.org/tb2/.Entities:
Mesh:
Year: 2011 PMID: 22196360 PMCID: PMC5054438 DOI: 10.1016/S1672-0229(11)60020-X
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Top two levels of gene functional class hierarchy
| Hierarchy order | Class name | No. of genes | Percentage distribution (%) |
|---|---|---|---|
| I.A. | Degradation | 163 | 4.17 |
| I.B. | Energy metabolism | 292 | 7.48 |
| I.C. | Central intermediary metabolism | 45 | 1.15 |
| I.D. | Amino acid biosynthesis | 95 | 2.43 |
| I.E. | Polyamine synthesis | 1 | 0.03 |
| I.F. | Purines, pyrimidines, nucleosides and nucleotides | 60 | 1.54 |
| I.G. | Biosynthesis of cofactors, prosthetic groups and carriers | 117 | 3.00 |
| I.H. | Lipid biosynthesis | 65 | 1.66 |
| I.I. | Polyketide and non-ribosomal peptide synthesis | 41 | 1.05 |
| I.J. | Broad regulatory functions | 187 | 4.79 |
| II.A. | Synthesis and modification of macromolecules | 215 | 5.5 |
| II.B. | Degradation of macromolecules | 87 | 2.23 |
| II.C. | Cell envelope | 351 | 8.99 |
| III.A. | Transport/binding proteins | 123 | 3.15 |
| III.B. | Chaperones/heat shock | 16 | 0.41 |
| III.C. | Cell division | 19 | 0.49 |
| III.D. | Protein and peptide secretion | 14 | 0.36 |
| III.E. | Adaptations and atypical conditions | 12 | 0.31 |
| III.F. | Detoxification | 22 | 0.56 |
| IV.A. | Virulence | 38 | 0.97 |
| IV.B. | IS elements, repeated sequences and phage | 132 | 3.38 |
| IV.C. | PE and PPE families | 164 | 4.20 |
| IV.D. | Antibiotic production and resistance | 14 | 0.36 |
| IV.E. | Bacteriocin-like proteins | 3 | 0.08 |
| IV.F. | Cytochrome P450 enzymes | 22 | 0.56 |
| IV.G. | Coenzyme F420-dependent enzymes | 3 | 0.08 |
| IV.H. | Miscellaneous transferases | 61 | 1.56 |
| IV.I. | Miscellaneous phosphatases, lyases, and hydrolases | 18 | 0.46 |
| IV.J. | Cyclases | 6 | 0.15 |
| IV.K. | Chelatases | 2 | 0.05 |
The parameters used to form the vectors in the three systems in the current study
| System | Composition employed | Vector length | Feature attributes |
|---|---|---|---|
| A | Mononucleotide | 4 | Composition of A, T, G, C |
| B | A+T and C+G | 2 | Composition of A+T and G+C |
| C | Dinucleotide | 16 | Composition of AA, AT, AG, AC, TA, TT, TG, TC, GA, GT, GG, GC, CA, CT, CG, CC |
Figure 1Flow chart describing the whole process. The steps involved in designing of the system A, B and C are depicted.
Prediction accuracies of gene functions
| Strain | No. of genes | Accuracy (%) | Not classified (%) | Misclassified (%) | |||
|---|---|---|---|---|---|---|---|
| Level 1 | Level 2 | Level 3 | |||||
| H37Rv | System A | 3,906 | 97.49 | 97.19 | 75.49 | 1.84 | 0.67 |
| System B | 3,906 | 0.00 | 0.00 | 0.00 | 100.00 | 0.00 | |
| System C | 3,906 | 99.77 | 99.96 | 78.77 | 0.23 | 0.00 | |
| H37Ra | 3,960 | 95.48 | 95.83 | 75.59 | 4.52 | 0.00 | |
| F11 | 3,898 | 87.04 | 87.49 | 68.27 | 12.96 | 0.00 | |
| 3,910 | 73.81 | 67.52 | 50.86 | 26.19 | 0.00 | ||
| C1 | 3,841 | 73.57 | 73.66 | 56.59 | 26.43 | 0.00 | |
| CDC1551 | 3,893 | 73.21 | 75.93 | 60.19 | 26.79 | 0.00 | |
Note: Gene functions were predicted using various systems and the resulting accuracies at each level are shown. The model built by system C was further tested against other members of MTB complex and the homologue M. bovis to evaluate the capability of System C to generalize its prediction power.
Figure 2Compositional differences between the classes using System A. The clustered column graph indicates the mean composition of the four nucleotides in each of the six functional classes. Although the variation of each nucleotide among the classes is not significant, all the classes have markedly higher percentage (in the range of 30%-35% for each) of cytosine and guanine in their sequences. This CG-richness in the genome is the basis for system B.
Figure 3Compositional differences between the classes using System B. The clustered column graph indicates the mean composition of AT and CG for each of the six functional classes. A more stable composition can be observed as compared to that of System A, implying the absence of pathogenicity islands in the genome (.
Figure 4Markov chain model for DNA. Each edge in the graph represents the probability of occurrence of a nucleotide following another nucleotide. All the possible four transitions starting from A are shown in grey.