| Literature DB >> 25132849 |
Kiran Sree Pokkuluri1, Ramesh Babu Inampudi2, S S S N Usha Devi Nedunuri3.
Abstract
Protein coding and promoter region predictions are very important challenges of bioinformatics (Attwood and Teresa, 2000). The identification of these regions plays a crucial role in understanding the genes. Many novel computational and mathematical methods are introduced as well as existing methods that are getting refined for predicting both of the regions separately; still there is a scope for improvement. We propose a classifier that is built with MACA (multiple attractor cellular automata) and MCC (modified clonal classifier) to predict both regions with a single classifier. The proposed classifier is trained and tested with Fickett and Tung (1992) datasets for protein coding region prediction for DNA sequences of lengths 54, 108, and 162. This classifier is trained and tested with MMCRI datasets for protein coding region prediction for DNA sequences of lengths 252 and 354. The proposed classifier is trained and tested with promoter sequences from DBTSS (Yamashita et al., 2006) dataset and nonpromoters from EID (Saxonov et al., 2000) and UTRdb (Pesole et al., 2002) datasets. The proposed model can predict both regions with an average accuracy of 90.5% for promoter and 89.6% for protein coding region predictions. The specificity and sensitivity values of promoter and protein coding region predictions are 0.89 and 0.92, respectively.Entities:
Year: 2014 PMID: 25132849 PMCID: PMC4123571 DOI: 10.1155/2014/261362
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1IN-MACA-MCC architecture—front.
Figure 2IN-MACA-MCC architecture—rear.
Example rules.
| SNO | Rule number | General representation |
|---|---|---|
| 1 | 254 |
|
| 2 | 252 |
|
| 3 | 238 |
|
| 4 | 250 |
|
| 5 | 204 |
|
| 6 | 240 |
|
| 7 | 170 |
|
Figure 3Attractor state (1.0, 1.0, and 0.2)—B formed with rule 〈170, 252, 204〉.
Figure 4Basin calculation.
Figure 5Training interface.
Execution time for prediction of both protein and promoter regions.
| Size of dataset | Prediction time of integrated algorithm in ms |
|---|---|
| 5000 | 1064 |
| 6000 | 1389 |
| 10000 | 2002 |
| 20000 | 2545 |
IN-MACA-MCC protein coding comparison with existing approaches.
| Algorithm/coding measure | Sensitivity | Specificity |
|---|---|---|
| OC1 | 65.3 | 66.4 |
| Hexamer | 68.36 | 70.2 |
| Position asymmetry | 72.3 | 74.5 |
| Dicodon usage | 81.3 | 82.3 |
| CRITICA | 82.5 | 84.9 |
| IN-MACA-MCC | 89.6 | 89.3 |
IN-MACA-MCC promoter comparison with existing approaches.
| Method | Sensitivity | Specificity |
|---|---|---|
| Promoter inspector | 56.9 | 46.9 |
| Dragon promoter finder | 62.3 | 59.3 |
| Promo predictor | 65.3 | 66.9 |
| CNN-promoter | 76.3 | 82.3 |
| SPANN | 68.9 | 84 |
| IMC | 76 | 86 |
| IN-MACA-MCC | 88.5 | 92.7 |
Figure 6Predictive accuracy for protein coding regions.
Figure 7Predictive accuracy for promoter regions.
Box 1
Box 2