| Literature DB >> 18834544 |
Firoz Anwar1, Syed Murtuza Baker, Taskeed Jabid, Md Mehedi Hasan, Mohammad Shoyaib, Haseena Khan, Ray Walshe.
Abstract
BACKGROUND: Eukaryotic promoter prediction using computational analysis techniques is one of the most difficult jobs in computational genomics that is essential for constructing and understanding genetic regulatory networks. The increased availability of sequence data for various eukaryotic organisms in recent years has necessitated for better tools and techniques for the prediction and analysis of promoters in eukaryotic sequences. Many promoter prediction methods and tools have been developed to date but they have yet to provide acceptable predictive performance. One obvious criteria to improve on current methods is to devise a better system for selecting appropriate features of promoters that distinguish them from non-promoters. Secondly improved performance can be achieved by enhancing the predictive ability of the machine learning algorithms used.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18834544 PMCID: PMC2575220 DOI: 10.1186/1471-2105-9-414
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Predictive accuracy calculation of the proposed model using different statistical measures
| Plant Promoter | 43 | Nil | 7 | Nil | 0.86 | 0.90 | 0.761 | 0.10 | 0.14 | 0.89 | 0.86 | 8.60 | 0.15 |
| Plant CDS/Non-Promoter | Nil | 5 | Nil | 45 | |||||||||
| Drosophila Promoter | 48 | Nil | 2 | Nil | 0.96 | 0.92 | 0.881 | 0.08 | 0.04 | 0.92 | 0.95 | 12 | 0.04 |
| Drosophila CDS/Non-Promoter | Nil | 4 | Nil | 46 | |||||||||
| Human Promoter | 44 | Nil | 6 | Nil | 0.88 | 0.92 | 0.801 | 0.08 | 0.12 | 0.92 | 0.88 | 11 | 0.13 |
| Human CDS/Non-Promoter | Nil | 4 | Nil | 46 | |||||||||
| Mouse Promoter | 39 | Nil | 11 | Nil | 0.78 | 0.84 | 0.621 | 0.16 | 0.22 | 0.83 | 0.79 | 4.87 | 0.26 |
| Mouse CDS/Non-Promoter | Nil | 8 | Nil | 42 | |||||||||
| Rat Promoter | 41 | Nil | 9 | Nil | 0.82 | 0.80 | 0.620 | 0.20 | 0.18 | 0.80 | 0.81 | 4.10 | 0.22 |
| Rat CDS/Non-Promoter | Nil | 10 | Nil | 40 | |||||||||
Cross validation accuracy
| Plant | 83.81% |
| Drosophila | 94.82% |
| Human | 91.25% |
| Mouse | 90.77% |
| Rat | 82.35% |
Top five 4-mer motifs in different species arranged in order, highest on the top
| 1 | TTTT | GCGG | AAAA | AAAA | GGAG |
| 2 | AAAA | GGCG | GGGG | TTTT | GGGG |
| 3 | ATTT | GGGG | CAGG | AAAT | AGAG |
| 4 | AAAT | GGGC | CCAG | TATA | CAGG |
| 5 | AATT | GCCC | GGGC | ATAT | CAGC |
Figure 1Top 18 motifs distribution along with position in Drosophila (A), Human (B), Mouse (C), Plant (D) and Rat (E).
The top 10 most discriminating 4-mer sequences found within promoter and non-promoter region for plant
| 1 | 564 | TATA |
| 2 | 438 | ATAT |
| 3 | 435 | AAAA |
| 4 | 431 | ATAA |
| 5 | 407 | TAAA |
| 6 | 373 | GAAG |
| 7 | 324 | TTTT |
| 8 | 312 | ATAA |
| 9 | 302 | AGAA |
| 10 | 302 | GGAA |
Comparison of accuracy against existing methods [** = Infinity]
| SENSITIVITY % | 68 | 88 | 0 | 12 | 0 | 86 |
| SPECIFICITY % | 76 | 90 | 100 | 100 | 78 | 90 |
| Correlation Co-Eff. | 0.44 | 0.78 | ** | 0.25 | ** | 0.77 |