| Literature DB >> 17170007 |
Scott Mann1, Jinyan Li, Yi-Ping Phoebe Chen.
Abstract
The computational approach for identifying promoters on increasingly large genomic sequences has led to many false positives. The biological significance of promoter identification lies in the ability to locate true promoters with and without prior sequence contextual knowledge. Prior approaches to promoter modelling have involved artificial neural networks (ANNs) or hidden Markov models (HMMs), each producing adequate results on small scale identification tasks, i.e. narrow upstream regions. In this work, we present an architecture to support prokaryote promoter identification on large scale genomic sequences, i.e. not limited to narrow upstream regions. The significant contribution involved the hybrid formed via aggregation of the profile HMM with the ANN, via Viterbi scoring optimizations. The benefit obtained using this architecture includes the modelling ability of the profile HMM with the ability of the ANN to associate elements composing the promoter. We present the high effectiveness of the hybrid approach in comparison to profile HMMs and ANNs when used separately. The contribution of Viterbi optimizations is also highlighted for supporting the hybrid architecture in which gains in sensitivity (+0.3), specificity (+0.65) and precision (+0.54) are achieved over existing approaches.Entities:
Mesh:
Year: 2006 PMID: 17170007 PMCID: PMC1802591 DOI: 10.1093/nar/gkl1024
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1pHMM scoring algorithm discrimination ability for Sequence AB102735.
Figure 2pHMMs as input to three-layer ANN, shaded ‘UP’ element indicates this profiles inclusion at a later stage in the investigation.
Figure 3Effect of hidden neurons on dataset discrimination.
Figure 4−35 Box HMM Scoring for Sequence AB102735.
Figure 5−10 Box HMM Scoring for Sequence AB102735.
Figure 6pHMM-ANN Hybrid scoring for AB102735.
Figure 7ANN-HMM Hybrid scoring ‘using Up elements’ AB102735.
Data summary
| Method | TP | TN | FP | FN |
|---|---|---|---|---|
| NNPP | 4 | 1 | 15 | 6 |
| SAK | 4 | 3 | 13 | 6 |
| Hybrid | 7 | 10 | 2 | 3 |
FP, false positive: non-promoter predicted as promoter.
FN, false negative: promoter predicted as non-promoter.
Sn, sensitivity: proportion of true promoters correctly identified as promoters as given in Equation 3.
Sp, specificity: proportion of non-promoters predicted as non-promoters as given in Equation 4.
P, precision: proportion of promoter predictions being true promoters as given in Equation 5.
Performance measure comparison
| Method | Sn | Sp | |
|---|---|---|---|
| NNPP | 0.4 | 0.0625 | 0.210526316 |
| SAK | 0.4 | 0.1875 | 0.235294118 |
| pHMM-ANN hybrid | 0.700 | 0.833 | 0.777 |
FP, false positive: non-promoter predicted as promoter.
FN, false negative: promoter predicted as non-promoter.
Sn, sensitivity: proportion of true promoters correctly identified as promoters as given in Equation 3.
Sp, specificity: proportion of non-promoters predicted as non-promoters as given in Equation 4.
P, precision: proportion of promoter predictions being true promoters as given in Equation 5.