| Literature DB >> 21711543 |
A K M Azad1, Saima Shahid, Nasimul Noman, Hyunju Lee.
Abstract
BACKGROUND: With an increasing number of plant genome sequences, it has become important to develop a robust computational method for detecting plant promoters. Although a wide variety of programs are currently available, prediction accuracy of these still requires further improvement. The limitations of these methods can be addressed by selecting appropriate features for distinguishing promoters and non-promoters.Entities:
Year: 2011 PMID: 21711543 PMCID: PMC3160368 DOI: 10.1186/1748-7188-6-19
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1Average sensitivities and specificities of the FDAFSA method for the selection of a different fraction of features from 4,096 features. The x-axis shows the fraction of selected features from 4,096 features and the y-axis shows the average sensitivity and specificity corresponding to the selected features.
Top 10 common hexamers in a set of top 25% features of FDAFSA from 5 data sets of 5-fold cross validation.
| Rank | Common hexamers extracted from All 5 dataset (top 25%) |
|---|---|
| 1 | ATATAT |
| 2 | TATATA |
| 3 | ATATTT |
| 4 | TATAAA |
| 5 | AAAAAA |
| 6 | TTTTTT |
| 7 | AGAGAG |
| 8 | TCTCTC |
| 9 | CTCTCT |
| 10 | GAGAGA |
FDAFSA vs. PromMachine.
| Methods (n-mers used) | Average Sensitivity of 5-fold cross validation (%) | Average Specificity of 5-fold cross validation (%) |
|---|---|---|
| FDAFSA | 84* | 86* |
| PromMachine | 86+ | 81+ |
*Accuracies are measured using the top 25% features from FDAFSA sequences in 1-pass. The measurements are then averaged for 5-passes.
+ This result is generated by implementing the PromMachine algorithm by ourselves using our dataset.
Figure 2Average sensitivities and specificities of the RTPFSGA method for different levels of significance (α-value). The x-axis shows p-values less than the different α-values, and the y-axis shows the average sensitivity and specificity corresponding to the selected features.
10 common RTPs in a set of RTPs having p-value < 0.000001 of all 5 data sets using 5-fold cross validation.
| Rank | Random Triplet Pair |
|---|---|
| 1 | AAA-AAA |
| 2 | AAA-AAT |
| 3 | AAA-AGA |
| 4 | AAA-ATC |
| 5 | AAA-ATT |
| 6 | AAA-CAT |
| 7 | AAA-TTT |
| 8 | AAC-ATA |
| 9 | AAC-CGA |
| 10 | AAC-CTG |
Results of prediction test with combined features from FDAFSA and RTPFSGA.
| Test Dataset | TP | FN | TN | FP | Sensitivity (%) | Specificity (%) |
|---|---|---|---|---|---|---|
| 56 | 5 | 52 | 9 | 92 | 85 | |
| 54 | 7 | 52 | 9 | 89 | 85 | |
| 54 | 7 | 55 | 6 | 89 | 90 | |
| 52 | 9 | 51 | 10 | 85 | 84 | |
| 55 | 6 | 51 | 10 | 90 | 84 | |
| Average | 89 | 86 | ||||
Comparative accuracy of PromoBot with FDAFSA and RTPFSGA.
| Algorithm for feature selection | Average sensitivity for 5-fold cross validation (%) | Average specificity for 5-fold cross validation (%) |
|---|---|---|
| FDAFSA | 84 | 86 |
| RTPFSGA | 94 | 59 |
| PromoBot | 89 | 86 |
Comparison with other methods.
| Statistical Measure (%) | NNPP 2.2 (threshold = 0.8) | TSSP-TCM | Promoter Scan Version | Promoter | Prom-Machine | PromoBot |
|---|---|---|---|---|---|---|
| Avg. Sensitivity | 74 | 88 | 8 | 24 | 86 | 89 |
| Avg. Specificity | 70 | 84 | 4 | 34 | 81 | 86 |
Performance evaluation using 271 experimentally validated promoters.
| Algorithm | No. of sequences | No. of accurate prediction | Percentage (%) |
|---|---|---|---|
| TSSP-TCM | 210 | 77.49 | |
| PromoBot | 235 | 86.72 | |
Comparative assessment of performance using different negative datasets
| Method | Statistical Measure (%) | miRNA only | mRNA only | rRNA only | PromoBot |
|---|---|---|---|---|---|
| PromoBot | Avg. Sensitivity | 82.95 | 87.87 | 93.12 | 89 |
| Avg. Specificity | 59.67 | 84.26 | 95.08 | 86 | |
| TSSP-TCM | Avg. Sensitivity | 88 | 88 | 88 | 88 |
| Avg. Specificity | 75.41 | 80.98 | 96.06 | 84 | |
Figure 3The significance of RTPs compared to the hexamers produced by two triplets in RTPs. Observed diff_RTP_hexamer average value (464.49) was compared with 1000 random cases where in each case, 220 random triplet pairs were generated and the average of 220 diff_RTP_hexamer values was calculated.