| Literature DB >> 19478005 |
Thomas Abeel1, Yves Van de Peer, Yvan Saeys.
Abstract
MOTIVATION: Promoter prediction is an important task in genome annotation projects, and during the past years many new promoter prediction programs (PPPs) have emerged. However, many of these programs are compared inadequately to other programs. In most cases, only a small portion of the genome is used to evaluate the program, which is not a realistic setting for whole genome annotation projects. In addition, a common evaluation design to properly compare PPPs is still lacking.Entities:
Mesh:
Year: 2009 PMID: 19478005 PMCID: PMC2687945 DOI: 10.1093/bioinformatics/btp191
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Overview of all the programs analyzed
| Name | References |
|---|---|
| ARTS | Sonnenburg |
| CpGcluster | Hackenberg |
| CpGProD | Ponger and Mouchiroud ( |
| DragonGSF | Bajic and Brusic ( |
| DragonPF | Bajic |
| EP3 | Abeel |
| Eponine | Down and Hubbard ( |
| FirstEF | Davuluri |
| McPromoter | Ohler |
| NNPP2.2 | Reese ( |
| Nscan | Gross and Brent ( |
| Promoter 2.0 | Knudsen ( |
| PromoterExplorer | Xie |
| PromoterScan | Prestridge ( |
| ProSOM | Abeel |
| PSPA | Wang and Hannenhalli ( |
| Wu-method | Wu |
Fig. 1.Visual representation of how the different protocols work. The panel numbers refer to the protocol identifiers. Protocols starting with 1 are based on binning, the ones starting with 2 on distance. Protocols ending in A use the CAGE data as reference, and those ending in B the gene set. More details can be found in the main text.
Overview of the results of all protocols on all PPPs
| Name | 1A | 1B | 2A | 2B | Number of predictions | Threshold | PPP score | ||
|---|---|---|---|---|---|---|---|---|---|
| 1 | ARTS | 0.19 | 0.36 | 0.47 | 0.64 | 432117 | 0.56362 | 0.47 | |
| 2 | CpGcluster | 0.09 | 0.22 | 0.28 | 0.44 | 22777 | 42.24167 | 0.38 | 0.18 |
| 3 | CpGProD | 0.06 | 0.16 | 0.32 | 0.04 | 20810 | 0.25473 | 0.45 | 0.08 |
| 4 | DragonGSF | 0.06 | 0.16 | 0.25 | 0.42 | 100,046 | 0.26 | 0.45 | 0.14 |
| 5 | DragonPF | 0.05 | 0.08 | 0.18 | 0.26 | 747571 | 0.34 | 0.32 | 0.09 |
| 6 | EP3 | 0.18 | 0.23 | 0.42 | 0.51 | 67807 | -0.048 | 0.44 | |
| 7 | Eponine | 0.14 | 0.29 | 0.41 | 0.57 | 1320964 | 0.986 | 0.45 | |
| 8 | FirstEF | 0.08 | 0.23 | 0.28 | 0.52 | 44818 | 0.92938 | 0.28 | 0.18 |
| 9 | McPromoter | 0.04 | 0.10 | 0.12 | 0.23 | 43818 | -0.01347 | 0.25 | 0.08 |
| 10 | NNPP2.2 | 0.01 | 0.01 | 0.01 | 0.01 | 1962552 | 0.99 | 0.08 | 0.01 |
| 11 | Nscan | 0.07 | 0.27 | 0.22 | 0.51 | 23360 | 200.558 | 0.34 | 0.17 |
| 12 | Promoter 2.0 | 0.01 | 0.01 | 0.02 | 0.01 | 1923610 | 0.5 | 0.10 | 0.01 |
| 13 | PromoterExplorer | 0.02 | 0.05 | 0.07 | 0.12 | 134282 | NA | 0.25 | 0.04 |
| 14 | PromoterScan | 0.02 | 0.05 | 0.06 | 0.13 | 248671 | 57.51 | 0.20 | 0.04 |
| 15 | ProSOM | 0.18 | 0.25 | 0.42 | 0.51 | 63228 | 0.65302 | 0.44 | |
| 16 | PSPA | 0.05 | 0.17 | 0.16 | 0.33 | 25602 | 85.20467 | 0.28 | 0.11 |
| 17 | Wu-method | 0.04 | 0.10 | 0.13 | 0.24 | 23934 | NA | 0.31 | 0.08 |
The first two columns provide the index and the name of the PPPs. The third through sixth column show the area under the precision–recall curve (auPRC) for each of the protocols. The seventh column displays the number of predictions for the optimal threshold as determined by protocol 2A. The eighth column shows the optimal threshold determined with protocol 2A and the next column the corresponding F-score. The tenth column gives the final score for the promoter predictor as the harmonic mean of the auPRC scores for the four protocols. PPP scores over 25% are indicated in bold. These are the programs we used for in-depth analysis.
Fig. 2.PRCs for all PPPs when evaluated with protocol 2A.
Fig. 3.Positional specificity for predictions around TSRs. The positional specificity is determined by using the optimal threshold as determined with protocol 2A.
Recall score for each of the top four PPPs on each of the four promoter classes and on the Rare and Common TSR set
| Name | SP | PB | MU | BR | Rare | Common |
|---|---|---|---|---|---|---|
| ARTS | 0.58 | 0.90 | 0.93 | 0.95 | 0.23 | 0.81 |
| EP3 | 0.52 | 0.82 | 0.85 | 0.84 | 0.23 | 0.74 |
| Eponine | 0.69 | 0.92 | 0.94 | 0.96 | 0.24 | 0.80 |
| ProSOM | 0.51 | 0.83 | 0.81 | 0.83 | 0.21 | 0.71 |
The recall is calculated with the optimal threshold as determined with protocol 2A.
Pair-wise prediction overlap for the top four programs based on PPP score
| ARTS | EP3 | Eponine | ProSOM | |
|---|---|---|---|---|
| ARTS | 0.57 | 0.29 | 0.74 | |
| EP3 | 0.36 | 0.21 | 0.75 | |
| Eponine | 0.76 | 0.85 | 0.97 | |
| ProSOM | 0.37 | 0.59 | 0.18 |
Details on the interpretation of the values can be found in the main text.