| Literature DB >> 29219070 |
Wen Zhang1, Xiaopeng Zhu2, Yu Fu3, Junko Tsuji3, Zhiping Weng3.
Abstract
BACKGROUND: Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are still unknown. In order to guide wet experiments, we develop computational methods to predict human splicing branchpoints.Entities:
Keywords: Genetic algorithm; Human splicing branchpoint; Logistic regression; Multi-label learning
Mesh:
Year: 2017 PMID: 29219070 PMCID: PMC5773893 DOI: 10.1186/s12859-017-1875-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Distribution of branchpoints near to 3′ of introns
Fig. 2Multi-label learning for the branchpoint prediction
Fig. 3Constructing base predictors by combining feature subsets and multi-label learning methods
Fig. 4AUP scores and AUC scores of individual feature-based models. Markov: Markov motif profile, PWM: position weight matrix profile, DN: dinucleotide profile, SP: sparse profile, PPT: polypyrimidine tract, combination: combining all features
The performance of multi-label learning methods based on different features
| Method | Feature | Recall | Precision | ACC | F | AUC | AUPR |
|---|---|---|---|---|---|---|---|
| PLS | Markov | 0.508 | 0.478 | 0.961 | 0.473 | 0.879 | 0.476 |
| PWM | 0.521 | 0.454 | 0.958 | 0.465 | 0.868 | 0.455 | |
| DN | 0.534 | 0.437 | 0.957 | 0.461 | 0.877 | 0.461 | |
| SP | 0.545 | 0.455 | 0.958 | 0.477 | 0.874 | 0.483 | |
| PPT | 0.574 | 0.098 | 0.787 | 0.170 | 0.698 | 0.103 | |
| CCA | Markov | 0.529 | 0.461 | 0.959 | 0.472 | 0.880 | 0.476 |
| PWM | 0.521 | 0.453 | 0.958 | 0.465 | 0.868 | 0.455 | |
| DN | 0.566 | 0.423 | 0.955 | 0.466 | 0.883 | 0.468 | |
| SP | 0.533 | 0.466 | 0.960 | 0.477 | 0.878 | 0.485 | |
| PPT | 0.488 | 0.118 | 0.844 | 0.182 | 0.703 | 0.114 | |
| LSCCA | Markov | 0.502 | 0.501 | 0.963 | 0.482 | 0.882 | 0.486 |
| PWM | 0.516 | 0.471 | 0.960 | 0.472 | 0.871 | 0.467 | |
| DN | 0.546 | 0.442 | 0.957 | 0.469 | 0.883 | 0.453 | |
| SP | 0.513 | 0.513 | 0.963 | 0.494 | 0.882 | 0.487 | |
| PPT | 0.472 | 0.085 | 0.790 | 0.129 | 0.690 | 0.086 |
Performances of different feature combination models
| Feature | Recall | Precision | ACC | F | AUC | AUPR |
|---|---|---|---|---|---|---|
| SP | 0.545 | 0.455 | 0.958 | 0.477 | 0.874 | 0.483 |
| SP + Markov | 0.528 | 0.479 | 0.961 | 0.482 | 0.887 | 0.492 |
| SP + Markov + DN | 0.530 | 0.484 | 0.961 | 0.486 | 0.889 | 0.498 |
| SP + Markov + DN + PWM | 0.505 | 0.507 | 0.963 | 0.487 | 0.889 | 0.500 |
| All | 0.532 | 0.478 | 0.961 | 0.484 | 0.884 | 0.494 |
Markov Markov motif profile, PWM position weight matrix profile, DN dinucleotide profile, SP sparse profile, PPT polypyrimidine tract, combination combining all features
Performances of ensemble methods and best individual feature-based models
| Feature | Recall | Precision | ACC | F | AUC | AUPR |
|---|---|---|---|---|---|---|
| Markov | 0.502 | 0.501 | 0.963 | 0.482 | 0.882 | 0.486 |
| PWM | 0.516 | 0.471 | 0.960 | 0.472 | 0.871 | 0.467 |
| DN | 0.546 | 0.442 | 0.957 | 0.469 | 0.883 | 0.453 |
| SP | 0.513 | 0.513 | 0.963 | 0.494 | 0.882 | 0.487 |
| LREM | 0.529 | 0.537 | 0.965 | 0.515 | 0.904 | 0.532 |
| GAEM | 0.482 | 0.541 | 0.965 | 0.493 | 0.891 | 0.512 |
Fig. 5Weights in the GAEM model
Fig. 6Correctly predicted branchpoints for all BPs and different types of BPs
Fig. 7Ratio of correctly identified BPs versus number of checked top predictions
Performances of our ensemble methods and the benchmark method
| Evaluation | Local evaluation | Global evaluation | ||
|---|---|---|---|---|
| Method | AUPR | AUC | AUPR | AUC |
| SVMBPfinder | 0.502 | 0.576 | 0.362 | 0.729 |
| LREM | 0.524 | 0.906 | 0.532 | 0.904 |
| GAEM | 0.500 | 0.892 | 0.512 | 0.891 |