| Literature DB >> 27245069 |
Lina Zhang1, Chengjin Zhang2,3, Rui Gao1, Runtao Yang1, Qing Song4.
Abstract
BACKGROUND: Aptamer-protein interacting pairs play a variety of physiological functions and therapeutic potentials in organisms. Rapidly and effectively predicting aptamer-protein interacting pairs is significant to design aptamers binding to certain interested proteins, which will give insight into understanding mechanisms of aptamer-protein interacting pairs and developing aptamer-based therapies.Entities:
Keywords: Aptamer-protein interacting pairs; Ensemble method; Hybrid features; Imbalanced data problem
Mesh:
Substances:
Year: 2016 PMID: 27245069 PMCID: PMC4888498 DOI: 10.1186/s12859-016-1087-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The structures of aptamers binding to specific targets
Fig. 2The whole procedure of the ensemble RF classifier. PseKNC: Pseudo K-tuple Nucleotide Composition; DCT: Discrete Cosine Transform; Bi-gram PSSM: Bi-gram Position Specific Scoring Matrix; IFS: Incremental Feature Selection; RF: Random Forest
Prediction performance of the ensemble RF models using various feature spaces by 10-fold cross validation
| Features |
|
|
|
| Youden’s index |
|---|---|---|---|---|---|
| PseKNC+DCT | 0.621 | 0.641 | 0.636 | 0.229 | 0.261 |
| PseKNC+Bi-gram PSSM | 0.660 | 0.673 | 0.670 | 0.293 | 0.333 |
| PseKNC+Disorder | 0.103 | 0.321 | 0.267 | -0.499 | -0.575 |
| PseKNC+DCT+Bi-gram PSSM | 0.693 | 0.666 | 0.673 | 0.315 | 0.359 |
| PseKNC+DCT +Disorder | 0.597 | 0.616 | 0.611 | 0.186 | 0.213 |
| PseKNC+Bi-gram PSSM+Disorder | 0.671 | 0.675 | 0.674 | 0.304 | 0.345 |
| PseKNC+DCT+Bi-gram PSSM+Disorder | 0.7 | 0.680 | 0.685 | 0.334 | 0.380 |
Prediction results with or without the ensemble method
| Method |
|
|
|
| Youden’s index |
|---|---|---|---|---|---|
| With ensemble | 0.7 | 0.680 | 0.685 | 0.334 | 0.380 |
| Without ensemble | 0.3 | 0.993 | 0.819 | 0.465 | 0.293 |
Fig. 3The IFS(Incremental Feature Selection) curve. The values of Youden’s index against the number of features
Performance of the ensemble FR classifier with and without feature selection
| Method | No. of features |
|
|
|
| Youden’s index |
|---|---|---|---|---|---|---|
| Without feature selection | 654 | 0.7 | 0.680 | 0.685 | 0.334 | 0.380 |
| With feature selection | 304 | 0.753 | 0.725 | 0.732 | 0.424 | 0.479 |
Fig. 4The feature type distribution of the original features and the optimal features
Performance comparison with the existing method on the training dataset by 10-fold cross validation
| Method |
|
|
|
| Youden’s index |
|---|---|---|---|---|---|
| [ | 0.488 | 0.922 | 0.813 | 0.461 | 0.410 |
| This study | 0.753 | 0.725 | 0.732 | 0.424 | 0.479 |
Performance comparison with the existing method on the independent testing dataset
| Method |
|
|
|
| Youden’s index |
|---|---|---|---|---|---|
| [ | 0.483 | 0.871 | 0.774 | 0.372 | 0.354 |
| This study | 0.738 | 0.713 | 0.719 | 0.398 | 0.451 |