| Literature DB >> 29958434 |
Ming Fang1, Xiujuan Lei2, Shi Cheng3, Yuhui Shi4, Fang-Xiang Wu5.
Abstract
Protein essentiality is fundamental to comprehend the function and evolution of genes. The prediction of protein essentiality is pivotal in identifying disease genes and potential drug targets. Since the experimental methods need many investments in time and funds, it is of great value to predict protein essentiality with high accuracy using computational methods. In this study, we present a novel feature selection named Elite Search mechanism-based Flower Pollination Algorithm (ESFPA) to determine protein essentiality. Unlike other protein essentiality prediction methods, ESFPA uses an improved swarm intelligence⁻based algorithm for feature selection and selects optimal features for protein essentiality prediction. The first step is to collect numerous features with the highly predictive characteristics of essentiality. The second step is to develop a feature selection strategy based on a swarm intelligence algorithm to obtain the optimal feature subset. Furthermore, an elite search mechanism is adopted to further improve the quality of feature subset. Subsequently a hybrid classifier is applied to evaluate the essentiality for each protein. Finally, the experimental results show that our method is competitive to some well-known feature selection methods. The proposed method aims to provide a new perspective for protein essentiality determination.Entities:
Keywords: essential protein; feature selection; flower pollination algorithm; machine learning; protein-protein interaction (PPI) network
Mesh:
Substances:
Year: 2018 PMID: 29958434 PMCID: PMC6100311 DOI: 10.3390/molecules23071569
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1The system flowchart of Elite Search mechanism-based Flower Pollination Algorithm (ESFPA) feature selection strategy to predict protein essentiality.
Figure 2Area under the receiver operating characteristic (ROC) curve (AUC) scores for our classifier trained on 10 balanced datasets.
Figure 3Comparison of the balanced dataset with imbalanced dataset.
Figure 4Comparison of our ESFPA method and some well-known feature selection methods. (a) shows the comparison results in terms of Precision, Recall, and F-Measure; (b) shows the comparison results in terms of AUC scores.
Comparing the performance of different feature selection methods on balanced dataset.
| Feature Selection Methods | Evaluation Criterion | ||
|---|---|---|---|
| Precision | Recall | F-Measure | |
| SFS | 0.689 | 0.571 | 0.624 |
| SBS | 0.708 | 0.530 | 0.607 |
| GA | 0.733 | 0.657 | 0.693 |
| ACO | 0.713 | 0.696 | 0.704 |
| PSO | 0.740 | 0.697 | 0.718 |
| FPA | 0.718 | 0.620 | 0.665 |
|
|
|
|
|
Comparing the performance of different feature selection methods on imbalanced dataset.
| Feature Selection Methods | Evaluation Criterion | ||
|---|---|---|---|
| Precision | Recall | F-Measure | |
| SFS | 0.554 | 0.158 | 0.245 |
| SBS | 0.544 | 0.134 | 0.215 |
| GA | 0.654 | 0.275 | 0.387 |
| ACO | 0.693 | 0.269 | 0.388 |
| PSO | 0.708 | 0.314 | 0.436 |
| FPA | 0.691 | 0.282 | 0.400 |
|
|
|
|
|
Figure 5AUC scores of our selected features and those using individual features.
Comparing the performance evaluations of different feature subsets.
| Feature Subsets | Evaluation Criterion | ||
|---|---|---|---|
| Precision | Recall | F-Measure | |
| Degree | 0.667 | 0.598 | 0.631 |
| EC | 0.660 | 0.401 | 0.499 |
| SC | 0.665 | 0.376 | 0.481 |
| PeC | 0.747 | 0.341 | 0.468 |
| ION | 0.717 | 0.655 | 0.684 |
| Modules | 0.715 | 0.530 | 0.609 |
| UDoNC | 0.759 | 0.459 | 0.572 |
| Nucleus | 0.654 | 0.638 | 0.646 |
| Cytosol | 0.594 | 0.181 | 0.277 |
| Mitochondrion | 0.512 | 0.848 | 0.639 |
| Endoplasmic reticulum | 0.484 | 0.352 | 0.408 |
| Cytoskeleton | 0.736 | 0.081 | 0.147 |
| Golgi apparatus | 0.492 | 0.669 | 0.567 |
|
|
|
|
|
Comparing the performance of diverse classifiers.
| Feature | Classifier | Pre | Rec | F | Feature | Classifier | Pre | Rec | F |
|---|---|---|---|---|---|---|---|---|---|
| Degree | NB | 0.714 | 0.227 | 0.345 | EC | NB | 0.719 | 0.162 | 0.264 |
| SMO | 0.686 | 0.150 | 0.246 | SMO | 0.736 | 0.127 | 0.216 | ||
| J48 | 0.667 | 0.598 | 0.631 | J48 | 0.653 | 0.509 | 0.572 | ||
| LMT | 0.667 | 0.598 | 0.631 | LMT | 0.634 | 0.546 | 0.587 | ||
| RF | 0.661 | 0.576 | 0.615 | RF | 0.533 | 0.522 | 0.528 | ||
| RT | 0.664 | 0.586 | 0.623 | RT | 0.531 | 0.524 | 0.528 | ||
| REPT | 0.658 | 0.599 | 0.627 | REPT | 0.629 | 0.543 | 0.583 | ||
| NC | 0.667 | 0.598 | 0.631 | NC | 0.660 | 0.401 | 0.499 | ||
| SC | NB | 0.670 | 0.054 | 0.100 | PeC | NB | 0.810 | 0.244 | 0.375 |
| SMO | 0.714 | 0.017 | 0.033 | SMO | 0.846 | 0.175 | 0.290 | ||
| J48 | 0.655 | 0.513 | 0.575 | J48 | 0.759 | 0.295 | 0.425 | ||
| LMT | 0.638 | 0.542 | 0.586 | LMT | 0.682 | 0.402 | 0.506 | ||
| RF | 0.532 | 0.530 | 0.531 | RF | 0.542 | 0.538 | 0.540 | ||
| RT | 0.529 | 0.532 | 0.531 | RT | 0.541 | 0.538 | 0.540 | ||
| REPT | 0.618 | 0.559 | 0.587 | REPT | 0.660 | 0.402 | 0.499 | ||
| NC | 0.665 | 0.376 | 0.481 | NC | 0.747 | 0.341 | 0.468 | ||
| ION | NB | 0.722 | 0.638 | 0.678 | Modules | NB | 0.715 | 0.530 | 0.609 |
| SMO | 0.715 | 0.663 | 0.688 | SMO | 0.715 | 0.530 | 0.609 | ||
| J48 | 0.711 | 0.626 | 0.665 | J48 | 0.715 | 0.530 | 0.609 | ||
| LMT | 0.707 | 0.679 | 0.692 | LMT | 0.715 | 0.530 | 0.609 | ||
| RF | 0.662 | 0.568 | 0.611 | RF | 0.715 | 0.530 | 0.609 | ||
| RT | 0.659 | 0.584 | 0.619 | RT | 0.715 | 0.530 | 0.609 | ||
| REPT | 0.708 | 0.638 | 0.671 | REPT | 0.715 | 0.530 | 0.609 | ||
| NC | 0.717 | 0.655 | 0.684 | NC | 0.715 | 0.530 | 0.609 | ||
| Nucleus | NB | 0.654 | 0.638 | 0.646 | UDoNC | NB | 0.828 | 0.226 | 0.355 |
| SMO | 0.654 | 0.638 | 0.646 | SMO | 0.902 | 0.039 | 0.076 | ||
| J48 | 0.654 | 0.638 | 0.646 | J48 | 0.722 | 0.544 | 0.620 | ||
| LMT | 0.654 | 0.638 | 0.646 | LMT | 0.716 | 0.553 | 0.624 | ||
| RF | 0.654 | 0.638 | 0.646 | RF | 0.693 | 0.693 | 0.577 | ||
| RT | 0.654 | 0.638 | 0.646 | RT | 0.689 | 0.495 | 0.576 | ||
| REPT | 0.654 | 0.638 | 0.646 | REPT | 0.690 | 0.589 | 0.636 | ||
| NC | 0.654 | 0.638 | 0.646 | NC | 0.759 | 0.459 | 0.572 | ||
| Feature subset | NB | 0.807 | 0.380 | 0.516 | |||||
| SMO | 0.740 | 0.710 | 0.725 | ||||||
| J48 | 0.728 | 0.717 | 0.722 | ||||||
| LMT | 0.736 | 0.723 | 0.729 | ||||||
| RF | 0.710 | 0.729 | 0.720 | ||||||
| RT | 0.644 | 0.661 | 0.652 | ||||||
| REPT | 0.705 | 0.689 | 0.697 | ||||||
| NC | 0.745 | 0.715 | 0.730 |