| Literature DB >> 31013619 |
Vinothini Boopathi1, Sathiyamoorthy Subramaniyam2,3, Adeel Malik4, Gwang Lee5, Balachandran Manavalan6, Deok-Chun Yang7.
Abstract
Anticancer peptides (ACPs) are promising therapeutic agents for targeting and killing cancer cells. The accurate prediction of ACPs from given peptide sequences remains as an open problem in the field of immunoinformatics. Recently, machine learning algorithms have emerged as a promising tool for helping experimental scientists predict ACPs. However, the performance of existing methods still needs to be improved. In this study, we present a novel approach for the accurate prediction of ACPs, which involves the following two steps: (i) We applied a two-step feature selection protocol on seven feature encodings that cover various aspects of sequence information (composition-based, physicochemical properties and profiles) and obtained their corresponding optimal feature-based models. The resultant predicted probabilities of ACPs were further utilized as feature vectors. (ii) The predicted probability feature vectors were in turn used as an input to support vector machine to develop the final prediction model called mACPpred. Cross-validation analysis showed that the proposed predictor performs significantly better than individual feature encodings. Furthermore, mACPpred significantly outperformed the existing methods compared in this study when objectively evaluated on an independent dataset.Entities:
Keywords: anticancer peptides; feature selection; optimal features; sequential forward search; support vector machine
Mesh:
Substances:
Year: 2019 PMID: 31013619 PMCID: PMC6514805 DOI: 10.3390/ijms20081964
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Performance of various feature encodings in a 10-fold cross-validation.
Figure 2Comparison of SVM with other classifiers on seven different feature encodings.
Figure 3Sequential forward search for discriminating between anticancer peptides (ACPs) and non-ACPs. The maximum accuracy obtained from 10-fold cross-validation is shown for each feature encoding.
The best performance achieved by various feature encodings using optimal features.
| Feature Encoding | Dimension | MCC | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|---|
| AAC | 20 | 0.763 | 0.882 | 0.876 | 0.887 |
| DPC | 135 | 0.762 | 0.880 | 0.838 | 0.921 |
| CTD | 140 | 0.711 | 0.853 | 0.842 | 0.865 |
| AAIF | 143 | 0.775 | 0.887 | 0.872 | 0.902 |
| QSO | 99 | 0.734 | 0.867 | 0.846 | 0.887 |
| CTF | 133 | 0.698 | 0.844 | 0.929 | 0.759 |
| NC5 | 54 | 0.706 | 0.852 | 0.808 | 0.880 |
Figure 4Performance comparison between the optimal feature set-based model against the respective controls (using all features).
Figure 5(A) Performance comparison of mACPpred with the single feature models, based on optimal features. (B) Performance comparison between mACPpred and hybrid features-based models.
Performance of various methods on the independent dataset.
| Methods | MCC | Accuracy | Sensitivity | Specificity | AUC | |
|---|---|---|---|---|---|---|
| mACPpred | 0.829 | 0.914 | 0.885 | 0.943 | 0.967 | – |
| SVMACP [ | 0.592 | 0.768 | 0.554 | 0.981 | 0.896 | 0.000382 |
| RFACP [ | 0.511 | 0.707 | 0.414 | 1.000 | 0.891 | 0.000401 |
| iACP [ | 0.338 | 0.667 | 0.580 | 0.753 | 0.747 | <0.00001 |
Figure 6Comparison of binormal receiver operating characteristics (ROC) curves for ACPs prediction using different methods on independent dataset.
Classification of 20 amino acids according to the seven specific types of physicochemical properties.
| Properties | Class1 | Class2 | Class3 |
|---|---|---|---|
| Hydrophobicity | Polar | Neutral | Hydrophobicity |
| Normalized Van der Waals volume | 0–2.78 | 2.95–4.0 | 4.03–8.08 |
| Polarity | 4.9–6.2 | 8.0–9.2 | 10.4–13.0 |
| Polarizability | 0–0.108 | 0.128–0.186 | 0.219–0.409 |
| Charge | Positive | Neutral | Negative |
| Secondary Structure | Helix | Strand | Coil |
| Solvent Accessibility | Buried | Exposed | Intermediate |