| Literature DB >> 35684624 |
Majed Alsanea1, Abdulsalam S Dukyil2, Bushra Riaz3, Farhan Alebeisat4, Muhammad Islam5, Shabana Habib6.
Abstract
In the modern technological era, Anti-cancer peptides (ACPs) have been considered a promising cancer treatment. It's critical to find new ACPs to ensure a better knowledge of their functioning processes and vaccine development. Thus, timely and efficient ACPs using a computational technique are highly needed because of the enormous peptide sequences generated in the post-genomic era. Recently, numerous adaptive statistical algorithms have been developed for separating ACPs and NACPs. Despite great advancements, existing approaches still have insufficient feature descriptors and learning methods, limiting predictive performance. To address this, a trustworthy framework is developed for the precise identification of ACPs. Particularly, the presented approach incorporates four hypothetical feature encoding mechanisms namely: amino acid, dipeptide, tripeptide, and an improved version of pseudo amino acid composition are applied to indicate the motif of the target class. Moreover, principal component analysis (PCA) is employed for feature pruning, while selecting optimal, deep, and highly variated features. Due to the diverse nature of learning, experiments are performed over numerous algorithms to select the optimum operating method. After investigating the empirical outcomes, the support vector machine with hybrid feature space shows better performance. The proposed framework achieved an accuracy of 97.09% and 98.25% over the benchmark and independent datasets, respectively. The comparative analysis demonstrates that our proposed model outperforms as compared to the existing methods and is beneficial in drug development, and oncology.Entities:
Keywords: anticancer peptides; artificial intelligence; biomedicine; machine learning; statistical approach
Mesh:
Substances:
Year: 2022 PMID: 35684624 PMCID: PMC9185351 DOI: 10.3390/s22114005
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Various possible treatment options for cancer using peptides sequence.
Existing approaches for the prediction of ACPs and NACPs using ML techniques.
| References | Features | Evaluation | Classifier |
|---|---|---|---|
| [ | PseACC, g-gap dipeptide | Accuracy, Sensitivity, Specificity, MCC | SVM |
| [ | AAC, DPC, ATC, and PCP | ------ | SVM, RFT |
| [ | g-gap dipeptide | Accuracy, Sensitivity, Specificity, MCC, F1-score | SVM |
| [ | Composition-based, physicochemical properties and profiles | Accuracy, Sensitivity, Specificity, MCC, AUC, | SVM, LR, KNN, RF |
| [ | AAC, Conjoint triad, PAAC, GAAC | Accuracy, Sensitivity, Specificity, MCC, F1-score | SVM, RFT, LibD3C |
| [ | K-space amino acid pair, | Accuracy, Sensitivity, Specificity, MCC | SVM, RFT, FKNN |
| [ | AAC, DPC, Terminus composition, binary profile | ------ | Tree based |
| [ | Binary profile, DPC | Accuracy, Sensitivity, Specificity, MCC, AUC | SVM |
| [ | ReduceAAC, AAC, average chemical shift | Sensitivity, Specificity, MCC, QA | SVM |
| [ | PAAC, RAAC, g-gap dipeptide | Accuracy, Sensitivity, Specificity, MCC | SVM, KNN, PNN, RF, GRNN |
| [ | Pseudo position specific scoring matrix, Composite protein sequence, Split-AAC | Accuracy, Sensitivity, Specificity, MCC, G-mean, F-measure, Precision, Recall | SVM, KNN, PNN |
| [ | Protein relatedness measure | Sensitivity, Specificity, MCC, AUC, Overall accuracy | SVM, AdaBoost |
| [ | PAAC, Local alignment kernel | Accuracy, Sensitivity, Specificity, MCC | SVM |
Figure 2The proposed framework for the classification of ACPs and NCPs.
The detailed statistics of ACP and NACPs of two peptide sequences datasets.
| Datasets | ACP | NACPs | Total Data | Training Data | Testing Data |
|---|---|---|---|---|---|
| PSD1 (Benchmark) [ | 138 | 206 | 344 | 241 | 103 |
| PSD2 (Main dataset) [ | 225 | 2250 | 2475 | 1732 | 743 |
Empirical results over numerous collections of feature extraction techniques using PCA and ensemble classifier where bold value represents the best performance.
| Method | PSD1 | ||||
|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | MCC | F1-Score | |
| AAC | 86.41 | 88.24 | 85.51 | 0.71 | 81.08 |
| DPC | 88.35 | 91.18 | 96.86 | 0.75 | 83.78 |
| TPC | 85.44 | 87.88 | 84.29 | 0.69 | 79.45 |
| IPseAAC | 88.35 | 83.33 | 91.80 | 0.75 | 85.37 |
| AAC+DPC | 92.23 | 92.11 | 92.31 | 0.83 | 89.74 |
| AAC+TPC | 90.29 | 84.09 | 94.92 | 0.80 | 88.10 |
| AAC+IPseAAC | 91.26 | 84.44 | 96.55 | 0.82 | 89.41 |
| DPC+TPC | 89.32 | 89.19 | 89.39 | 0.77 | 85.71 |
| DPC+IPseAAC | 93.20 | 88.37 | 96.67 | 0.86 | 91.57 |
| TPC+IPseAAC | 91.29 | 89.74 | 92.19 | 0.82 | 88.61 |
|
|
|
|
|
|
|
| Proposed Model ( | 96.12 | 92.86 | 98.36 | 0.82 | 95.12 |
| Proposed Model ( | 95.15 | 94.87 | 95.31 | 0.90 | 93.67 |
Figure 3Performance evaluation of the proposed model over testing data of benchmark. Addition of physicochemical properties using the concept (a) (λ = 1); (b) (λ = 1); and (c) (λ = 1).
Empirical results of numerous collections of feature extraction techniques over the main dataset using PCA and ensemble classifier where bold value represents the best performance.
| Method | PSD2 | ||||
|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | MCC | F1-Score | |
| AAC | 88.96 | 58.82 | 93.76 | 0.53 | 59.41 |
| DPC | 91.25 | 68.42 | 94.60 | 0.61 | 66.67 |
| TPC | 86.27 | 48.94 | 91.68 | 0.39 | 47.42 |
| IPseAAC | 90.98 | 66.67 | 94.72 | 0.61 | 66.33 |
| AAC+DPC | 93.00 | 75.53 | 95.53 | 0.69 | 73.20 |
| AAC+TPC | 94.05 | 79.79 | 96.12 | 0.73 | 77.32 |
| AAC+ IPseAAC | 92.87 | 72.82 | 96.09 | 0.69 | 73.89 |
| DPC+TPC | 95.29 | 84.21 | 96.91 | 0.79 | 82.05 |
| DPC+ IPseAAC | 92.73 | 71.30 | 96.38 | 0.69 | 74.04 |
| TPC+ IPseAAC | 91.79 | 80.00 | 92.92 | 0.60 | 63.03 |
|
|
|
|
|
|
|
| Proposed Model ( | 96.90 | 91.40 | 97.69 | 0.86 | 88.08 |
| Proposed Model ( | 94.89 | 80.39 | 97.19 | 0.78 | 81.19 |
Figure 4Performance evaluation of the proposed model over testing data of independent. Addition of physicochemical properties using the concept of (a) (λ = 1); (b) (λ = 1); and (c) (λ = 1).
Performance comparison of the proposed model with SOTA methods using PSD1 dataset, where the best result is highlighted in bold.
| Model/Year | Accuracy | Sensitivity | Specificity | MCC | F1-Score |
|---|---|---|---|---|---|
| SPAP [ | 87.00 | 92.00 | 86.00 | 0.74 | - |
| LAK [ | 92.68 | 89.70 | 85.18 | 0.78 | - |
| iACP [ | 95.06 | 89.86 | 98.54 | 0.89 | - |
| IAP [ | 93.61 | 89.86 | 96.12 | 0.86 | - |
| iACP-GAEnsC [ | 96.45 | 95.36 | 97.57 | 0.91 | - |
| SAP [ | 91.86 | 86.23 | 95.63 | 0.83 | 89.47 |
| LDFM [ | 92.73 | 87.70 | 96.10 | 0.84 | 92.70 |
|
|
|
|
|
|
|
| Proposed Model ( | 96.12 | 92.86 | 98.36 | 0.91 | 95.12 |
| Proposed Model ( | 95.15 | 94.87 | 95.31 | 0.89 | 93.67 |
Performance comparison of the proposed model with SOTA methods using PSD1 dataset, where the best result is highlighted in bold.
| Model/Year | Accuracy | Sensitivity | Specificity | MCC | F1-Score |
|---|---|---|---|---|---|
| NT5CT5 [ | 92.65 | 74.67 | 94.44 | 0.61 | - |
| GCGR [ | 96.36 | 69.33 | 99.07 | 0.76 | - |
| cACP [ | 96.91 | 77.32 | 98.12 | 0.79 | - |
| ACP-MHCNN [ | 91.0 | 97.6 | 84.2 | 0.82 | - |
|
|
|
|
|
|
|
| Proposed ( | 96.90 | 91.40 | 97.69 | 0.86 | 88.08 |
| Proposed (λ = 3) | 94.89 | 80.39 | 97.19 | 0.78 | 81.19 |