| Literature DB >> 34880291 |
Sajid Ahmed1, Rafsanjani Muhammod1, Zahid Hossain Khan1, Sheikh Adilina1, Alok Sharma2,3, Swakkhar Shatabda4, Abdollah Dehzangi5,6.
Abstract
Although advancing the therapeutic alternatives for treating deadly cancers has gained much attention globally, still the primary methods such as chemotherapy have significant downsides and low specificity. Most recently, Anticancer peptides (ACPs) have emerged as a potential alternative to therapeutic alternatives with much fewer negative side-effects. However, the identification of ACPs through wet-lab experiments is expensive and time-consuming. Hence, computational methods have emerged as viable alternatives. During the past few years, several computational ACP identification techniques using hand-engineered features have been proposed to solve this problem. In this study, we propose a new multi headed deep convolutional neural network model called ACP-MHCNN, for extracting and combining discriminative features from different information sources in an interactive way. Our model extracts sequence, physicochemical, and evolutionary based features for ACP identification using different numerical peptide representations while restraining parameter overhead. It is evident through rigorous experiments using cross-validation and independent-dataset that ACP-MHCNN outperforms other models for anticancer peptide identification by a substantial margin on our employed benchmarks. ACP-MHCNN outperforms state-of-the-art model by 6.3%, 8.6%, 3.7%, 4.0%, and 0.20 in terms of accuracy, sensitivity, specificity, precision, and MCC respectively. ACP-MHCNN and its relevant codes and datasets are publicly available at: https://github.com/mrzResearchArena/Anticancer-Peptides-CNN . ACP-MHCNN is also publicly available as an online predictor at: https://anticancer.pythonanywhere.com/ .Entities:
Mesh:
Substances:
Year: 2021 PMID: 34880291 PMCID: PMC8654959 DOI: 10.1038/s41598-021-02703-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The general architecture of ACP-MHCNN. We extract BPF, physicochemical, and evolutionary-based features. We then feed the extracted features to a multi-headed deep convolutional neural network (MHCNN) to predict Anti-Cancer peptides.
Summary of seven combinations of the three sequence representations explored in this study.
| Combination number | Feature encoding technique | Number of convolutional layer groups |
|---|---|---|
| C1 | BPF | 1 |
| C2 | Physicochemical Properties | 1 |
| C3 | Evolutionary Information | 1 |
| C4 | BPF & Physicochemical Properties | 2 |
| C5 | BPF & Evolutionary Information | 2 |
| C6 | Physicochemical Properties & Evolutionary Information | 2 |
| C7 | BPF & Physicochemical Properties & Evolutionary Information | 3 |
On the First column of the table, we present the name of the combination, on the second column we present the name of the representations used to build the given combination, and in the third column we present the number of convolutional groups for the given combination.
Hyperparameter configurations employed for different ACP datasets.
| ACP-740 and ACP-240 | ACP-500 and ACP-164 | ||||
|---|---|---|---|---|---|
| Conv-1 | Conv- | ||||
| filter = 10 | kernel = 4 | drop = 0.8 | filter = 16 | kernel = 3 | drop = 0.7 |
| Conv-2 | Conv-2 | ||||
| filter = 8 | kernel = 3 | drop = 0.7 | filter = 8 | kernel = 3 | drop = 0.5 |
| Conv-1 | Conv-1 | ||||
| filter = 10 | kernel = 4 | drop = 0.8 | filter = 16 | kernel = 3 | drop = 0.7 |
| Conv-2 | Conv-2 | ||||
| filter = 8 | kernel = 3 | drop = 0.7 | filter = 8 | kernel = 3 | drop = 0.5 |
| Conv-1 | Conv-1 | ||||
| filter = 10 | kernel = 4 | drop = 0.8 | filter = 16 | kernel = 3 | drop = 0.7 |
| Conv-2 | Conv-2 | ||||
| filter = 8 | kernel = 3 | drop = 0.7 | filter = 8 | kernel = 3 | drop = 0.5 |
| Dense-1 | Dense-1 | ||||
| units = 8 | drop = 0.7 | units = 16 | drop = 0.6 | ||
| Dense-2 | |||||
| units = 8 | drop = 0.5 | ||||
In this table, ‘Conv’ = a convolutional layer, ‘Dense’ = a fully connected layer, ‘filter’ = number of filters in a convolutional layer, ‘kernel’ = size of filters in a convolutional layer, ‘drop’ = dropout rate, and ‘units’ = number of neurons in a fully connected layer.
Results achieved using fivefold cross validation for ACP-740 dataset for different input feature groups.
| Combination | Accuracy (STD) | Sensitivity (STD) | Specificity (STD) | Precision (STD) | MCC (STD) |
|---|---|---|---|---|---|
| C1 | 76.0 (2.9) | 78.9 (7.8) | 73.0 (8.1) | 75.0 (6.2) | 0.52 (0.02) |
| C2 | 73.1 (4.8) | 74.7 (13.5) | 71.3 (11.6) | 72.8 (11.6) | 0.46 (0.11) |
| C3 | 81.1 (3.1) | 81.3 (3.7) | 80.7 (3.7) | 81.3 (4.1) | 0.62 (0.05) |
| C4 | 76.9 (2.9) | 75.7 (7.5) | 78.4 (2.9) | 78.2 (2.5) | 0.54 (0.05) |
| C5 | 84.0 (3.7) | 87.6 (8.3) | 80.3 (4.2) | 82.0 (3.7) | 0.68 (0.07) |
| C6 | 81.8 (3.2) | 82.9 (3.3) | 81.1 (5.2) | 81.8 (4.2) | 0.64 (0.07) |
| C7 |
The STD is also presented in the brackets for each measurement.
Bold items indicate the best values found by the methods.
Results achieved using fivefold cross validation for ACP-240 dataset for different input feature groups.
| Combination | Accuracy (STD) | Sensitivity (STD) | Specificity (STD) | Precision (STD) | MCC (STD) |
|---|---|---|---|---|---|
| C1 | 73.5 (3.1) | 82.7 (9.9) | 63.6 (8.8) | 72.9 (9.4) | 0.47 (0.06) |
| C2 | 71.2 (4.5) | 82.3 (11.0) | 59.6 (14.9) | 70.6 (4.6) | 0.43 (0.07) |
| C3 | 79.1 (2.1) | 84.6 (6.0) | 72.7 (6.0) | 78.6 (5.9) | 0.58 (0.08) |
| C4 | 75.1 (4.4) | 84.6 (4.4) | 63.6 (7.1) | 73.3 (6.4) | 0.50 (0.08) |
| C5 | 79.9 (2.3) | 85.4 (5.9) | 73.6 (15.8) | 79.3 (1.1) | 0.60 (0.08) |
| C6 | 81.5 (1.9) | 83.2 (8.6) | 0.63 (0.08) | ||
| C7 | 75.6 (3.5) | 81.1 (3.9) |
The STD is also presented in the brackets for each measurement.
Bold items indicate the best values found by the methods.
Results achieved using independent test for ACP-500/164 dataset.
| Combination | Accuracy | Sensitivity | Specificity | Precision | MCC |
|---|---|---|---|---|---|
| C1 | 83.8 | 85.4 | 81.6 | 82.3 | 0.67 |
| C2 | 74.2 | 77.9 | 70.6 | 72.6 | 0.49 |
| C3 | 89.0 | 91.4 | 86.6 | 87.2 | 0.78 |
| C4 | 85.6 | 88.7 | 82.6 | 83.6 | 0.71 |
| C5 | 90.0 | 93.7 | 86.3 | 87.3 | 0.80 |
| C6 | 88.4 | 89.4 | 0.76 | ||
| C7 | 84.2 | 86.0 |
Model trained on ACP-500 and tested on ACP-164.
Bold items indicate the best values found by the methods.
Figure 2ROC curve for ACP-740 dataset for the fivefold cross-validation on the experiment. As shown in these figures, we constantly achieve very high Area Under the Curve (AUC) value.
Figure 3ROC curve for ACP-240 dataset for the fivefold cross-validation on the experiment. Similar to the results reported for ACP-740 dataset, we constantly achieve very high Area Under the Curve (AUC) value.
Figure 4ROC curve for ACP-500/164. Here we used ACP-500 as a training dataset and ACP-164 as a testing dataset on the experiment.
Results achieved using fivefold cross validation for ACP-740 dataset (Complete sequences utilized instead of 15 N-terminus amino acids).
| Combination | Accuracy (STD) | Sensitivity (STD) | Specificity (STD) | Precision (STD) | MCC (STD) |
|---|---|---|---|---|---|
| C1 | 78.2 (1.5) | 82.5 (8.2) | 74.1 (8.8) | 77.2 (6.0) | 0.57 (0.03) |
| C2 | 71.1 (5.6) | 69.9 (16.9) | 72.5 (13.7) | 73.9 (13.7) | 0.44 (0.11) |
| C3 | 81.0 (3.3) | 81.4 (4.1) | 81.7 (3.7) | 82.0 (4.5) | 0.63 (0.07) |
| C4 | 77.1 (3.0) | 74.1 (8.0) | 80.8 (3.1) | 79.9 (2.6) | 0.55 (0.06) |
| C5 | 82.9 (4.1) | 86.7 (9.2) | 78.8 (4.7) | 80.9 (3.7) | 0.66 (0.09) |
| C6 | 81.3 (3.8) | 81.6 (3.8) | 81.2 (5.7) | 81.9 (4.3) | 0.63 (0.08) |
| C7 | 83.2 (1.7) | 80.4 (4.5) | 84.8 (5.4) | 84.9 (4.3) | 0.65 (0.03) |
The STD is also presented in the brackets for each measurement.
Results achieved using fivefold cross validation for ACP-240 dataset (Complete sequences utilized instead of 15 N-terminus amino acids).
| Combination | Accuracy (STD) | Sensitivity (STD) | Specificity (STD) | Precision (STD) | MCC (STD) |
|---|---|---|---|---|---|
| C1 | 75.4 (4.3) | 81.6 (8.2) | 71.8 (9.2) | 76.5 (10.9) | 0.53 (0.07) |
| C2 | 62.6 (4.8) | 77.6 (16.9) | 44.5 (15.0) | 63.1 (5.6) | 0.25 (0.08) |
| C3 | 82.1 (4.0) | 86.4 (4.1) | 78.6 (6.7) | 82.3 (6.0) | 0.65 (0.09) |
| C4 | 79.0 (5.5) | 81.9 (8.0) | 75.3 (8.2) | 79.1 (7.1) | 0.57 (0.08) |
| C5 | 78.2 (2.8) | 84.5 (9.2) | 67.0 (13.2) | 77.1 (2.2) | 0.53 (0.08) |
| C6 | 77.0 (4.4) | 81.8 (3.8) | 70.8 (9.2) | 77.2 (5.1) | 0.54 (0.09) |
| C7 | 78.1 (2.8) | 85.6 (4.5) | 68.9 (4.5) | 76.2 (4.5) | 0.56 (0.05) |
The STD is also presented in the brackets for each measurement.
Results achieved using independent test for ACP-500/164 dataset (Complete sequences utilized instead of 15 N-terminus amino acids).
| Combination | Accuracy | Sensitivity | Specificity | Precision | MCC |
|---|---|---|---|---|---|
| C1 | 82.3 | 86.6 | 78.1 | 79.8 | 0.65 |
| C2 | 84.0 | 84.2 | 82.9 | 83.1 | 0.67 |
| C3 | 84.1 | 87.8 | 80.5 | 81.8 | 0.68 |
| C4 | 88.1 | 90.2 | 86.6 | 87.1 | 0.77 |
| C5 | 85.0 | 87.8 | 81.7 | 82.7 | 0.70 |
| C6 | 86.3 | 81.7 | 90.2 | 89.3 | 0.72 |
| C7 | 87.2 | 87.8 | 85.4 | 85.7 | 0.73 |
Model trained on ACP-500 and tested on ACP-164.
The results achieved for ACP-MHCNN compared to traditional ML models on ACP-740, ACP-240, and ACP-500/164 using fivefold cross validation.
| Classifier | ACP-740 dataset | ACP-240 dataset | ACP-500/164 dataset | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Sen | Spe | MCC | Acc | Sen | Spe | MCC | Acc | Sen | Spe | MCC | |
| SVM | 80.4 | 77.6 | 83.2 | 0.61 | 68.7 | 65.1 | 72.9 | 0.38 | 78.0 | 74.3 | 81.7 | 0.56 |
| RF | 81.2 | 79.2 | 84.8 | 0.64 | 71.0 | 72.0 | 74.7 | 0.48 | 84.1 | 82.9 | 85.3 | 0.68 |
| ET | 81.5 | 78.4 | 85.9 | 0.65 | 72.7 | 72.8 | 80.1 | 0.53 | 81.0 | 79.2 | 82.9 | 0.62 |
| XGB | 81.6 | 82.4 | 81.8 | 0.64 | 74.2 | 82.1 | 74.7 | 0.57 | 85.3 | 86.5 | 84.1 | 0.71 |
| KNN | 79.3 | 64.3 | 75.5 | 0.40 | 70.6 | 91.4 | 15.3 | 0.11 | 68.9 | 51.2 | 86.5 | 0.40 |
| DT | 78.4 | 76.8 | 70.8 | 0.48 | 70.9 | 75.1 | 68.4 | 0.44 | 78.6 | 71.9 | 85.3 | 0.58 |
| NB | 78.2 | 80.0 | 73.6 | 0.54 | 70.6 | 75.1 | 62.1 | 0.38 | 71.9 | 74.3 | 69.5 | 0.44 |
| AB | 78.1 | 77.3 | 78.5 | 0.56 | 71.3 | 79.0 | 72.0 | 0.52 | 79.8 | 79.2 | 80.4 | 0.60 |
| ACP-MHCNN | ||||||||||||
Bold items indicate the best values found by the methods.
Comparing the results achieved for ACP-MHCNN to ACP-DL as the state-of-the-art anticancer peptide predictor.
| Dataset | Model | Accuracy | Sensitivity | Specificity | Precision | MCC |
|---|---|---|---|---|---|---|
| ACP-740 | ACP-DL | 80.0 | 81.4 | 78.6 | 79.7 | 0.60 |
| ACP-740 | ACP-MHCNN | |||||
| ACP-240 | ACP-DL | 81.3 | 69.6 | 76.7 | 0.64 | |
| ACP-240 | ACP-MHCNN | 90.1 | ||||
| ACP-500/ACP-164 | ACP-DL | 84.7 | 89.0 | 80.5 | 82.0 | 0.62 |
| ACP-500/ACP-164 | ACP-MHCNN | 84.2 | 86.0 |
Bold items indicate the best values found by the methods.
Comparing the results achieved for ACP-MHCNN to ACP-DL as the state-of-the-art anticancer peptide predictor.
| Dataset | Model | Accuracy | Sensitivity | Specificity | Precision | MCC |
|---|---|---|---|---|---|---|
| AntiCP-2.0 (Main validation) | ACP-DL | 66.0 | 58.1 | 69.4 | 0.33 | |
| AntiCP-2.0 (Main validation) | ACP-MHCNN | 78.5 | 67.4 | |||
| AntiCP-2.0 (Alternate Validation) | ACP-DL | 83.0 | 82.9 | 82.9 | 82.9 | 0.66 |
| AntiCP-2.0 (Alternate Validation) | ACP-MHCNN |
Bold items indicate the best values found by the methods.
Comparing the results achieved for ACP-MHCNN to the state-of-the-art anticancer peptide predictors on the main and alternative datasets used in[16,17,60].
| Methods | Main datasets | Alternative datasets | ||||||
|---|---|---|---|---|---|---|---|---|
| Acc | Sen | Spe | MCC | Acc | Sen | Spe | MCC | |
| ACP-MHCNN | 73.00 | 78.50 | 67.40 | 0.46 | 90.00 | 86.60 | 86.60 | 0.81 |
| AntiCP-2.0 | 75.43 | 77.46 | 73.41 | 0.51 | 92.01 | 92.27 | 92.27 | 0.84 |
| AntiCP | 50.58 | 1.16 | 0.07 | 89.95 | 89.69 | 89.69 | 0.80 | |
| ACPred | 53.47 | 85.55 | 21.39 | 0.09 | 85.31 | 87.11 | 87.11 | 0.71 |
| ACPred-FL | 44.80 | 67.05 | 22.54 | -0.12 | 43.80 | 60.21 | 60.21 | -0.15 |
| ACPpred-Fuse | 68.90 | 69.19 | 68.60 | 0.38 | 78.87 | 64.43 | 64.43 | 0.60 |
| PEPred-Suite | 53.49 | 33.14 | 73.84 | 0.08 | 57.47 | 40.21 | 40.21 | 0.16 |
| iACP | 55.10 | 77.91 | 32.16 | 0.11 | 77.58 | 78.35 | 78.35 | 0.55 |
| iACP-FSCM | 82.50 | 72.60 | 0.65 | 88.90 | 87.60 | 90.20 | 0.78 | |
| ACPred-LAF | 84.24 | 87.20 | ||||||
Bold items indicate the best values found by the methods.