| Literature DB >> 35268644 |
Mingwei Sun1,2, Sen Yang3, Xuemei Hu1, You Zhou4.
Abstract
Cancer is one of the most dangerous threats to human health. One of the issues is drug resistance action, which leads to side effects after drug treatment. Numerous therapies have endeavored to relieve the drug resistance action. Recently, anticancer peptides could be a novel and promising anticancer candidate, which can inhibit tumor cell proliferation, migration, and suppress the formation of tumor blood vessels, with fewer side effects. However, it is costly, laborious and time consuming to identify anticancer peptides by biological experiments with a high throughput. Therefore, accurately identifying anti-cancer peptides becomes a key and indispensable step for anticancer peptides therapy. Although some existing computer methods have been developed to predict anticancer peptides, the accuracy still needs to be improved. Thus, in this study, we propose a deep learning-based model, called ACPNet, to distinguish anticancer peptides from non-anticancer peptides (non-ACPs). ACPNet employs three different types of peptide sequence information, peptide physicochemical properties and auto-encoding features linking the training process. ACPNet is a hybrid deep learning network, which fuses fully connected networks and recurrent neural networks. The comparison with other existing methods on ACPs82 datasets shows that ACPNet not only achieves the improvement of 1.2% Accuracy, 2.0% F1-score, and 7.2% Recall, but also gets balanced performance on the Matthews correlation coefficient. Meanwhile, ACPNet is verified on an independent dataset, with 20 proven anticancer peptides, and only one anticancer peptide is predicted as non-ACPs. The comparison and independent validation experiment indicate that ACPNet can accurately distinguish anticancer peptides from non-ACPs.Entities:
Keywords: anticancer peptides; deep learning; multi-view information
Mesh:
Substances:
Year: 2022 PMID: 35268644 PMCID: PMC8912097 DOI: 10.3390/molecules27051544
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Datasets for train, test and independent validation of ACPNet.
| ACPs Number | Non-ACPs Number | Average Length | Max Length | Min Length | |
|---|---|---|---|---|---|
| ACPs250 | 250 | 250 | 27 | 97 | 11 |
| ACPs82 | 82 | 82 | 27 | 207 | 11 |
| ACPs20 | 10 | 10 | 24 | 47 | 13 |
Three kinds of hybrid features to encode peptide sequences.
| Feature Types | Feature Name | Dimensions |
|---|---|---|
| Sequence features | PAAC | 30 |
| Length | 1 | |
| Shannon entropy | 1 | |
| Peptide physicochemical properties | Gravy | 1 |
| Molecular_weight | 1 | |
| Charge_at_pH(10) | 1 | |
| Embedding features | Position embedding | 50 |
Figure 1The process to auto-embed a peptide sequence to a matrix.
Figure 2The overall workflow of ACPNet.
The performance results compared by three types of combinations on ACPs82.
| TP | TN | FP | FN | Accuracy | F1-Score | Recall | Precise | MCC | AUC | PRAUC | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| MS | 65 | 68 | 14 | 17 | 81.0 | 80.7 | 79.2 | 82.2 | 0.622 | 0.841 | 0.832 |
| AE | 67 | 73 | 9 | 15 | 85.3 | 84.8 | 81.7 | 88.1 | 0.709 | 0.867 | 0.878 |
| MS + AE | 72 | 75 | 7 | 10 | 89.6 | 89.4 | 87.8 | 90.1 | 0.793 | 0.945 | 0.947 |
MS means manually selected features, AE means automatic learning features.
Figure 3The importance rank of manually selected features.
Figure 4The importance rank of manually selected features.
Comparing results with traditional machine learning.
| TP | TN | FP | FN | Accuracy | F1-Score | Recall | Precision | MCC | AUC | PRAUC | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 60 | 63 | 22 | 19 | 75.0 | 75.5 | 73.1 | 75.9 | 0.500 | 0.775 | 0.763 |
| RF | 81 | 28 | 1 | 54 | 66.4 | 74.6 | 98.7 | 60.0 | 0.431 | 0.704 | 0.697 |
| CatBoost | 64 | 77 | 18 | 5 | 85.9 | 84.7 | 78.0 | 92.7 | 0.728 | 0.883 | 0.891 |
| ACPNet | 72 | 75 | 7 | 10 | 89.6 | 89.4 | 87.8 | 91.1 | 0.793 | 0.945 | 0.947 |
Performance comparisons of ACPNet with the existing methods.
| TP | TN | FP | FN | Accuracy | F1-Score | Recall | Precision | MCC | AUC | PRAUC | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| AntiCP_ACC | 56 | 71 | 26 | 11 | 77.4 | 75.2 | 68.3 | 83.6 | 0.558 | 0.805 | 0.793 |
| Anticp_DPC | 61 | 69 | 21 | 13 | 79.3 | 78.2 | 74.4 | 82.4 | 0.588 | 0.816 | 0.883 |
| Hajisharifi | 55 | 71 | 27 | 11 | 76.8 | 74.3 | 67.1 | 83.3 | 0.547 | 0.779 | 0.752 |
| IACP | 56 | 66 | 26 | 16 | 74.4 | 72.7 | 68.3 | 78.8 | 0.491 | 0.752 | 0.723 |
| ACPred-FL | 66 | 79 | 16 | 3 | 88.4 | 87.4 | 80.5 | 95.7 | 0.778 | 0.906 | 0.886 |
| CNN-RNN | 59 | 67 | 23 | 15 | 76.8 | 75.6 | 72.0 | 79.7 | 0.539 | 0.785 | 0.758 |
| CNN | 64 | 65 | 18 | 17 | 78.6 | 78.5 | 78.0 | 79.0 | 0.573 | 0.823 | 0.845 |
| DeepACP | 64 | 72 | 18 | 10 | 82.9 | 82.0 | 78.0 | 86.5 | 0.662 | 0.859 | 0.800 |
| ACPNet | 72 | 75 | 7 | 10 | 89.6 | 89.4 | 87.8 | 91.13 | 0.793 | 0.945 | 0.947 |
Figure 5AURoc (a) and PRRoc (b) on test dataset.
The independent validation results of ACPNet.
| Id | Sequence | Score | Label |
|---|---|---|---|
| 1 | KLWKKIEKLIKKLLTSIR | 0.9999 | ACP |
| 2 | YIWARAERVWLWWGKFLSL | 0.9994 | ACP |
| 3 | DLFKQLQRLFLGILYCLYKIW | 0.8732 | ACP |
| 4 | AIKKFGPLAKIVAKV | 0.7043 | ACP |
| 5 | RWNGRIIKGFYNLVKIWKDLKG | 0.9620 | ACP |
| 6 | KVWKIKKNIRRLLHGIKRGWKG | 0.9993 | ACP |
| 7 | GFWARIGKVFAAVKNL | 0.9988 | ACP |
| 8 | AFLYRLTRQIRPWWRWLYKW | 0.4979 | Non-ACP |
| 9 | RIWGKHSRYIKIVKRLIQ | 0.9993 | ACP |
| 10 | QIWHKIRKLWQIIKDGF | 0.9997 | ACP |
| 11 | CGESCVWIPCVTSIFNCKCKENKVCYHDKIP | 0.0001 | Non-ACP |
| 12 | SDEKASPDKHHRFSLSRYAKLANRLANPKLLETFLSKWIGDRGNRSV | 0.2383 | Non-ACP |
| 13 | DVKGMKKAIKGILDCVIEKGYDKLAAKLKKVIQQLWE | 0.4986 | Non-ACP |
| 14 | AGWGSIFKHIFKAGKFIHGAIQAHND | 0.011 | Non-ACP |
| 15 | ATCDLASGFGVGSSLCAAHCIARRYRGGYCNSKAVCVCRN | 0.0032 | Non-ACP |
| 16 | GWKIGKKLEHHGQNIRDGLISAGPAVFAVGQAATIYAAAK | 0.0015 | Non-ACP |
| 17 | FLGALIKGAIHGGRFIHGMIQNHH | 0.4750 | Non-ACP |
| 18 | FLPAIAGILSQLF | 0.1818 | Non-ACP |
| 19 | ALWMTLLKKVLKAAAKALNAVLVGANA | 0.0052 | Non-ACP |
| 20 | EGGGPQWAVGHFM | 0.1243 | Non-ACP |
Note: the independent validation peptide sequences (1–10, ACP) and (11–20, non-ACP) source from [31,32] respectively.