| Literature DB >> 35455418 |
Yih-Yun Sun1,2, Tzu-Tang Lin2, Wen-Chih Cheng2, I-Hsuan Lu2, Chung-Yen Lin2, Shu-Hwa Chen3.
Abstract
Anticancer peptides (ACPs) are selective and toxic to cancer cells as new anticancer drugs. Identifying new ACPs is time-consuming and expensive to evaluate all candidates' anticancer abilities. To reduce the cost of ACP drug development, we collected the most updated ACP data to train a convolutional neural network (CNN) with a peptide sequence encoding method for initial in silico evaluation. Here we introduced PC6, a novel protein-encoding method, to convert a peptide sequence into a computational matrix, representing six physicochemical properties of each amino acid. By integrating data, encoding method, and deep learning model, we developed AI4ACP, a user-friendly web-based ACP distinguisher that can predict the anticancer property of query peptides and promote the discovery of peptides with anticancer activity. The experimental results demonstrate that AI4ACP in CNN, trained using the new ACP collection, outperforms the existing ACP predictors. The 5-fold cross-validation of AI4ACP with the new collection also showed that the model could perform at a stable level on high accuracy around 0.89 without overfitting. Using AI4ACP, users can easily accomplish an early-stage evaluation of unknown peptides and select potential candidates to test their anticancer activities quickly.Entities:
Keywords: anticancer peptides (ACPs); deep learning; prediction; web service
Year: 2022 PMID: 35455418 PMCID: PMC9028292 DOI: 10.3390/ph15040422
Source DB: PubMed Journal: Pharmaceuticals (Basel) ISSN: 1424-8247
Figure 1Venn diagram of the positive set of the data sets.
Comparison of the composition of three data sets.
| Dataset | Dataset Usage | Positive Set | Negative Set |
|---|---|---|---|
| Main set | Training set | 689 ACPs | 689 AMPs |
| Testing set | 172 ACPs | 172 AMPs | |
| Alternative set | Training set | 766 ACPs | 776 peptides from Swiss-Prot |
| Testing set | 194 ACPs | 194 peptides from Swiss-Prot | |
| New collection | Training set | 1912 ACPs | 956 peptides from UniProt + 956 randomly generated sequences |
| Testing set | 212 ACPs | 106 peptides from UniProt + 106 randomly generated sequences |
Comparison of ACP predictors trained and tested with the main data set. Results were obtained from AntiCP2.0 [11] and ACPred [10], except AI4ACP.
| Predictors | Classifier | Accuracy | Sensitivity | Specificity | MCC * |
|---|---|---|---|---|---|
| AntiCP | SVM | 0.506 | 0.012 | 0.070 | |
| iACP | SVM | 0.551 | 0.779 | 0.322 | 0.110 |
| ACPred | SVM | 0.535 |
| 0.214 | 0.090 |
| PEPred-Suite | ensemble approach | 0.535 | 0.331 |
| 0.080 |
| ACPred-FL | ensemble approach | 0.448 | 0.671 | 0.225 | −0.120 |
| ACPred-Fuse | RF | 0.689 | 0.692 | 0.686 | 0.380 |
| AntiCP_2.0 | ETree |
| 0.775 | 0.734 |
|
| iACP-FSCM | SVM |
| 0.726 |
|
|
| AI4ACP | CNN | 0.718 | 0.802 | 0.633 | 0.442 |
*: Matthews Correlation Coefficient. +: Top two ranked methods for each index are presented using text formats: first in boldface, second with underline.
Comparison of ACP predictors trained and tested using the alternative data set. Results were obtained from AntiCP2.0 [11] and ACPred [10], except AI4ACP.
| Predictors | Classifier | Accuracy | Sensitivity | Specificity | MCC * |
|---|---|---|---|---|---|
| AntiCP | SVM |
|
| 0.902 |
|
| iACP | SVM | 0.776 | 0.784 | 0.768 | 0.550 |
| ACPred | SVM | 0.853 | 0.871 | 0.835 | 0.710 |
| PEPred-Suite | ensemble approach | 0.575 | 0.402 | 0.747 | 0.160 |
| ACPred-FL | ensemble approach | 0.438 | 0.602 | 0.256 | −0.150 |
| ACPred-Fuse | RF | 0.789 | 0.644 | 0.933 | 0.600 |
| AntiCP2.0 | ETree |
|
|
|
|
| iACP-FSCM | SVM | 0.889 | 0.876 | 0.902 | 0.779 |
| AI4ACP | CNN | 0.894 | 0.871 |
| 0.790 |
*: Matthews Correlation Coefficient. +: Top two ranked methods for each index are presented using text formats: first in boldface, second with underline.
Comparison of ACP predictors tested using the testing set of the new collection.
| Predictors | Classifier | Training Set | Accuracy | Specificity | Sensitivity | MCC * |
|---|---|---|---|---|---|---|
| AntiCP2.0 | ETree | Alternative set | 0.792 | 0.717 | 0.868 | 0.592 |
| AI4ACP | CNN | Alternative set |
|
|
|
|
| AI4ACP | CNN | New collection |
|
|
|
|
*: Matthews Correlation Coefficient. +: Top two ranked methods for each index are presented using text formats: first in boldface, second with underline.
Model performance of five-fold cross-validation.
| Fold | Accuracy | Specificity | Sensitivity | MCC * |
|---|---|---|---|---|
| 1 | 0.887 | 0.924 | 0.850 | 0.776 |
| 2 | 0.888 | 0.861 | 0.915 | 0.777 |
| 3 | 0.878 | 0.951 | 0.802 | 0.763 |
| 4 | 0.895 | 0.914 | 0.877 | 0.791 |
| 5 | 0.898 | 0.973 | 0.814 | 0.802 |
|
|
|
|
|
|
*: Matthews Correlation Coefficient.
Figure 2Histogram of the length distribution of collected ACPs. (a) The length distribution of all the 2839 ACPs; (b) The length distribution of ACPs, after excluding ACPs longer than 50 amino acids.
Figure 3Positive data collection and division process.
Figure 4Negative data collection and division process.
Figure 5PC6 protein−encoding method. A padded ACP will be transformed into a 50 × 6 matrix.
Figure 6Model architecture in this study. After PC6 encoding, protein sequences will go through every layer in this model.
Figure 7AI4ACP website. (A) The web portal of AI4ACP for sequence submission in FASTA. (B) The output of ACP activity for each submitted sequence with a prediction score. (C) Pie chart presents the prediction of the whole submissions and the submission with files generated during the prediction.