| Literature DB >> 29534013 |
Lei Xu1, Guangmin Liang2, Longjie Wang3, Changrui Liao4.
Abstract
Cancer is a serious health issue worldwide. Traditional treatment methods focus on killing cancer cells by using anticancer drugs or radiation therapy, but the cost of these methods is quite high, and in addition there are side effects. With the discovery of anticancer peptides, great progress has been made in cancer treatment. For the purpose of prompting the application of anticancer peptides in cancer treatment, it is necessary to use computational methods to identify anticancer peptides (ACPs). In this paper, we propose a sequence-based model for identifying ACPs (SAP). In our proposed SAP, the peptide is represented by 400D features or 400D features with g-gap dipeptide features, and then the unrelated features are pruned using the maximum relevance-maximum distance method. The experimental results demonstrate that our model performs better than some existing methods. Furthermore, our model has also been extended to other classifiers, and the performance is stable compared with some state-of-the-art works.Entities:
Keywords: 400D; anticancer peptides; dimension reduction; g-gap dipeptide; sequence-based method
Year: 2018 PMID: 29534013 PMCID: PMC5867879 DOI: 10.3390/genes9030158
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1The flow chart of identifying anticancer peptides. MRMD: maximum relevance-maximum distance; SVM: support vector machine.
Performance comparison with state-of-the-art methods.
| Methods | Sn | Sp | Acc | MCC | |
|---|---|---|---|---|---|
| iACP | 84.06% | 95.15% | 90.7% | 80.58% | 87.88% |
| SAP (400D) | 86.23% | 95.63% | 91.86% | 83.01% | 89.47% |
Sn: sensitivity; Sp: specificity; Acc: overall accuracy; MCC: Mathew’s correlation coefficient; SAP: sequence-based model for identifying ACP; iAPC: tool for identifying ACP proposed in [19].
Performance comparison with selected features.
| Methods | Sn | Sp | Acc | MCC | |
|---|---|---|---|---|---|
| iACP (g-gap) | 84.06% | 95.15% | 90.7% | 80.58% | 87.88% |
| SAP (400D) | 86.23% | 95.63% | 91.86% | 83.01% | 89.47% |
| SAP (selected features) | 81.88% | 96.6% | 90.7% | 80.71% | 87.6% |
Figure 2Overall accuracy comparison of 400D features with G-gap features on three different classifiers. RF: random forest.
Figure 3Mathew’s correlation coefficient value comparison of 400D features with G-gap features on three different classifiers.
Figure 4F-score comparison of 400D features with G-gap features on three different classifiers.
Figure 5Acc comparison of selected features with 400D features on three different classifiers.
Figure 6MCC comparison of selected features with 400D features on three different classifiers.
Figure 7F-score comparison of selected features with 400D features on three different classifiers.