| Literature DB >> 33323099 |
Tianyi Zhao1, Yang Hu1, Tianyi Zang2.
Abstract
BACKGROUND: Millions of people are suffering from cancers, but accurate early diagnosis and effective treatment are still tough for all doctors. Common ways against cancer include surgical operation, radiotherapy and chemotherapy. However, they are all very harmful for patients. Recently, the anticancer peptides (ACPs) have been discovered to be a potential way to treat cancer. Since ACPs are natural biologics, they are safer than other methods. However, the experimental technology is an expensive way to find ACPs so we purpose a new machine learning method to identify the ACPs.Entities:
Keywords: Anticancer peptides; Cancer; Deep belief network; Random forest; Relevance vector machine
Year: 2020 PMID: 33323099 PMCID: PMC7739480 DOI: 10.1186/s12859-020-03812-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The accuracy of three methods
| Dataset 1 | Dataset 2 | |
|---|---|---|
| DRACPsa | 0.96 | 0.95 |
| SVMNFb | 0.92 | 0.91 |
| Tyagi et alc | 0.88 | 0.86 |
| Naive Bayes | 0.84 | 0.81 |
| Random forest | 0.89 | 0.85 |
aThe method we purposed
bSVM with our feature
cAvailable at https://crdd.osdd.net/raghava/anticp/multi_pep.php
Fig. 1The ROC curves of DRACP and RRVMs
Fig. 2The PR curves of DRACP and RRVMs
Fig. 3Comparison of average whole amino acids composition of ACPs and non-ACPs. x-axis is the index of 20 kinds of amino acids and y-axis is the ratio of the amino acid to the total sequence length
The six groups of the 20 amino acids
| Groups | Amino acids |
|---|---|
| Strongly hydrophilic | R, D, E, N, Q, K, H |
| Strongly hydrophobic | L, I, A, V, M, F |
Weakly hydrophilic Weakly hydrophobic | S, T, Y, W |
| Proline | P |
| Glycine | G |
| Cysteine | C |
Fig. 4Flow chart of feature extraction. We extracted 56 D features to identify the ACPs and it includes 20-dimensional composition and 36-dimensional reduced amino acid composition
Fig. 5Frame of DRACP. The first step is to use DBN to reduce the dimension of features. Then RRVMs is used to do classification
Parameters and functions of RVM
| Setting items | The value set |
|---|---|
| Max iterations | 100 |
| Kernel function | Gaussian |
| Kernel function width | 6 |
| Sample number | 50 |
| Feature number | 10 |