| Literature DB >> 24136089 |
Atul Tyagi1, Pallavi Kapoor, Rahul Kumar, Kumardeep Chaudhary, Ankur Gautam, G P S Raghava.
Abstract
Use of therapeutic peptides in cancer therapy has been receiving considerable attention in the recent years. Present study describes the development of computational models for predicting and discovering novel anticancer peptides. Preliminary analysis revealed that Cys, Gly, Ile, Lys, and Trp are dominated at various positions in anticancer peptides. Support vector machine models were developed using amino acid composition and binary profiles as input features on main dataset that contains experimentally validated anticancer peptides and random peptides derived from SwissProt database. In addition, models were developed on alternate dataset that contains antimicrobial peptides instead of random peptides. Binary profiles-based model achieved maximum accuracy 91.44% with MCC 0.83. We have developed a webserver, which would be helpful in: (i) predicting minimum mutations required for improving anticancer potency; (ii) virtual screening of peptides for discovering novel anticancer peptides, and (iii) scanning natural proteins for identification of anticancer peptides (http://crdd.osdd.net/raghava/anticp/).Entities:
Mesh:
Substances:
Year: 2013 PMID: 24136089 PMCID: PMC6505669 DOI: 10.1038/srep02984
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Comparison of average whole amino acid composition of anticancer, non-anticancer, and antimicrobial peptides.
Figure 2Comparison of average amino acid composition of ten (A) N- and (B) C-terminal residues of anticancer, non-anticancer, and antimicrobial peptides.
Figure 3Sequence logo of (A) first ten residues of N-terminus and (B) last ten residues of C-terminus of anticancer peptides where size of residue is proportional to its propensity.
The performance of amino acid composition-based models on main dataset
| Balanced dataset-1 | |||||
|---|---|---|---|---|---|
| Dataset | Sensitivity | Specificity | Accuracy | MCC | AUC |
| Whole peptide | 88.00 | 89.78 | 88.89 | 0.78 | 0.94 |
| NT5 | 81.33 | 80.44 | 80.89 | 0.62 | 0.86 |
| CT5 | 71.11 | 73.78 | 72.44 | 0.45 | 0.78 |
| NT5CT5 | 82.22 | 83.56 | 82.89 | 0.66 | 0.88 |
| NT10 | 89.37 | 84.34 | 86.91 | 0.74 | 0.92 |
| CT10 | 79.23 | 85.86 | 82.47 | 0.65 | 0.88 |
| NT10CT10 | 89.37 | 87.37 | 88.40 | 0.77 | 0.93 |
| Main dataset | |||||
| Whole peptide | 88.89 | 85.29 | 85.62 | 0.52 | 0.95 |
| NT5 | 73.78 | 88.22 | 86.91 | 0.47 | 0.86 |
| CT5 | 61.78 | 87.64 | 85.29 | 0.38 | 0.80 |
| NT5CT5 | 74.67 | 94.44 | 92.65 | 0.61 | 0.90 |
| NT10 | 82.61 | 92.76 | 91.82 | 0.63 | 0.91 |
| CT10 | 78.26 | 83.80 | 83.29 | 0.43 | 0.89 |
| NT10CT10 | 88.89 | 90.55 | 90.39 | 0.62 | 0.94 |
Performances of amino acid composition-based models on alternate dataset
| Balanced dataset-2 | |||||
|---|---|---|---|---|---|
| Dataset | Sensitivity | Specificity | Accuracy | MCC | AUC |
| Whole peptide | 84.44 | 86.22 | 85.33 | 0.71 | 0.90 |
| NT5 | 84.00 | 84.89 | 84.44 | 0.69 | 0.89 |
| CT5 | 85.33 | 69.78 | 77.56 | 0.56 | 0.83 |
| NT5CT5 | 84.00 | 84.89 | 84.44 | 0.69 | 0.89 |
| NT10 | 81.64 | 86.67 | 84.26 | 0.68 | 0.90 |
| CT10 | 77.29 | 85.78 | 81.71 | 0.63 | 0.87 |
| NT10CT10 | 85.51 | 89.78 | 87.73 | 0.75 | 0.92 |
| Aletrnate dataset | |||||
| Whole peptide | 73.78 | 76.02 | 75.70 | 0.37 | 0.79 |
| NT5 | 68.00 | 62.03 | 62.87 | 0.21 | 0.70 |
| CT5 | 69.08 | 72.25 | 71.83 | 0.30 | 0.76 |
| NT5CT5 | 81.33 | 60.93 | 63.81 | 0.30 | 0.79 |
| NT10 | 69.08 | 72.25 | 71.83 | 0.30 | 0.76 |
| CT10 | 74.88 | 72.69 | 72.98 | 0.34 | 0.79 |
| NT10CT10 | 75.36 | 70.77 | 71.38 | 0.33 | 0.80 |
Figure 4ROC plot shows performance of models developed using (A) amino acid composition (B) dipetide composition, and (B) binary profiles of patterns (NT10 dataset).
Performance of dipeptide composition-based models on main dataset
| Balanced Dataset-1 | |||||
|---|---|---|---|---|---|
| Dataset | Sensitivity | Specificity | Accuracy | MCC | AUC |
| Whole peptide | 88.44 | 87.11 | 87.78 | 0.76 | 0.93 |
| NT5 | 73.33 | 86.67 | 80.00 | 0.61 | 0.86 |
| CT5 | 60.44 | 79.56 | 70.00 | 0.41 | 0.73 |
| NT5CT5 | 78.22 | 88.44 | 83.33 | 0.67 | 0.89 |
| NT10 | 81.16 | 88.89 | 84.94 | 0.70 | 0.91 |
| CT10 | 71.50 | 86.87 | 79.01 | 0.59 | 0.85 |
| NT10CT10 | 83.09 | 88.38 | 85.68 | 0.72 | 0.91 |
| Main Dataset | |||||
| Whole peptide | 90.22 | 84.80 | 85.29 | 0.52 | 0.94 |
| NT5 | 71.11 | 88.89 | 87.27 | 0.46 | 0.85 |
| CT5 | 66.22 | 82.04 | 80.61 | 0.33 | 0.81 |
| NT5CT5 | 80.00 | 85.69 | 85.17 | 0.47 | 0.89 |
| NT10 | 83.09 | 88.63 | 88.11 | 0.54 | 0.92 |
| CT10 | 75.85 | 86.12 | 85.17 | 0.45 | 0.87 |
| NT10CT10 | 84.54 | 85.43 | 85.34 | 0.50 | 0.91 |
Performance of dipeptide composition-based model on alternate dataset
| Balanced dataset-2 | |||||
|---|---|---|---|---|---|
| Dataset | Sensitivity | Specificity | Accuracy | MCC | AUC |
| Whole peptide | 88.89 | 84.89 | 86.89 | 0.74 | 0.91 |
| NT5 | 87.11 | 86.67 | 86.89 | 0.74 | 0.89 |
| CT5 | 76.00 | 75.56 | 75.78 | 0.52 | 0.83 |
| NT5CT5 | 87.56 | 86.22 | 86.89 | 0.74 | 0.91 |
| NT10 | 84.06 | 86.67 | 85.42 | 0.71 | 0.91 |
| CT10 | 84.06 | 75.56 | 79.63 | 0.60 | 0.86 |
| NT10CT10 | 85.51 | 83.56 | 84.49 | 0.69 | 0.89 |
| Alternate dataset | |||||
| Whole peptide | 77.78 | 74.78 | 75.2 | 0.39 | 0.79 |
| NT5 | 74.22 | 62.17 | 63.87 | 0.26 | 0.75 |
| CT5 | 71.50 | 70.70 | 70.81 | 0.30 | 0.78 |
| NT5CT5 | 73.78 | 63.41 | 64.87 | 0.26 | 0.77 |
| NT10 | 71.50 | 70.70 | 70.81 | 0.30 | 0.78 |
| CT10 | 69.57 | 64.58 | 65.24 | 0.24 | 0.74 |
| NT10CT10 | 78.26 | 64.94 | 66.71 | 0.30 | 0.79 |
Performance of binary profile-based model on main dataset
| Balanced dataset-1 | |||||
|---|---|---|---|---|---|
| Dataset | Sensitivity | Specificity | Accuracy | MCC | AUC |
| NT5 | 78.67 | 83.11 | 80.89 | 0.62 | 0.87 |
| CT5 | 63.11 | 86.22 | 74.67 | 0.51 | 0.79 |
| NT5CT5 | 81.78 | 86.22 | 84.00 | 0.68 | 0.89 |
| NT10 | 81.16 | 86.67 | 83.95 | 0.68 | 0.91 |
| CT10 | 75.36 | 84.34 | 79.75 | 0.60 | 0.84 |
| NT10CT10 | 81.64 | 88.38 | 84.94 | 0.70 | 0.91 |
| Main dataset | |||||
| NT5 | 80.00 | 87.33 | 86.67 | 0.50 | 0.89 |
| CT5 | 70.67 | 84.76 | 83.47 | 0.40 | 0.83 |
| NT5CT5 | 74.22 | 89.16 | 87.80 | 0.49 | 0.88 |
| NT10 | 80.19 | 92.52 | 91.38 | 0.60 | 0.90 |
| CT10 | 75.85 | 85.33 | 84.45 | 0.44 | 0.87 |
| NT10CT10 | 81.16 | 89.76 | 88.96 | 0.55 | 0.89 |
Performance of binary profile-based model on alternate dataset
| Balanced dataset-2 | |||||
|---|---|---|---|---|---|
| Dataset | Sensitivity | Specificity | Accuracy | MCC | AUC |
| NT5 | 87.11 | 89.78 | 88.44 | 0.77 | 0.93 |
| CT5 | 82.67 | 73.78 | 78.22 | 0.57 | 0.83 |
| NT5CT5 | 88.89 | 89.78 | 89.33 | 0.79 | 0.93 |
| NT10 | 89.37 | 93.33 | 91.44 | 0.83 | 0.94 |
| CT10 | 85.51 | 72.44 | 78.70 | 0.58 | 0.86 |
| NT10CT10 | 85.02 | 96 | 90.74 | 0.82 | 0.94 |
| Alternate dataset | |||||
| NT5 | 67.56 | 73.69 | 72.82 | 0.31 | 0.75 |
| CT5 | 71.56 | 71.43 | 71.45 | 0.31 | 0.75 |
| NT5CT5 | 70.22 | 75.87 | 75.08 | 0.35 | 0.79 |
| NT10 | 71.01 | 71.96 | 71.83 | 0.31 | 0.77 |
| CT10 | 65.22 | 78.08 | 76.38 | 0.33 | 0.77 |
| NT10CT10 | 75.85 | 69.23 | 70.1 | 0.32 | 0.79 |
Figure 5Schematic representation of AntiCP webserver (developed with scienceslides software, http://www.visiscience.com/) and its various modules.