| Literature DB >> 26335203 |
Azhagiya Singam Ettayapuram Ramaprasad1, Sandeep Singh2, Raghava Gajendra P S2, Subramanian Venkatesan1.
Abstract
The process of angiogenesis is a vital step towards the formation of malignant tumors. Anti-angiogenic peptides are therefore promising candidates in the treatment of cancer. In this study, we have collected anti-angiogenic peptides from the literature and analyzed the residue preference in these peptides. Residues like Cys, Pro, Ser, Arg, Trp, Thr and Gly are preferred while Ala, Asp, Ile, Leu, Val and Phe are not preferred in these peptides. There is a positional preference of Ser, Pro, Trp and Cys in the N terminal region and Cys, Gly and Arg in the C terminal region of anti-angiogenic peptides. Motif analysis suggests the motifs "CG-G", "TC", "SC", "SP-S", etc., which are highly prominent in anti-angiogenic peptides. Based on the primary analysis, we developed prediction models using different machine learning based methods. The maximum accuracy and MCC for amino acid composition based model is 80.9% and 0.62 respectively. The performance of the models on independent dataset is also reasonable. Based on the above study, we have developed a user-friendly web server named "AntiAngioPred" for the prediction of anti-angiogenic peptides. AntiAngioPred web server is freely accessible at http://clri.res.in/subramanian/tools/antiangiopred/index.html (mirror site: http://crdd.osdd.net/raghava/antiangiopred/).Entities:
Mesh:
Substances:
Year: 2015 PMID: 26335203 PMCID: PMC4559406 DOI: 10.1371/journal.pone.0136990
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Amino acid composition analysis of the residues in anti-angiogenic and non-anti-angiogenic peptides.
Composition of entire Swiss-Prot is taken for reference and composition of all random datasets.
Performance of various machine learning classifiers using amino acid composition as input feature on whole peptide dataset.
| Methods | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| SVM | 69.2 | 78.5 | 73.8 | 0.48 |
| IBk | 69.2 | 74.8 | 72.0 | 0.44 |
| Random Forest | 70.1 | 77.6 | 73.8 | 0.48 |
| Logistic | 70.1 | 74.8 | 72.4 | 0.45 |
| Multilayer Perceptron | 70.1 | 78.5 | 74.3 | 0.49 |
| Naïve Bayes | 65.4 | 72.0 | 68.7 | 0.37 |
| J48 | 62.6 | 73.8 | 68.2 | 0.37 |
Sn: Sensitivity; Sp: Specificity; Acc: Accuracy; MCC: Matthew’s Correlation Coefficient.
Performance of SVM based models using amino acid composition as input feature on nine terminus datasets.
| Approach | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| NT5 | 69.2 | 72.0 | 70.6 | 0.41 |
| CT5 | 62.9 | 68.2 | 65.6 | 0.31 |
| NTCT5 | 72.0 | 71.0 | 71.5 | 0.43 |
| NT10 | 71.0 | 76.6 | 73.8 | 0.48 |
| CT10 | 69.8 | 70.1 | 70.0 | 0.40 |
| NTCT10 | 68.2 | 76.6 | 72.4 | 0.45 |
| NT15 | 79.0 | 82.7 | 80.9 | 0.62 |
| CT15 | 67.9 | 72.8 | 70.4 | 0.41 |
| NTCT15 | 75.3 | 80.3 | 77.8 | 0.56 |
Sn: Sensitivity; Sp: Specificity; Acc: Accuracy; MCC: Matthew’s Correlation Coefficient.
Performance of SVM based models using dipeptide composition as input feature on whole peptide dataset and nine terminus datasets.
| Approach | Sn(%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| Whole peptide | 75.7 | 73.8 | 74.8 | 0.50 |
| NT5 | 66.4 | 66.4 | 66.4 | 0.33 |
| CT5 | 57.1 | 57.9 | 57.6 | 0.15 |
| NTCT5 | 64.5 | 60.8 | 62.6 | 0.25 |
| NT10 | 70.1 | 73.8 | 72.0 | 0.44 |
| CT10 | 63.2 | 70.1 | 66.7 | 0.33 |
| NTCT10 | 65.4 | 74.8 | 70.1 | 0.40 |
| NT15 | 75.3 | 72.8 | 74.1 | 0.48 |
| CT15 | 74.1 | 77.8 | 75.9 | 0.52 |
| NTCT15 | 72.8 | 76.5 | 74.7 | 0.49 |
Sn: Sensitivity; Sp: Specificity; Acc: Accuracy; MCC: Matthew’s Correlation Coefficient.
Performance of SVM based models using binary profile as input feature on nine terminus datasets.
| Approach | Sn(%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| NT5 | 66.4 | 72.9 | 69.7 | 0.39 |
| CT5 | 68.6 | 69.2 | 68.9 | 0.38 |
| NTCT5 | 75.7 | 79.4 | 77.6 | 0.55 |
| NT10 | 65.4 | 72.0 | 68.7 | 0.37 |
| CT10 | 55.7 | 61.7 | 58.7 | 0.17 |
| NTCT10 | 65.4 | 71.0 | 68.2 | 0.37 |
| NT15 | 70.4 | 72.8 | 71.6 | 0.43 |
| CT15 | 69.1 | 71.6 | 70.4 | 0.41 |
| NTCT15 | 71.6 | 76.5 | 74.1 | 0.48 |
Sn: Sensitivity; Sp: Specificity; Acc: Accuracy; MCC: Matthew’s Correlation Coefficient.
Performance of SVM based models on independent dataset.
| Approach | Feature | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|---|
| Independent Dataset | |||||
| Whole peptide | AAC | 53.6 | 85.7 | 69.6 | 0.41 |
| Whole peptide | DPC | 64.3 | 75.0 | 69.6 | 0.41 |
| NT15 | AAC | 65.0 | 85.0 | 75.0 | 0.51 |
Sn: Sensitivity; Sp: Specificity; Acc: Accuracy; MCC: Matthew’s Correlation Coefficient.
Performance of SVM based models using amino acid composition as input feature on different random dataset taken as negative dataset.
| Dataset | Sn (%) | Sp (%) | Acc (%) | MCC |
|---|---|---|---|---|
| Random Datasets | ||||
| Random1 | 70.1 | 71.8 | 71.0 | 0.42 |
| Random2 | 70.1 | 79.4 | 74.8 | 0.50 |
| Random3 | 75.7 | 77.6 | 76.6 | 0.53 |
| Random4 | 75.7 | 81.3 | 78.5 | 0.57 |
| Random5 | 72.9 | 73.8 | 73.4 | 0.47 |
Sn: Sensitivity; Sp: Specificity; Acc: Accuracy; MCC: Matthew’s Correlation Coefficient.