| Literature DB >> 31212918 |
Vishuda Laengsri1,2, Chanin Nantasenamat3, Nalini Schaduangrat4, Pornlada Nuchnoi5,6, Virapong Prachayasittikul7, Watshara Shoombuatong8.
Abstract
Cancer remains one of the major causes of death worldwide. Angiogenesis is crucial for the pathogenesis of various human diseases, especially solid tumors. The discovery of anti-angiogenic peptides is a promising therapeutic route for cancer treatment. Thus, reliably identifying anti-angiogenic peptides is extremely important for understanding their biophysical and biochemical properties that serve as the basis for the discovery of new anti-cancer drugs. This study aims to develop an efficient and interpretable computational model called TargetAntiAngio for predicting and characterizing anti-angiogenic peptides. TargetAntiAngio was developed using the random forest classifier in conjunction with various classes of peptide features. It was observed via an independent validation test that TargetAntiAngio can identify anti-angiogenic peptides with an average accuracy of 77.50% on an objective benchmark dataset. Comparisons demonstrated that TargetAntiAngio is superior to other existing methods. In addition, results revealed the following important characteristics of anti-angiogenic peptides: (i) disulfide bond forming Cys residues play an important role for inhibiting blood vessel proliferation; (ii) Cys located at the C-terminal domain can decrease endothelial formatting activity and suppress tumor growth; and (iii) Cyclic disulfide-rich peptides contribute to the inhibition of angiogenesis and cell migration, selectivity and stability. Finally, for the convenience of experimental scientists, the TargetAntiAngio web server was established and made freely available online.Entities:
Keywords: anti-angiogenic peptide; classification; interpretable model; machine learning; random forest; therapeutic peptides
Year: 2019 PMID: 31212918 PMCID: PMC6628072 DOI: 10.3390/ijms20122950
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Angiogenesis is regulated by a local equilibrium between pro-angiogenic such as vascular endothelial growth factor (VEGF), platelet-derived endothelial growth factor (PDGF), fibroblast growth factor (FGF), and angiopoietins and anti-angiogenic molecules such as endostatin, PF4 and TSP-1. It is switched on when tumor cells require oxygen and nutrients. Tumor cells produce VEGF and then secretes them into surrounding tissues. When VEGF binds to its receptor on the outer surface of endothelial cells, it activates endothelial cells that subsequently drives the development of new blood vessels from pre-existing vasculatures. Blood vessels gradually grow and expand to tumor cells whereby tumor cells continuously proliferate and spread into the blood circulation. Cancer progression is induced by an overexpression of pro-angiogenic factors (a). Disruption of the vascular supply can be mediated by blocking pro-angiogenic factors or via the use of anti-angiogenic factors as therapeutic drug is anticipated to increase the survival rate of cancer patients. Anti-angiogenic factor binds to VEGF thereby leading to the inhibition of neovascularization and tumor growth thereby leading to a decrease of metastasis. Eventually, tumor cells which are devoid of fuels (e.g., oxygen and nutrients) gently regress and become tumor necrosis (b).
Summary of existing methods for predicting anti-angiogenic peptides.
| Method | Classifier a | Sequence Feature (No. of Feature Used) b | Independent Test | Web Server |
|---|---|---|---|---|
| AntiAngioPred [ | SVM | AAC (20) | Yes | Yes |
| Blanco et al.’s method [ | glmnet | AAC, DPC, TC (200) | No | No |
| AntAngioCOOL [ | PART | PseAAC, | No | No |
| TargetAntiAngio (this study) | RF | AAC, PseAAC, Am-PseAAC (48) | Yes | Yes |
a glmnet: a generalized linear model, PART: recursive partitioning for classification, regression and survival trees, RF: random forest, SVM: support vector machine. b AAC: amino acid composition, AC: atomic profile, Am-PseAAC: amphiphilic pseudo amino acid composition, DPC: dipeptide composition, PCP: physicochemical properties, PseACC: pseudo amino acid composition, RACC: reduce amino acid composition, TC: tripeptide composition. The method is assessed by an independent validation test with N rounds of random splits.
Figure 2Schematic framework of TargetAntiAngio.
Performance comparison of RF models built with various types of sequence features. Models were evaluated by means of five-fold cross-validation and independent validation test using benchmark and NT15 datasets subjected to one round of random split.
| Feature | Dataset | 5-Fold CV | Independent Test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Ac (%) | MCC | auROC | Ac (%) | Sn (%) | Sp (%) | MCC | auROC | ||
| ACC | Benckmark | 71.03 | 0.42 | 0.81 | 72.12 | 67.86 | 76.92 | 0.45 | 0.77 |
| NT15 | 75.00 | 0.50 | 0.80 | 77.50 | 90.48 | 63.16 | 0.56 | 0.82 | |
| PseAAC | Benckmark | 73.83 | 0.48 | 0.78 | 72.22 | 85.71 | 57.69 | 0.45 | 0.81 |
| NT15 | 73.75 | 0.48 | 0.80 | 72.50 | 85.71 | 57.90 | 0.46 | 0.83 | |
| Am-PseAAC | Benckmark | 71.96 | 0.44 | 0.76 | 72.22 | 82.14 | 61.54 | 0.45 | 0.76 |
| NT15 | 72.50 | 0.45 | 0.80 | 75.00 | 76.19 | 73.68 | 0.50 | 0.80 | |
| DPC | Benckmark | 68.22 | 0.37 | 0.75 | 70.37 | 82.14 | 57.69 | 0.41 | 0.72 |
| NT15 | 72.50 | 0.45 | 0.79 | 72.50 | 95.24 | 47.37 | 0.49 | 0.69 | |
| PCP | Benckmark | 60.75 | 0.22 | 0.67 | 61.11 | 67.86 | 53.85 | 0.22 | 0.65 |
| NT15 | 67.50 | 0.36 | 0.72 | 67.50 | 76.19 | 57.90 | 0.35 | 0.74 | |
| AAC+PseAAC | Benckmark | 72.43 | 0.45 | 0.79 | 75.93 | 85.71 | 65.39 | 0.52 | 0.80 |
| NT15 | 74.38 | 0.50 | 0.77 | 77.00 | 85.71 | 68.42 | 0.55 | 0.83 | |
| AAC+Am-PseAAC | Benckmark | 70.09 | 0.41 | 0.76 | 74.07 | 89.29 | 57.69 | 0.50 | 0.83 |
| NT15 | 74.38 | 0.50 | 0.81 | 75.00 | 71.43 | 78.95 | 0.50 | 0.79 | |
| PseAAC+Am-PseAAC | Benckmark | 72.90 | 0.46 | 0.77 | 77.78 | 82.14 | 73.08 | 0.56 | 0.83 |
| NT15 | 75.00 | 0.50 | 0.82 | 75.00 | 85.71 | 63.16 | 0.50 | 0.85 | |
| AAC+PseAAC+Am-PseAAC | Benckmark | 71.03 | 0.42 | 0.78 | 74.07 | 75.00 | 73.08 | 0.48 | 0.82 |
| NT15 | 75.00 | 0.50 | 0.82 | 77.50 | 90.48 | 63.16 | 0.56 | 0.84 | |
Parameters of PseAAC (weight1 and lamda1) and Am-PseAAC (weight2 and lamda2) were optimized by varying their values and assessed by a 5-fold CV procedure. Values of weight1, weight2, lamda1, and lamda2 as performed on the benchmark and NT15 datasets are (0.9, 0.9, 1, and 1) and (0.1, 0.2, 2, and 3), respectively.
Performance comparison of RF models built with various types of sequence features. Models were evaluated by means of five-fold cross-validation and independent validation test using benchmark and NT15 datasets subjected to ten rounds of random splits.
| Feature | Dataset | 5-Fold CV | Independent Test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Ac (%) | MCC | auROC | Ac (%) | Sn (%) | Sp (%) | MCC | auROC | ||
| ACC | Benckmark | 70.84 ± 1.54 | 0.42 ± 0.03 | 0.79 ± 0.01 | 73.33 ± 1.01 | 77.14 ± 8.60 | 69.23 ± 9.81 | 0.47 ± 0.02 | 0.79 ± 0.02 |
| NT15 | 74.12 ± 2.10 | 0.49 ± 0.04 | 0.80 ± 0.02 | 77.00 ± 2.09 | 84.76 ± 6.21 | 68.42 ± 8.32 | 0.55 ± 0.04 | 0.82 ± 0.02 | |
| PseAAC | Benckmark | 71.78 ± 2.13 | 0.44 ± 0.04 | 0.77 ± 0.01 | 72.96 ± 1.66 | 80.00 ± 4.07 | 65.38 ± 6.08 | 0.46 ± 0.03 | 0.81 ± 0.02 |
| NT15 | 71.62 ± 1.44 | 0.43 ± 0.03 | 0.78 ± 0.02 | 73.50 ± 1.37 | 80.95 ± 8.25 | 65.26 ± 11.53 | 0.48 ± 0.03 | 0.81 ± 0.03 | |
| Am-PseAAC | Benckmark | 70.47 ± 2.10 | 0.41 ± 0.04 | 0.75 ± 0.02 | 72.96 ± 2.11 | 75.71 ± 9.58 | 70.00 ± 8.34 | 0.46 ± 0.05 | 0.79 ± 0.04 |
| NT15 | 72.38 ± 2.31 | 0.45 ± 0.05 | 0.81 ± 0.01 | 73.50 ± 1.37 | 76.19 ± 12.14 | 70.53 ± 12.68 | 0.48 ± 0.02 | 0.79 ± 0.04 | |
| DPC | Benckmark | 68.32 ± 0.84 | 0.37 ± 0.02 | 0.74 ± 0.01 | 69.63 ± 2.11 | 78.57 ± 9.45 | 60.00 ± 5.83 | 0.40 ± 0.05 | 0.74 ± 0.02 |
| NT15 | 71.50 ± 2.01 | 0.43 ± 0.04 | 0.78 ± 0.02 | 69.50 ± 3.26 | 78.09 ± 10.43 | 60.00 ± 7.98 | 0.40 ± 0.08 | 0.75 ± 0.07 | |
| PCP | Benckmark | 60.19 ± 2.44 | 0.20 ± 0.05 | 0.65 ± 0.02 | 61.85 ± 2.11 | 62.14 ± 10.59 | 61.54 ± 9.81 | 0.24 ± 0.04 | 0.66 ± 0.03 |
| NT15 | 68.00 ± 1.03 | 0.36 ± 0.02 | 0.74 ± 0.02 | 67.50 ± 2.50 | 72.38 ± 7.82 | 62.11 ± 11.41 | 0.35 ± 0.05 | 0.72 ± 0.07 | |
| AAC+PseAAC | Benckmark | 72.24 ± 0.53 | 0.45 ± 0.01 | 0.79 ± 0.01 | 74.81 ± 1.01 | 81.43 ± 4.66 | 67.69 ± 5.83 | 0.50 ± 0.02 | 0.81 ± 0.04 |
| NT15 | 73.50 ± 1.44 | 0.47 ± 0.03 | 0.79 ± 0.02 | 76.50 ± 1.37 | 84.76 ± 7.82 | 67.37 ± 10.79 | 0.54 ± 0.02 | 0.82 ± 0.05 | |
| AAC+Am-PseAAC | Benckmark | 70.37 ± 1.13 | 0.41 ± 0.02 | 0.77 ± 0.02 | 73.33 ± 1.01 | 85.00 ± 4.66 | 60.77 ± 5.70 | 0.48 ± 0.02 | 0.78 ± 0.04 |
| NT15 | 73.00 ± 0.81 | 0.47 ± 0.02 | 0.80 ± 0.02 | 75.50 ± 1.12 | 83.81 ± 7.97 | 66.32 ± 9.56 | 0.52 ± 0.02 | 0.82 ± 0.06 | |
| PseAAC+Am-PseAAC | Benckmark | 73.18 ± 1.57 | 0.47 ± 0.03 | 0.78 ± 0.01 | 73.33 ± 2.81 | 80.71 ± 1.96 | 65.38 ± 5.44 | 0.47 ± 0.05 | 0.78 ± 0.05 |
| NT15 | 73.88 ± 2.14 | 0.48 ± 0.04 | 0.80 ± 0.02 | 75.00 ± 1.77 | 80.95 ± 5.83 | 68.42 ± 6.45 | 0.50 ± 0.04 | 0.80 ± 0.05 | |
| AAC+PseAAC+Am-PseAAC | Benckmark | 70.37 ± 1.22 | 0.41 ± 0.02 | 0.77 ± 0.02 | 74.07 ± 1.31 | 82.14 ± 5.05 | 65.38 ± 7.20 | 0.49 ± 0.02 | 0.81 ± 0.01 |
| NT15 | 74.62 ± 1.57 | 0.50 ± 0.03 | 0.81 ± 0.01 | 77.50 ± 1.77 | 84.76 ± 10.32 | 69.47 ± 8.65 | 0.56 ± 0.03 | 0.83 ± 0.03 | |
Parameters of PseAAC (weight1 and lamda1) and Am-PseAAC (weight2 and lamda2) were optimized by varying their values and assessed by a 5-fold CV procedure. Values of weight1, weight2, lamda1, and lamda2 as performed on the benchmark and NT15 datasets are (0.9, 0.9, 1, and 1) and (0.1, 0.2, 2, and 3), respectively.
Performance comparisons between TargetAntiAngio and AntiAngioPred assessed by 5-fold cross-validation and independent validation tests on NT15 dataset.
| Sampling Time | Method | Cross-Validation | Independent Test | ||||
|---|---|---|---|---|---|---|---|
| Ac (%) | MCC | Ac (%) | Sn (%) | Sp (%) | MCC | ||
| 1 round | AntiAngioPred a | 80.90 | 0.62 | 75.00 | - | - | 0.51 |
| TargetAntiAngio | 75.00 | 0.50 | 77.50 | 90.48 | 63.16 | 0.56 | |
| N rounds b | AntiAngioPred a | - | - | 74.96 | 72.90 | 76.80 | 0.50 |
| TargetAntiAngio | 74.62 | 0.50 | 77.50 | 84.76 | 69.47 | 0.56 | |
a Results were reported from the work of AntiAngioPred. b N represents the number of 5 and 10 rounds of random splits for performing the prediction results of AntiAngioPred and TargetAntiAngio, respectively.
Amino acid compositions (%) of antiangiogenic (Angio) and non-antiangiogenic (non-Angio) peptides along with their mean decrease of Gini index (MDGI) values.
| Amino acid | Anti-Angio (%) | Non-Anti-Angio (%) | Difference | p-value | MDGI |
|---|---|---|---|---|---|
| A-Ala | 0.053 | 0.086 | −0.033 (20) | <0.05 | 9.21 (4) |
| C-Cys | 0.047 | 0.014 | 0.034 (2) | <0.05 | 15.90 (1) |
| D-Asp | 0.047 | 0.052 | −0.005 (13) | 0.568 | 5.02 (12) |
| E-Glu | 0.046 | 0.065 | −0.019 (17) | <0.05 | 6.68 (7) |
| F-Phe | 0.030 | 0.043 | −0.013 (15) | <0.05 | 4.89 (13) |
| G-Gly | 0.081 | 0.073 | 0.008 (7) | 0.420 | 5.18 (11) |
| H-His | 0.030 | 0.024 | 0.007 (8) | 0.373 | 4.58 (14) |
| I-Ile | 0.046 | 0.064 | −0.017 (16) | <0.05 | 5.26 (10) |
| K-Lys | 0.056 | 0.056 | 0.001 (9) | 0.933 | 6.59 (8) |
| L-Leu | 0.067 | 0.095 | −0.028 (19) | <0.05 | 8.41 (5) |
| M-Met | 0.019 | 0.023 | −0.004 (12) | 0.366 | 3.45 (20) |
| N-Asn | 0.037 | 0.040 | −0.003 (11) | 0.657 | 3.71 (18) |
| P-Pro | 0.060 | 0.045 | 0.016 (4) | <0.05 | 6.40 (9) |
| Q-Gln | 0.039 | 0.042 | −0.002 (10) | 0.757 | 4.47 (15) |
| R-Arg | 0.088 | 0.055 | 0.032 (3) | <0.05 | 8.31 (6) |
| S-Ser | 0.096 | 0.057 | 0.039 (1) | <0.05 | 14.43 (2) |
| T-Thr | 0.062 | 0.054 | 0.008 (6) | 0.232 | 3.77 (17) |
| V-Val | 0.048 | 0.073 | −0.025 (18) | <0.05 | 9.58 (3) |
| W-Trp | 0.023 | 0.012 | 0.012 (5) | <0.05 | 3.95 (16) |
| Y-Tyr | 0.023 | 0.029 | −0.007 (14) | 0.210 | 3.45 (19) |
Figure 3Sequence logo representations of antiangiogenic and non-antiangiogenic peptides. Shown are the sequence logo of the first and last 15 residues at N- and C-terminal regions from antiangiogenic peptides (a,b) and non-antiangiogenic peptides (c,d).
Ten top-ranked physicochemical properties from the AAindex having the highest MDGI values.
| Rank | AAindex | MDGI | Description |
|---|---|---|---|
| 1 | CHOP780216 | 0.73 | Normalized frequency of the 2nd and 3rd residues in turn (Chou-Fasman, 1978b) |
| 2 | CHOP780215 | 0.61 | Frequency of the 4th residue in turn (Chou-Fasman, 1978b) |
| 3 | MIYS990104 | 0.58 | Optimized relative partition energies—method C (Miyazawa-Jernigan, 1999) |
| 4 | CHOP780214 | 0.54 | Frequency of the 3rd residue in turn (Chou-Fasman, 1978b) |
| 5 | ENGD860101 | 0.54 | Hydrophobicity index (Engelman et al., 1986) |
| 6 | OLSK800101 | 0.53 | Average internal preferences (Olsen, 1980) |
| 7 | MIYS990105 | 0.53 | Optimized relative partition energies—method D (Miyazawa-Jernigan, 1999) |
| 8 | LEVM780104 | 0.52 | Normalized frequency of alpha-helix, unweighted (Levitt, 1978) |
| 9 | MIYS990101 | 0.52 | Relative partition energies derived by the Bethe approximation (Miyazawa-Jernigan, 1999) |
| 10 | KIDA850101 | 0.50 | Hydrophobicity-related index (Kidera et al., 1985) |
MDGI: Mean decrease of Gini index.
Figure 4Heat map of the mean decrease of Gini index (MDGI) of dipeptide compositions. It should be noted that features with the largest value of MDGI are the most important.
Figure 5Three-dimensional structures of established anti-angiogenic inhibitors consisting of endostatin (PDB id 1KOE) (a), somatostatin (PDB id 2MI1) (b), and Platelet factor-4 (PDB id 1RHP) (c). α-helix, β-sheet, and loop are shown in blue, red and yellow colors, respectively.
Figure 6Screenshot of the TargetAntiAngio web server before (a) and after (b) submission of the input query sequence.
Summary of two datasets for evaluating the predictors of anti-angiogenic peptides as obtained from Ramaprasad et al.
| Dataset |
|
| ||
|---|---|---|---|---|
| Anti-angio | Non-anti-angio | Anti-angio | Non-anti-angio | |
| Original data | 137 | 137 | 99 | 101 |
| Training set | 101 | 101 | 80 | 80 |
| Testing set | 36 | 36 | 19 | 21 |
Anti-angio and non-anti-angio represent anti-angiogenic and non-antiangiogenic peptides, respectively.