| Literature DB >> 35057016 |
Phasit Charoenkwan1, Wararat Chiangjong2, Chanin Nantasenamat3, Mohammad Ali Moni4, Pietro Lio'5, Balachandran Manavalan6, Watshara Shoombuatong3.
Abstract
Tumor-homing peptides (THPs) are small peptides that can recognize and bind cancer cells specifically. To gain a better understanding of THPs' functional mechanisms, the accurate identification and characterization of THPs is required. Although some computational methods for in silico THP identification have been proposed, a major drawback is their lack of model interpretability. In this study, we propose a new, simple and easily interpretable computational approach (called SCMTHP) for identifying and analyzing tumor-homing activities of peptides via the use of a scoring card method (SCM). To improve the predictability and interpretability of our predictor, we generated propensity scores of 20 amino acids as THPs. Finally, informative physicochemical properties were used for providing insights on characteristics giving rise to the bioactivity of THPs via the use of SCMTHP-derived propensity scores. Benchmarking experiments from independent test indicated that SCMTHP could achieve comparable performance to state-of-the-art method with accuracies of 0.827 and 0.798, respectively, when evaluated on two benchmark datasets consisting of Main and Small datasets. Furthermore, SCMTHP was found to outperform several well-known machine learning-based classifiers (e.g., decision tree, k-nearest neighbor, multi-layer perceptron, naive Bayes and partial least squares regression) as indicated by both 10-fold cross-validation and independent tests. Finally, the SCMTHP web server was established and made freely available online. SCMTHP is expected to be a useful tool for rapid and accurate identification of THPs and for providing better understanding on THP biophysical and biochemical properties.Entities:
Keywords: bioinformatics; machine learning; propensity score; scoring card method; therapeutic peptide; tumor-homing peptide
Year: 2022 PMID: 35057016 PMCID: PMC8779003 DOI: 10.3390/pharmaceutics14010122
Source DB: PubMed Journal: Pharmaceutics ISSN: 1999-4923 Impact factor: 6.321
Figure 1Schematic framework of the development of SCMTHP. This can be broken down to four major steps: (i) training and independent datasets preparation, (ii) SCMTHP-based propensity scores generation and optimization, (iii) THPs characterization and (iv) SCMTHP webserver construction.
Figure 2Propensity scores of 20 amino acids to be THPs by using the SCMTHP method with the Main (A) and Small (B) datasets.
Figure 3Performance evaluations of SCMTHP and other ML-based classifiers in terms of ACC and MCC as evaluated by 10-fold cross-validation (A,B) and independent (C,D) tests on the Main-TRN and Main-IND datasets, respectively.
Performance comparison of SCMTHP with the existing method as evaluated by 10-fold cross-validation and independent tests.
| Dataset | Cross-Validation | Method | ACC | Sn | Sp | MCC | AUC |
|---|---|---|---|---|---|---|---|
| Main | 10-fold CV | THPred | 0.857 | 0.877 | 0.837 | 0.716 | 0.929 |
| SCMTHP | 0.820 | 0.819 | 0.820 | 0.641 | 0.869 | ||
| Independent test | THPred | 0.846 | 0.792 | 0.900 | 0.696 | 0.939 | |
| SCMTHP | 0.827 | 0.869 | 0.785 | 0.656 | 0.869 | ||
| Small | 10-fold CV | THPred | 0.824 | 0.781 | 0.867 | 0.653 | 0.893 |
| SCMTHP | 0.808 | 0.723 | 0.893 | 0.628 | 0.853 | ||
| Independent test | THPred | 0.798 | 0.862 | 0.734 | 0.601 | 0.885 | |
| SCMTHP | 0.798 | 0.766 | 0.830 | 0.597 | 0.853 |
Figure 4Histogram plots of THPs’ scores and non-THPs’ scores from SCMTHP on the Main-TRN (A,B) and Small-TRN (C,D) datasets by using Initial-APS (A,C) and Optimized-APS (B,D). Note that the mean and standard deviation are indicated by the bars and closed circles.
Top 20 peptides having the highest S(P) values along with their important physicochemical properties.
| # | Sequence | THP Score | Length | Molecular Weight | Extinction Coefficient (M−1·cm−1) | pI | Net Charge | Hydrophobicity (Kcal·mol−1) |
|---|---|---|---|---|---|---|---|---|
| 1 | CFWPNRC | 684 | 7 | 925.17 | 5625 | 8.30 | 1 | 6.86 |
| 2 | QWCSRRWCT | 657 | 9 | 1225.52 | 11125 | 8.93 | 2 | 8.78 |
| 3 | WTCRASWCS | 632 | 9 | 1099.35 | 11125 | 8.25 | 1 | 7.16 |
| 4 | SGWCYRC | 631 | 7 | 874.08 | 7115 | 8.21 | 1 | 8.48 |
| 5 | RWCREKSCW | 631 | 9 | 1253.57 | 11125 | 8.85 | 2 | 14.19 |
| 6 | CSDWQHPWC | 627 | 9 | 1161.39 | 11125 | 4.97 | −1 | 11.02 |
| 7 | CPRGSRC | 621 | 7 | 777.99 | 125 | 9.66 | 2 | 13.23 |
| 8 | CWRKFYC | 617 | 7 | 1005.3 | 7115 | 9.24 | 2 | 7.96 |
| 9 | CSDSWHYWC | 615 | 9 | 1186.39 | 12615 | 4.97 | −1 | 9.86 |
| 10 | WRPCES | 607 | 6 | 776.93 | 5500 | 6.16 | 0 | 11.83 |
| 11 | CWLCNGRCGR | 606 | 10 | 1167.52 | 5625 | 8.60 | 2 | 11.27 |
| 12 | RHCFSQWCS | 600 | 9 | 1153.41 | 5625 | 8.19 | 1 | 9.89 |
| 13 | CDCRGDCFC | 598 | 9 | 1021.26 | 250 | 3.91 | −1 | 16.35 |
| 14 | CPHSKPCLC | 598 | 9 | 987.33 | 125 | 8.01 | 1 | 12.46 |
| 15 | CWGCNGRCRM | 595 | 10 | 1185.55 | 5625 | 8.60 | 2 | 11.85 |
| 16 | CSRPRRSEC | 585 | 9 | 1093.34 | 125 | 9.65 | 2 | 17.98 |
| 17 | CSRPRRSVC | 583 | 9 | 1063.36 | 125 | 11.33 | 3 | 13.89 |
| 18 | CVLCNGRCWS | 576 | 10 | 1140.49 | 5625 | 8.00 | 1 | 8.31 |
| 19 | CRGDGWC | 571 | 7 | 795.97 | 5625 | 5.94 | 0 | 13.52 |
| 20 | WREWFL | 571 | 6 | 936.16 | 11000 | 6.70 | 0 | 6.20 |
| 610.25 | 8 | 1041.50 | 6117.25 | 7.82 | 1.00 | 11.05 |
Top 20 peptides having the lowest S(P) values along with their important physicochemical properties.
| # | Sequence | THP Score | Length | Molecular Weight | Extinction Coefficient (M−1·cm−1) | pI | Net Charge | Hydrophobicity (Kcal·mol−1) |
|---|---|---|---|---|---|---|---|---|
| 1 | IKIQD | 69 | 5 | 615.80 | 0 | 6.72 | 0 | 12.87 |
| 2 | KKEKDIMKKTI | 74 | 11 | 1361.87 | 0 | 10.39 | 3 | 26.51 |
| 3 | INGKVT | 99 | 6 | 630.83 | 0 | 10.15 | 1 | 11.37 |
| 4 | VKNNVEVN | 105 | 8 | 915.13 | 0 | 6.81 | 0 | 15.50 |
| 5 | IGIGAG | 105 | 6 | 486.66 | 0 | 5.60 | 0 | 9.61 |
| 6 | AVKKAYDIAIQ | 108 | 11 | 1219.60 | 1490 | 9.73 | 1 | 16.00 |
| 7 | DVGTTE | 113 | 6 | 620.69 | 0 | 2.87 | −2 | 16.36 |
| 8 | IGDAT | 114 | 5 | 475.56 | 0 | 3.00 | −1 | 12.32 |
| 9 | VAIDM | 115 | 5 | 547.73 | 0 | 3.02 | −1 | 9.79 |
| 10 | DVKGVFVNI | 119 | 9 | 990.30 | 0 | 6.77 | 0 | 12.13 |
| 11 | DLAVVEVDQVMVVD | 119 | 14 | 1530.96 | 0 | 2.63 | −4 | 19.04 |
| 12 | TDIDDKIINRAI | 121 | 12 | 1386.74 | 0 | 4.21 | −1 | 20.55 |
| 13 | GDVVANT | 123 | 7 | 674.80 | 0 | 3.00 | −1 | 13.37 |
| 14 | IDKQLE | 131 | 6 | 744.93 | 0 | 7.00 | −1 | 16.37 |
| 15 | FGKKKKYKD | 131 | 9 | 1141.50 | 1490 | 10.49 | 4 | 24.27 |
| 16 | KENILNE | 135 | 7 | 859.05 | 0 | 4.08 | −1 | 17.29 |
| 17 | HEAVGI | 136 | 6 | 624.78 | 0 | 5.06 | −1 | 13.93 |
| 18 | HKNKGKKN | 139 | 8 | 953.23 | 0 | 11.03 | 4 | 24.28 |
| 19 | ENAKAAVAEMKDGDVVLLE | 139 | 19 | 2002.54 | 0 | 3.84 | −3 | 31.12 |
| 20 | ITDMAA | 140 | 6 | 620.79 | 0 | 3.13 | −1 | 11.00 |
| 116.75 | 8 | 920.17 | 149.00 | 5.98 | −0.20 | 16.68 |
Propensity scores of 20 amino acids to be THPs (PS-THP) along with amino acid compositions (%) of THPs and non-THPs based on the Main-TRN dataset.
| Amino Acid | PS-THP | THP (%) | non-THP (%) | Difference | |
|---|---|---|---|---|---|
| C-Cys | 1000(1) | 9.635 | 1.082 | 8.552(1) | <0.01 * |
| W-Trp | 981(2) | 3.459 | 1.088 | 2.371(3) | <0.01 * |
| R-Arg | 598(3) | 8.947 | 5.062 | 3.885(2) | <0.01 * |
| P-Pro | 587(4) | 6.831 | 4.940 | 1.891(4) | <0.01 * |
| F-Phe | 424(5) | 3.018 | 3.846 | −0.828(13) | 0.017 |
| S-Ser | 407(6) | 8.525 | 6.860 | 1.666(5) | <0.01 * |
| H-His | 382(7) | 3.084 | 2.699 | 0.385(6) | 0.287 |
| L-Leu | 374(8) | 8.157 | 9.394 | −1.237(14) | 0.020 |
| Y-Tyr | 273(9) | 3.023 | 2.912 | 0.111(8) | 0.741 |
| M-Met | 266(10) | 2.629 | 2.604 | 0.025(9) | 0.940 |
| Q-Gln | 198(11) | 3.284 | 4.052 | −0.769(11) | 0.046 |
| N-Asn | 195(12) | 3.365 | 4.169 | −0.804(12) | 0.033 |
| A-Ala | 160(13) | 5.717 | 8.099 | −2.382(16) | <0.01 * |
| G-Gly | 157(14) | 7.552 | 7.203 | 0.349(7) | 0.516 |
| T-Thr | 150(15) | 4.744 | 5.364 | −0.620(10) | 0.186 |
| D-Asp | 103(16) | 3.798 | 5.664 | −1.866(15) | <0.01 * |
| E-Glu | 67(17) | 3.544 | 6.153 | −2.609(19) | <0.01 * |
| V-Val | 48(18) | 4.392 | 6.906 | −2.514(17) | <0.01 * |
| K-Lys | 45(19) | 3.469 | 6.008 | −2.540(18) | <0.01 * |
| I-Ile | 0(20) | 2.828 | 5.894 | −3.066(20) | <0.01 * |
| R | 1.000 | 0.462 | −0.589 | 0.876 | - |
* Statistically significant at the level of p-value < 0.01.
Summary of two important physicochemical properties (PCPs) as derived from SCMTHP.
| Amino Acid | PS-THP | MCMT640101 a | Molar Extinction Coefficients
|
|---|---|---|---|
| C-Cys | 1000(1) | 35.77(2) | 225(6) |
| W-Trp | 981(2) | 42.53(1) | 29,050(1) |
| R-Arg | 598(3) | 26.66(5) | 102(9) |
| P-Pro | 587(4) | 10.93(17) | 30(19) |
| F-Phe | 424(5) | 29.4(4) | 5200(3) |
| S-Ser | 407(6) | 6.35(18) | 34(17) |
| H-His | 382(7) | 21.81(6) | 5125(4) |
| L-Leu | 374(8) | 18.78(10) | 45(13) |
| Y-Tyr | 273(9) | 31.53(3) | 5375(2) |
| M-Met | 266(10) | 21.64(7) | 980(5) |
| Q-Gln | 198(11) | 17.56(11) | 142(7) |
| N-Asn | 195(12) | 13.28(14) | 136(8) |
| A-Ala | 160(13) | 4.34(19) | 32(18) |
| G-Gly | 157(14) | 0(20) | 21(20) |
| T-Thr | 150(15) | 11.01(16) | 41(16) |
| D-Asp | 103(16) | 12(15) | 58(11) |
| E-Glu | 67(17) | 17.26(12) | 78(10) |
| V-Val | 48(18) | 13.92(13) | 43(14) |
| K-Lys | 45(19) | 21.29(8) | 41(15) |
| I-Ile | 0(20) | 19.06(9) | 45(12) |
| R | 1.000 | 0.635 | 0.556 |
a MCMT640101 = Refractivity (McMeekin et al., 1964) [18], Cited by Jones (1975) [18]. b (M−1 cm−1) c = Molar extinction coefficients () of free amino acids (M−1 cm−1) at 214 nm in 20% (v/v) acetonitrile and 0.1% (v/v) formic acid derived from the work of [50].