| Literature DB >> 32226875 |
Mehri Mahmoodi-Reihani1, Fatemeh Abbasitabar2, Vahid Zare-Shahabadi1.
Abstract
Predicting the bioactivity of peptides is an important challenge in drug development and peptide research. In this study, numerical descriptive vectors (NDVs) for peptide sequences were calculated based on the physicochemical properties of amino acids (AAs) and principal component analysis (PCA). The resulted NDV had the same length as the peptide sequence, so that each entry of NDV corresponded to one AA in the sequence. They were then applied to quantitative structure-activity relationship (QSAR) analysis of angiotensin-converting enzyme (ACE) inhibitor dipeptides, bitter-tasting dipeptides, and nonameric binding peptides of the human leukocyte antigens (HLA-A*0201). Multiple linear regression was used to construct the QSAR models. For each peptide set, a proper subset of physicochemical properties was chosen by the ant colony optimization algorithm. The leave-one-out cross-validation (q loo 2) values were 0.855, 0.936, and 0.642 and the root-mean-square errors (RMSEs) were 0.450, 0.149, and 0.461. Our results revealed that the new numerical descriptive vector can afford extensive characterization of peptide sequence so that it can be easily employed in peptide QSAR studies. Moreover, the proposed numerical descriptive vectors were able to determine hot spot residues in the peptides under study.Entities:
Year: 2020 PMID: 32226875 PMCID: PMC7097998 DOI: 10.1021/acsomega.9b04302
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Graphical representation of the NDV calculation and QSAR modeling for the ACE data set.
Characteristics of the Data Sets Used in This Paper
| size | ||||||
|---|---|---|---|---|---|---|
| groups | name | no. of peptides | no. of sequences | training set | test set | refs |
| data set 1 | ACE | 55 | 3 | 45 | 10 | ( |
| data set 2 | bitter tasting | 48 | 2 | 40 | 8 | ( |
| data set 3 | HLA | 177 | 9 | 131 | 46 | ( |
QSAR Models of the Peptides as ACE Inhibitors Obtained Using Different Sets of the AA Indices
| number of used AA indices | RSMEtraining | RSMEtest | |||||
|---|---|---|---|---|---|---|---|
| 2 | 0.829 | 0.801 | 0.797 | 0.788 | 0.266 | 0.376 | 0.150 |
| 3 | 0.830 | 0.826 | 0.806 | 0.803 | 0.265 | 0.341 | 0.217 |
| 4 | 0.843 | 0.852 | 0.814 | 0.814 | 0.255 | 0.265 | 0.121 |
| 5 | 0.849 | 0.842 | 0.822 | 0.819 | 0.250 | 0.317 | 0.176 |
| 6 | 0.845 | 0.844 | 0.818 | 0.816 | 0.253 | 0.314 | 0.104 |
| 7 | 0.855 | 0.859 | 0.831 | 0.830 | 0.245 | 0.302 | 0.160 |
| 8 | 0.855 | 0.861 | 0.831 | 0.829 | 0.245 | 0.295 | 0.099 |
Maximum Rtraining2 for the Y-randomization test.
Figure 2Plot of predicted versus experimental pIC50 values of ACE tripeptides.
Comparison between QSAR Models for the ACE Data Seta
| descriptors | model | variables/LVs | RSMEtraining | refs | ||||
|---|---|---|---|---|---|---|---|---|
| 1 | PLS | 2 | 0.770 | NR | NR | NR | Hellberg
et al.[ | |
| 2 | GRID PP | PLS | 1 | 0.744 | NR | NR | NR | Cocchi and
Johansson[ |
| 3 | ISA-ECI | PLS | 2 | 0.700 | NR | NR | NR | Collantes
and Dunn[ |
| 4 | MS-WHIM (extended) | PLS | 2 | 0.708 | NR | 0.637 | NR | Zaliani and Gancia[ |
| 5 | MS-WHIM (rotameric) | PLS | 6 | 0.657 | NR | 0.541 | NR | Zaliani and Gancia[ |
| 6 | VHSE | PLS | 1 | 0.770 | 0.48 | 0.745 | 0.688 | Mei et al.[ |
| 7 | T scale | PLS | 2 | 0.845 | 0.39 | 0.786 | 0.798 | Tian et al.[ |
| 8 | VSW | PLS | 2 | 0.868 | 0.37 | 0.784 | 0.871 | Tong et al.[ |
| 9 | ATS–QTMS | PLS | 3 | 0.868 | 0.36 | 0.812 | 0.702 | Yousefinejad et al.[ |
| 10 | NDV | MLR | 3 | 0.855 | 0.245 | 0.831 | 0.861 | this work |
NR, not reported.
Statistical Analysis of the Selected QSAR Model for ACE Data Set
| regression coefficient | SE | ||
|---|---|---|---|
| 1.88 | 0.038 | 49.25 | 4.17 × 10–38 |
| –0.27 | 0.046 | –5.87 | 6.59 × 10–7 |
| –0.02 | 0.046 | –0.35 | 0.73 |
| 0.66 | 0.044 | 15.01 | 2.97 × 10–18 |
QSAR Models of BTT Obtained Using Different Sets of the AA Indices
| number of used AA indices | RSMEtraining | RSMEtest | |||||
|---|---|---|---|---|---|---|---|
| 2 | 0.850 | 0.855 | 0.828 | 0.813 | 0.228 | 0.343 | 0.166 |
| 3 | 0.905 | 0.903 | 0.890 | 0.890 | 0.182 | 0.302 | 0.126 |
| 4 | 0.878 | 0.898 | 0.858 | 0.856 | 0.206 | 0.280 | 0.084 |
| 5 | 0.900 | 0.899 | 0.877 | 0.882 | 0.186 | 0.339 | 0.073 |
| 6 | 0.901 | 0.902 | 0.877 | 0.883 | 0.186 | 0.337 | 0.179 |
| 7 | 0.917 | 0.899 | 0.903 | 0.899 | 0.170 | 0.309 | 0.136 |
| 8 | 0.936 | 0.907 | 0.926 | 0.924 | 0.149 | 0.283 | 0.133 |
| 9 | 0.909 | 0.909 | 0.893 | 0.892 | 0.178 | 0.303 | 0.146 |
Maximum Rtraining2 for the Y-randomization test.
Figure 3Plot of predicted versus experimental activities for the BTT data set.
Comparison between QSAR Models for the BTT Data Set
| descriptors | model | variables/LVs | RSMEtraining | refs | ||||
|---|---|---|---|---|---|---|---|---|
| 1 | VSW | PLS | 2 | 0.873 | 0.23 | 0.751 | 0.713 | Tong et al.[ |
| 2 | PLS | 2 | 0.824 | 0.26 | NR | NR | Hellberg et al.[ | |
| 3 | ISA-ECI | PLS | 2 | 0.8480 | 0.245 | NR | 0.245 | Collantes and Dunn[ |
| 4 | MS-WHIM (extended) | PLS | 3 | 0.754 | NR | 0.710 | NR | Zaliani
and Gancia[ |
| 5 | MS-WHIM (rotameric) | PLS | 3 | 0.704 | NR | 0.633 | NR | Zaliani and Gancia[ |
| 6 | VHSE | PLS | 3 | 0.910 | 0.20 | 0.816 | 0.883 | Mie et al.[ |
| 7 | ATS/QTMS | GA–PLS | 2 | 0.872 | 0.22 | 0.826 | 0.770 | Yousefinejad et al.[ |
| 8 | NDV | MLR | 2 | 0.936 | 0.149 | 0.926 | 0.907 | this work |
Comparison between QSAR Models for the HLA Data Set
| descriptors | model | variables/LVs | RSMEtraining | refs | ||||
|---|---|---|---|---|---|---|---|---|
| 1 | QTMS–ADFQ | GA–PLS | 3 | 0.648 | 0.59 | 0.561 | 0.50 | Hemmateenejad
et al.[ |
| 2 | ATS/QTMS | GA–PLS | 6 | 0.782 | 0.47 | 0.682 | 0.50 | Yousefinejad et al.[ |
| 3 | additive | PLS | 3 | 0.85 | NR | 0.54 | 0.64 | Doytchinova et al.[ |
| 4 | global | GA–MLR | 0.43 | 0.75 | NR | 0.42 | Doytchinova et al.[ | |
| 5 | GA–MLR | 0.67 | 0.59 | NR | 0.50 | Doytchinova et al.[ | ||
| 6 | NDV | MLR | 9 | 0.540 | 0.664 | 0.407 | 0.535 | this work |
| 7 | NDV | MLR (after removing outliers) | 9 | 0.702 | 0.452 | 0.632 | 0.712 | this work |
| 8 | NDV | MLR (after removing outliers and omitting nonsignificant variables) | 5 | 0.690 | 0.461 | 0.642 | 0.713 | this work |
Standard error of estimate (SEE).
Figure 4Williams plot (standardized residual versus leverage) for the HLA data set. Critical values of standardized residual and leverage are shown by horizontal and vertical dashed lines, respectively.
Statistics for the Best QSAR Model for the HLA Data Set
| AA no. | β | SE | VIF | ||
|---|---|---|---|---|---|
| intercept | 5.381 | 0.043 | 124.810 | <10–11 | |
| 1 | 0.002 | 0.049 | 0.049 | 0.96 | 1.26 |
| 2 | –0.381 | 0.048 | –7.878 | <10–11 | 1.25 |
| 3 | 0.233 | 0.066 | 3.536 | 5.95 × 10–4 | 2.31 |
| 4 | 0.372 | 0.063 | 5.935 | 3.48 × 10–8 | 2.10 |
| 5 | 0.032 | 0.051 | 0.636 | 0.53 | 1.38 |
| 6 | 0.071 | 0.066 | 1.078 | 0.28 | 2.30 |
| 7 | 0.084 | 0.046 | 1.829 | 0.07 | 1.12 |
| 8 | 0.216 | 0.055 | 3.965 | 1.31 × 10–4 | 1.58 |
| 9 | –0.281 | 0.048 | –5.894 | 4.20 × 10–8 | 1.22 |
Figure 5Plot of predicted versus experimental activities for the HLA data set.
Most Active Peptides along with the Experimental and Predicted Activitiesa
| peptide sequence | experimental pBL50 | predicted pBL50 |
|---|---|---|
| ILDPFPVTV | 8.65 | 6.4927 |
| ILDPFPPTV | 8.17 | 6.3781 |
| ILDPFPPEV | 7.68 | 6.4713 |
| ILDPFPITV | 8.14 | 6.4678 |
| ILDPFPPPV | 7.44 | 6.4911 |
| ILDPLPPTV | 7.15 | 6.4363 |
| VVPPEEEPV | 7.4082 |
The peptide in the last row is that introduced by the QSAR model.
List of 50 Sequences with High Activities Suggested by the QSAR Model
| sequence | predicted pBL50 | sequence | predicted pBL50 | ||
|---|---|---|---|---|---|
| 1 | AVPPEEEPV | 7.38 | 26 | VVPPEVMPV | 7.31 |
| 2 | IVPPEEEPV | 7.34 | 27 | VVPPEEMPV | 7.35 |
| 3 | LVPPEEEPV | 7.38 | 28 | VVPPELMPV | 7.30 |
| 4 | MVPPEEEPV | 7.37 | 29 | VVPPELEPV | 7.36 |
| 5 | VVPPEEEPV | 7.41 | 30 | VVPPELEPA | 7.32 |
| 6 | VAPPEEEPV | 7.36 | 31 | VVPPEEEPA | 7.36 |
| 7 | VADPEEEPV | 7.31 | 32 | VVPPEMEPA | 7.31 |
| 8 | VVDPEEEPV | 7.36 | 33 | VVPPEVEPA | 7.33 |
| 9 | VVDPEIEPV | 7.31 | 34 | AVPPEVEPA | 7.30 |
| 10 | VVDPELEPV | 7.32 | 35 | LVPPEVEPA | 7.30 |
| 11 | VVDPEMEPV | 7.31 | 36 | LVPPEEEPA | 7.34 |
| 12 | VVDPEVEPV | 7.34 | 37 | LVPPEEVPA | 7.30 |
| 13 | VVDPEVVPV | 7.30 | 38 | LVPPEEVPV | 7.35 |
| 14 | VVPPEVVPV | 7.34 | 39 | AVPPEEVPV | 7.34 |
| 15 | VVPPEEVPV | 7.37 | 40 | IVPPEEVPV | 7.31 |
| 16 | VVPPEIVPV | 7.31 | 41 | MVPPEEVPV | 7.33 |
| 17 | VVPPELVPV | 7.33 | 42 | MVPPEVVPV | 7.30 |
| 18 | VVPPEMVPV | 7.32 | 43 | MVPPEVEPV | 7.34 |
| 19 | VVPPEMEPV | 7.35 | 44 | AVPPEVEPV | 7.35 |
| 20 | VVPPEMLPV | 7.30 | 45 | IVPPEVEPV | 7.31 |
| 21 | VVPPEELPV | 7.36 | 46 | LVPPEVEPV | 7.35 |
| 22 | VVPPELLPV | 7.31 | 47 | LAPPEVEPV | 7.30 |
| 23 | VVPPEVLPV | 7.32 | 48 | LAPPEEEPV | 7.33 |
| 24 | VVPPEVEPV | 7.38 | 49 | LAPPEEVPV | 7.30 |
| 25 | VVPPEVIPV | 7.30 | 50 | AAPPEEVPV | 7.30 |