| Literature DB >> 35910147 |
Zhenjiao Du1, Donghai Wang2, Yonghui Li1.
Abstract
Due to their multiple beneficial effects, antioxidant peptides have attracted increasing interest. Currently, the screening and identification of bioactive peptides, including antioxidative peptides based on wet-chemistry methods are time-consuming and highly rely on many advanced instruments and trained personnel. Quantitative structure-activity relationship (QSAR) analysis as an in silico method can be more efficient and cost-effective. However, model performance of QSAR studies on antioxidant peptides was still poor due to limited attempts in model development approaches. The objective of this study was to compare popular machine learning methods for antioxidant activity modeling and screening of tripeptides and identify the critical amino acid features that determine the antioxidant activity. 533 numerical indices of amino acids were adopted to characterize 130 tripeptides with known antioxidant activity from the published literature, and then 7 feature selection strategies plus pairwise correlation were used to screen the most important indices for antioxidant activity and model building. 14 machine learning methods were used to build models based on the feature selection strategies, respectively. Among the 98 models, non-linear regression methods tended to perform better, and the best model with an R 2 Test of 0.847 and RMSETest of 0.627 for tripeptide antioxidants was obtained by combining random forest for feature selection and tree-based extreme gradient boost regression for model development. Based on the predicted antioxidant values of 7870 unknown tripeptides, potentially high antioxidant activity tripeptides all have a tyrosine, tryptophan, or cysteine residue at the C-terminal position. Furthermore, the predicted antioxidant activity of six synthesized tripeptides was confirmed through experimental determination, and for the first time, the cysteine or tyrosine residue at the C-terminal was found to be critical to the antioxidant activity based on both QSAR models and experimental observations.Entities:
Year: 2022 PMID: 35910147 PMCID: PMC9330208 DOI: 10.1021/acsomega.2c03062
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Flowchart for QSAR modeling and validation of antioxidant tripeptides.
Sequence and TEAC of Tripeptide (μmol TE/μMol Peptide) Data Set from the Literature.
| no. | sequence | activity | no. | sequence | activity | no. | sequence | activity |
|---|---|---|---|---|---|---|---|---|
| 1 | LHA | 0 | 47 | PHN | 0.24 | 93 | PWR | 0.822 |
| 2 | LHD | 0 | 48 | LWF | 0.25 | 94 | PWI | 0.832 |
| 3 | LHE | 0 | 49 | PWD | 0.262 | 95 | RWG | 0.842 |
| 4 | LHF | 0 | 50 | LVG | 0.266 | 96 | LWN | 0.866 |
| 5 | LHG | 0 | 51 | PHH | 0.266 | 97 | LWR | 0.869 |
| 6 | LHH | 0 | 52 | PWE | 0.339 | 98 | PWL | 0.88 |
| 7 | LHQ | 0 | 53 | PHI | 0.344 | 99 | PWT | 0.9 |
| 8 | PHA | 0 | 54 | PHQ | 0.348 | 100 | PWN | 0.943 |
| 9 | PHD | 0 | 55 | GHG | 0.365 | 101 | RWH | 0.995 |
| 10 | PHE | 0 | 56 | LWD | 0.402 | 102 | RWQ | 0.995 |
| 11 | PHF | 0 | 57 | LWG | 0.406 | 103 | KHP | 1.143 |
| 12 | PHM | 0 | 58 | RHS | 0.409 | 104 | GVR | 1.157 |
| 13 | RHA | 0 | 59 | PWA | 0.414 | 105 | ECG | 1.413 |
| 14 | RHD | 0 | 60 | GHP | 0.426 | 106 | PHW | 1.768 |
| 15 | RHE | 0 | 61 | PWS | 0.44 | 107 | PWW | 1.774 |
| 16 | RHH | 0 | 62 | PWV | 0.457 | 108 | RWW | 1.837 |
| 17 | RHK | 0 | 63 | RWD | 0.485 | 109 | LHW | 1.84 |
| 18 | RHQ | 0 | 64 | LWM | 0.49 | 110 | LWW | 1.931 |
| 19 | RHT | 0 | 65 | PHG | 0.496 | 111 | WPL | 1.972 |
| 20 | PHT | 0.028 | 66 | RWA | 0.497 | 112 | VPW | 1.972 |
| 21 | LHM | 0.031 | 67 | PWM | 0.498 | 113 | RHW | 2.203 |
| 22 | LHN | 0.046 | 68 | LWV | 0.499 | 114 | LWY | 2.332 |
| 23 | GVT | 0.047 | 69 | RWV | 0.51 | 115 | RWY | 2.334 |
| 24 | PHS | 0.058 | 70 | LWL | 0.515 | 116 | RHY | 2.464 |
| 25 | KHR | 0.067 | 71 | LWQ | 0.519 | 117 | PHY | 2.707 |
| 26 | GHT | 0.079 | 72 | LWS | 0.522 | 118 | LHY | 2.753 |
| 27 | LWH | 0.098 | 73 | LWA | 0.594 | 119 | PWY | 2.785 |
| 28 | LHK | 0.108 | 74 | RWS | 0.6 | 120 | GVW | 4.365 |
| 29 | LHR | 0.108 | 75 | RHF | 0.6 | 121 | GKW | 4.687 |
| 30 | LHT | 0.108 | 76 | LWT | 0.627 | 122 | GHW | 4.745 |
| 31 | LHV | 0.108 | 77 | LWI | 0.628 | 123 | QVW | 5.161 |
| 32 | RHR | 0.118 | 78 | LWK | 0.629 | 124 | KVW | 5.218 |
| 33 | PHK | 0.176 | 79 | PWH | 0.632 | 125 | NKW | 5.349 |
| 34 | LHL | 0.186 | 80 | PWK | 0.634 | 126 | NHW | 5.368 |
| 35 | RHI | 0.189 | 81 | PWQ | 0.637 | 127 | QHW | 5.524 |
| 36 | PHV | 0.198 | 82 | RWR | 0.651 | 128 | KHW | 5.566 |
| 37 | PWF | 0.202 | 83 | RWT | 0.651 | 129 | PYW | 5.683 |
| 38 | PWG | 0.203 | 84 | RWE | 0.663 | 130 | YHW | 6.169 |
| 39 | RHG | 0.203 | 85 | LHS | 0.68 | |||
| 40 | RHL | 0.206 | 86 | RWF | 0.689 | |||
| 41 | RHM | 0.207 | 87 | RWL | 0.689 | |||
| 42 | RHN | 0.208 | 88 | RWI | 0.702 | |||
| 43 | PHR | 0.211 | 89 | RWM | 0.702 | |||
| 44 | RHV | 0.212 | 90 | RWN | 0.702 | |||
| 45 | LHI | 0.217 | 91 | RWK | 0.753 | |||
| 46 | PHL | 0.238 | 92 | LWE | 0.777 |
Amino Acid Positions, Variable Importance, and Description of the Selected Variables from Different Feature Selection Strategiesa
| AAindex accession number | amino acid position | variable importance | description | note |
|---|---|---|---|---|
| selected variables by FI-XGB | ||||
| BURA740101 | N-terminal | 0.0199 | normalized frequency of the alpha-helix | |
| CHOP780215 | N-terminal | 0.161 | frequency of the 4th residue in turn | A |
| BEGF750102 | central | 0.036 | conformational parameter of the beta-structure | |
| KANM800103 | C-terminal | 0.0138 | average relative probability of the inner helix | |
| LIFS790103 | C-terminal | 0.7049 | conformational preference for antiparallel beta-strands | B |
| selected variables by FI-RFR | ||||
| PALJ810113 | N-terminal | 0.025 | normalized frequency of turn in the all-alpha class | |
| ONEK900102 | N-terminal | 0.0108 | helix formation parameters (delta delta G) | |
| FUKS010101 | N-terminal | 0.015 | surface composition of amino acids in intracellular proteins of thermophiles (percent) | |
| JOND750102 | C-terminal | 0.0171 | pK (-COOH) | |
| LIFS790103 | C-terminal | 0.0518 | conformational preference for antiparallel beta-strands | B |
| MCMT640101 | C-terminal | 0.0286 | refractivity | |
| NAKH920102 | C-terminal | 0.0688 | AA composition of CYT2 of single-spanning proteins | |
| OOBM850102 | C-terminal | 0.037 | optimized propensity to form reverse turn | C |
| WEBA780101 | C-terminal | 0.0371 | RF value in high-salt chromatography | D |
| VINM940102 | C-terminal | 0.051 | normalized flexibility parameters (B-values) for each residue surrounded by none rigid neighbors | |
| PARS000101 | C-terminal | 0.0367 | N | |
| PARS000102 | C-terminal | 0.0768 | K | |
| FODM020101 | C-terminal | 0.0416 | free energy change of epsilon(i) to alpha(Rh) | E |
| MITS020101 | C-terminal | 0.1532 | amphiphilicity index | F |
| DIGM050101 | C-terminal | 0.0563 | hydrostatic pressure asymmetry index, PAI | G |
| selected variables by FC-LR | ||||
| MAXF760103 | N-terminal | 0.025 | normalized frequency of zeta R | |
| NAKH900102 | N-terminal | 0.0371 | SD of AA composition of total proteins | |
| QIAN880114 | N-terminal | 0.051 | weights for beta-sheet at the window position of -6 | |
| KHAG800101 | central | 0.0367 | the Kerr-constant increments | |
| CHOP780215 | C-terminal | 0.0768 | frequency of the 4th residue in turn | A |
| OOBM850102 | C-terminal | 0.0416 | optimized propensity to form reverse turn | C |
| WEBA780101 | C-terminal | 0.0153 | RF value in high salt chromatography | D |
| MITS020101 | C-terminal | 0.0563 | amphiphilicity index | F |
| selected variables by RFE-LR | ||||
| WERD780102 | N-terminal | 0.3136 | free energy change of epsilon(i) to epsilon(ex) | |
| AURR980107 | N-terminal | 0.9357 | normalized positional residue frequency at helix termini N2 | |
| AURR980111 | N-terminal | 2.0325 | normalized positional residue frequency at helix termini C5 | H |
| AURR980116 | N-terminal | 1.5792 | normalized positional residue frequency at helix termini Cc | |
| CEDJ970105 | N-terminal | 0.2991 | composition of amino acids in nuclear proteins (percent) | I |
| KARS160120 | N-terminal | 0.5644 | weighted minimum eigenvalue based on the atomic numbers | |
| CHOC760104 | Central | 0.8074 | proportion of residues 100% buried | |
| GEIM800110 | Central | 0.6058 | aperiodic indices for beta-proteins | J |
| QIAN880136 | Central | 0.9265 | weights for coil at the window position of 3 | |
| KARS160113 | Central | 0.7146 | weighted domination number using the atomic number | |
| CHOP780215 | C-terminal | 0.8292 | frequency of the 4th residue in turn | A |
| GEIM800110 | C-terminal | 0.0872 | aperiodic indices for beta-proteins | J |
| HUTJ700101 | C-terminal | 0.271 | heat capacity | |
| HUTJ700103 | C-terminal | 0.3975 | entropy of formation | |
| KARP850102 | C-terminal | 0.6556 | flexibility parameter for one rigid neighbor | |
| NAKH900110 | C-terminal | 1.1114 | normalized composition of membrane proteins | |
| WILM950102 | C-terminal | 0.66043 | hydrophobicity coefficient in RP-HPLC, C8 with 0.1%TFA/MeCN/H2O | |
| selected variables by RFE-SVR | ||||
| CHAM820102 | N-terminal | 0.303 | free energy of solution in water, kcal/mole | |
| NAKH920101 | N-terminal | 0.4642 | AA composition of CYT of single-spanning proteins | |
| RICJ880114 | N-terminal | 0.1628 | relative preference value at C1 | |
| PARS000102 | N-terminal | 0.2303 | K | |
| CEDJ970105 | N-terminal | 0.1578 | composition of amino acids in nuclear proteins (percent) | I |
| GEOR030105 | N-terminal | 0.0656 | linker propensity from small data set (linker length is less than six residues) | L |
| GEIM800106 | Central | 0.2531 | beta-strand indices for beta-proteins | |
| NAKH900108 | Central | 0.1859 | normalized composition from fungi and plant | |
| PALJ810116 | Central | 0.1345 | normalized frequency of turn in alpha/beta class | M |
| GEOR030105 | Central | 0.1183 | linker propensity from small data set (linker length is less than six residues) | L |
| CHOP780215 | C-terminal | 0.1984 | frequency of the 4th residue in turn | A |
| OOBM850102 | C-terminal | 0.3586 | optimized propensity to form reverse turn | C |
| PALJ810116 | C-terminal | 0.1983 | normalized frequency of turn in alpha/beta class | M |
| WERD780104 | C-terminal | 0.2361 | free energy change of epsilon(i) to alpha (Rh) | |
| PARS000101 | C-terminal | 0.3056 | N | |
| MITS020101 | C-terminal | 0.0605 | amphiphilicity index | F |
| DIGM050101 | C-terminal | 0.1109 | hydrostatic pressure asymmetry index, PAI | G |
| selected variables by RFE-RFR | ||||
| CHOP780215 | N-terminal | 0.0336 | frequency of the 4th residue in turn | A |
| ISOY800108 | N-terminal | 0.0294 | normalized relative frequency of coil | |
| MAXF760104 | N-terminal | 0.0341 | normalized frequency of left-handed alpha-helix | |
| GEOR030105 | N-terminal | 0.0486 | linker propensity from small data set (linker length is less than six residues) | L |
| KARS160122 | N-terminal | 0.0362 | weighted second smallest eigenvalue of the weighted Laplacian matrix | |
| QIAN880127 | central | 0.0362 | weights for coil at the window position of -6 | |
| AURR980111 | central | 0.0291 | normalized positional residue frequency at helix termini C5 | H |
| LIFS790103 | C-terminal | 0.1127 | conformational preference for antiparallel beta-strands | B |
| MCMT640101 | C-terminal | 0.0969 | refractivity | |
| OOBM850102 | C-terminal | 0.0462 | optimized propensity to form reverse turn | C |
| WEBA780101 | C-terminal | 0.0245 | normalized frequency of turn in all-alpha class | D |
| PARS000102 | C-terminal | 0.0745 | K | |
| FODM020101 | C-terminal | 0.1246 | free energy change of epsilon(i) to alpha(Rh) | E |
| MITS020101 | C-terminal | 0.2131 | amphiphilicity index | F |
| DIGM050101 | C-terminal | 0.0603 | hydrostatic pressure asymmetry index, PAI | G |
Note: Detailed information of these selected variables are available at https://www.genome.jp/aaindex/. The same capitalized letter in the last column indicates same amino acid features.
Performance of 14 QSAR Models Based on the Different Feature Selection Strategies.a
| training data set | test data set | ||||||
|---|---|---|---|---|---|---|---|
| model | RMSETrain | RMSECV | RMSETest | note | |||
| QSAR models based on FI-XGB | |||||||
| tree-XGB | 0.955 | 0.295 | 0.911 | 0.416 | 0.814 | 0.692 | *** |
| linear-XGB | 0.566 | 0.921 | 0.478 | 1.01 | 0.558 | 1.067 | |
| RFR | 0.956 | 0.295 | 0.924 | 0.386 | 0.807 | 0.698 | ** |
| GBDT | 0.976 | 0.219 | 0.904 | 0.434 | 0.78 | 0.752 | * |
| bagging | 0.974 | 0.226 | 0.904 | 0.434 | 0.769 | 0.77 | |
| MLP | 0.961 | 0.276 | 0.847 | 0.548 | 0.77 | 0.769 | |
| KNN | 0.84 | 0.559 | 0.598 | 0.887 | 0.555 | 1.069 | |
| rbf-SVR | 0.965 | 0.263 | 0.831 | 0.574 | 0.726 | 0.84 | |
| linear-SVR | 0.387 | 1.095 | 0.355 | 1.123 | 0.345 | 1.298 | |
| Lasso | 0.575 | 0.912 | 0.473 | 1.015 | 0.59 | 1.027 | |
| Ridge | 0.566 | 0.921 | 0.478 | 1.01 | 0.557 | 1.068 | |
| SGD | 0.516 | 0.973 | 0.424 | 1.062 | 0.49 | 1.146 | |
| KernelRidge | 0.074 | 1.346 | –0.073 | 1.448 | 0.206 | 1.429 | |
| Huber | 0.567 | 0.92 | 0.478 | 1.01 | 0.559 | 1.064 | |
| QSAR models based on FI-RFR | |||||||
| tree-XGB | 0.954 | 0.3 | 0.872 | 0.5 | 0.847 | 0.627 | *** |
| linear-XGB | 0.789 | 0.643 | 0.722 | 0.738 | 0.681 | 0.906 | |
| RFR | 0.928 | 0.375 | 0.842 | 0.556 | 0.854 | 0.613 | ** |
| GBDT | 0.978 | 0.207 | 0.866 | 0.512 | 0.781 | 0.75 | |
| Bagging | 0.962 | 0.274 | 0.833 | 0.571 | 0.822 | 0.677 | |
| MLP | 0.976 | 0.219 | 0.82 | 0.592 | 0.773 | 0.764 | |
| KNN | 0.933 | 0.362 | 0.832 | 0.573 | 0.814 | 0.691 | |
| rbf-SVR | 0.954 | 0.3 | 0.832 | 0.574 | 0.844 | 0.632 | * |
| linear-SVR | 0.78 | 0.655 | 0.709 | 0.755 | 0.623 | 0.984 | |
| Lasso | 0.796 | 0.632 | 0.714 | 0.748 | 0.685 | 0.901 | |
| Ridge | 0.779 | 0.657 | 0.721 | 0.739 | 0.679 | 0.909 | |
| SGD | 0.789 | 0.642 | 0.724 | 0.735 | 0.674 | 0.916 | |
| KernelRidge | 0.279 | 1.187 | –0.118 | 1.479 | 0.295 | 1.346 | |
| Huber | 0.792 | 0.637 | 0.719 | 0.742 | 0.682 | 0.904 | |
| QSAR models based on FC-LR | |||||||
| tree-XGB | 0.977 | 0.214 | 0.883 | 0.477 | 0.707 | 0.868 | |
| linear-XGB | 0.827 | 0.582 | 0.775 | 0.663 | 0.783 | 0.748 | ** |
| RFR | 0.983 | 0.182 | 0.923 | 0.389 | 0.652 | 0.946 | |
| GBDT | 0.991 | 0.134 | 0.928 | 0.377 | 0.626 | 0.981 | |
| Bagging | 0.989 | 0.145 | 0.92 | 0.396 | 0.681 | 0.905 | |
| MLP | 0.975 | 0.221 | 0.771 | 0.669 | 0.763 | 0.781 | |
| KNN | 0.94 | 0.342 | 0.835 | 0.568 | 0.813 | 0.693 | *** |
| rbf-SVR | 0.988 | 0.155 | 0.741 | 0.711 | 0.716 | 0.855 | |
| linear-SVR | 0.815 | 0.602 | 0.756 | 0.691 | 0.782 | 0.75 | |
| Lasso | 0.817 | 0.598 | 0.759 | 0.686 | 0.739 | 0.819 | |
| Ridge | 0.821 | 0.592 | 0.779 | 0.658 | 0.777 | 0.757 | |
| SGD | 0.826 | 0.584 | 0.771 | 0.669 | 0.785 | 0.743 | |
| KernelRidge | 0.319 | 1.155 | 0.034 | 1.375 | 0.37 | 1.272 | |
| Huber | 0.829 | 0.579 | 0.759 | 0.687 | 0.786 | 0.741 | * |
| QSAR models based on RFE-LR | |||||||
| tree-XGB | 0.951 | 0.31 | 0.801 | 0.624 | 0.773 | 0.764 | |
| linear-XGB | 0.849 | 0.542 | 0.752 | 0.697 | 0.78 | 0.752 | |
| RFR | 0.939 | 0.345 | 0.793 | 0.636 | 0.737 | 0.823 | |
| GBDT | 0.986 | 0.164 | 0.821 | 0.592 | 0.8 | 0.718 | * |
| Bagging | 0.976 | 0.217 | 0.815 | 0.601 | 0.766 | 0.775 | |
| MLP | 0.979 | 0.202 | 0.868 | 0.509 | 0.824 | 0.672 | *** |
| KNN | 0.859 | 0.526 | 0.749 | 0.701 | 0.627 | 0.98 | |
| rbf-SVR | 0.993 | 0.118 | 0.774 | 0.666 | 0.569 | 1.053 | |
| linear-SVR | 0.887 | 0.47 | 0.781 | 0.654 | 0.634 | 0.97 | |
| Lasso | 0.89 | 0.464 | 0.774 | 0.664 | 0.653 | 0.945 | |
| Ridge | 0.908 | 0.425 | 0.814 | 0.602 | 0.77 | 0.769 | |
| SGD | 0.787 | 0.645 | 0.684 | 0.786 | 0.768 | 0.773 | |
| KernelRidge | 0.24 | 1.219 | 0.004 | 1.395 | 0.328 | 1.315 | |
| Huber | 0.915 | 0.407 | 0.831 | 0.575 | 0.819 | 0.681 | ** |
| QSAR models based on RFE-SVR | |||||||
| tree-XGB | 0.945 | 0.329 | 0.893 | 0.457 | 0.772 | 0.766 | |
| linear-XGB | 0.844 | 0.553 | 0.756 | 0.691 | 0.759 | 0.787 | |
| RFR | 0.955 | 0.295 | 0.939 | 0.346 | 0.758 | 0.788 | |
| GBDT | 0.982 | 0.187 | 0.891 | 0.462 | 0.811 | 0.696 | |
| Bagging | 0.992 | 0.126 | 0.947 | 0.321 | 0.778 | 0.756 | |
| MLP | 0.979 | 0.202 | 0.882 | 0.48 | 0.846 | 0.628 | *** |
| KNN | 0.924 | 0.386 | 0.897 | 0.449 | 0.839 | 0.643 | * |
| rbf-SVR | 0.996 | 0.095 | 0.835 | 0.568 | 0.666 | 0.927 | |
| linear-SVR | 0.922 | 0.39 | 0.829 | 0.579 | 0.809 | 0.701 | |
| Lasso | 0.844 | 0.552 | 0.756 | 0.691 | 0.759 | 0.787 | |
| Ridge | 0.859 | 0.525 | 0.806 | 0.616 | 0.83 | 0.662 | |
| SGD | 0.916 | 0.405 | 0.834 | 0.569 | 0.886 | 0.541 | |
| KernelRidge | 0.329 | 1.145 | 0.156 | 1.285 | 0.448 | 1.191 | |
| Huber | 0.926 | 0.381 | 0.84 | 0.559 | 0.84 | 0.642 | ** |
| QSAR models based on RFE-RFR | |||||||
| tree-XGB | 0.978 | 0.205 | 0.931 | 0.367 | 0.828 | 0.665 | *** |
| linear-XGB | 0.852 | 0.539 | 0.786 | 0.647 | 0.704 | 0.872 | |
| RFR | 0.976 | 0.219 | 0.937 | 0.349 | 0.808 | 0.703 | |
| GBDT | 0.989 | 0.145 | 0.935 | 0.358 | 0.815 | 0.689 | * |
| Bagging | 0.992 | 0.122 | 0.939 | 0.345 | 0.799 | 0.719 | |
| MLP | 0.98 | 0.197 | 0.89 | 0.465 | 0.817 | 0.686 | ** |
| KNN | 0.966 | 0.259 | 0.915 | 0.409 | 0.791 | 0.734 | |
| rbf-SVR | 0.996 | 0.089 | 0.924 | 0.385 | 0.801 | 0.716 | |
| linear-SVR | 0.852 | 0.539 | 0.761 | 0.684 | 0.699 | 0.88 | |
| Lasso | 0.861 | 0.521 | 0.778 | 0.66 | 0.706 | 0.869 | |
| Ridge | 0.867 | 0.51 | 0.783 | 0.651 | 0.702 | 0.875 | |
| SGD | 0.863 | 0.518 | 0.787 | 0.647 | 0.703 | 0.874 | |
| KernelRidge | 0.382 | 1.099 | 0.105 | 1.324 | 0.144 | 1.484 | |
| Huber | 0.86 | 0.523 | 0.788 | 0.643 | 0.706 | 0.869 | |
| QSAR models without feature selection | |||||||
| tree-XGB | 0.987 | 0.161 | 0.860 | 0.523 | 0.705 | 0.87 | |
| linear-XGB | 0.927 | 0.378 | 0.786 | 0.647 | 0.746 | 0.807 | * |
| RFR | 0.946 | 0.324 | 0.892 | 0.459 | 0.749 | 0.803 | *** |
| GBDT | 0.992 | 0.126 | 0.898 | 0.447 | 0.744 | 0.811 | ** |
| Bagging | 0.929 | 0.374 | 0.419 | 1.066 | 0.404 | 1.238 | |
| MLP | 0.773 | 0.666 | 0.621 | 0.861 | 0.619 | 0.99 | |
| KNN | 0.996 | 0.089 | 0.752 | 0.697 | 0.628 | 0.978 | |
| rbf-SVR | 0.926 | 0.381 | 0.709 | 0.754 | 0.734 | 0.827 | |
| linear-SVR | 0.893 | 0.457 | 0.765 | 0.678 | 0.731 | 0.831 | |
| Lasso | 0.936 | 0.355 | 0.758 | 0.688 | 0.744 | 0.811 | |
| Ridge | 0.463 | 1.025 | 0.160 | 1.281 | 0.263 | 1.377 | |
| SGD | 0.411 | 1.073 | –0.081 | 1.454 | 0.073 | 1.544 | |
| KernelRidge | 0.936 | 0.355 | 0.757 | 0.69 | 0.743 | 0.812 | |
| Huber | 0.981 | 0.195 | 0.858 | 0.526 | 0.743 | 0.812 | |
Note: Detailed description of these models are available at https://scikit-learn.org/stable/and https://xgboost.readthedocs.io/en/stable/. (*)The models with more stars in the last column indicate better performance from the same feature selection method.
Antioxidant Activity of Synthesized Tripeptides.
| synthetic tripeptide | observed activity (μmol TE/μmol peptide) | predicted activity (μmol TE/μmol peptide) |
|---|---|---|
| QAY | 4.270 ± 0.124 | 6.167 |
| PHC | 5.013 ± 0.184 | 6.023 |
| YSQ | 3.736 ± 0.024 | 5.696 |
| YPQ | 3.028 ± 0.173 | 5.696 |
| VYV | 3.601 ± 0.039 | 4.837 |
| GPE | 0.598 ± 0.099 | 2.741 |