| Literature DB >> 22867018 |
Peteris Zikmanis1, Inara Kampenusa.
Abstract
The kinetic models of metabolic pathways represent a system of biochemical reactions in terms of metabolic fluxes and enzyme kinetics. Therefore, the apparent differences of metabolic fluxes might reflect distinctive kinetic characteristics, as well as sequence-dependent properties of the employed enzymes. This study aims to examine possible linkages between kinetic constants and the amino acid (AA) composition (AAC) for enzymes from the yeast Saccharomyces cerevisiae glycolytic pathway. The values of Michaelis-Menten constant (KM), turnover number (kcat), and specificity constant (ksp = kcat/KM) were taken from BRENDA (15, 17, and 16 values, respectively) and protein sequences of nine enzymes (HXK, GADH, PGK, PGM, ENO, PK, PDC, TIM, and PYC) from UniProtKB. The AAC and sequence properties were computed by ExPASy/ProtParam tool and data processed by conventional methods of multivariate statistics. Multiple linear regressions were found between the log-values of kcat (3 models, 85.74% < Radj.2 <94.11%, p < 0.00001), KM (1 model, Radj.2 = 96.70%, p < 0.00001), ksp (3 models, 96.15% < Radj.2 < 96.50%, p < 0.00001), and the sets of AA frequencies (four to six for each model) selected from enzyme sequences while assessing the potential multicollinearity between variables. It was also found that the selection of independent variables in multiple regression models may reflect certain advantages for definite AA physicochemical and structural propensities, which could affect the properties of sequences. The results support the view on the actual interdependence of catalytic, binding, and structural residues to ensure the efficiency of biocatalysts, since the kinetic constants of the yeast enzymes appear as closely related to the overall AAC of sequences.Entities:
Year: 2012 PMID: 22867018 PMCID: PMC3494524 DOI: 10.1186/1687-4153-2012-11
Source DB: PubMed Journal: EURASIP J Bioinform Syst Biol ISSN: 1687-4145
Figure 1The relationships between kinetic constants and the frequencies of individual AA. Bivariate correlations between the log-values of kinetic constants and frequencies of occurrence for individual AA in the yeast S. cerevisiae enzyme sequences, where KM is the Michaelis-Menten constant (A) and kcat is the catalytic constant (B–D). All the linear correlations are significant at the non-parametric assessment (Kendall’s τ, Spearman’s ρ correlation coefficients).
Figure 2The relationships between kinetic constants and frequencies of two AA. The multiple linear regressions showing changes of the log-values of kinetic constants as dependent variables upon the frequencies of occurrence for two AA in the yeast S. cerevisiae sequences, where kcat is the catalytic constant (A), KM is the Michaelis-Menten constant (C), and ksp = kcat/KM is the specificity constant (E). The observed versus predicted plots (B,D,F) for the values of dependent variables (kcat, KM, and ksp, respectively). The predicted values were calculated from the regression equations: log(kcat) = 5.556 –1.620*M −0.984*W (Radj.2 = 82.88%, p = 0.0000); log(KM) = 8.593 –0.596*N −0.998*D (Radj.2 = 53.72%, p = 0.0039); log(kcat/KM) = 0.818 +0.501*A −1.736*H (Radj.2 = 46.50%, p = 0.0068). All the multiple and pair correlations (A–F) are significant at the non-parametric assessment (Kendall's τ, Spearman's ρ correlation coefficients).
Figure 3The changes of explained variance upon the growing number of variables in the models. Relationships between an increase in the percentage of explained variance and the number of independent variables (AA frequencies of occurrence) included in multiple regressions, where A and B represent the variety of cases for log(kcat) and log(KM), respectively. Variables in the models: model I: 1 – M, 2 – M, W, 3 – M, W, R, and 4 – M, W, R, L; model II: 1 – T, 2 – T, V, 3 – T, V, H, 4 – T, V, H, A, and 5 – T, V, H, A, K; model III: 1 – H, 2 – H, A, 3 – H, A, E, and, 4 – H, A, E, V; model IV: 1 – D, 2 – D, N, 3 – D, N, W, 4 – D, N, W, L, and 5 – D, N, W, L, A.
The characteristics of the obtained models
| I | log(kcat) | 5.2073 | 0.5003 | 10.408 | 0.0000 | 95.58 | 94.11 | | 90.72 | 90.10 | |
| | | M | −1.6219 | 0.1169 | −13.879 | 0.0000 | | | 1.853 | | |
| | | W | −0.5258 | 0.2147 | −2.449 | 0.0307 | | | 3.329 | | |
| | | R | 0.3558 | 0.07329 | 4.855 | 0.0004 | | | 1.103 | | |
| | | L | −0.1697 | 0.06309 | −2.691 | 0.0196 | | | 2.180 | | |
| II | log(kcat) | 3.9385 | 1.3200 | 2.984 | 0.0124 | 95.22 | 93.05 | | 80.32 | 79.01 | |
| | | T | −0.4482 | 0.07274 | −6.161 | 0.0001 | | | 2.851 | | |
| | | V | 0.2756 | 0.05350 | 5.151 | 0.0003 | | | 1.530 | | |
| | | H | −1.3861 | 0.2088 | −6.639 | 0.0000 | | | 2.003 | | |
| | | A | 0.2840 | 0.06859 | 4.141 | 0.0016 | | | 1.868 | | |
| | | K | −0.2333 | 0.09633 | −2.422 | 0.0339 | | | 2.857 | | |
| III | log(kcat) | −6.3103 | 1.7275 | −3.653 | 0.0033 | 89.30 | 85.74 | | 71.62 | 69.73 | |
| | | A | 0.4367 | 0.07955 | 5.489 | 0.0001 | | | 1.224 | | |
| | | H | −0.9759 | 0.3015 | −3.237 | 0.0071 | | | 2.034 | | |
| | | V | 0.2728 | 0.07752 | 3.519 | 0.0042 | | | 1.564 | | |
| | | E | 0.5900 | 0.1564 | 3.773 | 0.0027 | | | 1.498 | | |
| IV | log(KM) | 13.2588 | 0.8236 | 16.098 | 0.0000 | 97.88 | 96.70 | | 93.18 | 92.66 | |
| | | D | −1.1379 | 0.06612 | −17.209 | 0.0000 | | | 1.365 | | |
| | | N | −0.9961 | 0.07256 | −13.729 | 0.0000 | | | 1.932 | | |
| | | W | 1.0535 | 0.08387 | 12.561 | 0.0000 | | | 1.948 | | |
| | | L | −0.2347 | 0.03077 | −7.628 | 0.0002 | | | 2.140 | | |
| | | A | −0.09888 | 0.02288 | −4.321 | 0.0019 | | | 1.093 | | |
| V | log(kcat/KM) | −11.0119 | 1.5657 | −7.052 | 0.0001 | 97.77 | 96.29 | | 88.86 | 88.06 | |
| | | A | −0.5525 | 0.05736 | 9.632 | 0.0000 | | | 1.705 | | |
| | | H | −1.2042 | 0.1817 | −6.626 | 0.0001 | | | 2.082 | | |
| | | R | 1.1894 | 0.1006 | 11.829 | 0.0000 | | | 2.373 | | |
| | | G | 0.6911 | 0.09445 | 7.317 | 0.0000 | | | 2.520 | | |
| | | Q | −0.5142 | 0.1009 | −5.098 | 0.0006 | | | 1.672 | | |
| | | N | 0.4252 | 0.1246 | 3.412 | 0.0077 | | | 2.176 | | |
| VI | log(kcat/KM) | 9.4887 | 0.8188 | 11.589 | 0.0000 | 97.69 | 96.15 | | 88.86 | 88.07 | |
| | | L | −0.4399 | 0.05548 | −7.929 | 0.0000 | | | 1.902 | | |
| | | T | −0.9367 | 0.07023 | −13.338 | 0.0000 | | | 3.267 | | |
| | | N | 1.1552 | 0.1032 | 11.194 | 0.0000 | | | 1.437 | | |
| | | W | −1.0394 | 0.2182 | −5.012 | 0.0007 | | | 3.420 | | |
| | | Q | −0.3207 | 0.1191 | −2.692 | 0.0247 | | | 2.244 | | |
| | | F | −0.2690 | 0.09349 | −2.877 | 0.0183 | | | 1.349 | | |
| VII | log(kcat/KM) | 2.5597 | 0.8288 | 3.088 | 0.0115 | 97.00 | 96.50 | | 90.44 | 89.77 | |
| | | T | −0.8156 | 0.06297 | −12.953 | 0.0000 | | | 2.249 | | |
| | | Q | −0.7700 | 0.1050 | −7.331 | 0.0000 | | | 1.495 | | |
| | | C | 2.4452 | 0.2845 | 8.593 | 0.0000 | | | 3.581 | | |
| | | N | 0.5745 | 0.1162 | 4.943 | 0.0006 | | | 1.561 | | |
| A | 0.2605 | 0.06600 | 3.946 | 0.0027 | 2.027 |
Elements and the statistical indices for multiple linear regression models which link the log-values of kinetic constants and the AAC of the yeast S. cerevisiae enzyme sequences.
a Elements of multiple linear regression which represent the frequencies of AA (a single letter code) occurrence in the yeast S. cerevisiae enzyme sequences and the constant (intercept) of equation.
b The variance inflation factor which indicates the impact of multicollinearity between the independent variables [22].
c Obtained by the LOOCV [21] of models.
Figure 4Linear plots of the actual kinetic constants against those predicted by linear regression models. The observed versus predicted plots (A,C,E) for the values of dependent variables log(kcat), log(KM), and log(kcat/KM), respectively. The predicted values were calculated from the statistically robust model equations as specified in Table 1, including those obtained by the LOOCV of models (B,D,F).
Figure 5Different contributions of the selected and non-selected AA into the properties of enzyme sequences. The Box-and-Whisker plot of the average AA property estimates for the selected/non-selected groups of independent variables in respect of the kcat/KM regression models (models V and VI, Table 1). The upper and lower bounds of the bars represent maximum and minimum values of estimates, the upper and lower bounds of each box represent the upper and lower quartiles of estimations and the lines in the middle of each box represent the median values. The effects of group selection and all pair differences between the groups are significant (Friedman ANOVA and Wilcoxon signed rank tests, respectively).
Figure 6Adjusted coefficients of determination for the multiple regression models which represent the full AA sequences (filled bars) and those which do not take into account the quantities of catalytic and binding residues in the active sites of enzymes and formed by the recalculated AA frequencies (open bars). Both model types contain the same independent variables.