| Literature DB >> 15367330 |
Siren R Veflingstad1, Jonas Almeida, Eberhard O Voit.
Abstract
BACKGROUND: Dense time series of metabolite concentrations or of the expression patterns of proteins may be available in the near future as a result of the rapid development of novel, high-throughput experimental techniques. Such time series implicitly contain valuable information about the connectivity and regulatory structure of the underlying metabolic or proteomic networks. The extraction of this information is a challenging task because it usually requires nonlinear estimation methods that involve iterative search algorithms. Priming these algorithms with high-quality initial guesses can greatly accelerate the search process. In this article, we propose to obtain such guesses by preprocessing the temporal profile data and fitting them preliminarily by multivariate linear regression.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15367330 PMCID: PMC522751 DOI: 10.1186/1742-4682-1-8
Source DB: PubMed Journal: Theor Biol Med Model ISSN: 1742-4682 Impact factor: 2.432
Transformation of data for regression analysis
| A. Absolute deviation from a reference state | ||
| B. Relative deviation from a reference state | ||
| C. Lotka-Volterra system |
We assume the general linear model is y = a+ Σ(ax). The Xdenote experimental time series data for metabolite i, while the slopes () are estimated from the smooth output functions of the artificial neural network that had been trained on the experimental data. Subscript r denotes the value of the metabolite at a reference state. Linearization options I and II are included in transformations A and B respectively, assuming that the reference state is a steady state. For a piecewise linear linearization (option III), the data may be transformed following either A or B.
Figure 1Test System. a) Gene network [42] used as test system for illustrating the proposed methods. Solid arrows represent material flow, while dashed arrows indicate regulatory signals that either activate (+) or inhibit (-) a process. The network contains two genes, Gene 1 and 2. X1 is the mRNA produced from gene 1, X2 is the enzyme for which the gene codes, and X3 is an inducer protein catalyzed by X2. X4 is the mRNA produced from Gene 2 and X5 is a regulator protein for which the gene codes. Positive feedback from X3 and negative feedback from X5 are assumed in the production of mRNAs from the two genes. b) S-system model of the gene network, according to Hlavacek and Savageau [42] and Kikuchi et al. [21].
Figure 2Dynamic response of the network after a perturbation in The response is shown as relative deviation from steady state. The guidelines proposed by Vance et al. [8] indicate that X1 and X4 precede X2 and X5 because they reach their maximum deviation earlier and the maximal values are larger than those of X2 and X5. All variables respond in a positive manner, which implies either a mass transfer or positive modulation (activation). The system determined from this analysis is essentially the same as in Figure 1a. The only relationship missed is the effect of X2 on the production and degradation of X3.
Comparison of computed and estimated coefficients
| 0 | 0.0000 | |
| -14.6780 | -14.3647 | |
| 0 | -0.1466 | |
| 7.3390 | 7.3414 | |
| 0 | -0.2165 | |
| -7.3390 | -7.1723 | |
| 0 | 0.0000 | |
| 14.6780 | 14.6119 | |
| -14.6780 | -14.6540 | |
| 0 | -0.0009 | |
| 0 | 0.0494 | |
| 0 | -0.0309 | |
| 0 | 0.0000 | |
| 0 | -2.3527 | |
| 0 | 1.3989 | |
| -27.2517 | -27.9204 | |
| 0 | 1.7491 | |
| 0 | -0.9955 | |
| 0 | 0.0000 | |
| 0 | 2.0843 | |
| 0 | -1.0925 | |
| 18.5664 | 19.0295 | |
| -18.5664 | -20.2112 | |
| -9.2832 | -8.3594 | |
| 0 | 0.0000 | |
| 0 | -0.4026 | |
| 0 | 0.1384 | |
| 0 | -0.0059 | |
| 18.5664 | 18.8987 | |
| -18.5664 | -18.7852 |
Regression coefficients for the small gene network (Figure 1), linearized about the steady state and based on relative deviations (option II). The first and second columns contain the computed and estimated regression coefficients, respectively. The regression coefficients arefer to the influence of variable j on variable i, while ais the constant term in each regression model. As the table indicates, the correspondence is good, except for the coefficients relating to X3 and X4 (see Text for explanation). The dataset consisted of 401 data points in the interval [0,4] and resulted from a simulation in which X3 was perturbed at t = 0 to a value 5% above its steady-state value.
Comparison of the different linearization options (I, II and IV)
| 0.0000 | 0.0000 | 14.4748 | |
| -14.3647 | -14.3647 | -18.9581 | |
| -0.1466 | -0.1466 | -0.6836 | |
| 5.3878 | 7.3414 | 7.3367 | |
| -0.1712 | -0.2165 | -0.4694 | |
| -5.6702 | -7.1723 | -7.4981 | |
| 0.0000 | 0.0000 | 0.0144 | |
| 14.6119 | 14.6119 | 19.8910 | |
| -14.6540 | -14.6540 | -19.9277 | |
| -0.0006 | -0.0009 | -0.0001 | |
| 0.0390 | 0.0494 | 0.0472 | |
| -0.0245 | -0.0309 | -0.0335 | |
| 0.0000 | 0.0000 | 26.4020 | |
| -3.2058 | -2.3527 | 2.8725 | |
| 1.9062 | 1.3989 | -1.7989 | |
| -27.9204 | -27.9204 | -26.6164 | |
| 1.8842 | 1.7491 | -1.5871 | |
| -1.0724 | -0.9955 | 0.9692 | |
| 0.0000 | 0.0000 | 8.0270 | |
| 2.6365 | 2.0843 | 6.3364 | |
| -1.3820 | -1.0925 | -4.1579 | |
| 17.6654 | 19.0295 | 19.0005 | |
| -20.2112 | -20.2112 | -23.1319 | |
| -8.3594 | -8.3594 | -7.7047 | |
| 0.0000 | 0.0000 | 0.0869 | |
| -0.5092 | -0.4026 | -0.6617 | |
| 0.1751 | 0.1384 | 0.4441 | |
| -0.0055 | -0.0059 | -0.0003 | |
| 18.8987 | 18.8987 | 20.2939 | |
| -18.7852 | -18.7852 | -20.2152 |
Estimated coefficients for three of the linearization approaches: absolute deviation from steady state (left column), relative deviation from steady state (center column) and Lotka-Volterra linearization (right column). The dataset consisted of 401 data points in the interval [0,4] and resulted from a simulation in which X3 was perturbed at t = 0 to a value 5% above its steady-state value.
The effect of the size of the perturbation
| 0 | 0.0000 | 0.0000 | 0.0001 | 0.0008 | |
| -14.6780 | -14.3647 | -14.1817 | -13.1496 | -11.3439 | |
| 0 | -0.1466 | -0.1429 | -0.0671 | 0.5735 | |
| 7.3390 | 7.3414 | 7.3438 | 7.3598 | 7.3735 | |
| 0 | -0.2165 | -0.3673 | -1.2462 | -2.7619 | |
| -7.3390 | -7.1723 | -7.0780 | -6.4846 | -5.2501 | |
| 0 | 0.0000 | 0.0000 | 0.0000 | -0.0003 | |
| 14.6780 | 14.6119 | 14.5748 | 14.4207 | 14.5029 | |
| -14.6780 | -14.6540 | -14.6623 | -14.7503 | -15.1862 | |
| 0 | -0.0009 | -0.0016 | -0.0054 | -0.0070 | |
| 0 | 0.0494 | 0.0839 | 0.2494 | 0.3462 | |
| 0 | -0.0309 | -0.0464 | -0.1119 | -0.0951 | |
| 0 | 0.0000 | 0.0000 | 0.0004 | 0.0038 | |
| 0 | -2.3527 | -4.5412 | -18.2307 | -46.8953 | |
| 0 | 1.3989 | 2.6336 | 9.8422 | 24.4004 | |
| -27.2517 | -27.9204 | -28.5955 | -34.0204 | -54.4047 | |
| 0 | 1.7491 | 3.4009 | 14.0961 | 39.3252 | |
| 0 | -0.9955 | -1.8949 | -7.0627 | -15.4759 | |
| 0 | 0.0000 | 0.0000 | -0.0001 | 0.0001 | |
| 0 | 2.0843 | 3.7814 | 14.7316 | 41.5863 | |
| 0 | -1.0925 | -1.7693 | -5.5766 | -13.2688 | |
| 18.5664 | 19.0295 | 19.4964 | 23.2397 | 37.1866 | |
| -18.5664 | -20.2112 | -21.6608 | -31.4631 | -58.1065 | |
| -9.2832 | -8.3594 | -7.6404 | -3.2226 | 6.5808 | |
| 0 | 0.0000 | 0.0000 | -0.0001 | -0.0015 | |
| 0 | -0.4026 | -0.6581 | -2.5848 | -10.1097 | |
| 0 | 0.1384 | 0.0830 | -0.1317 | 0.1582 | |
| 0 | -0.0059 | -0.0110 | -0.0435 | -0.0879 | |
| 18.5664 | 18.8987 | 19.1602 | 21.0620 | 27.2722 | |
| -18.5664 | -18.7852 | -18.9201 | -20.0013 | -24.0836 |
Overall, the estimated coefficients deviate more strongly from the corresponding computed values as the perturbation increases. However, there are substantial differences between variables. The coefficients associated with variable X2, for example, are hardly influenced, while the coefficients associated with X3 are strongly affected. Overall, the method seems to produce the best results for perturbation up to 10%. The datasets for the regression consisted of 401 data points in the interval [0,4] and the method of linearization was option II.
Results for piecewise linear regression
| 0.1315 | -0.0419 | 0.0000 | |
| -42.3980 | -14.1738 | -14.5490 | |
| 0.0000 | -0.8010 | -0.0464 | |
| 8.9105 | 7.3653 | 7.6299 | |
| 12.7757 | -0.3340 | -0.1386 | |
| -3.3476 | -6.9121 | -7.2940 | |
| 0.0567 | -0.0197 | 0.0000 | |
| -1.1939 | 14.4913 | 14.6792 | |
| -32.3300 | -14.5116 | -14.6784 | |
| 0.6133 | 0.0057 | -0.0205 | |
| 7.0917 | 0.1016 | -0.0018 | |
| 7.9313 | -0.1047 | 0.0067 | |
| -0.7858 | -0.0181 | 0.0000 | |
| -130.3724 | -0.2358 | 0.0021 | |
| 0.0000 | 0.3616 | -0.0007 | |
| -20.7724 | -27.6129 | -27.2551 | |
| 62.1525 | 0.3496 | -0.0027 | |
| 19.1470 | -0.1984 | 0.0006 | |
| 0.3164 | -0.0709 | 0.0000 | |
| -13.6819 | 1.1412 | -0.0115 | |
| 0.0000 | -2.1478 | 0.0015 | |
| 19.8295 | 18.8534 | 18.6927 | |
| -13.3654 | -19.5811 | -18.5494 | |
| -7.2135 | -8.0985 | -9.2792 | |
| 0.1617 | -0.0393 | 0.0000 | |
| -149.5199 | -0.8195 | 0.0250 | |
| -160.3341 | 0.8175 | -0.0074 | |
| 5.7537 | 0.0580 | -0.0304 | |
| 85.3050 | 19.0394 | 18.5356 | |
| 53.9745 | -19.1183 | -18.5623 |
The complete dataset is divided into three subsets for each variable, where the first and second extreme values serve as breakpoints. The datasets for the regression consisted of 401 data points in the interval [0,4] and resulted from a simulation in which X3 was perturbed at t = 0 to a value 5% above its steady-state value.
Collective inference of the gene network based on results from all linearizations
| X1 | |||||
| X2 | |||||
| X3 | ? | ? | ? | ? | |
| X4 | + (67 %) | - (67 %) | |||
| X5 | - (83 %) |
Each minus sign implies a negative influence; a plus sign implies a positive influence, while zero implies no influence. Bold symbols denote correctly identified interactions, and numbers in parentheses give the fraction of models that suggested positive identification. Question marks imply that no type of interaction was identified in more than 50% of the models.