| Literature DB >> 19176548 |
Peter Gennemark1, Dag Wedelin.
Abstract
MOTIVATION: In recent years, the biological literature has seen a significant increase of reported methods for identifying both structure and parameters of ordinary differential equations (ODEs) from time series data. A natural way to evaluate the performance of such methods is to try them on a sufficient number of realistic test cases. However, weak practices in specifying identification problems and lack of commonly accepted benchmark problems makes it difficult to evaluate and compare different methods.Entities:
Mesh:
Year: 2009 PMID: 19176548 PMCID: PMC2654804 DOI: 10.1093/bioinformatics/btp050
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Benchmark problems for ODE system identification
| Problem name | Original problem reference | Source system/Model | #var | #exp | #pts | Noise (%) |
|---|---|---|---|---|---|---|
| simpleLin1 | Simple linear system | 3 | 3 | 13 | 0 | |
| simpleLin2 | 8 | 13 | 10 | |||
| simpleFb1 | Simple feedback loop (McKinney | 3 | 4 | 7 | 0 | |
| simpleFb2 | 4 | 7 | 5 | |||
| simpleFb3 | 1 | 7 | 0 | |||
| simpleFb4 | (McKinney | 1 | 7 | ≈5 | ||
| osc1 | An oscillator (Karnaukhov | 3 | 1 | 41 | 0 | |
| osc2 | (Karnaukhov | 1 | 41 | 3 | ||
| metabol1 | (Gennemark and Wedelin, | A metabolic pathway (Arkin and Ross, | 5 | 12 | 7 | 0 |
| metabol2 | (Gennemark and Wedelin, | 12 | 21 | 10 | ||
| metabol3 | (Gennemark and Wedelin, | 12 | 21 | 20 | ||
| 3genes1 | A 3-step gene network (Moles | 8 | 16 | 21 | 0 | |
| ss_cascade1 | (Tsai and Wang, | A cascaded pathway (Voit, | 3 | 8 | 41 | 0 |
| ss_cascade2 | 4 | 41 | 0 | |||
| ss_cascade3 | 8 | 41 | 5 | |||
| ss_branch1 | (Voit and Almeida, | A branched pathway (Voit, | 4 | 3 | 21 | 0 |
| ss_branch2 | (Marino and Voit, | 6 | 51 | 0 | ||
| ss_branch3 | (Tucker and Moulton, | 5 | 20 | 0 | ||
| ss_branch4 | (Kutalik | 4 | 20 | 0 | ||
| ss_branch5 | (Kutalik | 4 | 20 | 2.5 | ||
| ss_branch6 | (Gonzalez | 5 | 31 | 0 | ||
| ss_5genes1 | (Kikuchi | A genetic network (Hlavacek and Savageau, | 5 | 10 | 11 | 0 |
| ss_5genes2 | (Gennemark and Wedelin, | 10 | 9 | 20 | ||
| ss_5genes3 | (Gennemark and Wedelin, | 10 | 3 | 0 | ||
| ss_5genes4 | (Kimura | 15 | 11 | 0 | ||
| ss_5genes5 | (Daisuke and Horton, | 10 | 11 | 0 | ||
| ss_5genes6 | (Cho | 1 | 16 | 0 | ||
| ss_5genes7 | (Tucker and Moulton, | 10 | 20 | 0 | ||
| ss_5genes8 | (Tsai and Wang, | 8 | 41 | 0 | ||
| (Liu and Wang, | ||||||
| ss_15genes1 | A genetic network (Maki | 15 | 10 | 11 | 0 | |
| ss_15genes2 | 20 | 11 | 10 | |||
| ss_30genes1 | A genetic network (Maki | 30 | 15 | 11 | 0 | |
| ss_30genes2 | (Kimura | 20 | 11 | 10 | ||
| ss_30genes3 | (Liu and Wang, | 8 | 41 | 0 | ||
| cytokine1 | (McKinney | Immunologic data (Rock | 4 | 1 | 7 | 10 |
| cytokine2 | 1 | 7 | 10 | |||
| ss_ethanolferm1 | (Liu and Wang, | Ethanol fermentation (Wang | 4 | 2 | 11–15 | ≈30 |
| ss_ethanolferm2 | 3 | 11–19 | ≈30 | |||
| ss_sosrepair1 | (Cho | SOS repair system Escherichia coli (Ronen | 6 | 1 | 50 | 10 |
| ss_sosrepair2 | 1 | 50 | 10 | |||
| ss_cadBA1 | (Gonzalez | cadBA network in | 5 | 1 | 25 | <20 |
| ss_cadBA2 | 1 | 25 | <20 | |||
| ss_clock1 | (Daisuke and Horton, | Mice cell cycle (Barrett | 7 | 1 | 12 | ≈10 |
| ss_clock2 | 1 | 12 | ≈10 |
aEstimate from or assumption about data. See web site for further information.
#var, number of dependent variables; #exp, number of experimental conditions with different initial conditions and/or input functions; #pts, number of data-points per time series.
Noise is added from a Gaussian distribution with SD given as a certain percentage (denoted Noise) of each experimental value. Problem names starting with ‘ss_’ correspond to S-systems. The last section lists problems with real data from biological systems.
Fig. 1.Identification of ODE systems. (A) An identification problem can be specified with real or simulated data from one or several experiments, a model space of allowed reactions occurring on the right-hand side of the ODEs, an initial model representing prior knowledge and an error function. (B) An example of an identification problem with a model space of three traditional chemical reaction types, an error function where L=likelihood function, λ K=structural complexity term (λ=constant and K=number of model parameters) and an initial model with no prior information. (C) An example of a solution model.
Best known solutions to the benchmark problems
| Problem | Error of source model − | Best known solution | Solution of original problem (if available) | |||||
|---|---|---|---|---|---|---|---|---|
| Error − | Residual − | Structure FP/FN/LD | Stability between runs | Average time (s) (1 GHz) | Structure FP/FN/LD | Time (s) (1 GHz) | ||
| simpleLin1 | 8 | 8 | 0 | 0/0/0 | 5/5 | 17 | ||
| simpleLin2 | 187.4 | 140.2 | 132.2 | 0/0/– | 5/5 | 86 | ||
| simpleFb1 | 7 | 7 | 0 | 0/0/0 | 5/5 | 14 | ||
| simpleFb2 | 45.46 | 42.14 | 34.14 | 1/1/– | 3/5 | 36 | ||
| simpleFb3 | 7 | 7 | 0 | 0/0/0 | 3/5 | 41 | ||
| simpleFb4 | 18.39 | 7.071 | 0.0713 | 0/0/– | 2/5 | 47 | 0/0/– | – |
| osc1 | 6 | 6.149 | 0.1491 | 0/0/0 | 5/5 | 18 | ||
| osc2 | 122.1 | 87.34 | 57.34 | 0/0/0 | 4/5 | 24 | – | – |
| metabol1 | 30 | 30 | 0 | 0/0/0 | 3/3 | 5400 | ||
| metabol2 | 1214 | 751.6 | 601.6 | 0/0/– | 2/3 | 11 000 | ||
| metabol3 | 1442 | 861.1 | 696.1 | 2/1/– | 1/5 | 13 000 | ||
| 3genes1 | 39 | 39 | 0 | 0/0/0 | 3/3 | 35 000 | ||
| ss_cascade1 | 14 | 14 | 0 | 0/0/0 | 5/5 | 180 | 0/0/300 | 8900 |
| ss_cascade2 | 14 | 14 | 0 | 0/0/0 | 4/5 | 130 | ||
| ss_cascade3 | 498.9 | 476.7 | 462.7 | 1/1/– | 1/5 | 860 | ||
| ss_branch1 | 17 | 17 | 0 | 0/0/0 | 5/5 | 68 | 0/0/15 | 2200 |
| ss_branch2 | 17 | 17 | 0 | 0/0/0 | 5/5 | 110 | 0/0/5 | 35 000 |
| ss_branch3 | 17 | 17 | 0 | 0/0/0 | 5/5 | 150 | 0/0/2 | 15 000 |
| ss_branch4 | 17 | 17 | 0 | 0/0/0 | 5/5 | 71 | 0/0/0 | 25 000 |
| ss_branch5 | 211.3 | 142.1 | 122.1 | 4/1/– | 3/5 | 300 | – | – |
| ss_branch6 | 18 | 18 | 0 | 0/0/0 | 5/5 | 90 | 0/0/5 | 1100 |
| ss_5genes1 | 23 | 23 | 0 | 0/0/0 | 5/5 | 600 | 1/0/– | 2.4E+8 |
| ss_5genes2 | 300.3 | 211.0 | 183.0 | 5/0/– | 2/5 | 2100 | ||
| ss_5genes3 | 23 | 23 | 0 | 0/0/0 | 5/5 | 380 | ||
| ss_5genes4 | 23 | 23 | 0 | 0/0/0 | 5/5 | 640 | 37/0/– | 40 000 |
| ss_5genes5 | 23 | 23.00 | 1.14E−3 | 0/0/0.02 | 5/5 | 400 | 4/2/– | 7200 |
| ss_5genes6 | 23 | 23 | 0 | 0/0/0 | 5/5 | 130 | 0/0/50 | 56 000 |
| ss_5genes7 | 23 | 23 | 0 | 0/0/0 | 5/5 | 190 | 0/0/0.2 | 15 000 |
| ss_5genes8 | 28 | 28 | 0 | 0/0/0 | 5/5 | 440 | 0/0/5 | 26 000 |
| 0/0/2.5 | 7000 | |||||||
| ss_15genes1 | 62 | 62 | 0 | 0/0/0 | 5/5 | 1500 | ||
| ss_15genes2 | 2010 | 1783 | 1478 | 3/4/– | 2/5 | 18 000 | ||
| ss_30genes1 | 128 | 128 | 0 | 0/0/0 | 5/5 | 7900 | ||
| ss_30genes2 | 4098 | 3628 | 2993 | 6/7/– | 1/3 | 94 000 | 242/10/– | 6.2E+6 |
| ss_30genes3 | 128 | 128 | 0 | 0/0/0 | 3/3 | 18 000 | 0/0/25 | 8.6E+6 |
| cytokine1 | 25.92 | 17.92 | 1/5 | 11 | – | – | ||
| cytokine2 | 42.17 | 32.17 | 1/5 | 23 | ||||
| ss_ethanolferm1 | 127.4 | 110.4 | 1/5 | 190 | err>127.4 | – | ||
| ss_ethanolferm2 | 1308 | 1292 | 1/5 | 360 | ||||
| ss_sosrepair1 | 2642 | 2611 | 1/5 | 510 | err>2642 | 2.7E+5 | ||
| ss_sosrepair2 | 2823 | 2789 | 1/5 | 470 | ||||
| ss_cadBA1 | 750.6 | 726.6 | 1/5 | 250 | err>750.6 | >540 | ||
| ss_cadBA2 | 709.1 | 687.1 | 1/5 | 260 | ||||
| ss_clock1 | 928.4 | 803.4 | 1/5 | 360 | – | 15 000 | ||
| ss_clock2 | 814.5 | 649.5 | 1/5 | 440 | ||||
aBest known solutions, based on comparsion to the solutions of the original problems. For ss_30genes2, the similarity with the source model strongly indicates that our solution is the best, but since we do not have access to the solution of the original problem this is not confirmed.
b The majority of runs have an error below the error of the source model, but differ slightly in structure.
cReal datasets with relatively few experiments and/or data-points making many similar models reasonable, increasing the variability in found solutions.
dSince no source model is available, we have evaluated the original solution with our error function, and it has an error higher than our solution.
The error of the source model refers to the error of the model from which data was simulated. The best result is taken from several runs with various random seeds. The error, the negative log-likelihood and the number of false positives and negative (FP/FN) reactions are reported. Stability measures the frequency of runs for which the best structure is obtained. Computation time in seconds is given as the average of several runs.