Literature DB >> 30624608

Benchmark problems for dynamic modeling of intracellular processes.

Helge Hass^1,2, Carolin Loos^3,4, Elba Raimúndez-Álvarez^3,4, Jens Timmer^1,2,5,6, Jan Hasenauer^3,4, Clemens Kreutz^1,2,5.

Abstract

MOTIVATION: Dynamic models are used in systems biology to study and understand cellular processes like gene regulation or signal transduction. Frequently, ordinary differential equation (ODE) models are used to model the time and dose dependency of the abundances of molecular compounds as well as interactions and translocations. A multitude of computational approaches, e.g. for parameter estimation or uncertainty analysis have been developed within recent years. However, many of these approaches lack proper testing in application settings because a comprehensive set of benchmark problems is yet missing.
RESULTS: We present a collection of 20 benchmark problems in order to evaluate new and existing methodologies, where an ODE model with corresponding experimental data is referred to as problem. In addition to the equations of the dynamical system, the benchmark collection provides observation functions as well as assumptions about measurement noise distributions and parameters. The presented benchmark models comprise problems of different size, complexity and numerical demands. Important characteristics of the models and methodological requirements are summarized, estimated parameters are provided, and some example studies were performed for illustrating the capabilities of the presented benchmark collection.
AVAILABILITY AND IMPLEMENTATION: The models are provided in several standardized formats, including an easy-to-use human readable form and machine-readable SBML files. The data is provided as Excel sheets. All files are available at https://github.com/Benchmarking-Initiative/Benchmark-Models, including step-by-step explanations and MATLAB code to process and simulate the models. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 30624608 PMCID： PMC6735869 DOI： 10.1093/bioinformatics/btz020

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Dynamic models based on ordinary differential equations (ODEs) have become a widely used tool in systems biology to quantitatively describe regulatory processes in living cells. Within this approach, known biochemical interactions of important compounds can be translated into rate equations describing the temporal evolution of the state of biological processes. Experimental data is then used to estimate parameters like rate constants or initial concentrations and to validate or improve the model structure. The dimensionality and nonlinearity of these models constitute a challenge for numerical and statistical methods regarding parameter estimation and identification of the most plausible model structure. For that reason, a multitude of new modeling techniques have been developed within recent years. However, they are often not well-tested in realistic application settings and, therefore, their performance benefits or limitations are unknown (Degasperi ; Hug ; Lillacci and Khammash, 2010; Maier ; Raue ; Stapor ; Vyshemirsky and Girolami, 2008). Since the performance of computational approaches depends on model characteristics such as nonlinearity, number of parameters or amount of experimental data, it is essential to have a reasonably large set of benchmark problems that covers these characteristics. Only if the collection of benchmark problems is representative, the results of performance studies will generalize to new modeling projects. One frequent limitation is that realistic measurements are typically not available for evaluations. Simulated data, as an example, is often much more informative in terms of number of data points (Tönsing ) and does not have a complex noise structure (Villaverde ) like measurements from living cells. Moreover, in most cases experimental measurements require augmenting the equations of the dynamic model with so-called observation functions containing scalings- and/or offset parameters, together with transformations of the data such as a log-transformation. Benchmark collections are used in many scientific fields, but there are currently only a limited number of benchmark problems for modeling intracellular processes and they cover only a small set of application setups: (i) Six benchmark models have been published by Villaverde , however for most of them, only simulated data are provided. For the models with experimental data, one has less data points than parameters, and the other provides its equations only in a compiled version, which limits their use for model evaluation. (ii) Additional benchmark problems were defined within the DREAM6 (Dialogue on Reverse-Engineering Assessment and Methods) and DREAM7 challenges. However, both challenges only had simulated data available because the models do not represent real biological networks occurring in specific living cells. In addition, abundances of the molecular compounds were assumed as known initial values and the dynamic variables were assumed as directly measured without observation functions which renders these problems as rather unrealistic. (iii) Public repositories, e.g. the Biomodels database (Le Novere ) provide a large number of realistic/published models. Unfortunately, for most models the measured data used for calibration is not or only partly provided. Moreover, if the data is published, the description of the link between model and data is often not sufficient, i.e. the noise model and observation functions are not comprehensively defined as required for a non-ambiguous benchmark problem. One major reason for this might be that current standards for defining models like the Systems Biology Markup Language (SBML) (Hucka ) only comprise the biological part of the model but do not contain equations for observations and noise models used to estimate parameters. Standards for the encoding of experimental descriptions, such as the Simulation Experiment Description Markup Language (SED-ML) (Waltemath ), are unfortunately not yet used widely and only supported by a fraction of the available tools. In this manuscript, 20 models of biochemical reaction networks which should serve as a comprehensive set of benchmark problems enabling testing of a multitude of data-based modeling approaches are presented. The models have different complexity ranging from 9 to 269 parameters. All models comprise measured data (21 to 27 132 data points per model). We also provide measurement errors either determined experimentally or from an underlying error model.

2 Methodology

2.1 Pathway models

Biochemical reaction networks can be modeled using reaction rate equations, which describe the dynamics of compound concentrations as a function of parameters θ (Section 2.3) and inputs (Section 2.4). The initial values x(0) of Eq. (1) might be known. However, in most applications some elements of x(0) are unknown and defined as parameters, i.e. , or functions of parameters, i.e. . Mathematically, we distinguish between three classes: The initial conditions might be known or given, e.g. zero before treatment. The initial conditions might be analytical functions of the parameters, e.g. analytical solutions to a steady-state constraint (Rosenblatt ). The initial conditions might be non-analytical expressions of the parameters, e.g. the result of a pre-simulation of an experimental condition (Fiedler ; Rosenblatt ). For a detailed discussion we refer to Rosenblatt and Fiedler .

2.2 Measurement errors

The state variables of reaction rate equations are linked to measurements via observation functions , which describe the properties of the experimental device/technique used to acquire measurement data. The observation functions might be nonlinear functions of the state variables, e.g. if the readout saturates, for considering detection limits, and comprise scalings (Loos ). For all presented benchmark models, independent normally distributed, additive errors are assumed for the measurements Note that in the chosen notation, index i enumerates each observation/data point y at a specific time point and each corresponding standard deviation σ of the measurement error individually. We consider two broad classes of error models: The standard deviation σ of measurement errors might be determined as part of the experiment and processing of raw data, e.g. by computing standard errors across replicates. In this case, each data point y has a given, fixed value σ specifying the accuracy of the measurement. Standard deviations might be unknown and therefore described as error models with error parameters which might be jointly estimated with other model parameters. The function can depend on parameters, state variables or both. While class 1 yields a parameters estimation problem with fewer parameters, class 2 does not require the calculation of σ from a potentially small number of replicates and the statistical model accounts for imperfect knowledge of σ (Raue ). An error model E describes the dependence of the standard deviation of an observation on the error parameters and the state variables x, . The most basic parameter-dependent error models are unknown standard deviations for the individual observations, , or sets of observations I, , i.e. Parameter- and state-dependent error models are for instance and with two parameters for absolute or relative noise levels. is obtained if relative and absolute errors are assumed as two independent sources of variability. is a phenomenological model which often realistically describes absolute and relative components of observed measurement errors.

2.3 Parameters

Dynamic models in systems biology comprise up to three classes of parameters: Dynamic parameters that determine the initial states x(0) and the dynamics of the process, see Eq. (). These parameters are rate constants such as association/dissociation rates or -constants, translocation rates between intra- or extracellular compartments, or parameters like Michaelis-Menten- and Hill-coefficients, efficiencies of genetic perturbations or parameters of input functions. We note that the dynamic parameters do not change over time, although the name might suggest otherwise. Observation parameters that describe the relationship between concentrations of intracellular compounds with outputs, e.g. intensities in an assay. These parameters are for example scaling factors or offsets (Weber ). Error parameters that describe the unknown noise levels (see Section 2.2). Since dynamic parameters depend on the biological context and observation- and error parameters are determined by the experimental setup, there is often only a limited amount of prior knowledge about parameters available. For the benchmark models, upper- and lower bounds are defined for all parameters. In most cases, these bounds cover eight orders of magnitude or even more. In some cases, additional prior knowledge in terms of prior distributions or penalties is available for specific parameters. The parameters of the biological process are often transformed to improve the convergence of optimization (Raue ) and to eliminate structural non-identifiabilities (Maiwald ). A common practice is the transformation of the parameters from linear to logarithmic scale. However, there are also problem-specific transformations as described in the supplements of Bachmann and Becker .

2.4 Inputs

Inputs u describe the dependence of the biochemical reaction network on external factors as well as perturbations. Examples are externally controlled concentration of ligands or nutrients, or genetic perturbations like knockouts or overexpression. Time-dependent inputs are often parameterized functions such as polynomials, splines (Schelker ) or control vectors (Banga ). Parameter dependence of inputs is in the following indicated by writing .

3 Model and data formats

For a thorough evaluation of computational methods, we provide a set of 20 published models and their corresponding datasets. The models have been extracted from the literature and have been developed by more than 10 different research groups. The information is stored in an easily accessible and standardized format, including an Excel file with general specifications of the model and its fit results. Measurements and model equations are stored as separate Excel files and for each experiment individually. In the data files, measurements with uncertainties and results from the corresponding model simulations are stored. The model files contain finalized ODEs including experiment-specific parameter assignments and observation functions, and are provided as user-readable Excel file and in the standardized, machine-readable SBML standard (Hucka ). For a detailed description of the provided files, we refer to Supplementary Section S1.

4 Results

4.1 Benchmark collection

The main focus of this paper is to introduce a comprehensive collection of benchmark problems and their formulation in a standardized format. A comprehensive overview of the benchmark problems is provided in Table 1.

Table 1.

Table summarizing the 20 benchmark models and their properties

Name	Description	Biochemical species	Observables	Data points	Experimental conditions	Parameters	Features
Bachmann	The model by Bachmann et al. (2011) describes JAK2/STAT5 regulation via two transcriptional negative feedbacks, CIS and SOCS3 in murin blood forming cells	25	11	542	23	113	C, E(1), NI, x0θ
Becker	The model by Becker et al. (2010) shows that rapid EpoR turnover and large intracellular receptor pools enables linear ligand response.	6	4	85	13	16	E(1), x0(θ)
Beer	The model by Beer et al. (2014) uses Escherichia coli as chassis to demonstrate heterologous T domain exchange in non-ribosomal peptide synthetases (NRPSs).	4	2	27 132	19	72	E(1), ev, NI, u(t,θ), x0θ
Boehm	The model by Boehm et al. (2014) evaluates possible homo- and heterodimerization of the transcription factor isoforms STAT5A and STAT5B.	8	3	48	1	9	C, E(1), u(t,θ), x0(θ)
Brannmark	The model by Brännmark et al. (2010) describes insulin signaling in adipocytes.	9	2	43	8	22	E(1), ev, NI, u(t), x0SSpre(θ)
Bruno	The model by Bruno et al. (2016) investigates the activity of Arabidopsis carotenoid cleavage dioxygenase 4 (AtCCD4) as regulator of carotenoid of seeds.	7	6	77	6	13	Ex, x0θ
Chen	The model by Chen et al. (2009) describes signaling in ErbB-activated MAPK and PI3k/Akt pathways, including seven receptor dimers and two ErbB ligands.	500	3	105	4	154	E(1), ev, NI, x0fix
Crauste	The model by Crauste et al. (2017) describes CD8 T cell differentiation after virus infection.	5	4	21	1	12	Ex, NI, x0fix
Fiedler	The model by Fiedler et al. (2016) describes Raf/MEK/ERK signaling in synchronized HeLa cells upon stimulation with MEK and ERK inhibitors.	6	2	72	3	19	E(1), NI, u(t,θ), x0(θ)
Fujita	The model by Fujita et al. (2010) describes the epidermal growth factor (EGF)-dependent Akt pathway in PC12 cells.	9	3	144	6	19	Ex, ev, NI, u(t,θ), x0θ
Hass	The model by Hass et al. (2017) establishes early Reelin-induced signaling and identifies Src family kinases (SFKs) as crucial part for Dab1 signaling.	9	6	221	17	49	Ex, ev, x0(θ)
Isensee	The model by Isensee et al. (2018) describes the Protein Kinase A (PKA)-II cycle in primary sensory neurons and its response to multiple stimuli, e.g. forskolin and cAMP analogues and is based on quantitative microscopy and Western blotting.	25	3	713	109	46	C, E(1), ev, NI, u(t,θ), x0SSpre(θ)
Lucarelli	The model by Lucarelli et al. (2018) describes activation of Smad proteins upon TGFβ stimulation, identifies the relevant complexes and linked them to target genes.	33	43	1755	12	84	E(1), Ex, NI, x0θ
Merkle	The model by Merkle et al. (2016) describes Epo-induced signaling simultaneously for CFU-E and H838 cells, with a parsimonious set of differing parameters.	23	22	1141	62	197	C, E(1), Ex, ev, NI, u(t), x0(θ)
Raia	The model by Raia et al. (2011) describes interleukin-13 (IL13)-induced activation of the JAK/STAT signaling pathway for B-cells and two lymphoma cell lines.	14	8	205	4	39	C, E(3), NI, x0θ
Schwen	The model by Schwen et al. (2015) describes binding of insulin to primary mouse hepatocytes based on flow cytometry and ELISA data.	11	4	292	7	30	E(1), NI, x0(θ)
Sobotta	The model by Sobotta et al. (2017) presents IL-6-induced JAK1-STAT3 signal transduction and expression of target genes in hepatocytes.	13	11	2220	110	260	C, E(1), ev, u(t,θ), x0SSpre(θ)
Swameye	The model by Swameye et al. (2003) demonstrates that rapid shuttling of STAT5 from the nucleus back to the cytoplasm following Epo stimulus is recognized as a remote sensor.	9	3	46	1	13	C, Ex, NI, u(t,θ), x0(θ)
Weber	The model by Weber et al. (2015) describes the interactions of PKD, PI4KIIIβ and CERT at the trans-Golgi network of mammalian cells.	7	8	135	3	36	E(1), ev, NI, u(t), x0SSpre(θ)
Zheng	The model is adapted from Zheng et al. (2012) and describes methylation at histone H3 lysines 27 and 36.	15	15	60	1	46	E(1), ev, NI, u(t,θ), x0SSpre(θ)

Note: The models are abbreviated with the last name of the first author. Many models are based on Western blot data. Number of parameters denotes unknown parameters that are estimated in the model. The number of experimental conditions is specified as the number of different simulation conditions. The feature abbreviations denote the following: C = several compartments, = constant error parameters, Eq. (3), = error model of Eq. (4), = error model of Eq. (5), Ex = known measurement errors, ev = events, NI = non-identifiable parameters, u(t) = time dependent input function, = input function with unknown parameter(s). Initial values are specified according to the following order: = known initial values, = initial condition given by unknown parameters, = parameter dependent functions and = pre-equilibration for initial steady state conditions. The models are described in more detail in Supplementary Section S4.

Table summarizing the 20 benchmark models and their properties Note: The models are abbreviated with the last name of the first author. Many models are based on Western blot data. Number of parameters denotes unknown parameters that are estimated in the model. The number of experimental conditions is specified as the number of different simulation conditions. The feature abbreviations denote the following: C = several compartments, = constant error parameters, Eq. (3), = error model of Eq. (4), = error model of Eq. (5), Ex = known measurement errors, ev = events, NI = non-identifiable parameters, u(t) = time dependent input function, = input function with unknown parameter(s). Initial values are specified according to the following order: = known initial values, = initial condition given by unknown parameters, = parameter dependent functions and = pre-equilibration for initial steady state conditions. The models are described in more detail in Supplementary Section S4. The benchmark problems cover a wide range of model and dataset sizes (Fig. 1A). Some of these properties are correlated, e.g. log-transformed numbers of data points and parameters (, P-value ). A local identifiability analysis (see Supplementary Section S12) using the identifiability test by radial penalization (ITRP) (Kreutz, 2018) revealed that most benchmark models possess non-identifiable parameters. Furthermore, we found that initial conditions are specified in multiple ways, e.g. as equilibrium points of an unperturbed condition, and that different types of noise models and input functions are used (Fig. 1B). This results in a large number of combinations which have to be covered by computational modeling tools.

Fig. 1.

Property distribution in the presented benchmark collection. (A) Histograms for numerical model properties: number of observables, conditions, data points and parameters. Properties of individual models are indicated with an overlayed parallel coordinate plot. The parallel coordinates facilitate highlight correlations: most lines are parallel positive correlations; most lines cross negative correlation. (B) Mosaic plot for the categoric model properties: initial conditions (indicated by columns), error models (indicated by colors) and input functions (indicated by saturation levels). The areas encode the percentage of models with a particular combination of properties. Combinations of model properties which are not observed are crossed out in the legend. Non-analytical parameter-dependent initial conditions cannot be solved analytically and are obtained by simulating the system to steady state (Color version of this figure is available at Bioinformatics online.)

4.2 Log-transformation of model parameters

A variety of studies in the systems biology field advocate the use of log-transformed parameters, , for optimization: ‘For parameters that are by definition non-negative a log-scale should be used in the parameter estimation’. (Raue ) Recent evaluations verified that this can improve computational efficiency (Kreutz, 2016; Villaverde ). A comprehensive evaluation on application problems is however missing and the precise reason for the improvement is still unclear. Here, we used the compiled benchmark collection to confirm the finding for multi-start local optimization (Fig. 2A) and to assess whether changes in the objective function landscape might be a potential reason. The performance metric is the average number of converged starts per minute (see Villaverde ). Starts are considered to be converged if the objective function value differs at most by from the best objective function value found across all runs for the given benchmark problem, whereby we only included the models for which the best value was found more than once. An assessment of the influence of the convergence threshold is provided in the Supplementary Figure S30.

Fig. 2.

Linear versus logarithmic scale. (A) Performance of the multi-start local optimization scheme using the MATLAB optimizer lsqnonlin for: (x-axis) sampling of initial values in log scale and optimization in linear scale; and (y-axis) sampling and optimization in log scale. Performance is measured as average number of converged starts per minute. (B) Level-sets of the objective function for a synthesis-degradation process described in the Supplementary Section S6. (top) The level-sets in linear parameter are non-convex, implying that the objective function is non-convex. (bottom) The level sets in log-transformed parameters are convex. (C) Convexity properties of the benchmark problems in linear parameters and log-transformed parameters. It is indicated whether the two parameters are sampled in linear or log space and whether the connection between the two parameters is checked in linear or log space. Statistically significant differences are shown (P-value for rank sum test, * < 0.05, ** <0.01) Log-transformation leaves the optima unchanged but changes the shape of the level-sets of the objective function. We found several examples for which the level-sets are non-convex in the parameter θ, but convex in log-transformed parameters ξ (see, e.g. Fig. 2B). As local optimizers are well suited for convex problems, the change in the level set structure could be a reason for the improvement. To assess whether log-transformation improved the convexity of the objective function, we drew a random parameter vector and a second random vector with and a random location on the connecting line, . For convex problems, the objective function J satisfies and α: Accordingly, the fraction of triples for which (6) holds provides a measure of convexity. We evaluated this measure for different combination of sampling strategies for and (lin or log scale, indicated in the x-axis of Fig. 2C), and checking the connecting line between the two parameters in lin or log scale (see Supplementary Section S7). For each combination, we sampled 1000 triples. Our comparison revealed that for most application problems, log-transformation increases the considered measure of convexity (Fig. 2C). Indeed, some problems appear to be completely convex when using log-transformed parameters. This provides a mechanistic explanation for the observed improvement in optimizer convergence.

4.3 Performance of local optimization methods

The no free lunch theorem for discrete optimization states that ‘[…] what an algorithm gains in performance on one class of problems is necessarily offset by its performance on the remaining problems’. (Wolpert and Macready, 1997) This implies that effective optimization relies on a fortuitous matching between an optimization method and an optimization problem. Similarly, it was shown for continuous optimization problems, for which the no free lunch theorem does not apply, that there are optimal algorithms for certain problem classes (Auger and Teytaud, 2010). Here, we used the benchmark collection to assess the performance of the trust-region-reflective and the interior-point algorithm in the MATLAB function fmincon (The MathWorks, 2016) to parameter optimization problems encountered in systems biology to assess which one is generally more suited. These local optimizers are widely used. Indeed, there are studies using both optimizers to exploit their individual benefits and performance differences (Stapor ). The choice of the optimizer has direct implication for multi-start local optimization methods (Raue ) and meta-heuristics (Villaverde ), but also for uncertainty analysis using profile likelihoods (Raue ). For fmincon, mainly the default settings provided by MATLAB were chosen, which can be obtained by . Therein, the algorithm was chosen as trust-region-reflective or interior-point, respectively. Additional changes to the default settings comprise: A user-defined gradient and Hessian for Gauss-Newton optimization. The tolerance on first-order optimality was set to 0. Termination tolerance on the parameters was set to . As subproblem-algorithm, cg (conjugate gradient) was always chosen. The maximum number of iterations was set to 10 000. The trust-region-reflective algorithm is tailored to optimization problems with linear constraints. The trail step of the optimizer is obtained by minimizing a quadratic approximation of the objective function within the trust region (which is chosen adaptively). Parameter bounds are handled in the step construction by scaling and reflection. The interior-point algorithm is a general purpose method (and the MATLAB default) for optimization problems with linear and nonlinear constraints. It solves a sequence of approximate optimization problems with barrier functions. In each iteration, a direct step obtained by solving the so-called Karush-Kuhn-Tucker condition or conjugate gradient step using a trust region is performed. For details we refer to the MATLAB documentation (The MathWorks, 2016). We performed multi-start local optimization with 1000 fits for all benchmark models. Our results revealed that for the considered benchmark problems the trust-region-reflective algorithm tends to outperform the interior-point algorithm (Fig. 3 and Supplementary Figs S7–S26). Indeed, the trust-region-reflective algorithm achieved a higher number of converged starts per computation time for 18 of the 20 benchmark problems and is for 9 benchmark problems the only algorithm finding the optimal solution. However, the optimal solutions for 2 benchmark problems were only obtained using the interior-point algorithm. Accordingly, although the trust-region-reflective algorithm (which is not the MATLAB default) achieves the higher reliability and performance, it can be beneficial to test alternative local optimizers. Additional information of the multi-start fits and the computation time for each model, as well as a comparison of the trust-region-reflective and the interior point method with the least-squares solver implemented in the MATLAB function lsqnonlin can be found in Supplementary Sections S3, S8 and S9.

Fig. 3.

Comparison of optimizer performance. Scatter plot of the average number of converged starts per minute for the interior-point algorithm versus trust-region-reflective algorithm

4.4 Number of steps for local optimizers

Common questions in practical applications are (i) for how many steps (or iterations) a local optimizer should be run, and (ii) how the number of steps depends on the number of the parameters. For many local optimization algorithms, such bounds and results for scaling properties are available. For interior-point algorithms it has been shown that for convex problems ‘[…] the number of Newton steps hardly grows […] with m [the number of constraints - author’s note] (or any other parameter, in fact)’. (Boyd and Vandenberghe, , Section 11.5.6) Similar findings are reported for other methods (see, e.g. Nesterov, 2013). As the independence of the number of optimization steps from the number of parameters might be surprising, we set out to assess the properties on the benchmark collection. For each problem, the trust-region-reflective algorithm implemented in the MATLAB function lsqnonlin was run, without constraints on the maximum number of function evaluation. Our assessment of the average number of optimizer steps (Fig. 4) revealed that on average 391 ± 19 iterations were taken. There is—as predicted by theory for convex problems—no significant dependence on the number of parameters (, P-value ). Accordingly, our analysis on the benchmark collection validated for the first time that the theoretical results also hold for application problems in systems biology (which are in general not convex).

Fig. 4.

Influence of problem size. (A) Average number of optimizer iterations and (B) average computation time versus the number of parameters. For optimization the trust-region-reflective algorithm implemented in the MATLAB function lsqnonlin was used and the averages across 1000 runs with different starting points were computed. The influence of the number of parameters was analyzed using correlation analysis and linear regression In contrast to the number of iterations, the computation time of local optimization depended on the number of parameters (, P-value ). For the trust-region-reflective algorithm using forward sensitivities for gradient calculation, we observed a roughly quadratic dependence (). As the objective and its gradient are evaluated simultaneously using forward sensitivity analysis, the increased computation time is not caused by an increase number of function evaluations (Supplementary Fig. S32). Instead, the computation time per function evaluation tends to increase as the number of parameters increases.

4.5 Incidence of sloppiness

The reliability of parameter estimates is limited by the quality and amount of available experimental data. This dependency translates to parameter sensitivities and to the Fisher information matrix (see Supplementary Material). For ODE models, Gutenkunst suggested that ‘sloppy sensitivity spectra are universal in systems biology models’. While the concept of sloppiness was controversially discussed in the literature (Apgar ; Chis ; Tönsing ; Transtrum ), an assessment of sloppiness for a large collection of models with experimental data appears to be missing. Here, we used the benchmark collection, calculated the eigenvalue spectra of the Fisher information matrix at the maximum likelihood estimate and thereby assessed incidence of sloppiness in our benchmark models. Details are provided as Supplementary Material. Figure 5 shows that 19 out of our 20 models exhibit a sloppy spectrum, i.e. the eigenvalues spread over more than 6 orders of magnitude. For most models, the spread was even >15 orders which partially occurs because of non-identifiability. One model (Bruno ) has a non-sloppy spectrum covering only 2.13 orders of magnitude which illustrates that the spread of the eigenvalues is a matter of experimental design.

Fig. 5.

Eigenvalue spectra of the Hessians of the log-likelihood. Each spectrum was normalized by dividing through the maximal eigenvalue. According to the literature, a model is termed sloppy, if the eigenvalues spread over more than six orders of magnitude. This range is indicated by gray shading. The spectra of non-identifiable models are plotted in red. For our depiction at the log-scale, eigenvalues which are smaller than after normalization with respect to the maximal eigenvalue were set to and occur as line at the bottom of each panel (Color version of this figure is available at Bioinformatics online.)

5 Discussion

Mechanistic dynamical models are used to describe and analyze biochemical reaction networks, to determine unknown parameters, gain biological insights and perform in-silico experiments. Novel methods to address these challenging tasks are proposed on a regular basis, however, a thorough assessment is often problematic. To address this problem, we compiled a collection of 20 benchmark problems. Reusability was ensured by providing the models in the machine-readable SBML format and the experimental data in structured Excel files. In addition, all aforementioned models are included in the open-source MATLAB toolbox Data2Dynamics (Raue ) and the analysis scripts are provided as Supplementary Material. To ensure that the benchmark problems are realistic and practically relevant, we exclusively included published models and measured experimental data. This is a key difference to existing benchmark collections which mostly considered models with simulated data (Ballnus ; Villaverde ). The benchmark models possess a broad spectrum of properties (e.g. different types of initial conditions, noise models and inputs), as well as challenges (e.g. structural and practical non-identifiabilities, and objective functions with multiple minima and valleys). The size of the benchmark problems ranges from roughly 20 data points, 10 parameters to be optimized and a single experimental condition to large models with more than 1000 data points, over 200 parameters and up to 110 distinct experimental conditions. This facilitates the assessment of the scaling behavior of novel algorithms. We illustrated the value of the benchmark collection by performing three different analyses: (i) Our study of parameter transformations confirmed that optimization benefits from log-transformed parameter space. Furthermore, it suggested that the reason could be a significant increase of convexity of most problems, which provides a more benign setting for local optimizers. The observed change in the convexity appears to be the first mechanistic explanation for the observed improvement in optimizer performance. (ii) Our comparison of trust-region-reflective and interior-point algorithms revealed that the former is better suited for most parameter estimation problems encountered in systems biology. (iii) Our analysis of the scaling behavior confirmed theoretical results showing that the number of optimizer steps does not depend on the number of model parameters. The results of analyses (i)–(iii) could not have been obtained without the benchmark collection, which provided the means for a fair comparison. Indeed, the reliability of the findings depends directly on the size and the representativeness of the benchmark collection. Amongst others, previous studies were not able to provide an assessment of the scaling properties (Ballnus ; Raue ; Villaverde ). Beyond the analysis carried out in this manuscript, the benchmark collection can be used to address questions such as: How do other local, global and hybrid optimization methods perform in practice? How does the number of iterations of global and hybrid optimization methods depend on the problem dimensions? How should the number of starting points depend on the dimension of the parameter space or properties of model and dataset (e.g. identifiability or oscillatory dynamics)? How do profile likelihood calculation and Markov chain Monte Carlo sampling methods perform? We expect that the assessment of these and other questions will pinpoint practically relevant limitations of existing methods. This will facilitate a targeted improvement of existing and development of new methods. Apparently, for more detailed questions and a more fine-grained analysis, more benchmark models will be required. In conclusion, we think that the compiled benchmark collection will be an important resource for the systems biology community. It will facilitate the thorough evaluation of novel computational methods and support an unbiased assessment. In the future, the benchmark collection should be integrated with public resources such as the BioModels database (Le Novere ). Furthermore, the collection should be extended to enable a more fine-grained analysis and to fill apparent gaps, such as the lack of models and datasets with sustained oscillations. Therefore, we encourage researchers to provide further models and datasets, e.g. by uploading them to our GitHub repository to obtain an even more powerful collection of benchmark models. Information about ways to contribute are provided on the GitHub page.

Funding

This work was supported by the German Ministry of Education and Research by the grant EA: Sys (FKZ 031L0080), the grant SYS-Stomach (01ZX1310B) and the grant INCOME (FKZ 01ZX1705A), as well as by the German Research Foundation (DFG) via grant TRR179. The authors also acknowledge support for high-performance computing by the state of Baden-Württemberg through the bwHPC initiative, which is supported by DFG grant INST 35/1134-1 FUGG. Conflict of Interest: none declared. Click here for additional data file.

46 in total

1. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.

Authors: M Hucka; A Finney; H M Sauro; H Bolouri; J C Doyle; H Kitano; A P Arkin; B J Bornstein; D Bray; A Cornish-Bowden; A A Cuellar; S Dronov; E D Gilles; M Ginkel; V Gor; I I Goryanin; W J Hedley; T C Hodgman; J-H Hofmeyr; P J Hunter; N S Juty; J L Kasberger; A Kremling; U Kummer; N Le Novère; L M Loew; D Lucio; P Mendes; E Minch; E D Mjolsness; Y Nakayama; M R Nelson; P F Nielsen; T Sakurada; J C Schaff; B E Shapiro; T S Shimizu; H D Spence; J Stelling; K Takahashi; M Tomita; J Wagner; J Wang
Journal: Bioinformatics Date: 2003-03-01 Impact factor: 6.937

2. Identification of nucleocytoplasmic cycling as a remote sensor in cellular signaling by databased modeling.

Authors: I Swameye; T G Muller; J Timmer; O Sandra; U Klingmuller
Journal: Proc Natl Acad Sci U S A Date: 2003-01-27 Impact factor: 11.205

3. Dynamic optimization of bioprocesses: efficient and robust numerical strategies.

Authors: Julio R Banga; Eva Balsa-Canto; Carmen G Moles; Antonio A Alonso
Journal: J Biotechnol Date: 2005-06-29 Impact factor: 3.307

4. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood.

Authors: A Raue; C Kreutz; T Maiwald; J Bachmann; M Schilling; U Klingmüller; J Timmer
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

5. BioBayes: a software package for Bayesian inference in systems biology.

Authors: Vladislav Vyshemirsky; Mark Girolami
Journal: Bioinformatics Date: 2008-07-16 Impact factor: 6.937

6. Mass and information feedbacks through receptor endocytosis govern insulin signaling as revealed using a parameter-free modeling framework.

Authors: Cecilia Brännmark; Robert Palmér; S Torkel Glad; Gunnar Cedersund; Peter Strålfors
Journal: J Biol Chem Date: 2010-04-26 Impact factor: 5.157

7. Parameter estimation and model selection in computational biology.

Authors: Gabriele Lillacci; Mustafa Khammash
Journal: PLoS Comput Biol Date: 2010-03-05 Impact factor: 4.475

8. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems.

Authors: Nicolas Le Novère; Benjamin Bornstein; Alexander Broicher; Mélanie Courtot; Marco Donizelli; Harish Dharuri; Lu Li; Herbert Sauro; Maria Schilstra; Bruce Shapiro; Jacky L Snoep; Michael Hucka
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data.

Authors: William W Chen; Birgit Schoeberl; Paul J Jasper; Mario Niepel; Ulrik B Nielsen; Douglas A Lauffenburger; Peter K Sorger
Journal: Mol Syst Biol Date: 2009-01-20 Impact factor: 11.429

10. Universally sloppy parameter sensitivities in systems biology models.

Authors: Ryan N Gutenkunst; Joshua J Waterfall; Fergal P Casey; Kevin S Brown; Christopher R Myers; James P Sethna
Journal: PLoS Comput Biol Date: 2007-08-15 Impact factor: 4.475

13 in total

1. Parameter Estimation and Uncertainty Quantification for Systems Biology Models.

Authors: Eshan D Mitra; William S Hlavacek
Journal: Curr Opin Syst Biol Date: 2019-11-06

2. A protocol for dynamic model calibration.

Authors: Alejandro F Villaverde; Dilan Pathirana; Fabian Fröhlich; Jan Hasenauer; Julio R Banga
Journal: Brief Bioinform Date: 2022-01-17 Impact factor: 11.622

Review 3. Calibration of ionic and cellular cardiac electrophysiology models.

Authors: Dominic G Whittaker; Michael Clerx; Chon Lok Lei; David J Christini; Gary R Mirams
Journal: Wiley Interdiscip Rev Syst Biol Med Date: 2020-02-21

Review 4. Guidelines for benchmarking of optimization-based approaches for fitting mathematical models.

Authors: Clemens Kreutz
Journal: Genome Biol Date: 2019-12-16 Impact factor: 13.583

5. Compromised Humoral Functional Evolution Tracks with SARS-CoV-2 Mortality.

Authors: Tomer Zohar; Carolin Loos; Stephanie Fischinger; Caroline Atyeo; Chuangqi Wang; Matthew D Slein; John Burke; Jingyou Yu; Jared Feldman; Blake Marie Hauser; Tim Caradonna; Aaron G Schmidt; Yongfei Cai; Hendrik Streeck; Edward T Ryan; Dan H Barouch; Richelle C Charles; Douglas A Lauffenburger; Galit Alter
Journal: Cell Date: 2020-11-03 Impact factor: 41.582

6. Parameterization of mechanistic models from qualitative data using an efficient optimal scaling approach.

Authors: Leonard Schmiester; Daniel Weindl; Jan Hasenauer
Journal: J Math Biol Date: 2020-07-21 Impact factor: 2.259

7. PEtab-Interoperable specification of parameter estimation problems in systems biology.

Authors: Leonard Schmiester; Yannik Schälte; Frank T Bergmann; Tacio Camba; Erika Dudkin; Janine Egert; Fabian Fröhlich; Lara Fuhrmann; Adrian L Hauber; Svenja Kemmer; Polina Lakrisenko; Carolin Loos; Simon Merkt; Wolfgang Müller; Dilan Pathirana; Elba Raimúndez; Lukas Refisch; Marcus Rosenblatt; Paul L Stapor; Philipp Städter; Dantong Wang; Franz-Georg Wieland; Julio R Banga; Jens Timmer; Alejandro F Villaverde; Sven Sahle; Clemens Kreutz; Jan Hasenauer; Daniel Weindl
Journal: PLoS Comput Biol Date: 2021-01-26 Impact factor: 4.475