Literature DB >> 30816929

Benchmarking optimization methods for parameter estimation in large kinetic models.

Alejandro F Villaverde¹, Fabian Fröhlich^2,3, Daniel Weindl², Jan Hasenauer^2,3, Julio R Banga¹.

Abstract

MOTIVATION: Kinetic models contain unknown parameters that are estimated by optimizing the fit to experimental data. This task can be computationally challenging due to the presence of local optima and ill-conditioning. While a variety of optimization methods have been suggested to surmount these issues, it is difficult to choose the best one for a given problem a priori. A systematic comparison of parameter estimation methods for problems with tens to hundreds of optimization variables is currently missing, and smaller studies provided contradictory findings.
RESULTS: We use a collection of benchmarks to evaluate the performance of two families of optimization methods: (i) multi-starts of deterministic local searches and (ii) stochastic global optimization metaheuristics; the latter may be combined with deterministic local searches, leading to hybrid methods. A fair comparison is ensured through a collaborative evaluation and a consideration of multiple performance metrics. We discuss possible evaluation criteria to assess the trade-off between computational efficiency and robustness. Our results show that, thanks to recent advances in the calculation of parametric sensitivities, a multi-start of gradient-based local methods is often a successful strategy, but a better performance can be obtained with a hybrid metaheuristic. The best performer combines a global scatter search metaheuristic with an interior point local method, provided with gradients estimated with adjoint-based sensitivities. We provide an implementation of this method to render it available to the scientific community.
AVAILABILITY AND IMPLEMENTATION: The code to reproduce the results is provided as Supplementary Material and is available at Zenodo https://doi.org/10.5281/zenodo.1304034. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 30816929 PMCID： PMC6394396 DOI： 10.1093/bioinformatics/bty736

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Mechanistic kinetic models provide a basis to answering biological questions via mathematical analysis. Dynamical systems theory can be used to interrogate these kinetic models, enabling a more systematic analysis, explanation and understanding of complex biochemical pathways. Ultimately, the goal is the model-based prediction of cellular functions under new experimental conditions (Almquist ; Kyriakopoulos ; Link ; van Riel, 2006). During the last decade, many efforts have been devoted to developing increasingly detailed and, therefore, larger systems biology models (Karr ; Smallbone and Mendes, 2013; Srinivasan ). Such models are often formulated as nonlinear ordinary differential equations (ODEs) with unknown parameters. As it is impossible to measure all parameters directly, parameter estimation (i.e. model calibration) is crucial for the development of quantitative models. The unknown parameters are typically estimated by solving a mathematical optimization problem which minimizes the mismatch between model predictions and measured data (Ashyraliyev ; Banga and Balsa-Canto, 2008; Jaqaman and Danuser, 2006; Raue ). Parameter estimation for dynamical systems is an inverse problem (Villaverde and Banga, 2013) that exhibits many possible challenges and pitfalls, mostly associated with ill-conditioning and non-convexity (Schittkowski, 2013). These properties, which are in general only known a posteriori, influence the performance of optimization methods. Even if we restrict our attention to a specific class of problems within the same field (e.g. parameter estimation in systems biology), there are often large differences in performance between different applications (Kreutz, 2016). Hence, methods need to be benchmarked for a representative collection of problems of interest in order to reach meaningful conclusions. In this study, we consider the class of medium to large scale kinetic models. These models pose several challenges, such as computational complexity, and an assessment of the performance of optimization methods is particularly important (Babtie and Stumpf, 2017; Degasperi ; Villaverde ). The calibration of large-scale kinetic models usually requires the optimization of a multi-modal objective function (Chen ; Ljung and Chen, 2013; Moles ), i.e. there will be several local optima. Local optimization methods, such as Levenberg-Marquardt or Gauss-Newton (Schittkowski, 2013), which converge to local optima, will only find a global optimum for appropriate starting points. Convergence to a suboptimal solution is an estimation artifact that can lead to wrong conclusions: we might think that the mechanism considered is not suitable to explain the data, while the real reason might be that the method failed to locate the global optimum (Chachuat ). In order to avoid suboptimal solutions, many studies have recommended the use of global optimization techniques (Ashyraliyev ; Banga and Balsa-Canto, 2008; Chen ; Mendes and Kell, 1998). One of the earliest and simplest global optimization methods is the multi-start, which consists of launching many local searches from different initial points in parameter space, assuming that one of them will be inside the basin of attraction of the global solution. It has been shown that multi-starts of local optimization methods can be sufficient for successful parameter estimation in kinetic models (Fröhlich ; Raue ), although the use of other approaches, such as metaheuristics, has also been advocated (Gábor and Banga, 2015; Villaverde ). The comparison of global optimization methods was the topic of several research papers. Interestingly, the evaluation results led to apparently contradictory conclusions, advocating the use of either multi-start local optimization (Hross and Hasenauer, 2016; Raue ), or global optimization metaheuristics (Egea a, 2010; Gábor and Banga, 2015; Moles ). These contradictions cannot be explained by the no-free lunch theorems for optimization (Wolpert and Macready, 1997), since (i) the problems analyzed possessed relatively similar characteristics (i.e. the comparisons did not consider every possible class of problems); and (ii) they were formulated in continuous search domains, where these theorems do not hold (Auger and Teytaud, 2010). Hence, we suggest two alternative explanations: (i) the comparisons were carried out by researchers who had substantially more experience with (the tuning of) one type of the considered methods, and (ii) the performance metrics differed and might even have been biased towards particular approaches. To circumvent these issues, we established an intensive collaboration between experienced users of multi-start local optimization (the HZM group) and metaheuristics (the CSIC group). Through a joint development of performance metrics and evaluation procedures we attempted to ensure a fair comparison of different approaches. In this study, we present the results of this collaboration: the development of a performance metric suited for the comparison of different methods, and the evaluation of the state-of-the-art in parameter estimation methodologies. Based on these results we provide guidelines for their application to large kinetic models in systems biology. To this end, we use seven previously published estimation problems to benchmark a number of optimization methods. The selected problems are representative of the medium and large scale kinetic models used in systems biology, with sizes ranging from dozens to hundreds of state variables and parameters (see Table 2 for details). To the best of our knowledge, this is the first time that a systematic evaluation of parameter estimation methods is conducted on a set of problems of this size and characteristics. We compare several variants of state-of-the-art optimization methods which have been recently reported as competitive options for large problems, including multi-start (Raue ) and hybrid metaheuristics (Villaverde ). We perform systematic comparisons between these different approaches using metrics capturing the performance/robustness trade-off. Finally, we discuss the implications of our results and provide guidelines for the successful application of optimization methods in computational systems biology.

Table 2.

Main features of the benchmarks.The model IDs follow the nomenclature in Villaverde and Fröhlich )

ID	B2	B3	B4	B5	BM1	BM3	TSP
Original ref.	Chassagnole et al. (2002)	Kotte et al. (2010)	Villaverde et al. (2014)	MacNamara et al. (2012)	Smith and Shanley (2013)	Chen et al. (2009)	Moles et al. (2003)
Organism	Escherichia coli	Escherichia coli	Chinese hamster	Generic	Mouse	Human	Generic
Description	Metabolic	Metabolic	Metabolic	Signaling	Signaling	Signaling	Metabolic
level		& transcription
Parameters	116	178	117	86	383	219	36
Upper bounds	10·pref	10·pref	10·pref	varying	10·pref	103·pref	105·pref
Lower bounds	0.1·pref	0.1·pref	0.1·pref	varying	0.1·pref	10−3·pref	10−5·pref
Dynamic states	18	47	34	26	104	500	8
Observed states	9	47	13	6	12	5	8
Experiments	1	1	1	10	1	4	16
Data points	110	7567	169	960	120	105	2688
Data type	Measured	Simulated	Simulated	Simulated	Measured	Measured	Simulated
Noise level	Real^a	No noise	Variable^b	Uniform^c	Real^a	Real^a	σ=10%d

Noise levels are unknown as real measurement data are used.

Noise levels differ for readouts.

Noise level is constant (); the data values generated by this model are between 0 and 1 by construction.

Noise levels are proportional to the signal intensity.

2 Methods and benchmark problems

2.1 Problem definition: parameter optimization for ODE models describing biological processes

We consider deterministic dynamic systems described by nonlinear ODEs of the following form: in which x(t) is the vector of state variables at time t, x0 is the vector of initial conditions, f is the vector field of the ODE, g is the observation function and p is the vector of unknown constant parameters with lower and upper bounds . Parameter optimization for dynamical systems is a nonlinear dynamic optimization problem that aims to find the vector of parameter values p that minimizes the distance between model simulation and measured data subject to the dynamics of the system and (potentially) other possible constraints. The distance is measured by a scalar objective function (or cost function), which can be of several forms. One common choice is the weighted least squares objective function given by: in which is the number of experiments, is the number of observables per experiment, is the number of samples per observable per experiment, is the measured data, is the corresponding simulated output, and are constants that weight the observables in the objective function according to their magnitudes and/or the confidence in the measurements. Another common choice for the objective function is the log-likelihood. Assuming independent, normally distributed additive measurement noise with standard deviation , the likelihood of observing the data D given the parameters p is: Maximizing (3) is equivalent to minimizing the negative log-likelihood function: If the standard deviations are known, and possess the same optimal parameters. Furthermore, for , the log-likelihood and least squares functions are related by We remark that a good agreement of model output and data does not imply that the parameter estimates are correct or reliable. Practical and structural non-identifiabilities can prevent a parameter from being precisely determined (DiStefano Iii, 2015). Still, an accurate fit—and hence optimization—is the starting point for many uncertainty analysis methods. State-of-the-art identifiability analysis methods have been recently evaluated elsewhere (Chiş ; Ligon ; Raue ; Miao ; Villaverde and Barreiro, 2016).

2.2 Overview of optimization methods

The ideal optimization method for the above class of problems would be able to find the global optimum with guarantees and in a short computation time. Furthermore, it should scale well with problem size and be able to handle arbitrary non-linearities. Currently, no such method exists. Local gradient-based methods (Schittkowski, 2013) can be efficient but will converge to the local optimum in the basin of attraction where they are initialized. Local gradient-free (also called zero-order) methods, such as pattern search (Wright, 1996), are less efficient than gradient-based methods but more robust with respect to situations where the gradient is unavailable or unreliable (Conn ). Global methods aim to locate the global solution by means of either deterministic (Esposito and Floudas, 2000) or stochastic (Zhigljavsky and Zilinskas, 2007) strategies. Deterministic methods include so-called complete and rigorous approaches, both of which can ensure convergence to the global solution under certain circumstances. In contrast, stochastic (also known as probabilistic) methods can only guarantee global optimality asymptotically in the best case (Neumaier, 2004), but can solve many problems that cannot be handled using available deterministic methods. Both deterministic and stochastic global optimization methods have been used to solve parameter estimation problems in systems biology. The results show that deterministic methods suffer from lack of scalability (Miró ). The computational cost of purely stochastic methods (such as simulated annealing, particle swarm optimization, or genetic algorithms) usually scales up better, but the computation times can still be excessive for problems of realistic size (Mendes and Kell, 1998; Moles ). Hybrid global-local methods attempt to exploit the benefits of global and local methods. By combining diversification phases (global search) and intensification phases (local search), hybrid methods facilitate reliable global exploration and fast local convergence. As a result, hybrid methods can potentially outperform the efficiency (convergence rate) of purely stochastic methods while keeping their success rate. One such hybrid method is the so called multi-start (MS) strategy (Zhigljavsky and Zilinskas, 2007), which solves the problem repeatedly with local methods initialized from different (e.g. random) initial points. Thus, MS can be regarded as one of the earliest hybrid strategies, and there are different extensions available (Hendrix and Tóth, 2010; Zhigljavsky and Zilinskas, 2007). An alternative family of hybrid methods are metaheuristics (i.e. guided heuristics). An example is the enhanced scatter search (eSS) method (Egea b), an improvement of the method designed by Glover . The eSS method combines a global stochastic search phase with local searches launched at selected times during the optimization, in order to accelerate convergence to local optima. Further accelerations can be achieved by parallelization (Penas ; Villaverde ). In all hybrid methods the efficiency of local methods plays a major role. The most efficient local methods are gradient-based, so their performance depends crucially on the accuracy of the gradient calculations (Nocedal and Wright, 1999). The simplest way of approximating the gradient is by finite differences. However, more accurate gradients are provided by forward sensitivity analysis (Raue ) and adjoint sensitivity analysis (Fröhlich ). While the former provides information on individual residuals which can be used in least squares algorithms, the latter is more scalable.

2.3 Choice of optimization methods for benchmarking

In this study, we consider several competitive hybrid methods based on the recent results reported by Fröhlich ) and Villaverde . These methods are summarized in Table 1 and combine two global strategies:

Table 1.

Classification of the hybrid optimization methods considered in the benchmarking

Global strategy	Local method & gradient calculation				Parameter scaling
	FMINCON-ADJ	NL2SOL-FWD	DHC	None
MS	MS-FMINCON-ADJ-LOG	MS-NL2SOL-FWD-LOG	MS-DHC-LOG	–	LOG
	MS-FMINCON-ADJ-LIN	MS-NL2SOL-FWD-LIN	MS-DHC-LIN	–	LIN
eSS	eSS-FMINCON-ADJ-LOG	eSS-NL2SOL-FWD-LOG	eSS-DHC-LOG	eSS-NOLOC-LOG	LOG
	eSS-FMINCON-ADJ-LIN	eSS-NL2SOL-FWD-LIN	eSS-DHC-LIN	eSS-NOLOC-LIN	LIN

Notes: These methods result from the combination of two global strategies with three local methods and two types of scaling for the search space. Additionally, we tested a global metaheuristics optimization method, Particle Swarm Optimization (PSO) both in logarithmic and linear scale (PSO-LOG, PSO-LIN). The abbreviations are defined in Sections 2.3 and 2.4.

Classification of the hybrid optimization methods considered in the benchmarking Notes: These methods result from the combination of two global strategies with three local methods and two types of scaling for the search space. Additionally, we tested a global metaheuristics optimization method, Particle Swarm Optimization (PSO) both in logarithmic and linear scale (PSO-LOG, PSO-LIN). The abbreviations are defined in Sections 2.3 and 2.4. MS: multi-start local optimization. eSS: enhanced scatter search metaheuristic with three different local methods: NL2SOL-FWD: the nonlinear least-squares algorithm NL2SOL, using forward sensitivity analysis for evaluating the gradients of the residuals. The use of NL2SOL (Dennis Jr ) has recently been advocated for parameter estimation by Gábor and Banga (2015). Additionally, Raue showed that least-squares algorithms with residual sensitivities computed using forward sensitivity analysis outperform many alternative approaches. FMINCON-ADJ: the interior point algorithm included in FMINCON (MATLAB and Optimization Toolbox Release 2015a, The MathWorks, Inc., Natick, Massachusetts, United States), using adjoint sensitivities for evaluating the gradient of the objective function. This method has been shown to outperform the least-squares method using forward sensitivities for large-scale models (Fröhlich ), due to the accelerated gradient evaluation. DHC: a gradient-free dynamic hill climbing algorithm. This algorithm has been proposed by De La Maza and Yuret (1994) and outperformed several alternative approaches in a recent study (Villaverde ). In our experience, this method is competitive when the gradient is numerically difficult to evaluate, e.g. if objective function values are corrupted by numerical integration errors. Furthermore, we consider eSS without any local method (eSS-NOLOC), and particle swarm optimization (PSO) (Kennedy and Eberhart, 1995). The considered global strategies and local methods are a representative subset that covers distinct approaches, which have been shown in the past to exhibit competitive performances on a number of problems (Egea ; Fröhlich ; Gábor and Banga, 2015; Penas ; Raue ; Rodriguez-Fernandez ; Villaverde , 2015). The local methods are applicable to time-resolved and steady-state data [see Raue , Rosenblatt , Fröhlich ) and references therein].

2.4 Choice of scaling for the optimization variables

In addition to the optimization methods, we consider two different choices for the scaling of the optimization variables: LIN: linear scale LOG: logarithmic scale While it is possible to consider the model parameters, p, directly as optimization variables, several studies suggest that using the logarithms of the model parameters, , improves the performance of local optimization methods (Kreutz, 2016; Raue ). This is implemented as the LOG option, which performs all parameter searches (both in the global and local phases) in logarithmic space, . Before every evaluation of the objective function the variables are back-transformed to , thus simulating the original equations.

2.5 Comparison of optimization methods

The performance of optimization methods can be compared using several evaluation criteria. Ideally, a criterion should be: single, interpretable quantity comparable across models and methods (to enable an integrated analysis) account for computation time and objective function value A number of evaluation criteria have been used in the literature to compare the performance of optimization methods, e.g. dispersion plots of objective function value versus computation time and waterfall plots showing the ordered objective function values found by the different searches. These and other plots are reported in the Supplementary Figures S1–S14. Alternative criteria are performance profiles (Dolan and Moré, 2002) which report for a given set of optimization problems how often one algorithm was faster than all others. The required assumption that all algorithms converge is relaxed in data profiles (Moré and Wild, 2009) by considering the decrease in objective function value and reporting the fraction of solved problems as a function of the budget per variable. While all these plots are useful tools, they do not provide a single, interpretable quantity and fail in other ways. Upon consideration of a variety of different evaluation criteria, we decided to adopt a workflow consisting of several steps, which lead to a newly proposed metric that is a distillation of the information obtained in previous steps. The workflow considers the following criteria: Convergence curves Fixed-budget scenario and fixed-target scenario Dispersion plots of the success rate versus average computation time Overall efficiency (OE) The first step is to evaluate convergence curves, which show the objective function value as a function of computation time (Fig. 1A). For eSS and PSO, the convergence curves are constructed from single searches as they reach the predefined maximum CPU time. For MS optimization, each convergence curve corresponds to a sequence of local searches and continues until the predefined maximum CPU time was reached.

Fig. 1.

Illustration of performance criteria. (A) Convergence curves for three different methods. Shaded areas show the range of all runs, while solid lines represent their median. The dashed horizontal line is the value to reach (VTR), that is the maximum objective function value that can be considered a successful result. The dashed vertical line is the maximum time allowed (MAXT). (B) Dispersion plot of objective value after the maximum time allowed and the derived success rates (SR). The SR is the area under the curve where objective VTR. (C) Success rate and computation time. Points indicate individual methods. The Pareto front is the set of non-dominated methods. Methods to the right or above the Pareto front are dominated by other methods with either shorter computation time or higher success rate. Filled areas show the average computation time required to obtain a successful run for the respective method. For algorithms with a success rate of zero, meaning that no optimization run reached the VTR, 1/success rate is set to infinity The information encoded in the convergence curves is in the second step summarized by considering a fixed-budget scenario and a fixed-target scenario, as proposed by Hansen . In the fixed-budget scenario, the distribution of the objective function for a given computation time is considered, meaning that a vertical line is drawn. In the fixed-target scenario the distribution of time points is considered at which a desired objective function value or value to reach (VTR) is reached, meaning that a horizontal line is drawn. Once an optimization has reached the desired VTR (horizontal view), it is considered successful. The success rate (SR) of an algorithm is the fraction of searches that reached the VTR within this maximum computation time, MAXT (Fig. 1B). Complementary, we evaluate the average computation time required by an algorithm, , which is the minimum of the time required to reach VTR and MAXT. In the third step, we consider dispersion plots of the success rate versus average computation time to study the relation of the two quantities (Fig. 1C). Note that this dispersion plot may reveal in some cases a Pareto set structure, consisting of algorithms which provide an optimal trade-off between conflicting goals (in this case, high success rate and low computation time): it is not possible to improve one of its objectives without worsening the other. We are interested in methods that are located towards the bottom (i.e. high success rate) and left (i.e. low computation time) of this plot. Therefore, in the fourth step, we quantify the trade-off between success rate and average computation time using a novel metric called overall efficiency (OE). The OE for method i on a given problem is defined as: where is the average computation time we need to run method i to obtain one successful run. It is calculated as , where and are the average computation time and the success rate of method i for that problem. The computation time is directly related to the area in the dispersion plot (Fig. 1C); accordingly, the OE is the ratio of the minimal area and the area for a given algorithm. The inverse of the overall efficiency, , quantifies how much longer one has to run method i—compared to the best method—in order to find a good solution. The OE ranges between 0 and 1; for each particular problem the best performing method achieves the maximum score, . To evaluate methods on a set of optimization problems, we compute a method’s cumulative overall efficiency as the sum of its OEs for the individual problems. The method with highest cumulative OE will be the one exhibiting the best trade-off between success rate and computation time for the set of problems. In summary, our workflow considers multiple metrics and summarizes the trade-off between computational complexity and success with the novel performance metric OE. As the OE is interpretable, comparable between models and methods and accounts for computation time and final objective function values, it fulfils all the afore-defined criteria.

2.6 Benchmark problems

In this study, we consider seven benchmark problems based on previously published kinetic models (Chassagnole ; Chen ; Kotte ; MacNamara ; Moles ; Smith and Shanley, 2013; Villaverde ) which describe metabolic and signalling pathways of different organisms (from bacteria to human). These problems possess 36 to 383 parameters and 8 to 104 state variables. The data points are collected under up to 16 experimental conditions, corresponding to the number of required numerical simulations. The features of all problems are summarized in Table 2. The benchmarks B2–B5 had been previously included in the BioPreDyn-bench collection (Villaverde ), and BM1 & BM3 were used in (Fröhlich ). Main features of the benchmarks.The model IDs follow the nomenclature in Villaverde and Fröhlich ) Noise levels are unknown as real measurement data are used. Noise levels differ for readouts. Noise level is constant (); the data values generated by this model are between 0 and 1 by construction. Noise levels are proportional to the signal intensity.

2.7 Implementation

The benchmark problems have been implemented in MATLAB (MathWorks, Natick, MA, USA) using the AMICI toolbox (Fröhlich ), a free MATLAB interface for SUNDIALS solvers (Hindmarsh ). The optimization methods have been implemented as MATLAB scripts calling solvers from the MATLAB Optimization Toolbox, the MATLAB Global Optimization Toolbox and the MEIGO toolbox (Egea ), and making use of the efficient gradient computation provided by the AMICI toolbox. The code necessary for reproducing the results reported here is provided as Supplementary Material and is also available at Zenodo https://doi.org/10.5281/zenodo.1304034.

3 Results and discussion

3.1 Comprehensive evaluation of the considered optimization methods on the benchmark problems

To assess the performance of the different optimization methods, we solved the 7 benchmark problems using the 16 optimization methods listed in Table 1. For each problem, multi-start local optimization (MS) used 100 starting points, while enhanced scatter search (eSS) and particle swarm optimization (PSO) were run 10 times, each time until the predefined, maximum problem-specific CPU time (Supplementary Table S1) was reached. The overall computational effort was CPU days (Intel Xeon E5-2600 2.50GHz processor). The convergence curves for all optimization methods on all problems were evaluated (see Fig. 2A for a representative example and Supplementary Figs S15–S28 for the complete set). Convergence curves for MS were plotted by mimicking the scenario in which eSS and PSO were used: we emulate 10 virtual processors, each of which performs local searches sequentially for the same time allowed to the eSS/PSO runs. The local searches in these emulated sequential optimizations are sampled without replacement from the pool of 100 searches launched in each MS. Thus, the success ratio (SR) reported for MS methods is the fraction of the 10 virtual processors that find a solution whose objective function value falls below the VTR. Note that this SR is not the same as the fraction of successful searches of the 100 launched in the MS. Numerical values of the horizontal and vertical views of said curves are provided in the Supplementary Tables S3–S20, and graphically in Supplementary Figures S65–S82.

Fig. 2.

Results of performance evaluation. (A) Convergence curves of the different methods for benchmark TSP. Results for the remaining benchmarks are reported in the Supplementary Material. (B) Average computation time of each method versus the inverse of its success rate for benchmark TSP. Methods with zero success rate are not shown. Results for the remaining benchmarks are reported in the Supplementary Material. (C) Cumulative overall efficiency: Each method is represented by a stack of the OEs observed for the individual benchmark problems. The maximum possible score is the same as the number of benchmarks, i.e. seven. (D) Successful methods for each benchmark are shown in color; methods which never succeeded for a given problem are shown in white. A, B and D use the thresholds VTR C and MAXT A as defined in the Supplementary Table S1. Panel C shows the average OE across all considered VTRs and MAXTs As expected, the optimization results indicate that the performance of the optimization methods varies substantially among the benchmark problems. This is in agreement with previous studies (Kreutz, 2016; Villaverde ). For the quantitative evaluation we performed the analyses for two MAXTs and nine VTRs per benchmark (see Supplementary Table S1). We found that the ranking of the methods with respect to the OE depends only weakly on the MAXTs and the VTRs (Supplementary Figs S47–S64). For visualization in Figure 2A, B and D, we use a high MAXT (MAXT A) and a moderate VTR (VTR C) which ensures a good agreement of model simulation and data. Results for other choices of VTR including larger and smaller values are shown in the Supplementary Figures S29–S82 and Tables S3–S21. In the following, we present the key findings of our analysis and address, amongst others, the question of which is the most efficient method for performing parameter optimization. The detailed evaluation results are presented in the Supplementary Material. In particular, quantitative values for the performance improvements mentioned in Sections 3.2–3.4 can be found in Supplementary Table S21.

3.2 Gradient-based local searches outperform gradient-free local searches

Our comprehensive evaluation clearly shows that high-quality sensitivity calculation methods provide a competitive advantage to local methods that exploit them. Optimization using adjoint and forward sensitivity analysis (FMINCON-ADJ and NL2SOL-FWD) usually outperform the gradient-free alternative (DHC). This is reflected in the dispersion plots (see, e.g. Fig. 2B) and in a higher cumulative OE (Fig. 2C) and holds for MS and eSS settings. The combinations of eSS with gradient-based methods, eSS-FMINCON-ADJ and eSS-NL2SOL-FWD, clearly outperform the gradient-free alternatives, eSS-DHC and eSS-NOLOC, as well as the also gradient-free PSO. Notably, successful optimization of BM3 for the given computational budget required adjoint sensitivity analysis in combination with optimization in the log-scale (Fig. 2D).

3.3 Enhanced scatter search outperforms multi-start local optimization

Our results show that MS is usually sufficient to find a good solution, given the same computation time as eSS (Fig. 2D). For strict VTRs (i.e. VTR E and VTR F), MS and eSS perform equally well. However, eSS were generally more efficient than MS (Fig. 2C). On average a 2-fold improvement of the OE is observed, almost independent of the local method. The reasons for the efficiency improvement is probably that eSS starts the local searches from promising points found through advanced exploration and recombination strategies. In this regard, it can be considered as an ‘advanced multi-start’ (Ugray ).

3.4 Optimization in logarithmic scale outperforms optimization in linear scale

Previous studies reported that the transformation of the optimization variable to log-scale improves the reliability and computational efficiency of local methods (Kreutz, 2016; Raue ). Our findings corroborate these results and show for the first time that also global optimization methods are more efficient when using log-scale (LOG) than linear-scale (LIN). Overall, we observe an average improvement of the cumulative OE at least by a factor of 2 (Fig. 2C). Indeed, for some problems (BM3, TSP), reasonable fits could only be obtained using the log-transformed parameters (Fig. 2D).

3.5 Best performing method

The comparison of all methods reveals that eSS-FMINCON-ADJ-LOG possesses the best overall efficiency on the considered benchmark problems and settings, followed by eSS-NL2SOL-FWD-LOG (Fig. 2C). The difference in performance between both methods is small; indeed, for certain VTRs, eSS-NL2SOL-FWD-LOG is the best performer (VTR C, see Supplementary Figs S51 and S52). However, eSS-FMINCON-ADJ-LOG is the only method that successfully solves all problems (Fig. 2D), while eSS-NL2SOL-FWD-LOG fails for BM3, possibly due to the very large number of states and parameters of this problem. In summary, our performance evaluation hence suggests the use of eSS-FMINCON-ADJ-LOG. Interestingly, our study of the dispersion plots revealed that eSS-FMINCON-ADJ-LOG often maximizes success rate and minimizes mean computation time. Accordingly, in these cases there is—in contrast to what we expected—no trade-off, but we have a clear winner.

4 Conclusion

In this paper we have presented a comparative evaluation of state-of-the-art optimization methods for parameter estimation in systems biology. We have applied these methods to benchmark problems of different sizes (medium to large) and complexities. To compare the different methodologies in detail, we have used a multi-criteria workflow, exploring several possible ways of assessing the performance of optimization methods for this task. We have reported results using a number of selected indicators and evaluation tools. Furthermore, we have introduced the concept of overall efficiency (OE), which quantifies the trade-off between success rate and computation time, providing a numerical indication of the most efficient method. We have found that this metric is a convenient summary of the comparative performance of a method on a set of problems. A central goal of our work was to re-examine past results regarding the performance of multi-start and metaheuristics (i.e. enhanced scatter search). Firstly, we have confirmed that multi-start local optimization is a powerful approach (Hross and Hasenauer, 2016; Raue ) as it solved most considered benchmark problems in a reasonable time. The only exception is B3, a problem for which numerical simulation fails for many parameter points. Secondly, we verified that the enhanced scatter search metaheuristic often possesses higher success rates and efficiency compared to plain multi-start optimization methods (Gábor and Banga, 2015). However, the difference of a factor of two was smaller than suggested by several previous studies and will likely depend on the set of benchmark problems. Furthermore, the average improvement by a factor of two is smaller than the variability across benchmarks, implying that for many problems the use of multi-start methods is still beneficial (e.g. BM3). Thirdly, our results confirm that purely global optimization strategies (i.e. not combined with a local method), such as PSO and eSS-NOLOC, are less efficient than hybrid ones. Finally, we have assessed the influence of parameter transformations, concluding that optimizations in logarithmic scale clearly outperform those in linear scale. This parameterization can always be easily adopted, irrespective of the optimization method used. We considered two sophisticated gradient-based methods, FMINCON with adjoint sensitivities and NL2SOL with forward sensitivities, whose use was mostly beneficial. A gradient-free local method, DHC, was found to be less precise than the gradient-based counterparts, although its use may still be advantageous in problems with numerical issues that limit the efficacy of gradient-based techniques. In this study, we assessed the average performance of optimization methods for the benchmark problems. Complementary, it would be interesting to see how the performance of each method depends on problem characteristics, e.g. problem size. The assessment of this would however require a large number of problems with different characteristics. Since this is currently not available, we refrain from attempting a systematic evaluation of this feature. Overall, the best performing method in our tests was eSS-FMINCON-ADJ-LOG, that is, a hybrid approach combining the global metaheuristic eSS with the local method FMINCON, provided with gradients estimated with adjoint-based sensitivities. This was the only method that succeeded in calibrating all the benchmarks and it also achieved the highest overall efficiency for the set of thresholds adopted in this study. To facilitate the application of this and other methods, we provide their implementations in the Supplementary Material. In the case of the best performing method, our solver is—to the best of our knowledge—the first publicly available implementation. Accordingly, our study provides access to a novel optimizer applicable to a broad range of application problems in systems biology.

Funding

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 686282 (‘CANPATHPRO’), from the Spanish MINECO/FEDER projects SYNBIOFACTORY (DPI2014-55276-C5-2-R) and SYNBIOCONTROL (DPI2017-82896-C2-2-R), and from the German Research Foundation (DFG) through the Graduate School of Quantitative Biosciences Munich (QBM; F.F.). Conflict of Interest: none declared. Click here for additional data file.

36 in total

Review 1. Parameter estimation and optimal experimental design.

Authors: Julio R Banga; Eva Balsa-Canto
Journal: Essays Biochem Date: 2008 Impact factor: 8.000

Review 2. Kinetic models in industrial biotechnology - Improving cell factory performance.

Authors: Joachim Almquist; Marija Cvijovic; Vassily Hatzimanikatis; Jens Nielsen; Mats Jirstrand
Journal: Metab Eng Date: 2014-04-16 Impact factor: 9.783

Review 3. How to deal with parameters for whole-cell modelling.

Authors: Ann C Babtie; Michael P H Stumpf
Journal: J R Soc Interface Date: 2017-08-02 Impact factor: 4.118

4. Structural identifiability of systems biology models: a critical comparison of methods.

Authors: Oana-Teodora Chis; Julio R Banga; Eva Balsa-Canto
Journal: PLoS One Date: 2011-11-22 Impact factor: 3.240

5. Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems.

Authors: Maria Rodriguez-Fernandez; Jose A Egea; Julio R Banga
Journal: BMC Bioinformatics Date: 2006-11-02 Impact factor: 3.169

6. Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy.

Authors: David R Penas; Patricia González; Jose A Egea; Ramón Doallo; Julio R Banga
Journal: BMC Bioinformatics Date: 2017-01-21 Impact factor: 3.169

7. Performance of objective functions and optimisation procedures for parameter estimation in system biology models.

Authors: Andrea Degasperi; Dirk Fey; Boris N Kholodenko
Journal: NPJ Syst Biol Appl Date: 2017-08-08

8. Input-output behavior of ErbB signaling pathways as revealed by a mass action model trained against dynamic data.

Authors: William W Chen; Birgit Schoeberl; Paul J Jasper; Mario Niepel; Ulrik B Nielsen; Douglas A Lauffenburger; Peter K Sorger
Journal: Mol Syst Biol Date: 2009-01-20 Impact factor: 11.429

9. Lessons learned from quantitative dynamical modeling in systems biology.

Authors: Andreas Raue; Marcel Schilling; Julie Bachmann; Andrew Matteson; Max Schelker; Max Schelke; Daniel Kaschek; Sabine Hug; Clemens Kreutz; Brian D Harms; Fabian J Theis; Ursula Klingmüller; Jens Timmer
Journal: PLoS One Date: 2013-09-30 Impact factor: 3.240

Review 10. Reverse engineering and identification in systems biology: strategies, perspectives and challenges.

Authors: Alejandro F Villaverde; Julio R Banga
Journal: J R Soc Interface Date: 2013-12-04 Impact factor: 4.118

20 in total

1. Parameter Estimation and Uncertainty Quantification for Systems Biology Models.

Authors: Eshan D Mitra; William S Hlavacek
Journal: Curr Opin Syst Biol Date: 2019-11-06

2. Optimal Experimental Design for Systems and Synthetic Biology Using AMIGO2.

Authors: Eva Balsa-Canto; Lucia Bandiera; Filippo Menolascina
Journal: Methods Mol Biol Date: 2021

3. Global optimization of the Michaelis-Menten parameters using physiologically-based pharmacokinetic (PBPK) modeling and chloroform vapor uptake data in F344 rats.

Authors: Marina V Evans; Christopher R Eklund; David N Williams; Yusupha M Sey; Jane Ellen Simmons
Journal: Inhal Toxicol Date: 2020-04-02 Impact factor: 2.724

4. A numerical approach for detecting switch-like bistability in mass action chemical reaction networks with conservation laws.

Authors: Brandon C Reyes; Irene Otero-Muras; Vladislav A Petyuk
Journal: BMC Bioinformatics Date: 2022-01-04 Impact factor: 3.169

5. A protocol for dynamic model calibration.

Authors: Alejandro F Villaverde; Dilan Pathirana; Fabian Fröhlich; Jan Hasenauer; Julio R Banga
Journal: Brief Bioinform Date: 2022-01-17 Impact factor: 11.622

6. Fitting thermodynamic-based models: Incorporating parameter sensitivity improves the performance of an evolutionary algorithm.

Authors: Michael J Gaiewski; Robert A Drewell; Jacqueline M Dresch
Journal: Math Biosci Date: 2021-10-21 Impact factor: 2.144