Literature DB >> 32406375

Application of conditional robust calibration to ordinary differential equations models in computational systems biology: a comparison of two sampling strategies.

Fortunato Bianconi¹, Chiara Antonini², Lorenzo Tomassoni², Paolo Valigi².

Abstract

Mathematical modelling is a widely used technique for describing the temporal behaviour of biological systems. One of the most challenging topics in computational systems biology is the calibration of non-linear models; i.e. the estimation of their unknown parameters. The state-of-the-art methods in this field are the frequentist and Bayesian approaches. For both of them, the performance and accuracy of results greatly depend on the sampling technique employed. Here, the authors test a novel Bayesian procedure for parameter estimation, called conditional robust calibration (CRC), comparing two different sampling techniques: uniform and logarithmic Latin hypercube sampling. CRC is an iterative algorithm based on parameter space sampling and on the estimation of parameter density functions. They apply CRC with both sampling strategies to the three ordinary differential equations (ODEs) models of increasing complexity. They obtain a more precise and reliable solution through logarithmically spaced samples.

Entities: Chemical

Mesh：

Year: 2020 PMID： 32406375 PMCID： PMC8687221 DOI： 10.1049/iet-syb.2018.5091

Source DB: PubMed Journal: IET Syst Biol ISSN： 1751-8849 Impact factor: 1.615

1 Introduction

In systems biology, a key issue is to understand the dynamic interactions occurring within and between cells, which determine their structure and basic functions [1, 2]. Since most aspects of a biological system are not directly accessible to experimental observation, mathematical modelling is widely employed to formalise the dynamic and temporal evolution of system variables and to make predictions about biological mechanisms [3, 4]. In this study, we consider biological systems modelled with deterministic ordinary differential equations (ODEs) [5]. These models have a set of unknown kinetic parameters which cannot be measured experimentally. Thus, one of the challenging tasks of model building is model calibration; i.e. the inverse problem of estimating those parameters from available experimental data [6]. The state‐of‐the‐art methods for parameter estimation are classified in frequentist and Bayesian methodologies. Both approaches are based on the concept of the probability density of observing the data given certain parameter values [7]. However, the frequentist methods aim at maximising the likelihood function through an optimisation algorithm while the Bayesian compute the posterior distribution using sampling based techniques [8, 9]. In both classes of algorithms, sampling is a fundamental element for dealing with parameter uncertainty. In order to avoid local optima, the frequentist approach employs bootstrapping from the available experimental replicates and Latin hypercube sampling (LHS) in order to start the estimation from different and independent parameter samples [10]. On the other hand, Bayesian methods sample directly from a prior distribution and accept only those samples that are in a region of high probability. Thus, in this class of methods, the sampler is of primary importance to guarantee the convergence to the desired posterior distribution. Samples should be dense and properly distributed in order to ensure a wide and appropriate exploration of the parameter space [11]. Currently, different sampling techniques are available, including multivariate uniform or logarithmic distribution, LHS and its variations, Markov Chain Monte Carlo and sequential Monte Carlo methods [12-15]. In this paper, we aim to understand the performances of two sampling strategies, namely uniform and logarithmic LHS, in a new Bayesian procedure called conditional robust calibration (CRC), presented in [16, 17]. CRC is an iterative algorithm that samples the parameter space and returns in output the probability density functions (pdfs) of parameters. Each pdf is estimated on the region of the parameter space that best reproduces the experimental dataset. CRC is applied to calibrate three ODE models with different characteristics; i.e. the well‐known Lotka Volterra model [12], the erythroprotein receptor (EpoR) model [18] and the multiple myeloma (MM) model [19]. The comparison of CRC results between logarithmic and uniform sampling show that, while the logarithmic LHS successfully estimated unknown parameter values, the uniform LHS returned less reliable and precise results in all studied models.

2 Methods

2.1 Mathematical model

Mathematically, ODE models are represented as follows: where is the vector of state variables with initial conditions , is the vector of control variables and is the vector of output variables; i.e. the concentrations of measured states or a function of them. Note that , where is a subset of the positive orthant, is the vector of dynamical model parameters. Vector contains scaling and offset parameters. The function specifies the model while is the observation function for mapping the state variables to the observed quantities. By setting , , all model parameters are known. Each parameter , is usually characterised by an interval of validity, due to its physical or biological meaning. Let denote the nominal value of . Then, the lower and upper boundaries of parameter can be written, respectively, as and , where and are multiplicative coefficients. Thus, the parameter space results from the Cartesian product .

2.2 Experimental dataset

Since mathematical models represent an in silico explanation of experimental data, the observables predicted by a model have to be in agreement with the experimental dataset. Let denote with the matrix of available data. Supposing that experimental data are collected at different time points , each is a time series vector of the form . The jth observable measured at the kth time point can be divided as , where is the nominal unknown value and is the corresponding measurement error, usually assumed to be normally distributed.

2.3 Calibration method

To compare the performances of uniform and logarithmic sampling when using CRC, we estimate the parameters of three different ODE models. Here, we briefly describe the main steps of CRC, whose details are fully explained in [16]. Fig. 1 shows the CRC workflow.

Fig. 1

Main steps of CRC

2.3.1 Initialisation of input parameters

First of all, it is necessary to initialise the input parameters of the procedure. They are and which are, respectively, the upper and lower boundaries of the parameter vector range and which is the number of parameter samples to generate at each iteration.

2.3.2 Parameter space sampling

The parameter vector is considered a random variable, denoted as . The purpose of CRC is to identify a sufficient number of parameter realisations that approximate the posterior distribution . Therefore, the parameter space is sampled in order to generate realisations of the parameter vector. Thus, the choice of depends on the cardinality of . The higher the number of parameters to estimate, the higher should be set in order to guarantee a wide exploration and coverage of the parameter space. LHS is chosen as the sampling technique in order to spread the sample points more evenly across all possible values. To generate Latin hypercube samples in a q‐dimensional space, a unit hypercube is first divided into intervals with an equal length of along each axis. This creates equally probable intervals, , for each dimension. Thus, LHS can be represented as a sample matrix, which contains one sample in each row and in each column. Each column corresponds to a variable and it is a random permutation of the intervals while each row is a sample point [11]. Afterwards, the LHS matrix needs to be centred around the nominal parameter values and between the defined lower and upper boundaries. In this study, we assume that, in one case, parameters are uniformly distributed among their intervals while, in another case, they are logarithmically distributed. Let denote with the matrix that contains the samples generated. Under the hypothesis of uniform distribution, the matrix of the generated parameter samples is computed as where is the nominal parameter vector, and are the lower and upper boundaries of the sampling interval at the zth iteration. On the contrary, when using logarithmic sampling, is calculated as where and are the lower and upper boundaries transformed to the logarithmic space. At each iteration, the sampling interval is shrinked according to the following formula: where , , , are input parameters chosen by the user. Since and are multiplied with the nominal parameter vector , they determine the percentage variation of at each iteration. This means that the interval where the parameter samples are selected for each iteration can be defined as follows: . Thus, given that each iteration gets closer to a solution of the parameter values, the tuning parameters , , , should be chosen in order to shrink the percentage variation of with respect to the previous iteration.

2.3.3 Model simulation and computation of distance functions (DFs)

For each sample , the ODE model is integrated in order to compute the in silico dataset . Then is compared with by computing a DF for each output variable. The DF is defined in two ways As is a random variable, each vector can be seen as a transformation from the random variable and, consequently, each is a transformation from the random variable . Let denote with a realisation of the random variable and with the pdf of . Each is estimated using a kernel density approach [20]. The pdfs of the DFs are good indicators to infer the ability of the sampling technique to find parameter values that correctly calibrate the mathematical model under investigation. The higher is the area under each in a region close to zero and the higher is the percentage of parameter vectors identified by the LHS that are able to make the observables of the mathematical model close to the experimental data.

2.3.4 Thresholds calculation

In this step, a set of thresholds is defined. They correspond to the minimum accepted level of agreement between simulated and experimental data, for each output variable. Thus, CRC selects only those DF values that belong to the low tail of the corresponding pdf. The rationale that the user has to follow to fix the threshold values is explained in Section 2.3.5. Then, all DF tails are intersected among each other obtaining the following set: . In the parameter space, the accepted DF values correspond to a set of parameter samples having as joint conditional distribution .

2.3.5 Parameter conditional densities estimation

For each parameter, the corresponding conditional density is estimated using a kernel density approach. Concretely, the input data for the estimation of , are the subset of the parameter vectors generated through the LHS for which , . As a consequence, the lower are the values of the thresholds and the fewer are the available parameter vectors to estimate the conditional densities of each parameter. In order to have a reliable and stable estimation of these pdfs, at least 1000 parameter samples are required [20]. For this reason, each threshold is chosen in order to guarantee at least 1000 samples in under ; i.e. . Finally, for each parameter the corresponding mode value is selected. Multiple sets of thresholds can meet the requirements, but the purpose is to find the lowest possible values because it means that the distance between simulated and experimental data is minimum. If it is necessary to perform another iteration of CRC, the nominal value of every parameter is set equal to the mode value of ; i.e. , and the sampling interval is updated by changing and according to (5). It is important to underline that at the next iteration the interval where is selected can be wider or tighter than the previous one because the lower and upper boundaries are multiplied by the mode value of each parameter. The procedure stops when the desired level of agreement between simulated and experimental data is met. For instance, in case of in silico data whose source of error is known, CRC is performed until each threshold is slightly over the difference between nominal and noisy data, in order to avoid noise fitting.

3 Results

In this section, we compare the performances of CRC with uniform and logarithmic sampling approach when calibrating three different ODE models. All computational analyses were run using Matlab (R2014), on a Intel Core i7‐4700HQ CPU, 2.4 GHz, 16 GB Memory, Ubuntu 16.04 LTS (64bit).

3.1 Lotka‐Volterra model (M1)

We apply CRC to the Lotka‐Volterra ODE model. The model describes the interaction between the prey species x, and predator species y, through parameters a and b Observables of the model are both x and y while the parameter space is . Both nominal values of parameters are set to 1. In silico data are generated as in [12]; i.e. sampling eight values at the specified time points and adding Gaussian noise (see Table 1).

Table 1

Model M1: in silico noisy dataset used for calibration

Time points, min	x	y
1.1	1.8823	0.5214
2.4	1.5510	1.2471
3.9	0.4007	0.9858
5.6	0.0454	0.2924
7.5	1.8881	0.6133
9.6	0.1839	1.3163
11.9	−0.1940	0.8171
14.4	1.6015	1.7212

Model M1: in silico noisy dataset used for calibration We calculate the distance between the nominal and noisy dataset according to (6) and we obtain the following values: 3.09 for species x and 2.6 for species y. Accordingly, they represent the minimum accepted level of agreement between simulated and experimental data. Since the purpose of this paper is the comprehension of the role of the sampling approach in the performances and efficiency of CRC, the tuning parameters are set in the same way when both the techniques are applied. Therefore, we perform six iterations of the entire CRC procedure setting , and shrinking and according to (5), setting and . In this way, we progressively shrink the percentage variation of the parameter vector, approaching 0 at the limit. Table 2 shows the mode vector of the parameters obtained at the end of each iteration of CRC and also the interval where the parameter samples are selected. Each interval depends on the mode vector in output from the previous iteration and on and . We generate parameter samples for each iteration. Table 3 summarises the resulting thresholds for both uniform and logarithmic sampling approach and it also shows the cardinality of the set . As explained in Section 2.3.5, each threshold is chosen in order to guarantee at least 1000 samples in the set of intersection between DF tails, for a reliable estimation of the conditional parameter pdfs.

Table 2

Model M1: mode vector obtained at the end of each CRC iteration, specified by z (first column of the table), and interval where the samples of are selected

	Uniform		Logarithmic
z	pmode	[pmode×Lz,pmode×Uz]	pmode	[pmode×Lz,pmode×Uz]
1	a = 0.52	[0.1,10]	a = 0.37	[0.1,10]
	b = 6.03	[0.1,10]	b = 0.96	[0.1,10]
2	a = 1.14	[0.63,6.3]	a = 0.81	[0.2,2.03]
	b = 6.13	[3.37,33.71]	b = 1.26	[0.53,5.28]
3	a = 1.16	[0.9,3.77]	a = 0.84	[0.63,2.63]
	b = 6.27	[4.86,20.38]	b = 1.11	[0.98,4.1]
4	a = 1.85	[1.64,3.93]	a = 0.98	[0.75,1.79]
	b = 5.99	[5.32,12.73]	b = 1.07	[0.95,2.27]
5	a = 1.90	[1.79,2.97]	a = 1.01	[0.95,1.58]
	b = 6.49	[6.12,10.14]	b = 1.07	[1,1.67]
6	a = 2.07	[2.01,2.65]	a = 1.06	[1.03,1.36]
	b = 6.51	[6.33,8.34]	b = 1.06	[1.03,1.36]

Table 3

Model M1: threshold schedule used at each iteration of CRC and the resulting cardinality of

	Uniform			Logarithmic
Iteration(z)	ϵxz	ϵyz	\|Δ\|	ϵxz	ϵyz	\|Δ\|
1	7.7	15	1010	7.3	6	1071
2	7.7	15	1008	5.8	4	1012
3	7.7	15	1038	5	3.4	1122
4	7.4	13.3	1002	4	3	1038
5	6.8	13.3	1022	3.4	2.7	1035
6	6.3	13.3	1170	3.1	2.7	1535

Model M1: mode vector obtained at the end of each CRC iteration, specified by z (first column of the table), and interval where the samples of are selected Model M1: threshold schedule used at each iteration of CRC and the resulting cardinality of According to the workflow depicted in Fig. 1, at the end of each iteration we estimate the conditional densities for both parameters a and b. The evolution of the domains of the two conditional densities and are shown in Fig. 2.

Fig. 2

Model M1: scatter plots of both model parameters

Uniform sampling,

Logarithmic sampling

Model M1: scatter plots of both model parameters Uniform sampling, Logarithmic sampling Thus, the scatter plots are the accepted parameter samples from the first until the final iteration of CRC. Fig. 3 shows the time behaviour of output variables generated with the parameter samples accepted in the last iteration. When using logarithmic LHS, the final mode values are 1.06 for parameter a and 1.06 for parameter b while CRC with uniform LHS estimates a equal to 2.07 and b equal to 6.51.

Fig. 3

Model M1: blue lines are the temporal behaviour of the output variables when the parameters a and b are set equal to the mode values of the last iteration of CRC; red dots are the in silico data; grey regions are the confidence bandwidths when parameters vary from the 2.5th and the 97.5th percentile of their corresponding final conditional densities

Uniform sampling,

Logarithmic sampling

3.2 EpoR model (M2)

We apply CRC to the EpoR model. In this model, an external ligand L activates, via two steps, an enzyme E that catalyses a substrate S. These reactions generate a product P that generally cannot be measured directly. For this reason the dynamical behaviour of P is the purpose of model prediction. Since the concentration over time of P is unknown, it is supposed to have available experimental data for the substrate S and the inactive form of enzyme E. The following equations, (9), (10) and (11), model the kinetics, the initial conditions and the observables of the EpoR model, respectively. The star symbol upon a variable distinguishes its active form from the inactive one The EpoR model has three kinetic parameters that appear in (9), named , and . Moreover, two other parameters of the model are the initial conditions of the inactive enzyme E and of the substrate S, denoted as and , respectively. Finally, the last two model parameters are scale factors of the output variables and called and , respectively. Differently from the previous example (M1 model), in the EpoR model the purpose is to estimate not only the kinetic parameters but also the initial conditions and the scale factors of the output variables. Indeed for this model, the initial conditions of the two state variables are unknown as well as the scale factors of the output variables. Let us denote the overall parameter vector to estimate with , . The nominal parameter vector is and the noisy and noiseless in silico dataset used for model calibration are reported in Table 4.

Table 4

Model M2: noisy and noiseless dataset used for parameter estimation through CRC

Time, min	Noisy dataset		Noiseless dataset
Time, min	y1	y2	y1	y2
0	39.24	10.24	40	10
3.33	—	10.61	—	9.4909
6.66	—	7.23	—	6.9968
10	21.89	5.47	14.7152	3.5448
13.33	—	1.43	—	1.2052
16.66	—	0.53	—	0.2768
20	3.08	1.08	5.4134	0.0446
23.33	—	0.38	—	0.0054
26.66	—	0.11	—	5.0042×10−4
30	0.12	0.08	1.9915	3.8284×10−5
40	1.37	—	0.732	—

Model M2: noisy and noiseless dataset used for parameter estimation through CRC We apply CRC using two different sampling approaches: uniformly and logarithmically spaced samples. As explained in Section 2, the application of CRC implies the setting of different tuning parameters. First, we fix the number of generated samples at each iteration equal to . Then, the boundaries of the sampling interval, and , evolve according to the values reported in Table 5 as the iteration number passes from 1 to 9. These values are obtained by setting , and for the first seven iterations and , for the eighth and ninth iterations. As already explained in Section 2.3.2, the values of and in Table 5 are the percentage variation of the mode vector and they are used to compute the sampling interval at each iteration. Using (6), we calculate the error between noisy and noiseless data points for both the output variables. These errors, 12.78 for and 5.6 for , represent the objective values for the thresholds of CRC in order to assert its success. From Table 5 it is possible to see that after nine iterations of CRC, and are very similar to the target values reported above, only when logarithmically spaced sampling is applied. On the other hand, and are too far from the target ones when uniform sampling is used. This demonstrates that the application of CRC combined with logarithmic sampling estimates a parameter vector that guarantees the desired level of agreement between simulated and experimental data.

Table 5

Model M2: CRC tuning parameters

			Uniform		Logarithmic
Iteration(z)	Lz	Uz	ϵ1z	ϵ2z	ϵ1z	ϵ2z
1	0.01	100	200	300	51.0	31.0
2	0.02	50	400	500	50.0	31.0
3	0.04	25	130	180	46.0	30.9
4	0.08	12.5	45.0	53.0	40.0	30.3
5	0.16	6.25	33.2	28.7	33.5	29.1
6	0.32	3.125	31.5	28.1	25.0	27.5
7	0.64	1.5625	30.1	28.0	15.5	22.0
8	0.82	1.2813	27.7	27.3	13.4	10.0
9	0.91	1.1406	27.7	27.3	13.0	5.75

The first column lists the iteration number z, the second and the third ones the boundaries of the sampling intervals and the rest of the columns report the resulting thresholds for both the output variables when uniformly and logarithmically spaced sampling is applied, respectively.

Model M2: CRC tuning parameters The first column lists the iteration number z, the second and the third ones the boundaries of the sampling intervals and the rest of the columns report the resulting thresholds for both the output variables when uniformly and logarithmically spaced sampling is applied, respectively. Table 6 reports the mean value, the variance and the mode of the conditional densities of each parameter estimated at the end of each iteration of CRC when uniform and logarithmic sampling are applied, respectively. Thus, the mode parameter vectors resulting in output from the last iteration of CRC are and for the logarithmic and uniform sampling, respectively. Figs. 4–6 show the scatter plots of the accepted parameter values at the end of each iteration of CRC, once the threshold values are set.

Table 6

Model M2: mean values, variances and modes for each one of the conditional estimated densities of the seven parameters

		Uniform			Logarithmic
Parameter	Iteration	Mean value	Variance	Mode	Mean value	Variance	Mode
k1	1	49.7290	28.9609	47.2136	11.6207	19.9535	0.1676
	2	1175.8	684.1150	1195.1	1.4223	1.9153	0.1057
	3	1489.7	8648	13,687	0.5564	0.6338	0.0955
	4	861,500	49,062	787,600	0.3016	0.2861	0.0928
	5	252,280	139,190	226,530	0.1828	0.1280	0.0925
	6	391,760	183,120	476,180	0.1254	0.0490	0.0976
	7	524,440	125,540	515,750	0.1058	0.0183	0.1024
	8	540,940	688,040	521,970	0.1096	0.0122	0.1042
	9	535,160	345,400	534,420	0.1092	0.0062	0.1143
k2	1	50.1379	28.8361	50.0661	11.3579	18.8793	0.1031
	2	1250	725.1931	1344.1	0.7662	1.1162	0.0248
	3	16,804	9721	18,070	0.1360	0.1512	0.0128
	4	113,370	64,590	103,060	0.0480	0.0425	0.0095
	5	329,270	181,910	252,480	0.0237	0.0161	0.0086
	6	436,220	204,810	480,790	0.0145	0.0065	0.0106
	7	529,100	127,830	520,070	0.0124	0.0025	0.0143
	8	546,420	688,560	546,070	0.0156	0.0017	0.0171
	9	560,090	36,606	563,730	0.0183	0.0009	0.0191
k3	1	50.2296	28.8879	52.4291	8.7008	17.8986	0.1031
	2	1324.9	758.2055	1423.3	0.8074	1.1006	0.0422
	3	18,017	10,287	19,339	0.2615	0.2676	0.0327
	4	121,480	69,621	105,810	0.1316	0.1108	0.0292
	5	337,600	186,520	303,990	0.0758	0.0497	0.0296
	6	523,450	245,120	537,010	0.0514	0.0223	0.0415
	7	592,050	143,260	598,680	0.0491	0.0098	0.0569
	8	628,680	79,859	642,390	0.0623	0.0068	0.0678
	9	659,140	42,881	678,010	0.0728	0.0033	0.0758
initE	1	22.6627	24.8233	2.7915	16.6944	22.0296	1.3032
	2	321	34.6861	0.0039	15.6962	16.4395	1.9840
	3	22	24	3	15.8885	13.4175	3.8127
	4	10.3727	9.1662	2.6501	18.3333	12.9594	6.6465
	5	7.3006	4.0041	3.9394	19.8313	10.6221	10.9966
	6	6.7829	2.5373	4.6579	21.4556	7.0448	20.2614
	7	6.3909	0.5098	6.5185	25.1709	3.8984	28.8037
	8	6.5604	0.7176	5.8516	30.1755	2.9611	32.2857
	9	6.5416	0.0890	6.6177	33.9041	1.5291	34.9329

Fig. 4

Model M2: scatter plots of the three kinetic model parameters , and . Circles in the graphs represent clouds of accepted parameter values at the end of each iteration of CRC

Application of CRC with logarithmically spaced sampling,

Application of CRC with uniformly spaced sampling

Fig. 6

Model M2: scatter plots of the two scale factors parameters and . Circles in the graphs represent clouds of accepted parameter values at the end of each iteration of CRC

Application of CRC with logarithmically spaced sampling,

Application of CRC with uniformly spaced sampling

Model M2: mean values, variances and modes for each one of the conditional estimated densities of the seven parameters Model M2: scatter plots of the three kinetic model parameters , and . Circles in the graphs represent clouds of accepted parameter values at the end of each iteration of CRC Application of CRC with logarithmically spaced sampling, Application of CRC with uniformly spaced sampling Model M2: scatter plots of the two initial condition parameters and . Circles in the graphs represent clouds of accepted parameter values at the end of each iteration of CRC Application of CRC with logarithmically spaced sampling, Application of CRC with uniformly spaced sampling Model M2: scatter plots of the two scale factors parameters and . Circles in the graphs represent clouds of accepted parameter values at the end of each iteration of CRC Application of CRC with logarithmically spaced sampling, Application of CRC with uniformly spaced sampling This is an alternative and effective way to visualise the distributions of parameters, whose main features are summarised in Table 6. Fig. 7 shows the time behaviour of the two output variables when both sampling strategies are applied and the parameter vector of the model is set equal to the corresponding mode vector reported above. Moreover, we also plot the confidence bands (grey regions) of the observables and when parameter values vary between the 2.5th and 97.5th percentile of their corresponding conditional pdfs. In order to verify if this calibration process is robust against independent realisations, we perform ten realisations of the entire procedure explained above.

Fig. 7

Time behaviour of when CRC is applied with logarithmically spaced sampling,

Time behaviour of when CRC is applied with uniformly spaced sampling,

Time behaviour of when CRC is applied with logarithmically spaced sampling,

Time behaviour of when CRC is applied with uniformly spaced sampling

Model M2: blue lines are the time behaviour of output variables and when the parameters are set equal to the mode in output from the last iteration of CRC; red dots are the noisy in silico data used to calibrate the model; grey regions are the temporal behaviour bandwidths when the model parameters vary between the 2.5th and 97.5th percentile of their corresponding conditional pdfs Time behaviour of when CRC is applied with logarithmically spaced sampling, Time behaviour of when CRC is applied with uniformly spaced sampling, Time behaviour of when CRC is applied with logarithmically spaced sampling, Time behaviour of when CRC is applied with uniformly spaced sampling

3.3 Multiple myeloma model (M3)

In this section, we describe the calibration of the ODE model presented in [19], which investigates the mechanisms of the p38 MAPK isoforms in Multiple Myeloma (MM). The model has 40 ODEs and 53 kinetic parameters while the experimental data used for parameter estimation are reverse phase protein array (RPPA) data [21]. RPPA was used to analyse MM cell lines, measuring in total 153 proteins in 80 samples at six different time points (0, 5, 10, 30, 60, 90 min). The 16 proteins both present in the model and in the dataset represent model observables. Since RPPA data are normalised with respect to the initial concentration value, initial conditions of proteins in the ODEs are all set to 1. For this reason, parameters to estimate are only the kinetic ones; i.e. . Equations of the model and the corresponding RPPA dataset are reported in Supplementary Materials. Tuning parameters of CRC are set in the same way in both cases of uniform and logarithmic LHS. Sixteen normalised DFs are computed according to (7) and six iterations of CRC are run. We define and and then we shrink the sampling interval as in model M1. We initialise the other parameters as follows: and . We perform ten independent realisations of the procedure in order to test the stability and reliability of results. Table 7 summarises the thresholds used in both sampling techniques at the different iterations. Fig. 8 compares the evolution of the conditional density of parameter , chosen as an example, in the two cases.

Table 7

Model M3: threshold schedule when both uniform and logarithmic LHS sampling are applied

	Iteration
	1		2		3		4		5		6
	Uniform	Logarithmic	Uniform	Logarithmic	Uniform	Logarithmic	Uniform	Logarithmic	Uniform	Logarithmic	Uniform	Logarithmic
x4	0.45	0.6	0.3	0.3	0.3	0.2	0.2	0.1	0.14	0.05	0.08	0.05
x6	0.5	0.7	0.4	0.4	0.4	0.3	0.4	0.2	0.2	0.1	0.15	0.08
x10	0.5	0.7	0.4	0.4	0.4	0.3	0.4	0.2	0.2	0.1	0.18	0.06
x14	0.4	0.7	0.4	0.4	0.35	0.3	0.15	0.15	0.12	0.13	0.1	0.09
x16	0.5	0.8	0.4	0.5	0.4	0.3	0.4	0.2	0.2	0.1	0.2	0.06
x18	0.25	0.3	0.2	0.2	0.2	0.15	0.2	0.12	0.1	0.095	0.1	0.088
x20	0.35	0.5	0.25	0.3	0.2	0.2	0.2	0.13	0.1	0.1	0.1	0.088
x22	0.45	0.7	0.3	0.4	0.25	0.2	0.15	0.15	0.1	0.1	0.08	0.08
x24	0.4	0.7	0.4	0.4	0.35	0.3	0.3	0.2	0.15	0.1	0.15	0.08
x26	0.35	0.6	0.35	0.4	0.35	0.2	0.25	0.1	0.17	0.05	0.08	0.04
x28	0.35	0.5	0.25	0.3	0.2	0.2	0.15	0.15	0.12	0.1	0.07	0.08
x32	0.45	0.6	0.3	0.3	0.2	0.2	0.25	0.15	0.2	0.12	0.1	0.12
x33	0.8	1	0.7	0.7	0.45	0.5	0.45	0.3	0.4	0.2	0.22	0.15
x34	0.9	0.8	0.7	0.6	0.55	0.5	0.55	0.35	0.2	0.2	0.18	0.15
x38	0.4	0.6	0.3	0.4	0.2	0.25	0.2	0.2	0.14	0.18	0.14	0.16
x40	0.5	0.7	0.35	0.4	0.3	0.3	0.35	0.2	0.2	0.15	0.1	0.12

Fig. 8

Model M3: evolution of the conditional pdfs of parameter in all iterations, when using, respectively, logarithmic and uniform LHS

Logarithmic sampling,

Uniform sampling

Model M3: threshold schedule when both uniform and logarithmic LHS sampling are applied Model M3: evolution of the conditional pdfs of parameter in all iterations, when using, respectively, logarithmic and uniform LHS Logarithmic sampling, Uniform sampling In Supplementary Materials means, modes and variances of all parameters during the iterations are reported. Fig. 9 shows the time behaviour of output variables generated with the parameter samples accepted in the last iteration, respectively, with logarithmic and uniform sampling.

Fig. 9

Model M3: results of CRC with logarithmic and uniform LHS; red dots are the RPPA data; blue lines are the time behaviour of output variables when parameter values are equal to mode values of the last iteration (see Supplementary Materials); grey area covers the possible temporal behaviours of observables when parameters vary between the 2.5th and 97.5th percentile of their corresponding conditional pdfs. MSE is the mean squared error between the simulation of the model and the data

Logarithmic sampling,

Uniform sampling

4 Discussion and conclusion

In [16], the authors presented CRC, a novel Bayesian procedure for parameter estimation of ODE models. In Section 2, the main steps of CRC are briefly summarised with specific attention to the introduced innovations. One of the key points of CRC is the sampling of the parameter space which is completely overturned compared to other Bayesian algorithms. Thus, in CRC the number of samples is an input parameter which remains unchanged in all the iterations. Other substantial innovations are the definition of as many DFs as the number of observables of the model and the corresponding intersection of all the DFs tails. The purpose of this study is to apply CRC with two different sampling approaches in order to test which performs better and to understand the influence of the sampling process in this algorithm. We apply CRC using LHS with uniformly and logarithmically spaced samples. Even if the practice of applying different sampling strategies is common in the field of the Bayesian algorithms, the comparison we provide in this paper has never been performed before since CRC is a novel algorithm. We choose three ODE models with a parameter space and the corresponding space of state variables of increasing dimension. They are the Lotka‐Volterra model, the EpoR model and the MM model. Tables 3, 5 and 7 prove that the logarithmic sampling is able to reach lower thresholds than the uniform ones, almost for all the output variables in all the tested models. Indeed, for the Lotka‐Volterra model, the thresholds obtained at the sixth iteration with logarithmic sampling are as close as possible to the target ones, meaning that through logarithmic sampling it is possible to calibrate the ODE model also preventing overfitting. Logarithmic sampling generates more dense samples in the interval (0; 1]. Similarly, in the EpoR model, after nine iterations of CRC with logarithmic sampling, the resulting thresholds get very close to the target ones. As regards the MM model, since the nominal parameter vector is not known from [19], we cannot calculate the target values of the error. However, after six iterations of CRC with logarithmic sampling the maximum error between simulated and experimental data is only of the 16% (see Table 7). On the other hand, the application of CRC combined with uniform sampling leads to threshold values that are on average higher than those obtained through the logarithmic ones. In the Lotka‐Volterra model, thresholds after six iterations are too distant from 3.09 and 2.6. Moreover, scatter plots in Fig. 2 confirm results of Table 3: it is clear how with the uniform sampling the accepted samples at each iteration evolve towards a region of the parameter space that does not correspond to the nominal values of kinetic parameters a and b. Conversely, the scatter plots of logarithmic sampling in Fig. 2 show how the red dots lie in a region that is exactly in the neighbourhood of the nominal point (1;1). Additionally, as shown in Fig. 3 only the logarithmic ones adequately fit the noisy data, following the typical oscillating behaviour of the prey‐predator model. On the other hand, at many time points, the behaviour depicted in Fig. 3 and b is too far from in silico data. Similar observations are valid also for the EpoR model. Fig. 7 clearly shows that, since the threshold values obtained with uniform sampling are too far from the target ones, the time behaviour of output variables does not fit the experimental data. On the contrary, logarithmic sampling is able to adequately reproduce the correct time behaviour of and , even if the resulting estimated parameter vector is an alternative parametrisation of the model. As regards the MM model instead, both sampling strategies produce acceptable time behaviour of output variables, as is shown in Fig. 9, since the maximum percentage error with the uniform sampling is of the 22%. However, the confidence bands in Fig. 9 are rather higher than those in Fig. 9. This is due to the fact that, for most of the kinetic parameters, the corresponding conditional pdf, estimated in the last iteration of CRC, is tighter when logarithmic sampling is performed, as shown in Fig. 8. Moreover, in order to make a comparison of the computational cost of the two sampling techniques, we measure the time to perform each iteration of CRC, as reported in Table 8. In all examples, the logarithmic sampling has a minor computational burden compared to the uniform sampling, whose simulation time also increases with the model dimension. In conclusion, it is possible to state that logarithmic sampling performs better and has a reduced computational cost when calibrating a model using CRC. It is the best sampling technique when it is necessary to perturb the parameter space over different orders of magnitude. This is due to the intrinsic property of the logarithmic distribution that generates the same number of samples in different orders of magnitude [22]. The logarithmic sampling is able to generate more dense samples around the mode vector returned by the previous iteration of CRC, which is the most likely point in the parameter space at a given iteration.

Table 8

Time to perform each iteration of CRC in the three models

	M1		M2		M3
Iteration(z)	Uniform	Logarithmic	Uniform	Logarithmic	Uniform	Logarithmic
1	431 s	288 s	798 s	449 s	360 min	110 min
2	323 s	213 s	835 s	305 s	720 min	80 min
3	300 s	251 s	856 s	245 s	1 d	70 min
4	268 s	225 s	750 s	223 s	2 d	60 min
5	260 s	216 s	674 s	210 s	3 d	60 min
6	245 s	202 s	693 s	195 s	3 d	60 min
7	—	—	685 s	217 s	—	—
8	—	—	650 s	310 s	—	—
9	—	—	645 s	360 s	—	—

Time is expressed in seconds (s), minutes (min) or days (d).

Time to perform each iteration of CRC in the three models Time is expressed in seconds (s), minutes (min) or days (d).

11 in total

1. A Parameter Estimation Method for Biological Systems modelled by ODE/DDE Models Using Spline Approximation and Differential Evolution Algorithm.

Authors: Choujun Zhan; Wuchao Situ; Lam Fat Yeung; Peter Wai-Ming Tsang; Genke Yang
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2014 Nov-Dec Impact factor: 3.710

2. A hybrid approach for efficient and robust parameter estimation in biochemical pathways.

Authors: Maria Rodriguez-Fernandez; Pedro Mendes; Julio R Banga
Journal: Biosystems Date: 2005-10-19 Impact factor: 1.973

3. Addressing parameter identifiability by model-based experimentation.

Authors: A Raue; C Kreutz; T Maiwald; U Klingmuller; J Timmer
Journal: IET Syst Biol Date: 2011-03 Impact factor: 1.615

Review 4. Parameter uncertainty in biochemical models described by ordinary differential equations.

Authors: J Vanlier; C A Tiemann; P A J Hilbers; N A W van Riel
Journal: Math Biosci Date: 2013-03-25 Impact factor: 2.144

5. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems.

Authors: Tina Toni; David Welch; Natalja Strelkowa; Andreas Ipsen; Michael P H Stumpf
Journal: J R Soc Interface Date: 2009-02-06 Impact factor: 4.118

6. A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation.

Authors: Juliane Liepe; Paul Kirk; Sarah Filippi; Tina Toni; Chris P Barnes; Michael P H Stumpf
Journal: Nat Protoc Date: 2014-01-23 Impact factor: 13.491

7. Conditional robustness analysis for fragility discovery and target identification in biochemical networks and in cancer systems biology.

Authors: Fortunato Bianconi; Elisa Baldelli; Vienna Ludovini; Vienna Luovini; Emanuel F Petricoin; Lucio Crinò; Paolo Valigi
Journal: BMC Syst Biol Date: 2015-10-19

3. Application of Response Surface Methodology for Optimizing the Therapeutic Activity of ZnO Nanoparticles Biosynthesized from Aspergillus niger.

Authors: Ali Es-Haghi; Mohammad Ehsan Taghavizadeh Yazdi; Mohammad Sharifalhoseini; Mohsen Baghani; Ehsan Yousefi; Abbas Rahdar; Francesco Baino
Journal: Biomimetics (Basel) Date: 2021-05-27

3 in total