Fortunato Bianconi1, Chiara Antonini2, Lorenzo Tomassoni2, Paolo Valigi2. 1. Independent Researcher, Belvedere 44, 06036 Montefalco, Perugia, Italy. fortunato.bianconi@gmail.com. 2. Department of Engineering, University of Perugia, G. Duranti 95, 06132 Perugia, Italy.
Abstract
Mathematical modelling is a widely used technique for describing the temporal behaviour of biological systems. One of the most challenging topics in computational systems biology is the calibration of non-linear models; i.e. the estimation of their unknown parameters. The state-of-the-art methods in this field are the frequentist and Bayesian approaches. For both of them, the performance and accuracy of results greatly depend on the sampling technique employed. Here, the authors test a novel Bayesian procedure for parameter estimation, called conditional robust calibration (CRC), comparing two different sampling techniques: uniform and logarithmic Latin hypercube sampling. CRC is an iterative algorithm based on parameter space sampling and on the estimation of parameter density functions. They apply CRC with both sampling strategies to the three ordinary differential equations (ODEs) models of increasing complexity. They obtain a more precise and reliable solution through logarithmically spaced samples.
Mathematical modelling is a widely used technique for describing the temporal behaviour of biological systems. One of the most challenging topics in computational systems biology is the calibration of non-linear models; i.e. the estimation of their unknown parameters. The state-of-the-art methods in this field are the frequentist and Bayesian approaches. For both of them, the performance and accuracy of results greatly depend on the sampling technique employed. Here, the authors test a novel Bayesian procedure for parameter estimation, called conditional robust calibration (CRC), comparing two different sampling techniques: uniform and logarithmic Latin hypercube sampling. CRC is an iterative algorithm based on parameter space sampling and on the estimation of parameter density functions. They apply CRC with both sampling strategies to the three ordinary differential equations (ODEs) models of increasing complexity. They obtain a more precise and reliable solution through logarithmically spaced samples.
In systems biology, a key issue is to understand the dynamic interactions occurring within and between cells, which determine their structure and basic functions [1, 2]. Since most aspects of a biological system are not directly accessible to experimental observation, mathematical modelling is widely employed to formalise the dynamic and temporal evolution of system variables and to make predictions about biological mechanisms [3, 4].In this study, we consider biological systems modelled with deterministic ordinary differential equations (ODEs) [5]. These models have a set of unknown kinetic parameters which cannot be measured experimentally. Thus, one of the challenging tasks of model building is model calibration; i.e. the inverse problem of estimating those parameters from available experimental data [6].The state‐of‐the‐art methods for parameter estimation are classified in frequentist and Bayesian methodologies. Both approaches are based on the concept of the probability density of observing the data given certain parameter values [7]. However, the frequentist methods aim at maximising the likelihood function through an optimisation algorithm while the Bayesian compute the posterior distribution using sampling based techniques [8, 9]. In both classes of algorithms, sampling is a fundamental element for dealing with parameter uncertainty. In order to avoid local optima, the frequentist approach employs bootstrapping from the available experimental replicates and Latin hypercube sampling (LHS) in order to start the estimation from different and independent parameter samples [10]. On the other hand, Bayesian methods sample directly from a prior distribution and accept only those samples that are in a region of high probability. Thus, in this class of methods, the sampler is of primary importance to guarantee the convergence to the desired posterior distribution. Samples should be dense and properly distributed in order to ensure a wide and appropriate exploration of the parameter space [11]. Currently, different sampling techniques are available, including multivariate uniform or logarithmic distribution, LHS and its variations, Markov Chain Monte Carlo and sequential Monte Carlo methods [12-15].In this paper, we aim to understand the performances of two sampling strategies, namely uniform and logarithmic LHS, in a new Bayesian procedure called conditional robust calibration (CRC), presented in [16, 17]. CRC is an iterative algorithm that samples the parameter space and returns in output the probability density functions (pdfs) of parameters. Each pdf is estimated on the region of the parameter space that best reproduces the experimental dataset. CRC is applied to calibrate three ODE models with different characteristics; i.e. the well‐known Lotka Volterra model [12], the erythroprotein receptor (EpoR) model [18] and the multiple myeloma (MM) model [19]. The comparison of CRC results between logarithmic and uniform sampling show that, while the logarithmic LHS successfully estimated unknown parameter values, the uniform LHS returned less reliable and precise results in all studied models.
2 Methods
2.1 Mathematical model
Mathematically, ODE models are represented as follows:
where is the vector of state variables with initial conditions , is the vector of control variables and is the vector of output variables; i.e. the concentrations of measured states or a function of them. Note that , where is a subset of the positive orthant, is the vector of dynamical model parameters. Vector contains scaling and offset parameters. The function specifies the model while is the observation function for mapping the state variables to the observed quantities. By setting , , all model parameters are known. Each parameter , is usually characterised by an interval of validity, due to its physical or biological meaning. Let denote the nominal value of . Then, the lower and upper boundaries of parameter can be written, respectively, as and , where and are multiplicative coefficients. Thus, the parameter space results from the Cartesian product .
2.2 Experimental dataset
Since mathematical models represent an in silico explanation of experimental data, the observables predicted by a model have to be in agreement with the experimental dataset. Let denote with the matrix of available data. Supposing that experimental data are collected at different time points , each is a time series vector of the form . The jth observable measured at the kth time point can be divided as , where is the nominal unknown value and is the corresponding measurement error, usually assumed to be normally distributed.
2.3 Calibration method
To compare the performances of uniform and logarithmic sampling when using CRC, we estimate the parameters of three different ODE models. Here, we briefly describe the main steps of CRC, whose details are fully explained in [16]. Fig. 1 shows the CRC workflow.
Fig. 1
Main steps of CRC
Main steps of CRC
2.3.1 Initialisation of input parameters
First of all, it is necessary to initialise the input parameters of the procedure. They are and which are, respectively, the upper and lower boundaries of the parameter vector range and which is the number of parameter samples to generate at each iteration.
2.3.2 Parameter space sampling
The parameter vector is considered a random variable, denoted as
. The purpose of CRC is to identify a sufficient number of parameter realisations that approximate the posterior distribution . Therefore, the parameter space is sampled in order to generate realisations of the parameter vector. Thus, the choice of depends on the cardinality of . The higher the number of parameters to estimate, the higher should be set in order to guarantee a wide exploration and coverage of the parameter space.LHS is chosen as the sampling technique in order to spread the sample points more evenly across all possible values. To generate Latin hypercube samples in a q‐dimensional space, a unit hypercube is first divided into intervals with an equal length of along each axis. This creates equally probable intervals, , for each dimension. Thus, LHS can be represented as a sample matrix, which contains one sample in each row and in each column. Each column corresponds to a variable and it is a random permutation of the intervals while each row is a sample point [11]. Afterwards, the LHS matrix needs to be centred around the nominal parameter values and between the defined lower and upper boundaries.In this study, we assume that, in one case, parameters are uniformly distributed among their intervals while, in another case, they are logarithmically distributed. Let denote with the matrix that contains the samples generated. Under the hypothesis of uniform distribution, the matrix of the generated parameter samples is computed as
where is the nominal parameter vector, and are the lower and upper boundaries of the sampling interval at the zth iteration. On the contrary, when using logarithmic sampling, is calculated as
where and are the lower and upper boundaries transformed to the logarithmic space. At each iteration, the sampling interval is shrinked according to the following formula:
where , , , are input parameters chosen by the user.Since and are multiplied with the nominal parameter vector , they determine the percentage variation of at each iteration. This means that the interval where the parameter samples are selected for each iteration can be defined as follows: . Thus, given that each iteration gets closer to a solution of the parameter values, the tuning parameters , , , should be chosen in order to shrink the percentage variation of with respect to the previous iteration.
2.3.3 Model simulation and computation of distance functions (DFs)
For each sample , the ODE model is integrated in order to compute the in silico dataset
. Then
is compared with by computing a DF for each output variable. The DF is defined in two ways
As
is a random variable, each vector can be seen as a transformation from the random variable
and, consequently, each is a transformation from the random variable . Let denote with a realisation of the random variable and with the pdf of . Each is estimated using a kernel density approach [20]. The pdfs of the DFs are good indicators to infer the ability of the sampling technique to find parameter values that correctly calibrate the mathematical model under investigation. The higher is the area under each in a region close to zero and the higher is the percentage of parameter vectors identified by the LHS that are able to make the observables of the mathematical model close to the experimental data.
2.3.4 Thresholds calculation
In this step, a set of thresholds is defined. They correspond to the minimum accepted level of agreement between simulated and experimental data, for each output variable. Thus, CRC selects only those DF values that belong to the low tail of the corresponding pdf. The rationale that the user has to follow to fix the threshold values is explained in Section 2.3.5. Then, all DF tails are intersected among each other obtaining the following set: . In the parameter space, the accepted DF values correspond to a set of parameter samples having as joint conditional distribution .
2.3.5 Parameter conditional densities estimation
For each parameter, the corresponding conditional density is estimated using a kernel density approach. Concretely, the input data for the estimation of , are the subset of the parameter vectors generated through the LHS for which , . As a consequence, the lower are the values of the thresholds and the fewer are the available parameter vectors to estimate the conditional densities of each parameter. In order to have a reliable and stable estimation of these pdfs, at least 1000 parameter samples are required [20]. For this reason, each threshold is chosen in order to guarantee at least 1000 samples in under ; i.e. . Finally, for each parameter the corresponding mode value is selected. Multiple sets of thresholds can meet the requirements, but the purpose is to find the lowest possible values because it means that the distance between simulated and experimental data is minimum. If it is necessary to perform another iteration of CRC, the nominal value of every parameter is set equal to the mode value of ; i.e. , and the sampling interval is updated by changing and according to (5). It is important to underline that at the next iteration the interval where is selected can be wider or tighter than the previous one because the lower and upper boundaries are multiplied by the mode value of each parameter.The procedure stops when the desired level of agreement between simulated and experimental data is met. For instance, in case of in silico data whose source of error is known, CRC is performed until each threshold is slightly over the difference between nominal and noisy data, in order to avoid noise fitting.
3 Results
In this section, we compare the performances of CRC with uniform and logarithmic sampling approach when calibrating three different ODE models. All computational analyses were run using Matlab (R2014), on a Intel Core i7‐4700HQ CPU, 2.4 GHz, 16 GB Memory, Ubuntu 16.04 LTS (64bit).
3.1 Lotka‐Volterra model (M1)
We apply CRC to the Lotka‐Volterra ODE model. The model describes the interaction between the prey species x, and predator species y, through parameters a and b
Observables of the model are both x and y while the parameter space is . Both nominal values of parameters are set to 1. In silico data are generated as in [12]; i.e. sampling eight values at the specified time points and adding Gaussian noise (see Table 1).
Table 1
Model M1: in silico noisy dataset used for calibration
Time points, min
x
y
1.1
1.8823
0.5214
2.4
1.5510
1.2471
3.9
0.4007
0.9858
5.6
0.0454
0.2924
7.5
1.8881
0.6133
9.6
0.1839
1.3163
11.9
−0.1940
0.8171
14.4
1.6015
1.7212
Model M1: in silico noisy dataset used for calibrationWe calculate the distance between the nominal and noisy dataset according to (6) and we obtain the following values: 3.09 for species x and 2.6 for species y. Accordingly, they represent the minimum accepted level of agreement between simulated and experimental data. Since the purpose of this paper is the comprehension of the role of the sampling approach in the performances and efficiency of CRC, the tuning parameters are set in the same way when both the techniques are applied. Therefore, we perform six iterations of the entire CRC procedure setting , and shrinking and according to (5), setting and . In this way, we progressively shrink the percentage variation of the parameter vector, approaching 0 at the limit. Table 2 shows the mode vector of the parameters obtained at the end of each iteration of CRC and also the interval where the parameter samples are selected. Each interval depends on the mode vector in output from the previous iteration and on and . We generate parameter samples for each iteration. Table 3 summarises the resulting thresholds for both uniform and logarithmic sampling approach and it also shows the cardinality of the set . As explained in Section 2.3.5, each threshold is chosen in order to guarantee at least 1000 samples in the set of intersection between DF tails, for a reliable estimation of the conditional parameter pdfs.
Table 2
Model M1: mode vector obtained at the end of each CRC iteration, specified by z (first column of the table), and interval where the samples of are selected
Uniform
Logarithmic
z
pmode
[pmode×Lz,pmode×Uz]
pmode
[pmode×Lz,pmode×Uz]
1
a = 0.52
[0.1,10]
a = 0.37
[0.1,10]
b = 6.03
[0.1,10]
b = 0.96
[0.1,10]
2
a = 1.14
[0.63,6.3]
a = 0.81
[0.2,2.03]
b = 6.13
[3.37,33.71]
b = 1.26
[0.53,5.28]
3
a = 1.16
[0.9,3.77]
a = 0.84
[0.63,2.63]
b = 6.27
[4.86,20.38]
b = 1.11
[0.98,4.1]
4
a = 1.85
[1.64,3.93]
a = 0.98
[0.75,1.79]
b = 5.99
[5.32,12.73]
b = 1.07
[0.95,2.27]
5
a = 1.90
[1.79,2.97]
a = 1.01
[0.95,1.58]
b = 6.49
[6.12,10.14]
b = 1.07
[1,1.67]
6
a = 2.07
[2.01,2.65]
a = 1.06
[1.03,1.36]
b = 6.51
[6.33,8.34]
b = 1.06
[1.03,1.36]
Table 3
Model M1: threshold schedule used at each iteration of CRC and the resulting cardinality of
Uniform
Logarithmic
Iteration(z)
ϵxz
ϵyz
|Δ|
ϵxz
ϵyz
|Δ|
1
7.7
15
1010
7.3
6
1071
2
7.7
15
1008
5.8
4
1012
3
7.7
15
1038
5
3.4
1122
4
7.4
13.3
1002
4
3
1038
5
6.8
13.3
1022
3.4
2.7
1035
6
6.3
13.3
1170
3.1
2.7
1535
Model M1: mode vector obtained at the end of each CRC iteration, specified by z (first column of the table), and interval where the samples of are selectedModel M1: threshold schedule used at each iteration of CRC and the resulting cardinality ofAccording to the workflow depicted in Fig. 1, at the end of each iteration we estimate the conditional densities for both parameters a and b. The evolution of the domains of the two conditional densities and are shown in Fig. 2.
Fig. 2
Model M1: scatter plots of both model parameters
Uniform sampling,
Logarithmic sampling
Model M1: scatter plots of both model parametersUniform sampling,Logarithmic samplingThus, the scatter plots are the accepted parameter samples from the first until the final iteration of CRC. Fig. 3 shows the time behaviour of output variables generated with the parameter samples accepted in the last iteration. When using logarithmic LHS, the final mode values are 1.06 for parameter a and 1.06 for parameter b while CRC with uniform LHS estimates a equal to 2.07 and b equal to 6.51.
Fig. 3
Model M1: blue lines are the temporal behaviour of the output variables when the parameters a and b are set equal to the mode values of the last iteration of CRC; red dots are the in silico data; grey regions are the confidence bandwidths when parameters vary from the 2.5th and the 97.5th percentile of their corresponding final conditional densities
Uniform sampling,
Logarithmic sampling
Model M1: blue lines are the temporal behaviour of the output variables when the parameters a and b are set equal to the mode values of the last iteration of CRC; red dots are the in silico data; grey regions are the confidence bandwidths when parameters vary from the 2.5th and the 97.5th percentile of their corresponding final conditional densitiesUniform sampling,Logarithmic sampling
3.2 EpoR model (M2)
We apply CRC to the EpoR model. In this model, an external ligand L activates, via two steps, an enzyme E that catalyses a substrate S. These reactions generate a product P that generally cannot be measured directly. For this reason the dynamical behaviour of P is the purpose of model prediction. Since the concentration over time of P is unknown, it is supposed to have available experimental data for the substrate S and the inactive form of enzyme E. The following equations, (9), (10) and (11), model the kinetics, the initial conditions and the observables of the EpoR model, respectively. The star symbol upon a variable distinguishes its active form from the inactive one
The EpoR model has three kinetic parameters that appear in (9), named , and . Moreover, two other parameters of the model are the initial conditions of the inactive enzyme E and of the substrate S, denoted as and , respectively. Finally, the last two model parameters are scale factors of the output variables and called and , respectively. Differently from the previous example (M1 model), in the EpoR model the purpose is to estimate not only the kinetic parameters but also the initial conditions and the scale factors of the output variables. Indeed for this model, the initial conditions of the two state variables are unknown as well as the scale factors of the output variables. Let us denote the overall parameter vector to estimate with , . The nominal parameter vector is and the noisy and noiseless in silico dataset used for model calibration are reported in Table 4.
Table 4
Model M2: noisy and noiseless dataset used for parameter estimation through CRC
Time, min
Noisy dataset
Noiseless dataset
y1
y2
y1
y2
0
39.24
10.24
40
10
3.33
—
10.61
—
9.4909
6.66
—
7.23
—
6.9968
10
21.89
5.47
14.7152
3.5448
13.33
—
1.43
—
1.2052
16.66
—
0.53
—
0.2768
20
3.08
1.08
5.4134
0.0446
23.33
—
0.38
—
0.0054
26.66
—
0.11
—
5.0042×10−4
30
0.12
0.08
1.9915
3.8284×10−5
40
1.37
—
0.732
—
Model M2: noisy and noiseless dataset used for parameter estimation through CRCWe apply CRC using two different sampling approaches: uniformly and logarithmically spaced samples. As explained in Section 2, the application of CRC implies the setting of different tuning parameters. First, we fix the number of generated samples at each iteration equal to . Then, the boundaries of the sampling interval, and , evolve according to the values reported in Table 5 as the iteration number passes from 1 to 9. These values are obtained by setting , and for the first seven iterations and , for the eighth and ninth iterations. As already explained in Section 2.3.2, the values of and in Table 5 are the percentage variation of the mode vector and they are used to compute the sampling interval at each iteration. Using (6), we calculate the error between noisy and noiseless data points for both the output variables. These errors, 12.78 for and 5.6 for , represent the objective values for the thresholds of CRC in order to assert its success. From Table 5 it is possible to see that after nine iterations of CRC, and are very similar to the target values reported above, only when logarithmically spaced sampling is applied. On the other hand, and are too far from the target ones when uniform sampling is used. This demonstrates that the application of CRC combined with logarithmic sampling estimates a parameter vector that guarantees the desired level of agreement between simulated and experimental data.
Table 5
Model M2: CRC tuning parameters
Uniform
Logarithmic
Iteration(z)
Lz
Uz
ϵ1z
ϵ2z
ϵ1z
ϵ2z
1
0.01
100
200
300
51.0
31.0
2
0.02
50
400
500
50.0
31.0
3
0.04
25
130
180
46.0
30.9
4
0.08
12.5
45.0
53.0
40.0
30.3
5
0.16
6.25
33.2
28.7
33.5
29.1
6
0.32
3.125
31.5
28.1
25.0
27.5
7
0.64
1.5625
30.1
28.0
15.5
22.0
8
0.82
1.2813
27.7
27.3
13.4
10.0
9
0.91
1.1406
27.7
27.3
13.0
5.75
The first column lists the iteration number z, the second and the third ones the boundaries of the sampling intervals and the rest of the columns report the resulting thresholds for both the output variables when uniformly and logarithmically spaced sampling is applied, respectively.
Model M2: CRC tuning parametersThe first column lists the iteration number z, the second and the third ones the boundaries of the sampling intervals and the rest of the columns report the resulting thresholds for both the output variables when uniformly and logarithmically spaced sampling is applied, respectively.Table 6 reports the mean value, the variance and the mode of the conditional densities of each parameter estimated at the end of each iteration of CRC when uniform and logarithmic sampling are applied, respectively. Thus, the mode parameter vectors resulting in output from the last iteration of CRC are and for the logarithmic and uniform sampling, respectively. Figs. 4–6 show the scatter plots of the accepted parameter values at the end of each iteration of CRC, once the threshold values are set.
Table 6
Model M2: mean values, variances and modes for each one of the conditional estimated densities of the seven parameters
Uniform
Logarithmic
Parameter
Iteration
Mean value
Variance
Mode
Mean value
Variance
Mode
k1
1
49.7290
28.9609
47.2136
11.6207
19.9535
0.1676
2
1175.8
684.1150
1195.1
1.4223
1.9153
0.1057
3
1489.7
8648
13,687
0.5564
0.6338
0.0955
4
861,500
49,062
787,600
0.3016
0.2861
0.0928
5
252,280
139,190
226,530
0.1828
0.1280
0.0925
6
391,760
183,120
476,180
0.1254
0.0490
0.0976
7
524,440
125,540
515,750
0.1058
0.0183
0.1024
8
540,940
688,040
521,970
0.1096
0.0122
0.1042
9
535,160
345,400
534,420
0.1092
0.0062
0.1143
k2
1
50.1379
28.8361
50.0661
11.3579
18.8793
0.1031
2
1250
725.1931
1344.1
0.7662
1.1162
0.0248
3
16,804
9721
18,070
0.1360
0.1512
0.0128
4
113,370
64,590
103,060
0.0480
0.0425
0.0095
5
329,270
181,910
252,480
0.0237
0.0161
0.0086
6
436,220
204,810
480,790
0.0145
0.0065
0.0106
7
529,100
127,830
520,070
0.0124
0.0025
0.0143
8
546,420
688,560
546,070
0.0156
0.0017
0.0171
9
560,090
36,606
563,730
0.0183
0.0009
0.0191
k3
1
50.2296
28.8879
52.4291
8.7008
17.8986
0.1031
2
1324.9
758.2055
1423.3
0.8074
1.1006
0.0422
3
18,017
10,287
19,339
0.2615
0.2676
0.0327
4
121,480
69,621
105,810
0.1316
0.1108
0.0292
5
337,600
186,520
303,990
0.0758
0.0497
0.0296
6
523,450
245,120
537,010
0.0514
0.0223
0.0415
7
592,050
143,260
598,680
0.0491
0.0098
0.0569
8
628,680
79,859
642,390
0.0623
0.0068
0.0678
9
659,140
42,881
678,010
0.0728
0.0033
0.0758
initE
1
22.6627
24.8233
2.7915
16.6944
22.0296
1.3032
2
321
34.6861
0.0039
15.6962
16.4395
1.9840
3
22
24
3
15.8885
13.4175
3.8127
4
10.3727
9.1662
2.6501
18.3333
12.9594
6.6465
5
7.3006
4.0041
3.9394
19.8313
10.6221
10.9966
6
6.7829
2.5373
4.6579
21.4556
7.0448
20.2614
7
6.3909
0.5098
6.5185
25.1709
3.8984
28.8037
8
6.5604
0.7176
5.8516
30.1755
2.9611
32.2857
9
6.5416
0.0890
6.6177
33.9041
1.5291
34.9329
Fig. 4
Model M2: scatter plots of the three kinetic model parameters
,
and
. Circles in the graphs represent clouds of accepted parameter values at the end of each iteration of CRC
Application of CRC with logarithmically spaced sampling,
Application of CRC with uniformly spaced sampling
Fig. 6
Model M2: scatter plots of the two scale factors parameters
and
. Circles in the graphs represent clouds of accepted parameter values at the end of each iteration of CRC
Application of CRC with logarithmically spaced sampling,
Application of CRC with uniformly spaced sampling
Model M2: mean values, variances and modes for each one of the conditional estimated densities of the seven parametersModel M2: scatter plots of the three kinetic model parameters
,
and
. Circles in the graphs represent clouds of accepted parameter values at the end of each iteration of CRCApplication of CRC with logarithmically spaced sampling,Application of CRC with uniformly spaced samplingModel M2: scatter plots of the two initial condition parameters
and
. Circles in the graphs represent clouds of accepted parameter values at the end of each iteration of CRCApplication of CRC with logarithmically spaced sampling,Application of CRC with uniformly spaced samplingModel M2: scatter plots of the two scale factors parameters
and
. Circles in the graphs represent clouds of accepted parameter values at the end of each iteration of CRCApplication of CRC with logarithmically spaced sampling,Application of CRC with uniformly spaced samplingThis is an alternative and effective way to visualise the distributions of parameters, whose main features are summarised in Table 6. Fig. 7 shows the time behaviour of the two output variables when both sampling strategies are applied and the parameter vector of the model is set equal to the corresponding mode vector reported above. Moreover, we also plot the confidence bands (grey regions) of the observables and when parameter values vary between the 2.5th and 97.5th percentile of their corresponding conditional pdfs. In order to verify if this calibration process is robust against independent realisations, we perform ten realisations of the entire procedure explained above.
Fig. 7
Model M2: blue lines are the time behaviour of output variables
and
when the parameters are set equal to the mode in output from the last iteration of CRC; red dots are the noisy in silico data used to calibrate the model; grey regions are the temporal behaviour bandwidths when the model parameters vary between the 2.5th and 97.5th percentile of their corresponding conditional pdfs
Time behaviour of when CRC is applied with logarithmically spaced sampling,
Time behaviour of when CRC is applied with uniformly spaced sampling,
Time behaviour of when CRC is applied with logarithmically spaced sampling,
Time behaviour of when CRC is applied with uniformly spaced sampling
Model M2: blue lines are the time behaviour of output variables
and
when the parameters are set equal to the mode in output from the last iteration of CRC; red dots are the noisy in silico data used to calibrate the model; grey regions are the temporal behaviour bandwidths when the model parameters vary between the 2.5th and 97.5th percentile of their corresponding conditional pdfsTime behaviour of when CRC is applied with logarithmically spaced sampling,Time behaviour of when CRC is applied with uniformly spaced sampling,Time behaviour of when CRC is applied with logarithmically spaced sampling,Time behaviour of when CRC is applied with uniformly spaced sampling
3.3 Multiple myeloma model (M3)
In this section, we describe the calibration of the ODE model presented in [19], which investigates the mechanisms of the p38 MAPK isoforms in Multiple Myeloma (MM). The model has 40 ODEs and 53 kinetic parameters while the experimental data used for parameter estimation are reverse phase protein array (RPPA) data [21]. RPPA was used to analyse MM cell lines, measuring in total 153 proteins in 80 samples at six different time points (0, 5, 10, 30, 60, 90 min). The 16 proteins both present in the model and in the dataset represent model observables. Since RPPA data are normalised with respect to the initial concentration value, initial conditions of proteins in the ODEs are all set to 1. For this reason, parameters to estimate are only the kinetic ones; i.e. . Equations of the model and the corresponding RPPA dataset are reported in Supplementary Materials.Tuning parameters of CRC are set in the same way in both cases of uniform and logarithmic LHS. Sixteen normalised DFs are computed according to (7) and six iterations of CRC are run. We define and and then we shrink the sampling interval as in model M1. We initialise the other parameters as follows: and . We perform ten independent realisations of the procedure in order to test the stability and reliability of results.Table 7 summarises the thresholds used in both sampling techniques at the different iterations. Fig. 8 compares the evolution of the conditional density of parameter , chosen as an example, in the two cases.
Table 7
Model M3: threshold schedule when both uniform and logarithmic LHS sampling are applied
Iteration
1
2
3
4
5
6
Uniform
Logarithmic
Uniform
Logarithmic
Uniform
Logarithmic
Uniform
Logarithmic
Uniform
Logarithmic
Uniform
Logarithmic
x4
0.45
0.6
0.3
0.3
0.3
0.2
0.2
0.1
0.14
0.05
0.08
0.05
x6
0.5
0.7
0.4
0.4
0.4
0.3
0.4
0.2
0.2
0.1
0.15
0.08
x10
0.5
0.7
0.4
0.4
0.4
0.3
0.4
0.2
0.2
0.1
0.18
0.06
x14
0.4
0.7
0.4
0.4
0.35
0.3
0.15
0.15
0.12
0.13
0.1
0.09
x16
0.5
0.8
0.4
0.5
0.4
0.3
0.4
0.2
0.2
0.1
0.2
0.06
x18
0.25
0.3
0.2
0.2
0.2
0.15
0.2
0.12
0.1
0.095
0.1
0.088
x20
0.35
0.5
0.25
0.3
0.2
0.2
0.2
0.13
0.1
0.1
0.1
0.088
x22
0.45
0.7
0.3
0.4
0.25
0.2
0.15
0.15
0.1
0.1
0.08
0.08
x24
0.4
0.7
0.4
0.4
0.35
0.3
0.3
0.2
0.15
0.1
0.15
0.08
x26
0.35
0.6
0.35
0.4
0.35
0.2
0.25
0.1
0.17
0.05
0.08
0.04
x28
0.35
0.5
0.25
0.3
0.2
0.2
0.15
0.15
0.12
0.1
0.07
0.08
x32
0.45
0.6
0.3
0.3
0.2
0.2
0.25
0.15
0.2
0.12
0.1
0.12
x33
0.8
1
0.7
0.7
0.45
0.5
0.45
0.3
0.4
0.2
0.22
0.15
x34
0.9
0.8
0.7
0.6
0.55
0.5
0.55
0.35
0.2
0.2
0.18
0.15
x38
0.4
0.6
0.3
0.4
0.2
0.25
0.2
0.2
0.14
0.18
0.14
0.16
x40
0.5
0.7
0.35
0.4
0.3
0.3
0.35
0.2
0.2
0.15
0.1
0.12
Fig. 8
Model M3: evolution of the conditional pdfs of parameter
in all iterations, when using, respectively, logarithmic and uniform LHS
Logarithmic sampling,
Uniform sampling
Model M3: threshold schedule when both uniform and logarithmic LHS sampling are appliedModel M3: evolution of the conditional pdfs of parameter
in all iterations, when using, respectively, logarithmic and uniform LHSLogarithmic sampling,Uniform samplingIn Supplementary Materials means, modes and variances of all parameters during the iterations are reported. Fig. 9 shows the time behaviour of output variables generated with the parameter samples accepted in the last iteration, respectively, with logarithmic and uniform sampling.
Fig. 9
Model M3: results of CRC with logarithmic and uniform LHS; red dots are the RPPA data; blue lines are the time behaviour of output variables when parameter values are equal to mode values of the last iteration (see Supplementary Materials); grey area covers the possible temporal behaviours of observables when parameters vary between the 2.5th and 97.5th percentile of their corresponding conditional pdfs. MSE is the mean squared error between the simulation of the model and the data
Logarithmic sampling,
Uniform sampling
Model M3: results of CRC with logarithmic and uniform LHS; red dots are the RPPA data; blue lines are the time behaviour of output variables when parameter values are equal to mode values of the last iteration (see Supplementary Materials); grey area covers the possible temporal behaviours of observables when parameters vary between the 2.5th and 97.5th percentile of their corresponding conditional pdfs. MSE is the mean squared error between the simulation of the model and the dataLogarithmic sampling,Uniform sampling
4 Discussion and conclusion
In [16], the authors presented CRC, a novel Bayesian procedure for parameter estimation of ODE models. In Section 2, the main steps of CRC are briefly summarised with specific attention to the introduced innovations. One of the key points of CRC is the sampling of the parameter space which is completely overturned compared to other Bayesian algorithms. Thus, in CRC the number of samples is an input parameter which remains unchanged in all the iterations. Other substantial innovations are the definition of as many DFs as the number of observables of the model and the corresponding intersection of all the DFs tails.The purpose of this study is to apply CRC with two different sampling approaches in order to test which performs better and to understand the influence of the sampling process in this algorithm. We apply CRC using LHS with uniformly and logarithmically spaced samples. Even if the practice of applying different sampling strategies is common in the field of the Bayesian algorithms, the comparison we provide in this paper has never been performed before since CRC is a novel algorithm. We choose three ODE models with a parameter space and the corresponding space of state variables of increasing dimension. They are the Lotka‐Volterra model, the EpoR model and the MM model.Tables 3, 5 and 7 prove that the logarithmic sampling is able to reach lower thresholds than the uniform ones, almost for all the output variables in all the tested models. Indeed, for the Lotka‐Volterra model, the thresholds obtained at the sixth iteration with logarithmic sampling are as close as possible to the target ones, meaning that through logarithmic sampling it is possible to calibrate the ODE model also preventing overfitting. Logarithmic sampling generates more dense samples in the interval (0; 1]. Similarly, in the EpoR model, after nine iterations of CRC with logarithmic sampling, the resulting thresholds get very close to the target ones. As regards the MM model, since the nominal parameter vector is not known from [19], we cannot calculate the target values of the error. However, after six iterations of CRC with logarithmic sampling the maximum error between simulated and experimental data is only of the 16% (see Table 7). On the other hand, the application of CRC combined with uniform sampling leads to threshold values that are on average higher than those obtained through the logarithmic ones. In the Lotka‐Volterra model, thresholds after six iterations are too distant from 3.09 and 2.6. Moreover, scatter plots in Fig. 2 confirm results of Table 3: it is clear how with the uniform sampling the accepted samples at each iteration evolve towards a region of the parameter space that does not correspond to the nominal values of kinetic parameters a and b. Conversely, the scatter plots of logarithmic sampling in Fig. 2 show how the red dots lie in a region that is exactly in the neighbourhood of the nominal point (1;1). Additionally, as shown in Fig. 3 only the logarithmic ones adequately fit the noisy data, following the typical oscillating behaviour of the prey‐predator model. On the other hand, at many time points, the behaviour depicted in Fig. 3 and b is too far from in silico data. Similar observations are valid also for the EpoR model. Fig. 7 clearly shows that, since the threshold values obtained with uniform sampling are too far from the target ones, the time behaviour of output variables does not fit the experimental data. On the contrary, logarithmic sampling is able to adequately reproduce the correct time behaviour of and , even if the resulting estimated parameter vector is an alternative parametrisation of the model. As regards the MM model instead, both sampling strategies produce acceptable time behaviour of output variables, as is shown in Fig. 9, since the maximum percentage error with the uniform sampling is of the 22%. However, the confidence bands in Fig. 9 are rather higher than those in Fig. 9. This is due to the fact that, for most of the kinetic parameters, the corresponding conditional pdf, estimated in the last iteration of CRC, is tighter when logarithmic sampling is performed, as shown in Fig. 8. Moreover, in order to make a comparison of the computational cost of the two sampling techniques, we measure the time to perform each iteration of CRC, as reported in Table 8. In all examples, the logarithmic sampling has a minor computational burden compared to the uniform sampling, whose simulation time also increases with the model dimension. In conclusion, it is possible to state that logarithmic sampling performs better and has a reduced computational cost when calibrating a model using CRC. It is the best sampling technique when it is necessary to perturb the parameter space over different orders of magnitude. This is due to the intrinsic property of the logarithmic distribution that generates the same number of samples in different orders of magnitude [22]. The logarithmic sampling is able to generate more dense samples around the mode vector returned by the previous iteration of CRC, which is the most likely point in the parameter space at a given iteration.
Table 8
Time to perform each iteration of CRC in the three models
M1
M2
M3
Iteration(z)
Uniform
Logarithmic
Uniform
Logarithmic
Uniform
Logarithmic
1
431 s
288 s
798 s
449 s
360 min
110 min
2
323 s
213 s
835 s
305 s
720 min
80 min
3
300 s
251 s
856 s
245 s
1 d
70 min
4
268 s
225 s
750 s
223 s
2 d
60 min
5
260 s
216 s
674 s
210 s
3 d
60 min
6
245 s
202 s
693 s
195 s
3 d
60 min
7
—
—
685 s
217 s
—
—
8
—
—
650 s
310 s
—
—
9
—
—
645 s
360 s
—
—
Time is expressed in seconds (s), minutes (min) or days (d).
Time to perform each iteration of CRC in the three modelsTime is expressed in seconds (s), minutes (min) or days (d).
Authors: Juliane Liepe; Paul Kirk; Sarah Filippi; Tina Toni; Chris P Barnes; Michael P H Stumpf Journal: Nat Protoc Date: 2014-01-23 Impact factor: 13.491
Authors: Ali Es-Haghi; Mohammad Ehsan Taghavizadeh Yazdi; Mohammad Sharifalhoseini; Mohsen Baghani; Ehsan Yousefi; Abbas Rahdar; Francesco Baino Journal: Biomimetics (Basel) Date: 2021-05-27