Literature DB >> 34350015

Numerical method for parameter inference of systems of nonlinear ordinary differential equations with partial observations.

Yu Chen^1,2, Jin Cheng^3,1, Arvind Gupta⁴, Huaxiong Huang^5,6,2,4, Shixin Xu⁷.

Abstract

Parameter inference of dynamical systems is a challenging task faced by many researchers and practitioners across various fields. In many applications, it is common that only limited variables are observable. In this paper, we propose a method for parameter inference of a system of nonlinear coupled ordinary differential equations with partial observations. Our method combines fast Gaussian process-based gradient matching and deterministic optimization algorithms. By using initial values obtained by Bayesian steps with low sampling numbers, our deterministic optimization algorithm is both accurate, robust and efficient with partial observations and large noise.

Entities: Chemical Disease Mutation Species

Keywords: Gaussian process; nonlinear ordinary differential equations; parameter inference; partial observations

Year: 2021 PMID： 34350015 PMCID： PMC8316824 DOI： 10.1098/rsos.210171

Source DB: PubMed Journal: R Soc Open Sci ISSN： 2054-5703 Impact factor: 2.963

Introduction

Many problems in science and engineering can be modelled by systems of ordinary differential equations (ODEs). It is often difficult or impossible to measure some parameters of the systems directly. Therefore, various methods have been developed to estimate parameters based on available data. Mathematically, such problems are classified as inverse problems, which have been widely studied [1-4]. They can be also treated as parameter inference in statistics. Statistical inference enables the construction of data-efficient learning machines and shows advantages in dealing with noisy data, which is increasingly being studied and discussed (e.g. [5-7]). For nonlinear ODEs, standard statistical inference is time consuming as numerical integration is needed after each update of the parameters [8,9]. Recently, gradient matching techniques have been proposed to circumvent the high computational cost of numerical integration [7,10-12]. These techniques are based on minimizing the difference between the time derivatives (gradients) and the right-hand side of the equations. This usually involves a process consisting of two steps: data interpolation and parameter adaptation. Among them, non-parametric Bayesian modelling with Gaussian processes is one of the promising approaches, which includes data interpolation by Gaussian process and parameter adaptation by matching the solution or model. An adaptive gradient matching method based on a product-of-experts approach and a marginalization over the derivatives of the state variables was proposed by Calderhead et al. [8] and extended by Dondelinger et al. [10]. Barber & Wang [13] proposed a Gaussian process-based method in which the state variables are marginalized. Macdonald et al. [9] provided an interpretation of the above paradigms. Wenk et al. [14] proposed a fast Gaussian process-based gradient matching (FGPGM) algorithm with theoretical framework in systems of nonlinear ODEs, which was indicated more accurate, robust and efficient. For many practical problems, the variables are only partially observable, or not at all times [15]. As a consequence, parameter inference is more challenging, even for a coupled system where the parameters are uniquely determined by data of partially observed data under certain initial conditions. It is not clear whether the gradient matching techniques can be applied to the case when there are latent variables. The Markov chain Monte Carlo algorithm has the ability to sidestep the issue of parameter identifiability in many cases, but convergence remains a serious issue [7]. Therefore, the graphical model, density formula and algorithm for inference with partial observations, especially how to deal with the latent variables, are worth investigation. Moreover, we need to pay attention to the feasibility, accuracy, robustness and computational cost of numerical computations for such problems. In this work, we focus on the case of parameter inference with partially observable data. The main idea is to treat the observable and non-observable variables differently. For observable variables, we use the same approach as proposed by Wenk et al. [14]. We will provide three approaches to deal with the unobserved variables, including numerical integrations and Gaussian process sampling, and compare their performances. To circumvent the high computational cost of sampling in Bayesian approaches, we also combine the extended FGPGM with least square optimization method. The remaining part of the paper is organized as follows. In §2, we give the numerical method to deal with parameter identification problems with partial observation. Numerical examples are presented in §3. Some concluding remarks are given in §4.

Algorithm

The main strategy of Gaussian process-based gradient matching is to minimize the mismatch between the data and the ODE solutions in a maximum-likelihood sense, making use of the property that Gaussian process is closed under differentiation. In this section, we will extend the FGPGM method proposed in [14] to the situation that contains unobserved variables. In this work, we would like to estimate the time-independent parameters of the following dynamical system described bywhere is the vector of time derivative of the variable and can be nonlinear vector valued functions. We assume only part of the variables are measurable and denote them as . Throughout the paper, we use subscript M to specify measurable components. They are observed on discrete time points as (t)(i = 1, …N2) with noise ε such that = + ε. We assume that the noise is Gaussian , thenwhere and are the realizations of and , respectively. The latent/unmeasurable variables are denoted as , with . The idea of Gaussian process-based gradient matching is as follows. Firstly, we put a Gaussian process prior on ,Here is denoted as the covariance matrix. Its components are given by with respect to a kernel function parametrized by the hyperparameters ϕ. Then according to lemma A.8, the conditional distribution of the kth state derivatives iswhereandHere . For more details, we refer to appendix A. We set up the following chain graph to show the relationship between the variables (figure 1). In figure 1, is the deterministic output of the measurable parts of the ODEs, whose realization is determined by and . is introduced to incorporate the model uncertainty, which is assumed to be a Gaussian noise with standard deviation γ so that for its realization we haveThen the joint density can be represented by the following theorem.

Figure 1

Chain graph with partial observable variables.

Theorem 2.1.

Given the modelling assumptions summarized in the graphical probabilistic model in ,where involved in is the solution determined by and . The proof is given in appendix B. In our computation, can be obtained by integrating the ODE system numerically with proposed and initial values of and . Then, the target is to maximize the likelihood function ρ(, |, ϕ, σ, γ). The present algorithm is a combination of a Gaussian process-based gradient matching and a least square optimization. In the GP gradient matching step, the Gaussian process model is first fitted by inferring the hyperparameter ϕ. Secondly, the states and parameters are inferred using a one-chain MCMC scheme on the density as in [14]. Finally, the parameters estimated above are set as initial guess in the least square optimization. The algorithm can be summarized as follows. Algorithm In Step 1, the Gaussian process model is fitted to data by maximizing the log of marginal likelihood of the observations at times with respect to hyperparameters ϕ and σ. σ is the estimated standard deviation of the observation noise and n is the amount of observations. For the inference of the unobserved variables, there are the following possible ways. The first method is to integrate the whole ODE system, which only depends on initial conditions and the proposed value of parameters. An alternative way is to integrate partially the system, that is, integrate the equations for the unobservable variables only. The coupled observable variables are regarded as known coefficients whose values are taken from the observations. In order to ensure the stability and convergence of the integration, smoothing and interpolation may be necessary if the data of the observable variables are sparse and noisy. The numerical integration of in these two methods are implemented only after each update of . The third approach to deal with the latent variables is applying the same sampling process as the observable states under the assumption of Gaussian process and doing the gradient matching, in which the posterior probability is adapted asNote that although there is no observation to match for the latent variables, the gradient matching is still valid. Thus, in the integration approaches only the observable variables are sampled while for the third one all the variables are sampled together with the updates of the likelihood function in each MCMC sampling cycle. The performances of these three methods will be discussed in §3.4. The last step is to solve the following minimization problem:In the optimization process, gradient descent method is adopted where numerical gradient is used in each searching step. One advantage of least-square optimization is its ability to obtain accurate results at low computational cost, especially for observations with Gaussian noise. But it requires proper initial guess of the parameters so as to avoid falling in local minima, whereas gradient matching is relatively less sensitive to the initial guess. However, for FGPGM a large amount of MCMC samplings are necessary to ensure the expectations of random variables make sense, which increases computational cost. Therefore, if we combine these two methods, it is possible to use just a few MCMC samplings to obtain a good initial guess for the least-square optimization.

Experiments

For the Gaussian regression step for observable variables, the code published alongside Wenk et al. [14] was used. The gradient matching part should then be adapted to the partial observation case according to figure 1. In the following, we refer to FGPGM as the modified fast Gaussian process gradient matching algorithm that is adapted for partial observation case.

Lotka Volterra

The Lotka Volterra system was originally proposed in Lotka [16]. It was introduced to model the prey–predator interaction system whose dynamics is given byandwhere θ1, θ2, θ3, θ4 > 0. In the present work, the system was observed with one variable and the initial value of the other variable. The other set-up is the same as Gorbach et al. [11] and Wenk et al. [14]. The observed series are located in the time interval [0, 2] at 20 uniformly distributed observation times. The initial values of the variables are (5, 3). The history of the observable variable is generated with numerical integration of the system with true parameters * = (θ1, θ2, θ3, θ4) = (2, 1, 4, 1) added with Gaussian noise with standard deviation 0.1. The radial basis function kernel (RBF) was used for the Gaussian process. For the model uncertainty, we set γ = 0.3. For the FGPGM step, the burn-in number and valid number are Nburnin = 7500, NMCMC = 10 000, respectively. The results with x1 being observed are shown in figure 2. Those with observation of x2 are given in figure 3. The errors of the reconstructed solution and parameters are summarized in tables 1–4. At this noise level, the inferences with and without optimization step both perform well with the overall relative error being of order 10−2. In the case with measurement of x2, we can see that the optimization process can improve the results from FGPGM, with the discernable errors of θ1 and θ3 in figure 3c being reduced. Although θ2 and θ4 appear quite accurate in the case of using FGPGM only (Nburnin = 7500, NMCMC = 10 000) in table 4, the relatively large error in θ3 may affect the reconstructed solution due to the coupling effect of the parameters in the system. By contrast, the parameter errors after the combination of FGPGM and optimization are all of order 10−2.

Figure 2

Figure 3

The state evolution over time for Lotka–Volterra system and parameter inference results. x2 is observable and x1 is latent variable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. (a) x1, (b) x2, (c) .

Table 1

Error of reconstructed solution with x1 being observable.

	∥x1−x1∗∥L∞∥x1∗∥L∞	∥x1−x1∗∥L2∥x1∗∥L2	∥x2−x2∗∥L∞∥x2∗∥L∞	∥x2−x2∗∥L2∥x2∗∥L2
FGPGM	1.69 × 10⁻²	0.83 × 10⁻²	3.47 × 10⁻²	2.44 × 10⁻²
FGPGM + Opt.	1.40 × 10⁻²	0.49 × 10⁻²	4.26 × 10⁻²	3.63 × 10⁻²

Table 4

Error of reconstructed parameters with x2 being observable.

	\|θ1−θ1∗\|	\|θ2−θ2∗\|	\|θ3−θ3∗\|	\|θ4−θ4∗\|
FGPGM	7.16 × 10⁻²	0.06 × 10⁻²	1.34 × 10⁻¹	0.07 × 10⁻²
FGPGM + Opt.	1.79 × 10⁻²	1.76 × 10⁻²	7.77 × 10⁻²	1.88 × 10⁻²

Reconstruction and inference results for the Lotka–Volterra system, showing the state evolution over time and parameter distributions. x1 is observable and x2 is latent variable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. The state evolution over time for Lotka–Volterra system and parameter inference results. x2 is observable and x1 is latent variable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. (a) x1, (b) x2, (c) . Error of reconstructed solution with x1 being observable. Error of reconstructed parameters with x1 being observable. Error of reconstructed solution with x2 being observable. Error of reconstructed parameters with x2 being observable. The sensitivities of the variables to the parameters are listed in table 5.

Table 5

Sensitivity of each variable to parameters for Lotka–Volterra system at * = (2, 1, 4, 1). The sensitivity index is defined in equation (3.3).

S_ij	x₁	x₂
θ₁	0.20	0.61
θ₂	0.52	1.13
θ₃	0.40	0.33
θ₄	1.27	0.98

Sensitivity of each variable to parameters for Lotka–Volterra system at * = (2, 1, 4, 1). The sensitivity index is defined in equation (3.3). The sensitivity indexes at the true parameter set * are defined aswhich are normalized. It is approximated by numerical difference. It is shown that near the true parameter set, θ1 and θ3 are relatively less sensitive to the variables than other parameters. This explains that θ1 and θ3 are less accurate in the numerical test (figures 2c and 3c). The cases with larger measurement noise level (with standard deviation 0.5) are shown in figures 4 and 5, corresponding to x1 and x2 observations, respectively. The errors of reconstructed solution and identified parameters are summarized in tables 6–9. It can be seen that the prediction of the unknown variable can deviate far from the ground truth if we use FGPGM method only (without Step 3 in the algorithm above). The inferring of states and parameters can be improved after further applying the deterministic optimization. This improvement is more obvious than the low noise case.

Figure 4

Figure 5

The state evolution over time for Lotka–Volterra system. x2 is observable and x1 is latent variable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. The observation noise has a standard deviation 0.5 (large noise case). (a) x1, (b) x2, (c) .

Table 6

Error of reconstructed solution with x1 being observable (large noise).

	∥x1−x1∗∥L∞∥x1∗∥L∞	∥x1−x1∗∥L2∥x1∗∥L2	∥x2−x2∗∥L∞∥x2∗∥L∞	∥x2−x2∗∥L2∥x2∗∥L2
FGPGM	1.56 × 10⁻¹	7.30 × 10⁻²	4.03 × 10⁻¹	3.81 × 10⁻¹
FGPGM+Opt.	3.02 × 10⁻²	3.08 × 10⁻²	2.70 × 10⁻²	2.68 × 10⁻²

Table 9

Error of reconstructed parameters with x2 being observable (large noise).

	\|θ1−θ1∗\|	\|θ2−θ2∗\|	\|θ3−θ3∗\|	\|θ4−θ4∗\|
FGPGM	6.40 × 10⁻¹	3.44 × 10⁻¹	1.10	3.63 × 10⁻¹
FGPGM + Opt.	4.35 × 10⁻²	4.35 × 10⁻²	4.72 × 10⁻¹	2.16 × 10⁻¹

Reconstruction and inference results for the Lotka–Volterra system with x1 being observable and x2 latent. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. The observation noise has a standard deviation 0.5 (large noise case). (a) x1, (b) x2, (c) . The state evolution over time for Lotka–Volterra system. x2 is observable and x1 is latent variable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. The observation noise has a standard deviation 0.5 (large noise case). (a) x1, (b) x2, (c) . Error of reconstructed solution with x1 being observable (large noise). Error of reconstructed parameters with x1 being observable (large noise). Error of reconstructed solution with x2 being observable (large noise). Error of reconstructed parameters with x2 being observable (large noise).

Spiky dynamics

This example is a system proposed by FitzHugh [17] and Nagumo et al. [18] for modelling the spike potentials in the giant squid neurons, which is abbreviated as FHN system. This system involves two ODEs (3.4) and (3.5) with three parameters.and The FHN system has notoriously fast changing dynamics due to its highly nonlinear terms. In the following numerical tests, the Matern 5/2 [19] kernel was used and γ was set to 0.3, the same as that in Wenk et al. [14]. We assume one of the two variables is observable, which was generated with * = (θ1, θ2, θ3) = (0.2, 0.2, 3) and added by Gaussian noise with average signal-to-noise ratio SNR = 100. There were 100 data points uniformly spaced in [0, 10]. The burn-in number and valid number of samplings in the FGPGM step are Nburnin = 7500, NMCMC = 10 000, respectively. In this case, if we merely use FGPGM step, the reconstructed solution corresponding to the identified parameters may deviate significantly from the true time series (see figure 6, where data of x1 are observable). It was pointed out [14] that all GP-based gradient matching algorithms lead to smoother trajectories than the ground truth. This becomes more severe with sparse observation. The least-square optimization after FGPGM may well reduce this effect and give a better reconstruction of the solution (figure 7).

Figure 6

Figure 7

The state evolution over time and identified parameters for FHN system. x1 is observable and x2 is latent variable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. (a) x1, (b) x2, (c) .

Results for FHN system obtained from FGPGM method, without further optimization. x1 is observable and x2 is latent variable. The ground truth, FGPGM result and reconstructed solution (integration of ODEs with inferred parameters) are compared. (a) x1, (b) X2. The state evolution over time and identified parameters for FHN system. x1 is observable and x2 is latent variable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. (a) x1, (b) x2, (c) . Figures 7 and 8 present the results with single x1 and x2 observations, respectively. In both cases, the identified parameters are more accurate than using FGPGM only. From the sensitivity check in table 10, it is expected that θ1 is most accurate because it is most sensitive among these three parameters, whereas θ2 is most insensitive and would be harder to be identified. The numerical results agree with that. It is worth mentioning that in the FGPGM step, only 3500 samplings were taken and the time for optimization step was much less than FGPGM step. This means the time needed for the whole process can be greatly saved compared with that in Wenk et al. [14], where 100 000 MCMC samplings were implemented.

Figure 8

Table 10

Sensitivity of each variable to parameters for FHN system at * = (0.2, 0.2, 3.0). The sensitivity index is defined as equation (3.3).

S_ij	x₁	x₂
θ₁	2.33	1.24
θ₂	0.44	0.31
θ₃	1.01	0.55

The state evolution over time and identified parameters for FHN system. x2 is observable and x1 is latent variable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. (a) x1, (b) x2, (c) . Sensitivity of each variable to parameters for FHN system at * = (0.2, 0.2, 3.0). The sensitivity index is defined as equation (3.3). In this example, we also notice that if we merely use least-square optimization method, the local minimum effect would lead to reconstruction being far from the ground truth, which is even less robust than FGPGM method. For example, if we choose initial guess of the parameters near (θ1, θ2, θ3) = (1.51, 2.2, 1.78) then the cost functional will fall into the local minimum during gradient-based search (figure 9). The existence of many local minima in the full observation case has been pointed out in e.g. [7,20]. These results clearly illustrate the performance of the combination of FGPGM and least-square optimization to avoid local minimum solution.

Figure 9

Results of FHN system with x1 being latent, obtained by merely using least-square optimization with initial guess of parameters near a local minimum point. (a) x1, (b) x2.

Protein transduction

Finally, the protein transduction system proposed in Vyshemisky & Girolami [21] was adopted to illustrate the performance of the method in ODEs with more equations. The system is described byWe adopted the same experimental set-up of Dondelinger et al. [10] and Wenk et al. [14] as follows. γ = 10−4 in FGPGM step. The observation were made at discrete times [0, 1, 2, 4, 5, 7, 10, 15, 20, 30, 40, 50, 60, 80, 100]. The initial condition was [1, 0, 1, 0, 0] and the data were generated by numerically integrating the system under = [0.07, 0.6, 0.05, 0.3, 0.017, 0.3], added by Gaussian noise with standard deviation 0.01. A sigmoid kernel was used to deal with the logarithmically spaced observation times and the typically spiky form of the dynamics as in the previous papers. Figure 10 gives the result with x3 being unobserved. In fact, the situations with one of other variables being unknown have better results than the case illustrated here, which will not be presented here. We can see that x3 was not well fitted by merely using FGPGM step (Nburnin = 7500, NMCMC = 10 000), whereas further applying the least-square optimization generates satisfactory results, with the parameters θ2 and θ4 being significantly improved. The sensitivity check is summarized in table 11, from which we can see that θ2 is less sensitive and thereby harder to infer accurately.

Figure 10

Table 11

Sensitivity of each variable to parameters for protein transduction system at * = [0.07, 0.6, 0.05, 0.3, 0.017, 0.3]. The sensitivity index is defined as equation (3.3).

S_ij	x₁	x₂	x₃	x₄	x₅
θ₁	2.86	9.78	1.73	1.77	3.33
θ₂	0.70	0.98	0.22	0.59	0.41
θ₃	1.35	2.11	0.47	0.92	0.90
θ₄	0.26	0.43	0.03	2.64	0.62
θ₅	1.53	2.58	24.48	0.90	49.38
θ₆	0.04	0.07	0.60	0.02	1.21

The state evolution over time and inferred parameters for protein transduction system. x3 is unknown and other variables are observable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. (a) x1, (b) x2, (c) x3, (d) x4, (e) x5, (f) . Sensitivity of each variable to parameters for protein transduction system at * = [0.07, 0.6, 0.05, 0.3, 0.017, 0.3]. The sensitivity index is defined as equation (3.3). It was mentioned in Dondelinger et al. [10] that θ5 and θ6 in this system are difficult to fit. θ5/θ6 is relatively sensitive but is still likely to be overestimated. It was pointed out in [14] that the parameters of the protein transduction system are unidentifiable. In the present case with partial observation, θ6 is far beyond the truth with the inferred value being around 3.0, which is therefore not presented in the figure. However, the observation can still be acceptably reconstructed. It would also be of interest to see the performance of our method for the cases with more latent variables. In this model, although x2 is not involved in equations for other variables, the data of x2 helps infer θ1. We also notice that , which means with known initial conditions the histories of the three variables can be known from any two of them. If x1 and x2 are both missing, it is impossible to identify θ1. Therefore, in the following test we choose data of x2, x4 and x5 as observations. The data have a Gaussian noise with standard deviation 0.01, the same as the previous case with one latent variable. It can be seen from figure 11 that the result from FGPGM step is worse than the case with only one latent variable, but the final reconstruction of latent variables and parameter identification after Step 3 is not significantly different from the case with one latent variable.

Figure 11

The state evolution over time and inferred parameters for protein transduction system. x1 and x3 are unknown and other variables are observable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. (a) x1, (b) x2, (c) x3, (d) x4, (e) x5, (f) . In order to imitate real situations, we considered the case with abnormal measurements, in which some observations have significant deviation from the ground truth. Moreover, only x1 and x5 are observable. Note that further reducing the observable variable will make the system unidentifiable. The results are shown in figure 12. At t = 40 and t = 60, the measurement errors are abnormally large (figure 12a,e). The identified parameters and reconstructed state series seem not severely impacted by the large measurement errors, which shows some robustness. This also illustrates that the sigmoid kernel is a proper choice for this example. The performance in more real situations is worth further investigation, especially for complex dynamical systems.

Figure 12

The state evolution over time and inferred parameters for protein transduction system. x2, x3, x4 are unknown and other variables are observable. The ground truth, FGPGM result, and result from combination of FGPGM and optimization are compared. Observations at t = 40, 60 are abnormal with significant deviation from ground truth. (a) x1, (b) x2, (c) x3, (d) x4, (e) x5, (f) . Besides, the dependence of errors of inferred parameter on the noise levels have also been carried out. Table 12 gives the comparison among observation noise levels. The measurement errors are Gaussian and independent and identically distributed. Here x1 and x5 are observable. On the whole the identified parameters become less accurate when the observation noise level increases. When the standard deviation increases to 0.02, some parameters show quite large errors such that the identification should be regarded as failed. We need to mention that small errors in part of the parameters does not necessarily imply a better reconstruction of solution or prediction. It relates to the sensitivities of parameters and variables, which need further investigation.

Table 12

Error of inferred parameters under various noise levels.

noise level (s.d.)	method	\|θ1−θ1∗\|	\|θ2−θ2∗\|	\|θ3−θ3∗\|	\|θ4−θ4∗\|	\|θ5−θ5∗\|
0.005	FGPGM	0.0127	0.1009	0.0007	0.0312	0.0372
0.01	FGPGM	0.0055	0.1312	0.0590	0.1316	0.0308
0.02	FGPGM	0.0193	0.1730	1.1257	1.4630	0.0272
0.005	FGPGM + Opt.	0.0253	0.0489	0.0327	0.0638	0.0602
0.01	FGPGM + Opt.	0.0267	0.0702	0.0270	0.0736	0.0704
0.02	FGPGM + Opt.	0.0121	0.1063	1.0824	1.4883	0.0477

Error of inferred parameters under various noise levels.

Comparisons among treatments for unobservable variables

Now we illustrate the performances of the three ways to infer the unobservable variables proposed in §2. We take the spiky dynamics in §3.2 as example and choose x2 as unobservable variable. Here the observation noise for x1 is increased with a standard deviation of 0.1 and the time resolution is reduced to 50 points. The results of these three methods are compared in figures 13–17. In all the cases, 2000 burning cycles and 3000 sampling cycles are implemented. In the full integration case, the ode45 solver is applied, while in the partial integration case Eular formula is adopted. Other computational parameters are unified as γ = 1.0 × 10−3, standard deviations that control the sampling steps are 0.001 and 0.12 for x1 and , respectively. The initial guess for the unobservable variable are plotted in dashed line in figure 15b.

Figure 13

The state evolution over time and identified parameters for FHN system. x2 is unobservable and inferred by the integration of the whole ODE system. (a) x1, (b) x2, (c) .

Figure 17

The state evolution over time and identified parameters for FHN system. x2 is unobservable and inferred by sampling under the assumption of Gaussian process. The initial guess of x2 at t = 0 is also far from the ground truth. (a) x1, (b) x2, (c) .

Figure 15

The state evolution over time and identified parameters for FHN system. x2 is unobservable and inferred by sampling under the assumption of Gaussian process. (a) x1, (b) x2, (c) .

The state evolution over time and identified parameters for FHN system. x2 is unobservable and inferred by the integration of the whole ODE system. (a) x1, (b) x2, (c) . The state evolution over time and identified parameters for FHN system. x2 is unobservable and inferred by integrating the equation for x2 with inferred x1 as known coefficient. (a) x1, (b) x2, (c) . The state evolution over time and identified parameters for FHN system. x2 is unobservable and inferred by sampling under the assumption of Gaussian process. (a) x1, (b) x2, (c) . The state evolution over time and identified parameters for FHN system. x2 is unobservable and inferred by sampling under the assumption of Gaussian process. The initial guess of x2 is different from that in figure 15b. (a) x1, (b) x2, (c) . The state evolution over time and identified parameters for FHN system. x2 is unobservable and inferred by sampling under the assumption of Gaussian process. The initial guess of x2 at t = 0 is also far from the ground truth. (a) x1, (b) x2, (c) . The total CPU times corresponding to the full integration, partial integration and sampling approaches are 207.6 s, 180.7 s and 529.4 s correspondingly. In the partial integration case, if denoise and interpolation for observed variables or high-order integration scheme are needed to ensure the stability and convergence, then the computational cost will increase. If most of the variables in a high-dimensional system are unobservable, then the partial integration may not have a significant time saving compared with the full integration case. At present, the system is so simple that the computational expense for integration is very low compared with sampling and probability calculation. Therefore, the CPU time is much higher in the third approach where the sampled states are doubled. However, when the system is complex such that the computational cost for solving the system becomes an issue, the sampling approach with proper initial guesses may show advantages since it depends on the time points, and sparse approximation works well in Gaussian process methods [8]. It can be seen from the results that the sampling approach gets a better parameter approximation than the integration methods (figure 15c), though the inference of x2 is worse than them in terms of the curve features. It should be noted that the results for the full sampling method depend on the initial guess. An inappropriate initial distribution may lead to large deviation from the truth at the same computational parameters (figure 16b,c). If the system is large and most of the variables are unobservable, then the initial guesses of them may become an issue. To investigate the case of unknown initial conditions, we propose an initial guess such that x2 at t = 0 deviates a lot from the ground truth (figure 17b). It can be seen that the sampling is still able to find a solution close to the truth for this system. We should point out that lack of initial conditions may lead to non-uniqueness of identification for some systems [22].

Figure 16

The state evolution over time and identified parameters for FHN system. x2 is unobservable and inferred by sampling under the assumption of Gaussian process. The initial guess of x2 is different from that in figure 15b. (a) x1, (b) x2, (c) .

According to the above discussions, we suggest that if the system is very small such that solving ODEs is cheap in time, one can adopt the integration approaches since they are independent of the prior information of the unobserved variables. If the integration requires fine time step and most of the variables in the system are unobservable, then the full integration is preferred since denoise and interpolation of the observable variables which appear in the equations for unobservable ones can be avoided. For large-scale problems with reliable prior information, the full sampling method may have advantages in both accuracy and efficiency.

Indirect observation

In this section, we consider the case that the observable output (t) is a function of the original system variable (t). Then, the observation with noise becomes (t) = ((t)) + ɛ. Take the following system as example:andand the measurement equation is For such problem, we take time derivative of the output functionand then have a new three-equation system for {x1, x2, y}, in which the first two variables are unobservable and the last one is observable. The above method can then be applied to the new system. Assume that the observation is taken at 25 uniformly distributed time points within [0, 5], with a Gaussian noise of standard deviation 0.2. The true values of the parameters are * = (0.5, 1), and initial values (0) = [0.5, 0]. In the Gaussian process gradient matching step, the Matern 5/2 kernel was used and γ = 10−3. The numerical results are provided in figure 18. The burning number and total sampling number are 2000 and 5000, respectively. In this example, the FGPGM step has already reached a satisfactory inference for the parameters (figure 18d), and further application of least optimization loses accuracy at the end of the time interval due to sparsity of the data (figure 18c) which leads to deviations in the parameters.

Figure 18

The state evolution over time and identified parameters for example 4. Only is observable. (a) x1, (b) x2, (c) y, (d) .

Discussion

In the work, we proposed an effective method for parameter inference of coupled ODE systems with partially observable data. Our method is based on previous work known as FGPGM [14], which avoids product of experts heuristics. We provide the treatment of latent variable and graphical frame for such problems. The likelihood formula is then derived. In order to improve the accuracy and efficiency of the method, we also use least-square optimization in our computation. In our numerical tests, we use only 10% of the sampling number that is suggested in the literature for the FGPGM step, which can already provide a good initial guess for the least optimization. Owing to the existence of latent variables, three approaches for treatment of unobservable variables are provided with the performances discussed. Suggestions for choices of the approaches are given according to the system scales and reliability of prior information. The present algorithm is robust and accurate with large data noise and partial observations. At the same time, thanks to the step of Gaussian process-based inference, it is easier to get rid of local minima for the least-square optimization step and achieve the global minimum, which reduces the dependence on the initial guess of parameters. Click here for additional data file.

Algorithm

Input: y, f(x, θ), γ, N_MCMC, N_burnin, t, σ_s, σ_p

Step 1. Fit Gaussian process model to data

Step 2. Infer x_M, x_L and θ using MCMC

S←∅

for i = 1 → N_MCMC + N_burnin do

T←∅

for each state x_Mj(t_k) do

Propose a new state value using a Gaussian distribution with standard deviation σ_s

Accept proposed value based on the density (equation (2.8))

Add current value to T

end for

for each parameter do

Propose a new parameter value using a Gaussian distribution with standard deviation σ_p

Infer x_L by integration or sampling (detailed in §2).

Accept proposed value based on the density (equation (2.8))

Add current value to T

end for

Add the mean of T to S

end for

Discard the first N_burnin samples of S

Return x_M, x_L, θ

Step 3. Optimization of equation (2.11) using θ from Step 2 as initial guess.

Table 2

Error of reconstructed parameters with x1 being observable.

	\|θ1−θ1∗\|	\|θ2−θ2∗\|	\|θ3−θ3∗\|	\|θ4−θ4∗\|
FGPGM	8.00 × 10⁻²	4.79 × 10⁻²	8.32 × 10⁻²	2.35 × 10⁻²
FGPGM + Opt.	9.82 × 10⁻²	1.88 × 10⁻²	8.39 × 10⁻²	1.17 × 10⁻²

Table 3

Error of reconstructed solution with x2 being observable.

	∥x1−x1∗∥L∞∥x1∗∥L∞	∥x1−x1∗∥L2∥x1∗∥L2	∥x2−x2∗∥L∞∥x2∗∥L∞	∥x2−x2∗∥L2∥x2∗∥L2
FGPGM	2.99 × 10⁻²	3.08 × 10⁻²	1.98 × 10⁻²	1.92 × 10⁻²
FGPGM + Opt.	0.80 × 10⁻²	0.65 × 10⁻²	1.37 × 10⁻²	1.07 × 10⁻²

Table 7

Error of reconstructed parameters with x1 being observable (large noise).

	\|θ1−θ1∗\|	\|θ2−θ2∗\|	\|θ3−θ3∗\|	\|θ4−θ4∗\|
FGPGM	6.37 × 10⁻²	2.76 × 10⁻¹	2.01	5.75 × 10⁻¹
FGPGM+Opt.	1.78 × 10⁻¹	1.06 × 10⁻¹	2.75 × 10⁻¹	5.51 × 10⁻²

Table 8

Error of reconstructed solution with x2 being observable (large noise).

	∥x1−x1∗∥L∞∥x1∗∥L∞	∥x1−x1∗∥L2∥x1∗∥L2	∥x2−x2∗∥L∞∥x2∗∥L∞	∥x2−x2∗∥L2∥x2∗∥L2
FGPGM	3.22 × 10⁻¹	1.89 × 10⁻¹	8.80 × 10⁻²	5.34 × 10⁻²
FGPGM + Opt.	6.88 × 10⁻²	6.68 × 10⁻²	8.40 × 10⁻²	7.71 × 10⁻²

5 in total

1. Bayesian ranking of biochemical system models.

Authors: Vladislav Vyshemirsky; Mark A Girolami
Journal: Bioinformatics Date: 2007-12-05 Impact factor: 6.937

2. Impulses and Physiological States in Theoretical Models of Nerve Membrane.

Authors: R Fitzhugh
Journal: Biophys J Date: 1961-07 Impact factor: 4.033

3. ON IDENTIFIABILITY OF NONLINEAR ODE MODELS AND APPLICATIONS IN VIRAL DYNAMICS.

Authors: Hongyu Miao; Xiaohua Xia; Alan S Perelson; Hulin Wu
Journal: SIAM Rev Soc Ind Appl Math Date: 2011-01-01 Impact factor: 10.780

4. A Bayesian approach to estimating hidden variables as well as missing and wrong molecular interactions in ordinary differential equation-based mathematical models.

Authors: Benjamin Engelhardt; Maik Kschischo; Holger Fröhlich
Journal: J R Soc Interface Date: 2017-06 Impact factor: 4.118

5. A comparison of approximate versus exact techniques for Bayesian parameter inference in nonlinear ordinary differential equation models.

Authors: Amani A Alahmadi; Jennifer A Flegg; Davis G Cochrane; Christopher C Drovandi; Jonathan M Keith
Journal: R Soc Open Sci Date: 2020-03-11 Impact factor: 2.963

5 in total