Literature DB >> 29769407

Information sensitivity functions to assess parameter information gain and identifiability of dynamical systems.

Abstract

A new class of functions, called the 'information sensitivity functions' (ISFs), which quantify the information gain about the parameters through the measurements/observables of a dynamical system are presented. These functions can be easily computed through classical sensitivity functions alone and are based on Bayesian and information-theoretic approaches. While marginal information gain is quantified by decrease in differential entropy, correlations between arbitrary sets of parameters are assessed through mutual information. For individual parameters, these information gains are also presented as marginal posterior variances, and, to assess the effect of correlations, as conditional variances when other parameters are given. The easy to interpret ISFs can be used to (a) identify time intervals or regions in dynamical system behaviour where information about the parameters is concentrated; (b) assess the effect of measurement noise on the information gain for the parameters; (c) assess whether sufficient information in an experimental protocol (input, measurements and their frequency) is available to identify the parameters; (d) assess correlation in the posterior distribution of the parameters to identify the sets of parameters that are likely to be indistinguishable; and (e) assess identifiability problems for particular sets of parameters.

Entities: Chemical Disease Gene Species

Keywords: Bayesian estimation; dynamical systems; identifiability; information theory; sensitivity functions

Mesh：

Year: 2018 PMID： 29769407 PMCID： PMC6000172 DOI： 10.1098/rsif.2017.0871

Source DB: PubMed Journal: J R Soc Interface ISSN： 1742-5662 Impact factor: 4.118

Introduction

Sensitivity analysis [1] has been widely used to determine how the parameters of a dynamical system influence its outputs. When one or more outputs are measured (observed), it quantifies the variation of the observations with respect to the parameters to determine which parameters are most and least influential towards the measurements. Therefore, when performing an inverse problem of estimating the parameters from the measurements, sensitivity analysis is widely used to fix the least influential parameters (as their effect on the measurements is insignificant and removing them reduces the dimensionality of the inverse problem) while focussing on estimation of the most influential parameters. Sensitivity analysis is also used to assess the question of parameter identifiability, i.e. how easy or difficult is it to identify the parameters from the measurements. This is primarily based on the idea that if the observables are highly sensitive to perturbations in certain parameters then these parameters are likely to be identifiable, and if the observables are insensitive then the parameters are likely to be unidentifiable. However, the magnitude of the sensitivities is hard to interpret, except in the trivial case when the sensitivities are identically zero. Lastly, parameter identifiability based on sensitivity analysis also assesses correlation/dependence between the parameters—through principle component analysis [2], correlation method [3], orthogonal method [4] and the eigenvalue method [5]—to identify which pairs of parameters, owing to the high correlation, are likely to be indistinguishable from each other (also see [6] and the referenced therein). Another method to assess correlations is based on the Fisher information matrix [6-8], which can be derived from asymptotic analysis of nonlinear least-squares estimators [9,10]. Ashyraliyev & Blom [11] suggested that a singular value decomposition of the Fisher information matrix can be used to identify linear combinations of parameters that can be well identified given the observables and measurement noise. Another class of methods to assess identifiability, proposed by Raue et al. [12-14], are based on the exploiting the curvature of the likelihood function or the flatness of the profile likelihood, i.e. minimization of the likelihood with respect to all parameters but one. Li & Vu [15,16] proposed that pairwise and higher-order correlations between the parameters may be identified by assessing linear dependencies between the columns of the sensitivity matrix [16] or the matrix of first-order partial derivatives of the state equations [15]. Thomaseth and Cobeli extended the classical sensitivity functions to ‘generalized sensitivity functions’ (GSFs) which assess information gain about the parameters from the measurements. This method has been widely used to assess identifiability of dynamical systems [10,17-20], where regions of high information gain show a sharp increase in the GSFs while oscillations imply correlation with other parameters. There are two drawbacks of GSFs: first, that they are designed to start at 0 and end at 1, which leads to the so-called ‘force-to-one’ phenomenon, where even in the absence of information about the parameters the GSFs are forces to end at a value of 1; and second, oscillations in GSFs can be hard to interpret in terms of identifying which sets of parameters are correlated. Based on a pure information-theoretic approach Pant & Lombardi [21] proposed to compute information gain through a decrease in Shannon entropy, which alleviated the shortcomings of GSFs. However, since their method relies on a Monte Carlo type method the computational effort associated with the computation of information gains can be quite large. In this article, a novel method which combines the method of Pant & Lombardi [21] with the classical sensitivity functions to compute information gain about the parameters is presented. The new functions are collectively called ‘information sensitivity functions’ (ISFs), which assess parameter information gain through sensitivity functions alone, thereby eliminating the need for Monte Carlo runs. These functions (i) are based on Bayesian/information-theoretic methods and do not rely on asymptotic analysis; (ii) are monotonically non-decreasing and therefore do not oscillate; (iii) can assess regions of high information content for individual parameters; (iv) can assess parameter correlations between an arbitrary set of parameters; (v) can reveal potential problems in identifiability of system parameters; (vi) can assess the effect of experimental protocol on the inverse problem, for example, which outputs are measured, associated measurement noise, and measurement frequency; and (vii) are easily interpretable. In what follows, first the theoretic developments are presented in §§2–8, followed by their application to three different dynamical systems in §9. The three examples are chosen from different areas in mathematical biosciences: (i) a Windkessel model, which is a widely used boundary condition in computational fluid dynamics simulations of haemodynamics; (ii) the Hodgkin–Huxley model for a biological neuron, which has formed the basis for a variety of ionic models describing excitable tissues; and (iii) a kinetics model for the influenza A virus.

The dynamical system and sensitivity equations

Consider the following dynamical system governed by a set of parametrized ordinary differential equations (ODEs): where t represents time, is the state vector, is the parameter vector, the function represents the dynamics and x0 represents the initial condition at time t0. The initial conditions may depend on the parameters, and therefore The above representation subsumes the case where the initial condition may itself be seen as a parameter. The RHS of the dynamical system, equation (2.1), can be linearized at at a reference point (x, , t), to obtain where represents ( · ) evaluated at the reference point. Henceforth, in order to be concise, the explicit dependence of f(x, , t) on its arguments is omitted and f, without any arguments, is used to denote f(x, , t). Following this notation, equation (2.3) is concisely written as The above linearization will be used in the next section to study the evolution of the state covariance matrix with time. Let denote the matrix of sensitivity functions for the system in equation (2.1), i.e. S = ∇x, or It is well known that S satisfies the following ODE system, which can be obtained by applying the chain rule of differentiation to equation (2.1): The goal is to relate the evolution of the sensitivity matrix to the evolution of the covariance of the joint vector of the state and the parameters. Let the subscript n denote all quantities at time t; for example, x denotes the state vector at time t, S the corresponding sensitivity matrix, and so on. To relate the sensitivity matrix S at time t with S, a first-order discretization of equation (2.6) is considered and, therefore, the matrix product can be written as Next, it is hypothesized that under certain conditions can be seen as the covariance matrix of the state vector at time t. These developments are presented in the next two sections.

Forward propagation of uncertainty

As the objective is to study the relationship between the parameters and the state vector, a joint vector of all the state vectors until the current time t and the parameter vector is considered. Assume that at time t, this joint vector [x, x, …, x0, ] is distributed according to a multivariate Normal distribution as follows: To obtain the joint distribution of [x, x, …, x0, ] (all the state vectors until time t and the parameter vector), the linearized dynamical system, equation (2.4), is used. Considering the reference point (x, , t) in equation (2.4) to be (, , t), i.e. considering the linearization around the mean values of the parameter vector and the state at time t, one obtains Ignoring the higher-order terms, and employing a forward Euler discretization, one obtains

Remark 3.1.

x is completely determined by x and , i.e. given x and nothing more can be learned about x. Hence, the forward propagation forms a Markov chain.

Remark 3.2.

are evaluated at (, , t).

Remark 3.3.

In equation (3.1), = and = . The joint vector [x, x, …, x0, ] can be written from equations (3.1) and (3.3) as where I represents an identity matrix of size q, O represents a zero matrix of size q × r, and is a term that does not depend on x and . The distribution of [x, x, …, x0, ] can be written from equation (3.4) as and the covariance can be expanded as where and If the above evolution of the covariance matrix can be related to the evolution of the sensitivity matrix, as presented in §2 and equation (2.8), then the dependencies between the state vector and the parameters can be studied. This concept is developed in the next section.

Relationship between sensitivity and forward propagation of uncertainty

In this section, the relationship between the evolution of the sensitivity matrix and the evolution of the covariance matrix of the joint distribution between all the state vectors until time t and the parameters is developed. Equation (3.8) can be expanded as follows: Assume the following: and Under the above assumptions, it can be deduced from equations (4.1) and (2.8) that Furthermore, equation (3.9) reads which, as evident from equation (2.7), is the standard forward propagation of the sensitivity matrix. Hence Finally, the term from equation (3.10) can be written as From equations (4.5), (4.6) and (4.7), it can be concluded that if the initial prior uncertainty in [x0, ] is assumed to be Gaussian with covariance then the joint vector of , the parameters, and [x, x, …, x0], the state-vector corresponding to time instants [t0, t1, …, t], can be approximated, by considering only the first-order terms after linearization, to be a Gaussian distribution with the following covariance:

Remark 4.1.

Note that a prior mean for the vector [x0, ] is assumed to be based on which the mean vector of the state will propagate according to equation (3.3), essentially according to the forward Euler method. While this propagated mean does not directly influence the posterior uncertainty of the parameters, which depends only on the covariance matrix, it is important to note that the sensitivity terms in the covariance matrix of equation (4.9) are evaluated at the propagated means. The propagated mean of the joint vector [x, x, …, x0, ] is referred throughout this manuscript as = [, , …, , ].

Remark 4.2.

The required conditions presented in equations (4.2), (4.3) and (4.4), can also be derived without temporal discretization of the sensitivity and linearized forward model. This is presented in appendix A, which presents a differential equation describing the evolution of the joint covariance matrix, leading to the conditions derived above without temporal discretization. Even though the method presented in appendix A may be considered more general, the author first conceived the idea using the arguments shown above, and hence these ideas are presented in the main text.

Measurements (observations)

Having established how the covariance of the state and the parameters evolves in relation to the sensitivity matrix, the next task is to extend this framework to include the measurements. Eventually, one wants to obtain an expression for the joint distribution of the measurements and the parameters, so that conditioning this joint distribution on the measurements (implying that measurements are known) will yield information about how much can be learned about the parameters. Consider a linear observation operator where is measured at time t according to where is the observation operator at time t and is the measurement noise. Let be independently (across all measurement times) distributed as where O is a zero vector and is the covariance structure of the noise. From equations (4.9) and (5.1), it is easy to see that [y, y, …, y0, ] follows a Gaussian distribution with the following mean and covariance: where and

Remark 5.1.

A nonlinear observation operator in equation (5.1), as opposed to the linear operator H, does not present any technical challenges to the formulation as it can be linearized at the current mean values. Following this, in equations (5.4) and (5.5), H would need to be replaced by the tangent operator .

Conditional distribution of the parameters

The quantity of interest is the conditional distribution of parameters; i.e. how the beliefs about the parameters have changed from the prior beliefs to the posterior beliefs (the conditional distribution) by the measurements. More than the mean of the conditional distribution, the covariance is of interest. This is due to two reasons: (i) owing to the Gaussian approximations, the covariance entirely reflects the amount of uncertainty in the parameters; and (ii) while the mean of the conditional distribution depends on the measurements, the covariance does not. The latter is significant because a priori, the measurement values are not known. Consequently, the average (over all possible measurements) uncertainty in the parameters too is independent of the measurements, and hence can be studied a priori. From equation (5.3), since the joint distribution of the parameter vector and the observables is Gaussian, the conditional distribution of the parameter vector given the measurements is also Gaussian and can be written as with and where yo denotes the measurement value (the realization of the random variable y observed) at t. Note that the conditional covariance is independent of these measurement values yo. Furthermore, since the uncertainty in a Gaussian random variable, quantified by the differential entropy, depends only on the covariance matrix, the posterior distribution uncertainty does not depend on the measurements.

Conditional covariance when n → ∞ and when n is finite

For the asymptotic case when n → ∞, it can be shown that (for proof see appendix B) where is the Fisher information matrix defined as Furthermore, for finite n, the conditional covariance can be written as (for proof see appendix C)

Remark 7.1.

Equations (7.1) and (7.3) relate to the classical and Bayesian Cramer Rao bounds [22,23], respectively, in estimation theory.

Information gain

In this section, the gain in information about the parameters by the measurements is considered. For details of such an information-theoretic approach the reader is referred to [21]. The gain in information about the parameter vector by the measurements of = [, , …, 0] is given by the mutual information between and , which is equal to the difference between the differential entropies of the prior distribution p() and the conditional distribution p(|z). From equations (4.8), (6.1) and (6.3), this gain in information can be written as where denotes the determinant. The above can be expanded through equation (7.3) as Note that the above represents the information gain for the joint vector of all the parameters. Commonly, one is interested in individual parameters, for which the information gain is now presented. Let denote the vector of a subset of parameters indexed by the elements of set and denote the vector of the remaining parameters, the complement of set . Hence, { denotes the ith parameter, { denotes the vector formed by taking the ith and jth parameters, and so on. The conditional covariance matrix can be decomposed into the components of and as and where is the cardinality of , and and are the sensitivity matrices for and , respectively, i.e. and . Given the above decomposition, the marginal covariance of given the measurements can be written as the Schur complement of the matrix in as follows: and the information gain as Another quantity of interest is the correlation between two subsets of parameters and . In an information-theoretic context this can be assessed by how much more information is gained about the parameters in addition to if was also known, i.e. the mutual information between and given the measurements. Similar to the procedure employed in equation (8.4), by splitting into three components for , and , one can write the conditional covariance of the parameters given the measurements and, additionally, the parameters as follows: where is the cardinality of , and The information gain about the parameters given both the measurements and the parameters is Lastly, the conditional mutual information (CMI), i.e. the additional (after the measurements are known) information gained about the parameters due to the knowledge of is

Remark 8.1.

is the gain in information about the parameters given the measurements and when nothing is known about the parameters .

Remark 8.2.

is the gain in information about the parameters given the measurements and the parameters , when nothing is known about the parameters .

Remark 8.3.

In [21], the authors suggested a method to interpret the information gains and when the set contained a single parameter by proposing a hypothetical measurement device. This is not necessary in the current formulation as all the distributions are approximated to be Gaussian. Therefore, when contains only a single parameter, the conditional covariances and are scalar quantities representing the posterior variances of the parameter . When contains more than one parameter, the quantities and are scalars that quantify the gains in information. The above developed functions for information gains (and associated variances) are collectively referred as ‘ISFs’. From this point onwards, the terms marginal posterior variance or just marginal variance for a parameter subset refers to the the variance conditioned on only the measurements, equation (8.5), and the corresponding information gain, equation (8.6), is referred as the marginal information gain. Similarly, the term conditional variance is used to refer to the variance when the measurements and additionally a parameter subset is given, equation (8.7), and the corresponding information gain is referred as the conditional information gain, equation (8.10). Lastly, the information shared between two subsets of parameters given the measurements, equation (8.11), is referred as the conditional mutual information or just the mutual information. Finally, the vector z = [y, y, …, y0] is used to denote a collection of all measurement vectors up to time t.

Results and discussion

In this section, the theory developed above is applied to study three dynamical systems.

Three-element Windkessel model

Windkessel models are widely used to describe arterial haemodynamics [24]. Increasingly, they are also being used as boundary conditions in three-dimensional computational fluid dynamics simulations to assess patient-specific behaviour [20,25]. To perform patient-specific analysis, it is imperative that the parameters of the Windkessel model are estimated from measurements taken in each patient individually. A three-element Windkessel model is shown in figure 1a and consists of three parameters: Rp (proximal resistance) which represents the hydraulic resistance of large vessels; C (capacitance) which represents the compliance of large vessels; and R which represents the resistance of small vessels in the microcirculation. Note that these models use the electric analogy to fluid flow where pressure P is seen as voltage and flow-rate q is seen as electric current. Typically, inlet flow-rate qi is measured (via magnetic resonance imaging or Doppler ultrasound) and inlet pressure Pi is measured by pressure catheters. The goal then is to estimate the parameters (Rp, C and R) by assuming q is deterministically known and minimizing the difference between the Pi reproduced by the model and the P that was measured. The model dynamics is described by the following differential algebraic equations, which may also be rewritten as a single ODE: where Pi and P are the inlet and mid-Windkessel pressures, respectively, (figure 1a); Pext and Pven are the reference external and venous pressures, respectively, which are both set to zero; and qi and qo are the inlet and outlet flow-rates, respectively. The measurement model is written as follows: where ε is the noise (normally distributed with zero mean and variance σ2noise) in measuring Pi to give the measurement y at time t. The measurement vector, therefore, has only one component y = [y]. The nominal values of Rp, C, R are 0.838 mmHg · s cm−3, 0.0424 cm3 mmHg−1 and 9.109 mmHg · s cm−3. Note that these units are chosen so that the results are comprehensible in typical units used in the clinic: millilitres for volume and millimetres of mercury for pressure. Figure 1b shows the inlet flow-rate qi (taken from [20,26] where it was measured in the carotid artery of a healthy 27-year-old subject), and the resulting pressure curves obtained by the solution of equation (9.1) with Pi0 = 85 mmHg and nominal parameter values. To put a zero-mean and unit-variance prior on the parameters, see equation (4.8), the following parameter transformation is considered where ξ represents the real parameter, ξ0 and ς are transformation parameters, respectively, and θ represents the transformed parameter on which a prior of zero mean and unit variance is considered. Therefore, the prior considered on the real parameter ξ has mean ξ0 and variance ς2. The posterior variances for the transformed parameter θ and the real parameter ξ are represented by σ2 and σ2, respectively. A total of 150 time-points, evenly distributed between t = 0 s and t = Tc (where Tc = 0.75s is the time period of the cardiac cycle), are used for the computation of ISFs and conditional variances.

Figure 1.

The three-element Windkessel model, flow-rate curve used, and pressure solutions. (a) Schematic of a three-element Windkessel model and (b) inlet flow-rate curve and Windkessel pressure solution, see equation (9.1), with nominal parameter values. Figure 2 shows the marginal posterior variances (conditional only on the measurements, (a–c)) and the corresponding information gains (d–f) for individual parameters at four different levels of measurement noises. The conditional variances when all measurements are taken into account, i.e. at t = Tc, are also summarized in table 1. An immediate utility of figure 2 is in identify intervals of time where information is concentrated about a parameter. For example, from the first column it is clear that most of the information about the parameter θ is concentrated in the interval t ∈ [0.3, 0.4] as this is the interval that shows maximum reduction in the marginal variance and highest information gain. This interval corresponds to the rising peak of the inlet flow-rate curve, see figure 1b, and from equation (9.1) it is clear that the parameter θ should have most effect on the pressure Pi in this interval. For the parameter θ, it appears from figure 2 that while information is available in the entire cardiac cycle, larger amount of information is concentrated in the later half of the cardiac cycle, t ∈ [0.4, 0.75]. For R information is available throughout the cardiac cycle. These observations have also been presented in [20] through the computation of GSFs [27] and in [21] through a Monte Carlo type computation of information gain. However, as opposed to GSFs which can be non-monotonic and therefore hard to interpret, the ISFs are always monotonic. Furthermore, since the GSFs are normalized by design, they are forced to start at 0 and end at 1, thereby making the assessment of measurement noise difficult. On the other hand, the effect of measurement noise is inherently built in to the ISFs. Figure 2 quantifies how increasing measurement noise results in a decreasing amount of information gained about the parameters. While this behaviour is intuitively expected, its quantification with respect to each individual parameter is made possible with the proposed method. For example, while at σ2noise = 100.0 mmHg2 the conditional variance of the parameter θ after considering all the measurements is 0.158 square units, at σ2noise = 4900.0 mmHg2 this conditional variance is 0.887 square units. Comparing this to the prior variance of 1.0 square units, one may conclude that at measurement noise of 4900.0 mmHg2 (standard deviation of 70.0 mmHg), the parameter Rp is extremely difficult to identify relative to when the measurement noise is 100.0 mmHg2 (standard deviation of 10.0 mmHg). A similar argument can be made for the parameter θ, even though its identifiability is better than that of θ (θ has posterior variance of 0.672 square units at measurement noise of σ2noise = 4900.0 mmHg2). However, the parameter θ appears to be well identifiable even at σ2noise = 4900.0 mmHg2 with final posterior variance of 0.07 square units. This behaviour can be explained by the fact that measurement noise is assumed to be independent and identically distributed with zero mean at all measurement times. Therefore, the mean pressure is measured much more precisely than individual pressure measurements, irrespective of the noise levels, as when mean/expectation of equation (9.2) is taken, the expectation of noise component is zero: where denotes the expectation operator. From equation (9.1) and figure 1a, the inlet mean pressure is equal to the inlet mean flow-rate times the sum of both resistances, i.e. . Approximating by the sample mean as , one obtains

Figure 2.

Marginal posterior variances (a–c) and marginal information gains (d–f) for the three Windkessel model parameters at four different levels of measurement noise.

Table 1.

Prior and posterior variances (marginal and conditional) for the Windkessel model.

	prior				expected posterior
	θ-space		real space (ξ)		θ-space	real space (ξ)
	mean	variance	mean	variance	variance	variance	std./prior-mean
parameter	μ_θ	σ²_θ	μ = ξ₀	σ² = ς²_ξ	σ²_θ	σ²	σ/ξ₀
observation noise, σ²_noise = 100.0
R_p	0	1.0	8.40 × 10⁻¹	1.60 × 10⁻¹	1.58 × 10⁻¹	2.53 × 10⁻²	18.9%
R_p\|C					1.41 × 10⁻¹	2.25 × 10⁻²	17.9%
R_p\|R_d					8.13 × 10⁻²	1.30 × 10⁻²	13.6%
C	0	1.0	4.00 × 10⁻²	4.00 × 10⁻⁴	4.45 × 10⁻²	1.78 × 10⁻⁵	10.5%
C\|R_p					3.95 × 10⁻²	1.58 × 10⁻⁵	9.9%
C\|R_d					4.36 × 10⁻²	1.74 × 10⁻⁵	10.4%
R_d	0	1.0	9.11	2.03 × 10¹	2.73 × 10⁻³	5.52 × 10⁻²	2.6%
R_d\|R_p					1.40 × 10⁻³	2.84 × 10⁻²	1.8%
R_d\|C					2.67 × 10⁻³	5.41 × 10⁻²	2.6%
observation noise, σ²_noise = 625.0
R_p	0	1.0	8.40 × 10⁻¹	0	5.32 × 10⁻¹	8.51 × 10⁻²	34.7%
R_p\|C					5.04 × 10⁻¹	8.06 × 10⁻²	33.8%
R_p\|R_d					3.51 × 10⁻¹	5.61 × 10⁻²	28.2%
C	0	1.0	4.00 × 10⁻²	0	2.16 × 10⁻¹	8.64 × 10⁻⁵	23.2%
C\|R_p					2.04 × 10⁻¹	8.18 × 10⁻⁵	22.6%
C\|R_d					2.16 × 10⁻¹	8.63 × 10⁻⁵	23.2%
R_d	0	1.0	9.11	0	1.31 × 10⁻²	2.66 × 10⁻¹	5.7%
R_d\|R_p					8.66 × 10⁻³	1.75 × 10⁻¹	4.6%
R_d\|C					1.31 × 10⁻²	2.65 × 10⁻¹	5.7%
observation noise, σ²_noise = 2500.0
R_p	0	1.0	8.40 × 10⁻¹	0	8.09 × 10⁻¹	1.29 × 10⁻¹	42.8%
R_p\|C					7.98 × 10⁻¹	1.28 × 10⁻¹	42.5%
R_p\|R_d					6.75 × 10⁻¹	1.08 × 10⁻¹	39.1%
C	0	1.0	4.00 × 10⁻²	0	5.14 × 10⁻¹	2.05 × 10⁻⁴	35.8%
C\|R_p					5.07 × 10⁻¹	2.03 × 10⁻⁴	35.6%
C\|R_d					5.13 × 10⁻¹	2.05 × 10⁻⁴	35.8%
R_d	0	1.0	9.11	0	4.02 × 10⁻²	8.15 × 10⁻¹	9.9%
R_d\|R_p					3.36 × 10⁻²	46.80 × 10⁻¹	9.0%
R_d\|C					4.02 × 10⁻²	8.13 × 10⁻¹	9.9%
observation noise, σ²_noise = 4900.0
R_p	0	1.0	8.40 × 10⁻¹	0	8.87 × 10⁻¹	1.42 × 10⁻¹	44.8%
R_p\|C					8.82 × 10⁻¹	1.41 × 10⁻¹	44.7%
R_p\|R_d					7.99 × 10⁻¹	1.28 × 10⁻¹	42.6%
C	0	1.0	4.00 × 10⁻²	0	6.72 × 10⁻¹	2.69 × 10⁻⁴	41.0%
C\|R_p					6.68 × 10⁻¹	2.67 × 10⁻⁴	40.9%
C\|R_d					6.70 × 10⁻¹	2.68 × 10⁻⁴	40.9%
R_d	0	1.0	9.11	0	7.05 × 10⁻²	1.43	13.1%
R_d\|R_p					6.35 × 10⁻²	1.29	12.5%
R_d\|C					7.03 × 10⁻²	1.42	13.1%

Marginal posterior variances (a–c) and marginal information gains (d–f) for the three Windkessel model parameters at four different levels of measurement noise. Prior and posterior variances (marginal and conditional) for the Windkessel model. As qi is assumed deterministic, from the above equation it can be seen that Rp + R is indirectly measured with high precision. As R is approximately an order of magnitude larger than Rp, it is natural that R dominates the sum (Rp + R) and hence, irrespective of the noise levels, a large amount of information is obtained about R (figure 2c,f). The order of magnitudes of the resistances are chosen by the physics of circulation, where the resistance of small vessels and microcirculation is significantly higher than that of large vessels [20,26], and is reflected in the chosen priors for the problem. Equation (9.5) and the arguments presented above imply that a significant amount of correlation must have been built up between the parameters Rp and R in the posterior distribution as the sum (Rp + R) is measured with high precision. This correlation implies that if one of the parameters Rp or Rp were known then how much additional information can be gained about the other parameter. The CMI presented in equation (8.11) precisely measures this additional information. CMIs for all the three pairs of the parameters are shown in figure 3. It is clear that at the end of the cardiac cycle, the largest CMI is for the parameter pair θ and θ. It is sensible to compare the magnitude of CMIs with the marginal information gains (figure 2). For example, for the case of σ2noise = 100.0, the marginal gain in information about the parameter Rp is approximately 0.9 nats and the mutual information between Rp and R is 0.35 nats; therefore, one may conclude that approximately 40% extra information about the parameter Rp is locked up in the correlation with R. For the pair R and C, it appears that correlation is built up in the t ∈ [0.0, 0.4], the diastole, and destroyed in the remaining part, the systole, of the cardiac cycle. This can be explained by the fact that the time-constant e−, with τ = RC, is the dominant parameter that governs the diastole phase [21] leading to a built up of correlation, and as independent information about C and R is acquired in systole (figure 2) this correlation is destroyed. It should be noted that these aspects, even without knowing the physics (or solution) of the problem, can be naturally inferred from figures 2 and 3.

Figure 3.

Mutual information between all the pairs of Windkessel model parameters at four different levels of measurement noise.

Mutual information between all the pairs of Windkessel model parameters at four different levels of measurement noise. The effect of correlations can be further assessed by looking at the conditional variances (a–c) and conditional information gains (d–f) as depicted in figure 4. For σ2noise = 625.0, this figure shows the conditional posterior variances and the conditional information gains for individual parameters when other parameters are given. For the parameter θ, it can be seen that the conditional variance given θ is lower than the marginal variance in the interval t ∈ [0.4, 0.75] as this is the region where mutual information (correlation) is built between these parameters (figure 3). Similarly, in diastole, t ∈ [0.0, 0.4], it can be seen that the conditional variance of parameter θ given θ is significantly lower as correlation is built up, but this gain quickly diminishes to zero in systole, t ∈ [0.4, 0.75]. For the parameter R, as a large amount of individual information is obtained marginally, the conditional variances are not too different than the marginal variances. Note, that the variances show an opposite behaviour to information gains as a decrease in variance implies gain in information. Therefore, even though the two measures appear to be similar, information gain is a better measure as it can be readily applied to cases where behaviour of a set of parameters is required to be studied. For example, if one was interested in the joint information again for a set of two parameters given a third, the information gain measure will be a scalar but the joint covariance will be a matrix. Furthermore, the relation between conditional information gain, marginal information gain, and mutual information is additive, see equation (8.11), whereas the relation between conditional variance and marginal variance is, in general, not additive. As a demonstration, it can be observed that the conditional information gain curves in figure 4 can be obtained by the addition of the corresponding curves from figures 2 and 3.

Figure 4.

Conditional variances (a–c) and conditional information gains (d–f) for all pairs of the Windkessel model parameters. The measurement noise is σ2noise = 625.0.

The Hodgkin–Huxley model of a neuron

The Hodgkin–Huxley model [28] describes ionic exchanges and their relationship to the membrane voltage in a biological neuron. This model has also been used as the basis for several other ionic models to describe a variety of excitable tissues such as cardiac cells [29]. The model is described by the following ODE equations: with where Vm is the membrane voltage, Cm is the membrane capacitance, Iext is the external current applied; INa, IK and IL are the sodium, potassium and leakage currents, respectively; VNa, VK and VL are the equilibrium potentials for sodium, potassium and leakage ions, respectively; gNa, gK and gL are the maximum conductances for the channels of sodium, potassium and leakage ions, respectively; and m, h and n are the dimensionless gate variables, m, h, n ∈ [0, 1], that characterize the activation and inactivation of sodium and potassium channels. Cm is set to 1 μF cm−2, and the equilibrium potentials are defined in millivolts (mV) relative to the membrane resting potential, ER, as follows [30,31]: The inverse problem is of estimating the three parameters gNa, gK and gL by measuring the membrane voltage Vm when a constant external current Iext = 20 μA cm−2 is applied to the neuron. It is well known that when a relatively high constant external current is applied the neuron exhibits a tonic spiking pattern in membrane voltage Vm [32-34]. With nominal parameter values of gNa = 120.0 mS cm−2, gK = 36.0 mS cm−2 and gL = 0.3 mS cm−2, and initial conditions of Vm(0) = −75 mV, m(0) = 0.05, h(0) = 0.6 and n(0) = 0.325, this tonic spiking behaviour, generated by solving equation (9.6), is shown in figure 5. The observation model reads where Vm is the membrane voltage at time t and ε is the zero-mean measurement noise with variance σ2noise. As only Vm is measured the observation vector is y = [y]. As opposed to the Windkessel case where the effect of noise is evaluated, in this case the effect of number of observations, i.e. the observation frequency is evaluated. Nobs number of measurement time-points evenly distributed in the time interval t ∈ [0.0, 40.0] ms are studied. Four levels of observation frequencies resulting in four values of Nobs ∈ {100, 200, 400, 800} are used while σ2noise is set to 100.0 mV2 (standard deviation of 10.0 mV). Similar to the Windkessel example the following parametrization is used to impose zero-mean and unit-variance priors on the parameters. where ξ0 is the nominal parameter value, zero-mean and unit-variance normal distribution prior is imposed on the transformed parameter θ, resulting in the prior distribution imposed on the real parameter ξ to be a normal distribution with mean ξ0 and variance ς2. The parameters ς are set to 10.0, 6.0 and 0.1 mS cm−2 for gNa, gK and gL, respectively.

Figure 5.

Solution of the Hodgkin–Huxley model, equation (9.6), for nominal parameter values.

Solution of the Hodgkin–Huxley model, equation (9.6), for nominal parameter values. Figure 6 shows the posterior marginal variances (a–c) and the marginal information gains (d–f) for the three parameters for all the four observation frequencies. In all these plots, an arbitrarily scaled Vm(t) curve is shown in light grey for ease of interpretation relative to Vm(t) variations. As expected, increasing the measurement frequency results in larger amounts of information (and consequently larger reduction in the posterior variances). However, it is observed that the parameters θ and θ benefit most from an increase in measurement frequency as opposed to the parameter θ which benefits only marginally. This implies that at low observation frequencies the identifiability of θ is good, while very low amount of information is available for the parameters θ and θ. The behaviour for the parameter θ (figure 6b,e) shows that the information about this parameter is concentrated mostly in the sharp rising phase of the action potential Vm. A similar behaviour, although less salient, is observed for the parameters θ and θ (figure 6a,d,c,f). While the Hodgkin–Huxley model is quite complex with gating variables of different time-constants and dependence of ionic currents on powers (up to fourth power) of the gating variables, it is widely understood that the rising phases of the action potential Vm are related to the sodium and potassium currents. This may explain why information about the parameters θ and θ is mostly concentrated in this region. Furthermore, if we accept that the sodium and potassium currents, in combination, are responsible for the rising action potential, then we should also expect a substantial amount of correlation between the parameters θ and θ as it should be hard to distinguish between these two parameters. This is precisely what is observed by the CMI analysis, figure 7, where a large amount of mutual information is developed between these two parameters. For the case of Nobs = 100, the marginal information gain in the parameter θ, figure 6, is approximately 0.3 nats, and it is observed from figure 7 that approximately 0.7 nats of mutual information exists between θ and θ. This implies that the amount of information that can be gained about θ by knowing θ, in addition to the measurements, is larger than the amount of information gained by just the measurements. Indeed, as the observation frequency is increased more information is available about all the parameters individually. Figure 7 also shows that significant amount of correlation is built between the parameters θ and θ during the sharp rising part of Vm. For example, for Nobs = 200, the amount of CMI between θ and θ is approximately 0.14 nats (figure 7), approximately the same magnitude as the marginal information gain of 0.15 nats (figure 6) for the parameter θ. At the same time, since the marginal information gain for θ is approximately 1.25 nats (figure 6), the effect of this correlation, amounting to an information gain of 0.14 nats (figure 7), is not too significant for estimating θ.

Figure 6.

Figure 7.

(a–c) Mutual information between all the pairs of the parameters of the Hodgkin–Huxley model. Four different measurement frequencies are considered. In all the plots, an arbitrarily scaled Vm curve is shown in grey.

Marginal posterior variances (a–c) and marginal information gains (d–f) for all the Hodgkin–Huxley model parameters. Four different measurement frequencies are considered. In all the plots, an arbitrarily scaled Vm curve is shown in grey. (a–c) Mutual information between all the pairs of the parameters of the Hodgkin–Huxley model. Four different measurement frequencies are considered. In all the plots, an arbitrarily scaled Vm curve is shown in grey. Finally, the effect of the CMI, i.e. the correlation, can also be seen in terms of the conditional variances and conditional information gains as shown in figure 8 for Nobs = 200. As discussed above the correlations between the pairs (θ, θ) and (θ, θ) show that the conditional variances are significantly lower (and the conditional information gain is larger) for one parameter when the other parameter is additionally known. It should be noted that the correlations and information gains presented are specific to the protocol, i.e. a constant external current resulting in tonic spiking of the neuron and only Vm being measured. The information gains will behave differently if the protocol is changed, for example to intermittent step currents or continuously varying external currents. Therefore, one application of the methods proposed in this article can be in optimal design of experiments, where one may design the protocol such that maximal information gain occurs for individual parameters while CMI (correlations in the posterior distribution) are minimized.

Figure 8.

Conditional variances (a–c) and conditional information gains (d–f) for all pairs of the Hodgkin–Huxley model parameters. The case with Nobs = 200 is shown.

Influenza A virus kinetics

The final example presented is for the kinetics of the influenza A virus. The following model was proposed by Baccam et al. [35] to describe viral infection where V is the infectious virus titre (measured in TCID50 ml−1 of nasal wash), T is the number of uninfected target cells, I is the number of productively infected cells and {β, δ, p, c} are the model parameters. The parameter p represents the average rate at which the productively infected cells, I, increase the viral titres, and the parameter δ represents the rate at which the infected cells die. The parameter β characterises the rate at which the susceptible cells become infected and c represents the clearing rate of the virus. As opposed to the previous example where the initial conditions were assumed to be known, in this example, the initial conditions for the virus titre V0 and the number of uninfected target cells T0 are considered unknown and hence form the parameters of the dynamical system. Time is measured in days (d) and the initial condition for the number of infected cells I0 is assumed to be known at 0.0. Hence there are six parameters [β, δ, p, c, V0, T0] in total. The nominal values of the parameters are chosen to be β = 2.7 × 10−5 (TCID50 ml−1)−1 d−1, δ = 4.0 d−1, p = 0.012 TCID50 ml−1 · d−1, c = 3.0 d−1, V0 = 0.1 TCID50 ml−1 and T0 = 4 × 108 based on the average patient parameters identified by Baccam et al. [35]. As in the previous examples, the following parametrization is used to impose zero-mean and unit-variance priors on the transformed parameters: where θ represents the transformed version of the real parameter ξ, ξ0 represents the nominal values of the parameter, and hence with a zero-mean and unit-variance prior on the transformed parameters θ, the prior imposed on the real parameter is of mean ξ0 and variance ς2. The scaling parameters ς are set to 9 ×10−06, 1.3, 0.004, 1.0, 0.03 and 2.0 × 108 for β, δ, p, c, V0 and T0, respectively, in their respective units. The solution to equation (9.11) for the nominal parameter values is shown in figure 9. It is observed that both the virus titre V and the number of infected cells I increase sharply until they peak at the 2–3 day mark. After this a decrease in both values is observed. The number of uninfected target cells T remains approximately constant until the 2 day mark after which a sharp decrease (approx. 4 orders of magnitude) is observed over the next 2 days leading to a plateau.

Figure 9.

Solution of the influenza A kinetics model, equation (9.11), for nominal parameter values.

Solution of the influenza A kinetics model, equation (9.11), for nominal parameter values. To study the sensitivity and information gain two cases are considered: first, when only V is measured; and second, when both V and I are measured. In the first case, the observation model reads: where V is the virus titre concentration at time t and ε is the zero-mean measurement noise with variance σ2noise = 2.5 × 107 (TCID50 ml−1)2, i.e. a standard deviation of 5 × 103 TCID50 ml−1. A total of 200 measurements are evenly distributed between 0 days and 10 days for the computation of marginal variances and information gains. Figure 10 shows the marginal variances for all the parameters in solid lines and the conditional variances for a four pairs of parameters in dashed lines. Given the dynamics of the problem as shown in figure 9 it is not surprising that most of the information gain about all the parameters occurs in t ∈ [0, 4] days. The parameters θ, θ and θ appear to be well identifiable given the large decreases in marginal variances. However, the initial conditions θ and θ show less decrease in the variances indicating problems in their identifiability. Finally, the parameter p appears to be unidentifiable given that its marginal variance decreases from 1.0 (standard deviation 1.0) to only 0.7 square units (standard deviation 0.84 units). Figure 11 shows the mutual information between all the pairs of the parameters, where the parameter pairs that show a high mutual information are plotted in dashed lines. For the parameters in these pairs of high mutual information, (θ, θc) and (θp, θ), the conditional variances are plotted in figure 10. The parameter pair (θp, θ) is particularly interesting as the parameter θp, although unidentifiable individually, becomes very well identifiable, owing to the large mutual information it shares with T0, if the initial condition T0 is known. This observation was proved through classical methods by Miao et al. [6] where it was shown that taking higher-order derivatives of equation (9.11) and eliminating the unmeasured variables, T and I, one obtains the following differential equation:

Figure 10.

Figure 11.

Mutual information between all pairs of the influenza A kinetics model. Pairs with significant (large) mutual information are plotted in dashed lines.

Marginal posterior variances (solid lines) and conditional variances (dashed lines) for the parameters of the influenza A kinetics model. Only V is measured with a measurement noise of σ2noise = 2.5 × 107 (TCID50 ml−1)2. Mutual information between all pairs of the influenza A kinetics model. Pairs with significant (large) mutual information are plotted in dashed lines. As the above equation does not contain the parameter p, in the absence of any other quantity, i.e. T and I, and the corresponding initial conditions, the parameter p is not identifiable. Miao et al. [6] also reported that when T0 is known, the parameter p becomes identifiable, which is consistent with the large mutual information. In the Bayesian approach adopted in this manuscript, a non-zero amount of knowledge (non-infinite variance) is inherently assumed in the prior for θ, which results in a small amount of information gain (and hence a small reduction in the marginal variance from 1.0 to 0.7 square units). This small amount of information gain is a result of the knowledge assumed in the prior. However, it is not significant enough to hide the identifiability problem for θp. One can choose to impose prior of higher ignorance by increasing the prior variance of the real parameter T0 by increasing the scaling factor ς. The results for four different values of ς on the marginal variance of the parameter θp are shown in figure 12b. It is clear that a higher value of ς, which implies higher ignorance in the prior for T0, results in a decreasing amount of information gained about the parameter θp. This example shows how, without the use of classical analytical methods, see for example those presented in [6], which may not be easily applicable to all dynamical systems, the information theoretic approach can provide similar conclusions about parameter identifiability. Lastly, the classical sensitivity of the parameter p to the measurable V is shown in figure 12a, whose large magnitude does not indicate any problems of parameter identifiability. Finally, Miao et al. [6] reported that all the parameters of the influenza dynamical system were well identifiable if both V and I, or both V and T were measured. For the case when both V and I are measured, the marginal variances are shown in figure 13, which too shows that no identifiability problems persist in this case. Note that the error structure in the measurement of I was assumed to be identical to the measurement of V, equation (9.13).

Figure 12.

Sensitivity of the measurable V with respect to θp (a) and the marginal posterior variance of the parameter θp at different levels of ς (b). The prior imposed on T0 is of variance ς2.

Figure 13.

Marginal posterior variances for the parameters of the influenza A kinetics model. when both V and I are measured.

Sensitivity of the measurable V with respect to θp (a) and the marginal posterior variance of the parameter θp at different levels of ς (b). The prior imposed on T0 is of variance ς2. Marginal posterior variances for the parameters of the influenza A kinetics model. when both V and I are measured.

Conclusion

A new class of functions called the ‘ISFs’ have been proposed to study parametric information gain in a dynamical system. Based on a Bayesian and information-theoretic approach, such functions are easy to compute through classical sensitivity analysis. Compared to the previously proposed generalized sensitivity functions (GSFs) [27] to measure such information gain, the ISFs do not suffer from the forced-to-one behaviour and are easy to interpret as correlations are measured through separate measures of mutual information as opposed to oscillations in GSFs. Furthermore, as opposed to GSFs, which are normalized, the ISFs can be used to compare information gain between different parameters and hence can be used to rank the parameters on ease of identifiability. They can be used to identify regions of high information content and indicate identifiability problems for parameters which show little to no information gain, or high mutual information (correlation) with other parameters. The application of ISFs is demonstrated on three models. For the Windkessel model, the effect of measurement noise is illustrated and it is shown that the insights provided by ISFs are consistent with those of a significantly more expensive Monte Carlo type approach [21]. For the Hodgkin–Huxley model, the effect of measurement frequency is illustrated, and finally, for the influenza A virus, it is shown how, even when classical sensitivity analysis fails to assess identifiability issues, the ISFs correctly reveal identifiability problems, which have been analytically proven through classical methods.

22 in total

1. Generalized sensitivity functions in physiological system identification.

Authors: K Thomaseth; C Cobelli
Journal: Ann Biomed Eng Date: 1999 Sep-Oct Impact factor: 3.934

2. Responses of a Hodgkin-Huxley neuron to various types of spike-train inputs.

Authors: H Hasegawa
Journal: Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics Date: 2000-01

3. Stochastic and deterministic models for agricultural production networks.

Authors: P Bai; H T Banks; S Dediu; A Y Govan; M Last; A L Lloyd; H K Nguyen; M S Olufsen; G Rempala; B D Slenning
Journal: Math Biosci Eng Date: 2007-07 Impact factor: 2.080

4. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood.

Authors: A Raue; C Kreutz; T Maiwald; J Bachmann; M Schilling; U Klingmüller; J Timmer
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

5. The CellML Model Repository.

Authors: Catherine M Lloyd; James R Lawson; Peter J Hunter; Poul F Nielsen
Journal: Bioinformatics Date: 2008-07-25 Impact factor: 6.937

6. Modelling and disentangling physiological mechanisms: linear and nonlinear identification techniques for analysis of cardiovascular regulation.

Authors: Jerry Batzel; Giuseppe Baselli; Ramakrishna Mukkamala; Ki H Chon
Journal: Philos Trans A Math Phys Eng Sci Date: 2009-04-13 Impact factor: 4.226

7. Outflow boundary conditions for 3D simulations of non-periodic blood flow and pressure fields in deformable arteries.

Authors: I E Vignon-Clementel; C A Figueroa; K E Jansen; C A Taylor
Journal: Comput Methods Biomech Biomed Engin Date: 2010-10 Impact factor: 1.763

8. Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems.

Authors: Maria Rodriguez-Fernandez; Jose A Egea; Julio R Banga
Journal: BMC Bioinformatics Date: 2006-11-02 Impact factor: 3.169

9. Systematic identifiability testing for unambiguous mechanistic modeling--application to JAK-STAT, MAP kinase, and NF-kappaB signaling pathway models.

Authors: Tom Quaiser; Martin Mönnigmann
Journal: BMC Syst Biol Date: 2009-05-09

10. A simple method for identifying parameter correlations in partially observed linear dynamic models.

Authors: Pu Li; Quoc Dong Vu
Journal: BMC Syst Biol Date: 2015-12-14