Literature DB >> 30419074

Determining minimal output sets that ensure structural identifiability.

Abstract

The process of inferring parameter values from experimental data can be a cumbersome task. In addition, the collection of experimental data can be time consuming and costly. This paper covers both these issues by addressing the following question: "Which experimental outputs should be measured to ensure that unique model parameters can be calculated?". Stated formally, we examine the topic of minimal output sets that guarantee a model's structural identifiability. To that end, we introduce an algorithm that guides a researcher as to which model outputs to measure. Our algorithm consists of an iterative structural identifiability analysis and can determine multiple minimal output sets of a model. This choice in different output sets offers researchers flexibility during experimental design. Our method can determine minimal output sets of large differential equation models within short computational times.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 30419074 PMCID： PMC6231658 DOI： 10.1371/journal.pone.0207334

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Mathematical models are powerful tools that enable the scientific community to understand processes otherwise immeasurable by predicting outcomes of numerous physical properties. The field of systems biology often utilises ordinary differential equations to model dynamic systems. These models can comprise large systems of differential equations that contain vast numbers of unknown parameters [1]. Despite improvements in the quality of experimental sensors and therefore both the quality and quantity of experimental data, the process of parameter estimation remains cumbersome. This may be due to noisy data or due to the inherent structure of the model (structural unidentifiability) [2]. A structurally unidentifiable model implies that certain parameters are totally correlated, also referred to as ‘aliased’, and have confidence intervals that span the interval (−∞, ∞). Uncertainty in inferred parameter values calls into question the validity of the entire model and therefore it is imperative to address these uncertainties upfront by conducting identifiability analyses. We will focus on ensuring structural identifiability and since this property can be analysed before conducting experiments, our analysis can be utilised in preliminary experimental design. An experimental researcher may wish to know: “Which of the pre-defined model outputs do I at least need to measure to ensure that I can infer unique parameter values?”. The answer is addressed by the topic of minimal output sets, where a minimal output set is defined as: Measuring a minimal set of model outputs ensures that a model is structurally identifiable. Due to its complexity, the topic of minimal output sets has received little attention [3]. Scientists often rely on intuitive experimental design, which may easily result in redundant or insufficient experimental measurements. In this paper we present an algorithm to determine minimal output sets by identifying sets of totally correlated parameters using an iterative structural identifiability analysis. This algorithm offers insight into which states should be measured, thereby aiding intuitive experimental design. A particular model may have multiple minimal output sets. This offers great flexibility to the experimental researcher as he/she can decide which output set to measure taking factors such as time, cost and physical constraints into account. This structural identifiability issue has been considered in a few previous papers [3-6], which we will briefly describe. The first paper, published in 2009, introduces a minimisation algorithm to determine which parameters are identifiable [4]. Three simple examples are included and due to its computational complexity the author states that defining minimal output sets for medium sized models is still too hard using this algorithm. In a paper published in 2012, the authors present an algorithm tasked with identifying symmetries, i.e. sets of totally correlated parameters, in a system of differential equations [3]. Once these symmetries have been identified, the states and parameters that destroy these symmetries are included into minimal output sets. Minimal output sets of the well-known NF-κB and JAK/STAT models are determined assuming that all model parameters and states can potentially be measured. The final step in their algorithm is doing a symbolic computation to test for structural identifiability and identify any remaining symmetries. Other papers address observability [5, 6]. Identifiability can be regarded as a special case of observability [7]. In [5], the authors introduce a graphical method and illustrate its key concepts using non-linear models. They construct a directed graph from the so-called adjacency matrix and inspect it to identify strongly connected components and more specifically root strongly connected components. Two nodes are classified as strongly connected if they are reachable from each other [8]. Root strongly connected components are strongly connected components with no outgoing edges. Minimum output sets are identified from the different elements in these root strongly connected components. A different approach is followed by Letellier and co-authors [6]. They use a symbolically computed Jacobi matrix to compute the output sets that ensure observability. An interesting extension of minimal output sets in the preliminary experimental design phase, could be to determine these sets taking measurement noise into account, thereby establishing practical identifiability. To this end, Docherty and co-authors present a graphical method to identify such sets [9]. Our minimal output set algorithm is different from the existing techniques as it numerically identifies sets of unidentifiable parameters. Through a number of computational experiments, we provide evidence (but not a complete mathematical proof) that our proposed algorithm has the following attributes: It can calculate the minimal output sets of large models. It can easily be adjusted to allow for cases in which only a limited subset of predefined outputs are measurable. This is illustrated in example 7 in the results and discussion section. Non-rational models can also be analysed as shown in example 8 in the results and discussion section. The numerical findings are validated in a second step using symbolic computations as explained in [10]. This paper is divided into the following sections: Section 2 covers the underlying theory and concepts of our algorithm. Section 3 showcases the algorithm using 8 examples and the final section contains concluding remarks.

Materials and methods

Background theory

Many dynamic systems biology phenomena are described in terms of differential equation models. These models can often be written in the standard state-space form [11]: State variables are contained in a vector (t) (dim = n), model parameters are contained in vector , (dim = p) and the output signals or measured variables are contained in vector (t) (dim = m). Function denotes a dynamic model structure and is the output or observation function. Our approach allows for functions and to be either rational or non-rational. Unknown initial conditions of model states in vector 0, can be regarded as additional unknown parameters and can accordingly be included into . If all initial conditions are unknown, contains n + p elements. The identifiability analysis method used in this paper was first proposed in [10]. In essence, this method relies on the singular value decomposition (SVD) of an output sensitivity matrix. Reid introduced the concept of sensitivity based identifiability analysis for linear models [12]. In his paper, he defines a sensitivity matrix as = ∂/∂, with its elements describing the sensitivities of the model output with respect to model parameters. These partial derivatives are evaluated for nominal parameter values . Let Δ denote a small perturbation of the nominal vector , so = + Δ. This perturbation will result in a corresponding perturbation in the model output, () = () + Δ . A first order Taylor series approximation can be used to relate these perturbations [13, 14]: To solve Δ from the measured Δ uniquely, should be non-singular [15-17] and therefore should be of full rank [18, 19]. For non-linear models, the individual sensitivities are obtained by deriving the model (1)–(3) with respect to , thereby obtaining the system: To obtain the output sensitivity matrix , the matrix function ∂/∂ is evaluated over a discretised finite time grid, [t0, …, t] and the obtained matrices at each time point are concatenated [10]. It is advantageous to normalise to adjust for sensitivities measured in different units [20]. We emphasise that working with normalised matrix elements is numerically attractive but not essential. The normalised matrix is given as: If all the initial conditions of model states are unknown, matrix (and also ) has dimensions M × (n + p), with M = m × (N + 1). To determine the rank of (and also ), the numerical rank test using a SVD reads as [15]: The 2 matrices of importance are, the diagonal matrix, Σ (dim M × (n + p)), and (dim (n + p) × (n + p)). The singular values in Σ, σ, i = 1, …, n + p, are used to determine whether or not (or ) is of full rank. The rank of (or ) is the number of non-zero singular values and this can be expressed as follows [15]: In practice, singular values are never exactly vanishing due to numerical rounding errors. That is why one uses as practical definition: zero-valued singular values are values that fall beyond a significant gap in the spectrum of singular values [21]. In this paper we consider a gap larger than 3 decades on the log scale as significant. Once structural unidentifiability has been established, the non-zero entries of the singular vectors of matrix , related to vanishing singular values beyond this gap, allude to which model parameters and initial conditions may be unidentifiable. The singular values and the unidentifiable parameters are graphically illustrated in a so-called identifiability signature [22]. To illustrate our approach, we use the NF-κB model, also analysed in Section 3. It has 15 states and 28 model parameters and if all the initial conditions of the individual model states are considered to be unknown, it has a total of 43 parameters [3]. Measuring = {x1, …, x15} as model output, we observe no gap in the singular values (See Fig 1). This confirms that there are no vanishing singular values and therefore the sensitivity matrix, , is of full rank and the model is structurally identifiable for this particular choice of output sensors.

Fig 1

NF- κB model: Singular values of the output sensitivity matrix, S, if we measure all states, {x1, …, x15}, as model output.

Singular values, arranged in descending order, reveal no gap. This suggests that the sensitivity matrix is of full rank and therefore the model is structurally identifiable.

NF- κB model: Singular values of the output sensitivity matrix, S, if we measure all states, {x1, …, x15}, as model output.

Singular values, arranged in descending order, reveal no gap. This suggests that the sensitivity matrix is of full rank and therefore the model is structurally identifiable. However, if we omit state x4 from the output, , we observe from Fig 2 that matrix is now rank deficient. This is apparent from the clear gap in the singular values and the vanishing singular value of σ43 = 7.8 × 10−16.

Fig 2

NF- κB model: Singular values of the output sensitivity matrix, S, if we measure all states apart from x4.

NF- κB model: Singular values of the output sensitivity matrix, S, if we measure all states apart from x4.

Singular values, arranged in descending order, reveal a clear gap with σ43 = 7.8 × 10−16. This indicates that the sensitivity matrix is rank deficient and so the model is structurally unidentifiable. We can now examine the columns of , corresponding to vanishing singular values, for suggestions as to which model parameters may be unidentifiable. Fig 2 reveals only 1 vanishing singular value and therefore it suffices to consider only the last column vector, 43, corresponding to σ43. The non-zero entries in Fig 3 reveal that parameters θ2, θ3, θ27 and the initial condition x4(0), are both totally correlated and unidentifiable. To ensure the model’s structural identifiability, the omitted state, x4, has to be measured and so is included into any minimal output set. In contrast, omitting state x3 from the output set does not change this model’s identifiability and therefore can be omitted from a minimal output set.

Fig 3

NF- κB model: Entries in the last right singular vector corresponding to the vanishing singular value, σ43, in Fig 2.

The corresponding non-trivial null-space indicates that parameters θ2, θ3, θ27 and initial condition x4(0) are totally correlated.

NF- κB model: Entries in the last right singular vector corresponding to the vanishing singular value, σ43, in Fig 2.

The corresponding non-trivial null-space indicates that parameters θ2, θ3, θ27 and initial condition x4(0) are totally correlated.

Minimal output set algorithm

Here, we present our algorithm to detect minimal output sets. We first outline the ideas underlying the algorithm and then discuss the subsequent steps. It is important to realise that the parameters to be identified may comprise both system parameters θ, j = 1, .., p, and initial values of the states, x(0), j = 1, …, n. We assume that the numerical values assigned to the elements in both and (0) are regular points, where it is known that the rank of the sensitivity matrix does not change in the neighbourhood of a regular point. To ensure that this assumption holds, it may be useful to repeat the algorithm for a different values in the vicinity of a chosen regular point. System parameters are to be inferred from measurements of model states that may or may not be measured directly and so are usually not regarded as measurable outputs. For the time being, we assume that the pre-defined measurable outputs y, j = 1, …, m, also referred to as sensors, are identical to the states x, j = 1, …, n, and therefore m = n. Later on we show that this assumption can easily be relaxed. We may also take for granted that the system is identifiable when all sensors are measured. If this would not be the case, searching for minimal output sets would clearly not be possible. The main idea of the algorithm is to systematically omit elements from the set of all available sensors, thereby searching for essential sensors that absolutely can not be omitted to keep the system identifiable. As explained above, unidentifiability is detected by inspecting the calculated singular vales of the sensitivity matrix in (7). If these singular values show a gap of 3 decades or larger, we conclude unidentifiability and subsequently proceed to identify the essential sensors that need to be included into a model’s minimal output sets. Let be the set of all available sensors with set-cardinality || = m. The algorithm involves an iterative identifiability analyses in which sensors are omitted step-wise from the maximum starting set . Systematically more and more sensors are left out as to find all essential sensors that are needed for a minimal output set (MOS). Let k be the number of sensors to be omitted from a set of available sensors, . Starting with k = 1, we leave out one-sensor-at-a-time from the initial set of all available sensors, 1 ≔ . Each time measuring with a different set of sensors from 1, we conduct identifiability analyses for k = 1. If a lack of identifiability is detected, the unidentifiable parameters are stored in a set ϕ1 and the corresponding omitted sensors that cause unidentifiability are stored in a set ψ1. Continuing this way, we get unidentifiable parameter sets {ϕ, i = 1, …, l1}, and the corresponding omitted sensor sets {ψ, i = 1, …, l1} that cause a lack of identifiability. Here, l1 is the total number of unidentifiable parameter sets identified for the case of omitting one-sensor-at-a-time (k = 1). The unidentifiable parameter sets ϕ can be found by inspecting the non-zero entries in the singular vectors of the matrix corresponding with the zero-valued singular values (as can be seen from the identifiability signature). To ensure structural identifiability, the essential sensors form each {ψ, i = 1, …, l1} must be included into any minimal output set. Having checked all possibilities of leaving out one-sensor-at-a-time, we can now define a new set of available sensors, say 2, that is created by excluding the previously found sensors in the sets {ψ, i = 1, …, l1} from the set 1. Since we know for sure that these excluded sensors are needed for a model’s structural identifiability, they are permanently included into all sensor sets that are measured from now on. Hence, the case k = 1 reduces the number of candidate sensors to choose from in the next iteration from m to m′ = m − l1. Next, we leave out 2-sensors-at-a-time (the case k = 2) from 2 and check for identifiability. Since set cardinality |2| now equals m′ ≤ m, we have choices for omitting 2 sensors from this set. If unidentifiability is detected, a new set of unidentifiable parameters is compiled from the identifiability signature and stored in ϕ, and the 2 corresponding left-out sensors are stored in ψ. Proceeding this way, the total number of unidentifiable sets that can be found for k = 2 are collected in the sets {ϕ, i = l1 + 1, …, l1 + l2} and the corresponding omitted sensor sets, {ψ, i = l1 + 1, …, l1 + l2}. Assume now that for the case of leaving out 2-sensors-at-a-time (k = 2), we have found an unidentifiable parameter set ϕ. Apparently, this new set, ϕ, only occurs when these 2 particular sensors are missing and therefore, either 1 of these 2 essential sensors must be included in a MOS. Hence, the available sensor sets for the case k = 3 branch out into two sets, namely 3,1 and 3,2. When leaving out three-sensors-at-a-time in the next iteration of our algorithm (case k = 3), we have to iterate both of these available sensor sets to find more unidentifiable parameter sets {ϕ, i = l1 + l2 + 1, …, l1 + l2 + l3}. Continuing in this way for k = 3, 4, …, we complete our search for essential sensors when leaving out k sensors at a time. At the same time guaranteeing that the sensors that are needed for the identifiability of our model, identified in earlier iterations k − 1, k − 2, …, 1, are included in each new measured output. Clearly, for larger models the output, , will contain a large number of sensors and in these cases an exhaustive search will be computationally demanding. The computational burden may however be substantially reduced by randomly selecting outputs from an intermediate set of available sensors (for a certain iteration step k) using a series of Bernoulli trial experiments. The number of sensors to include into each sensor set can then be chosen in such a way that the chance of successfully detecting an unidentifiable set of parameters is more than 99.5% (refer to supplementary S9 File). We further note that in practice our experience shows that the values k = 1, 2, 3 already summarise the majority of possible unidentifiable parameter sets ϕ. More importantly, once we have established a few required sensors on basis of lower k values, one can perform an additional check for a lack of identifiability when using only the required sensors that have already been determined for the lower k values. Such a test will immediately reveal additional correlations that still need to be found for larger k values, but these correlations are not yet neatly separated in a systematic way. This check does, however, demonstrate decisively whether we need to continue our search for larger k values (e.g. k = 4, 5, …), yes or no or whether one can already define minimal outputs sets from the already identified essential sensors. Finally, in reality the output, , is not always identical to the states . For example, one could have = {x1 + x3 + x4, θ16(x3 + x4 + x5 + x12), θ17(x4 + x5)}. Our algorithm allows for the user to define these more complex outputs in a straightforward manner: Instead of omitting states {x, i = 1, …, n}, we now systematically omit outputs {y, j = 1, …, m} to find the essential sensors needed in a MOS.

Results and discussion

Example 1: A chemical reaction system

This model was used by Liu and co-authors to illustrate their method ensuring observability based on the graphical analysis of a model’s structure [5]. It contains 11 states and 6 model parameters and potentially has 17 unknown parameters. Examining the structure of the model by evaluating its adjacency/Jacobi matrix, the authors detected 3 root strongly connected components and identified 6 minimal output sets. These observability results were confirmed using our algorithm. Additionally, we expanded the scope of the problem to define minimal output sets that guarantee this model’s structural identifiability. We found that the minimal output sets that ensure observability also ensure identifiability and these are: {x4, x6, x7}, {x4, x6, x8}, {x4, x6, x9}, {x5, x6, x7}, {x5, x6, x8} and {x5, x6, x9}. These results were obtained in 6 minutes and 35 seconds using a Intel Core i7 processor with 8GB RAM (see S1 File for details). Using our algorithm, we detected 3 different sets of unidentifiable parameters, {ϕ1, ϕ2, ϕ3}. Each of these sets can be verified symbolically, which also allows for the identification of different totally correlated sets of parameters within each set, ϕ (see supplementary S8 File for the symbolic verification of all 3 unidentifiable sets). The results obtained for the different values of k are summarised in Table 1.

Table 1

Results obtained analysing the chemical reaction model.

k value	Number of sets	Unidentifiable parameters sets	Omitted sensors	Computational time (seconds)
1	1	ϕ₁ = {x₆(0)}	ψ₁ = {x₆}	4.5
2	1	ϕ₂ = {θ₂, θ₃, x₄(0), x₅(0)}	ψ₂ = {x₄, x₅}	14.6
3	1	ϕ₃ = {θ₄, θ₅, x₇(0), x₈(0), x₉(0)}	ψ₃ = {x₇, x₈, x₉}	53.6
4	0			53.5
5	0			112.3
6	0			90.4
7	0			48.7
8	0			17.1

Figs 4 and 5 indicate the identifiability signature obtained when measuring the output, {x1, x2, x3, x6, x7, x8, x9, x10, x11}, here k = 2. The 4 zero-valued singular values indicate that the model is unidentifiable when measuring this output. The unidentifiable parameters can be identified by looking at the non-zero entries in the last 4 columns of matrix , each corresponding to a singular value beyond the gap. Fig 5 reveals the unidentifiable parameter set, ϕ2 = {θ2, θ3, x4(0), x5(0)} and accordingly, the essential sensors are ψ2 = {x4, x5}. The symbolic verification of this set yields a non-trivial null-space with 4 base vectors: , where .

Fig 4

Example 1: Structural identifiability results of a chemical reaction system: Singular values of the output sensitivity matrix, S, when measuring the output {x1 …, x11} omitting sensors x4 and x5.

Singular values, arranged in descending order, reveal a clear gap. This gap, in conjunction with the smallest singular value, σ17 = 2.4 × 10−17, indicate that the model is structurally unidentifiable when measuring this output.

Fig 5

Example 1: Structural identifiability results of a chemical reaction system: Non-zero entries in the last 4 columns of matrix V.

These indicate that initial conditions x4(0) and x5(0) and model parameters θ2 and θ3 are unidentifiable. Since x4 and x5 are defined in , both of these sensors are essential.

Example 1: Structural identifiability results of a chemical reaction system: Singular values of the output sensitivity matrix, S, when measuring the output {x1 …, x11} omitting sensors x4 and x5.

Example 1: Structural identifiability results of a chemical reaction system: Non-zero entries in the last 4 columns of matrix V.

These indicate that initial conditions x4(0) and x5(0) and model parameters θ2 and θ3 are unidentifiable. Since x4 and x5 are defined in , both of these sensors are essential.

Example 2: NF-κB model

This model describes the two-feedback-loop regulatory module of nuclear factor NF-κB signalling pathway. It involves two-compartment kinetics of the activators IκB (IKK) and NF-κB, the inhibitors, A20 and IκBα, and their complexes. In response to extra-cellular signals such as tumour necrosis factor, the activation of IKK ultimately stimulates the release of the main activator NF-κB, which enters the nucleus and triggers transcription of the inhibitors and numerous other genes [23] (See supplementary S2 File for a model description). The model contains 15 states and 28 model parameters and assuming the initial state conditions to be unknown, it has 43 unknown parameters in total. Minimal output sets for this model were first identified by Anguelova and co-authors [3]. We found the model structural identifiable when measuring all states, = {x1, …, x15}. Our algorithm identified 5 different sets of unidentifiable parameters: ϕ1 = {θ2, θ3, θ27, x4(0)}, ϕ2 = {θ5, θ6, θ18, x5(0)}, ϕ3 = {θ8, θ9, θ10, x6(0)}, ϕ4 = {θ19, θ27, x10(0)} and ϕ5 = {x12(0)}. The corresponding sets of essential sensors are: ψ1 = {x4}, ψ2 = {x5}, ψ3 = {x6}, ψ4 = {x10} and ψ5 = {x12} and these results were obtained in 29.5 seconds. Analysing the model for all the different values of k took 8 minutes and 20 seconds. The resulting minimal output set, {x4, x5, x6, x10, x12}, is identical to the minimal output set defined by Anguelova and co-authors [3]. Figs 2 and 3 show the identifiability signature obtained when sensor x4 is omitted from . The symbolic verification of the unidentifiable set shown in Fig 3 yields the non-trivial null-space: , were . Refer to the supplementary S8 File for symbolic verification of the remaining 4 sets of unidentifiable parameters.

Example 3: JAK/STAT model

This model aims to describe the interaction of the suppressor cytokine signaling-1 (SOCS1), Janus kinase (JAK) and the transcription (STAT) signal transduction pathway [24] (S3 File). It contains 31 model states and 51 model parameters and therefore the total number of unknown parameters is 82. This model was structurally identifiable when measuring all states, = {x1, …, x31}. Applying our method, we identified 2 sets of unidentifiable parameters: ϕ1 = {x31(0)} with corresponding omitted sensor set ψ1 = {x31}, and ϕ2 = {θ14, θ51, x10(0), x11(0)} with corresponding omitted sensor set ψ2 = {x10, x11}. These results were obtained in 3 minutes and 2 seconds with sets ψ1 and ψ2 identified using iterative Bernoulli trails. Due to the model size and the potential computational demand associated with larger values of k, we first measured outputs containing the already detected essential sensors to ascertain whether measuring these sensors resulted in the model’s structural identifiability. The model was found to be identifiable when measuring either of the outputs, {x10, x31} or {x11, x31}. Accordingly, these are the minimal output sets of the JAK/STAT model and our results correspond to the findings of Anguelova and co-authors [3]. The identifiability signature obtained when states x10 and x11 are simultaneously omitted from the model’s output is illustrated in Figs 6 and 7. The unidentifiable set illustrated in Fig 7, was confirmed by the symbolically computed non-trivial null-space: . Here is set ϕ2.

Fig 6

Example 3: JAK-STAT model: Singular values of the output sensitivity matrix, S, when measuring the model output {x1, …, x31}, omitting sensors x10 and x11.

Singular values reveal a clear gap and this, in conjunction with the smallest singular value of σ82 = 7.3 × 10−16, indicates that is not of full rank and therefore the model is structurally unidentifiable.

Fig 7

Example 3: JAK-STAT model: Entries in the last right singular vector corresponding to the vanishing singular value, σ82, in Fig 6.

The corresponding non-trivial null-space indicates that model parameters θ14, θ51 and initial conditions x10(0) and x11(0) are totally correlated and so the model is not identifiable when model states x10 and x11 are simultaneously omitted from the model’s output.

Example 3: JAK-STAT model: Singular values of the output sensitivity matrix, S, when measuring the model output {x1, …, x31}, omitting sensors x10 and x11.

Example 3: JAK-STAT model: Entries in the last right singular vector corresponding to the vanishing singular value, σ82, in Fig 6.

Example 4: Ligand binding model

Next, we consider a Ligand binding model, previously analysed for structural identifiability [25]. This model describes the dynamic behaviour of the ligand (Epo) and its receptor (EpoR) in erythroid progenitor cells. In these cells, the dynamic characteristics of the Epo receptor (EpoR) determine how signals are encoded, in the presence of Epo, and processed at receptor level. These processed signals activate downstream signalling cascades such as the JAK2-STAT5 pathway which in turn leads to responses such as differentiation and proliferation of erythrocytes [25]. The model consists of 6 states and assuming their initial states are unknown, it contains 14 unknown parameters (see supplementary S4 File). The minimal output set ensuring the observability of this model, {x5, x6}, was determined by Liu and co-authors using their graphical approach [5]. This set also ensures the structural identifiability of the model and this result was obtained in 12 seconds. Two sets of unidentifiable parameters were detected: ϕ1 = {x5(0)} and ϕ2 = {x6(0)}. Set ϕ2, shown in Fig 8, is indicated by the non-zero entry in the last right singular vector corresponding to the smallest singular value calculated to be precisely zero.

Fig 8

Example 4: Ligand binding model: Entries in the last right singular vector, corresponding to the smallest singular value of precisely zero, calculated for the measured output {x1, x2, x3, x4, x6}.

The non-trivial null-space indicates that the initial condition of state x5 is unidentifiable when this state is not measured. Accordingly, x5 should be included into the model’s minimal output set.

Example 4: Ligand binding model: Entries in the last right singular vector, corresponding to the smallest singular value of precisely zero, calculated for the measured output {x1, x2, x3, x4, x6}.

Example 5: Simplified glycolytic reaction model

The simplified glycolytic reaction map consists of 10 chemical species: glucose, ADP, glucose 6-phosphate, ATP, glucose 1-phosphate, AMP, fructose 6-phosphate, fructose 2, 6-biphosphate, triose phosphate and pyruvate. The interaction between these chemicals are described by 9 reactions [26] (see supplementary S5 File). This model’s minimal output set for observability was defined by Liu and co-authors as {x10} [5]. Our algorithm confirmed that this minimal set also ensures the model’s structural identifiability. This result was obtained after 2 minutes and 43 seconds. The set of unidentifiable parameters, ϕ1 = {θ13, x10(0)}, corresponding with the omitted sensor set, ψ1 = {x10}, is indicated in Fig 9.

Fig 9

Example 5: Simplified glycolytic reaction model: Entries in the right singular vectors corresponding to 2 vanishing singular values.

The non-zero values indicate that the initial condition x10(0) and parameter θ13 are unidentifiable when state x10 is not measured.

Example 5: Simplified glycolytic reaction model: Entries in the right singular vectors corresponding to 2 vanishing singular values.

The non-zero values indicate that the initial condition x10(0) and parameter θ13 are unidentifiable when state x10 is not measured.

Example 6: Goldbeter model

Consider a model describing the circadian oscillations in the Drosophila period protein (PER) [27]. It is based on both multiple phosphorylation of PER and on the negative feedback exerted by PER on the transcription of the period (per) gene. It provides a molecular basis for circadian oscillations of the limit cycle type in which the peak in per mRNA precedes the peak in total PER protein. This model was analysed by Sedoglavic in 1995, in which he identified only 1 set of totally correlated parameters [28]. It contains 5 states and 17 model parameters and assuming that initial conditions are unknown, the total number of model parameters is 22. Measuring the output, {x2, x3, x4, x5}, our algorithm also found only the 1 totally correlated set, ϕ1 = {θ1, θ3, θ4, θ5, x1(0)}, with its elements indicated by the non-zero values in Fig 10. The minimal output set of this model, {x1}, was calculated in 12 seconds.

Fig 10

Example 6: Goldbeter model: Entries in the last right singular vector corresponding to single vanishing singular value calculated.

The non-zero values indicate that parameters θ1, θ3, θ4, θ5 and initial condition x1(0) are unidentifiable when state x1 is not measured.

Example 6: Goldbeter model: Entries in the last right singular vector corresponding to single vanishing singular value calculated.

The non-zero values indicate that parameters θ1, θ3, θ4, θ5 and initial condition x1(0) are unidentifiable when state x1 is not measured.

Example 7: JAK/STAT model with specific model output

In this example, we illustrate how our method can be used to identify minimal output sets from a set of more complex model outputs. These outputs do not simply consist of single model states and in this example, also include additional model parameters. We consider a reparameterised JAK/STAT model, with the original unidentifiable model described by Raia and co-authors [29]. The constitutive activation of the JAK (Janus kinase)/STAT signalling pathway forms part of both the primary mediastinal B-cell lymphoma (PMBL) and the classical Hodgkin lymphoma (cHL). Raue and co-authors investigated the identifiability of this benchmark model using three different approaches [1]. The model definition also contains a specific set of initial conditions for model states, (0) = {1.3, θ21, 0, 1, 0, 2.8, 0, 165, 0, 0, 0.34, 0, 0, 0}. These initial conditions, in conjunction with the predetermined set of model outputs, result in the model’s structural unidentifiability. Structural identifiability can be reinstated by reparameterising the model (See supplementary S7 File for the structurally identifiable version of this JAK/STAT model). The reparameterised model contains 14 states and 21 parameters, with only the initial condition of state x2 assumed to be unknown. Considering the reparameterised model’s output, = {x1 + x3 + x4, θ16(x3 + x4 + x5 + x12), θ17(x4 + x5), θ18 x7, θ19 x10, θ20 x14, x13, x9}, our algorithm can now be implemented to determine the model’s minimal output sets. Setting k = 1, already revealed 6 essential sensors. The unidentifiable parameters obtained were: ϕ1 = {θ12, θ16}, when sensor ψ1 = {θ16(x3 + x4 + x5 + x12)} was not measured, ϕ2 = {θ17}, when ψ2 = {θ17(x4 + x5)} was not measured, ϕ3 = {θ18}, when ψ3 = {θ18 x7} was not measured, ϕ4 = {θ19}, when ψ4 = {θ19 x10} was not measured, ϕ5 = {θ20}, when ψ5 = {θ20 x14} was not measured, and ϕ6 = {θ8, θ13} when state ψ6 = {x13} was not measured. All these sensors are essential and the resulting minimal output set, obtained after 18 seconds, is: {θ16(x3 + x4 + x5 + x12), θ17(x4 + x5), θ18 x7, θ19 x10, θ20 x14, x13}. Figs 11 and 12 reveal the identifiability signature obtained when sensor θ17(x4 + x5) was not measured. From this, one can see that parameter θ17 is unidentifiable.

Fig 11

Example 7: JAK/STAT model with specific model output: Singular values of the output sensitivity matrix, S, when omitting sensor θ17(x4 + x5) from y.

Singular values, arranged in descending order, reveal a clear gap. This gap in conjunction with the smallest singular value of 4 × 10−18, indicate that is rank deficient.

Fig 12

Example 7: JAK/STAT model with specific model output: Entries in the last right singular vector corresponding to the vanishing singular value in Fig 11.

The non-trivial null-space indicates that model parameter θ17 is not identifiable when sensor θ17(x4 + x5) is not measured.

Example 7: JAK/STAT model with specific model output: Singular values of the output sensitivity matrix, S, when omitting sensor θ17(x4 + x5) from y.

Singular values, arranged in descending order, reveal a clear gap. This gap in conjunction with the smallest singular value of 4 × 10−18, indicate that is rank deficient.

Example 7: JAK/STAT model with specific model output: Entries in the last right singular vector corresponding to the vanishing singular value in Fig 11.

The non-trivial null-space indicates that model parameter θ17 is not identifiable when sensor θ17(x4 + x5) is not measured.

Example 8: Non-rational JAK/STAT model with specific model output

In this final example, we show that our method can be used to analyse non-rational models. Consider a non-rational version of the JAK/STAT model in example 7: Analysing this model, we find the results identical to those obtained in example 7 and therefore conclude that the predefined outputs, x1 + x3 + x4 and x9, do not have to be measured to ensure this model’s identifiability. The different identifiability signatures, calculated for each example, can be found in the supplementary material (see supplementary files S1 to S7 Files). The symbolic verification of the individual unidentifiable sets in ϕ can be found in the supplementary S8 File. The MATLAB code of our algorithm can be found at: https://sourceforge.net/u/djoubert-wur/profile.

Conclusions

In this paper we introduced an algorithm that can find minimal output sets for a wide range of models in a short time. It is not limited by any specific model structure. Proposing multiple plausible minimal output sets to experimental researchers, enables them to select model outputs based on factors such as measurement cost and complexity. Offering measurement flexibility whilst ensuring structural identifiability is a useful tool to scientists and our algorithm could propose these minimal sets within a couple of minutes. In the future we intent to increase the numerical accuracy of our method by making use of the increased integration accuracy obtained by using complex derivatives to compute the Jacobi matrices ∂/∂ and ∂/∂. This step will increase the tolerance of the elements of the output sensitivity matrix to 10−20 [30]. In addition, we are investigating the added advantages of concatenating the sensitivity matrix for different values of the model parameters. Preliminary results indicate that this can have a dramatic effect on the accuracy in our computations [22].

A chemical reaction system description.