Rachel Mester1, Alfonso Landeros1, Chris Rackauckas2,3,4, Kenneth Lange1,5,6. 1. Department of Computational Medicine, University of California Los Angeles, Los Angeles, California, United States of America. 2. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America. 3. Pumas-AI, Annapolis, Maryland, United States of America. 4. Julia Computing, Cambridge, Massachusetts, United States of America. 5. Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America. 6. Department of Statistics, University of California Los Angeles, Los Angeles, California, United States of America.
Abstract
Differential sensitivity analysis is indispensable in fitting parameters, understanding uncertainty, and forecasting the results of both thought and lab experiments. Although there are many methods currently available for performing differential sensitivity analysis of biological models, it can be difficult to determine which method is best suited for a particular model. In this paper, we explain a variety of differential sensitivity methods and assess their value in some typical biological models. First, we explain the mathematical basis for three numerical methods: adjoint sensitivity analysis, complex perturbation sensitivity analysis, and forward mode sensitivity analysis. We then carry out four instructive case studies. (a) The CARRGO model for tumor-immune interaction highlights the additional information that differential sensitivity analysis provides beyond traditional naive sensitivity methods, (b) the deterministic SIR model demonstrates the value of using second-order sensitivity in refining model predictions, (c) the stochastic SIR model shows how differential sensitivity can be attacked in stochastic modeling, and (d) a discrete birth-death-migration model illustrates how the complex perturbation method of differential sensitivity can be generalized to a broader range of biological models. Finally, we compare the speed, accuracy, and ease of use of these methods. We find that forward mode automatic differentiation has the quickest computational time, while the complex perturbation method is the simplest to implement and the most generalizable.
Differential sensitivity analysis is indispensable in fitting parameters, understanding uncertainty, and forecasting the results of both thought and lab experiments. Although there are many methods currently available for performing differential sensitivity analysis of biological models, it can be difficult to determine which method is best suited for a particular model. In this paper, we explain a variety of differential sensitivity methods and assess their value in some typical biological models. First, we explain the mathematical basis for three numerical methods: adjoint sensitivity analysis, complex perturbation sensitivity analysis, and forward mode sensitivity analysis. We then carry out four instructive case studies. (a) The CARRGO model for tumor-immune interaction highlights the additional information that differential sensitivity analysis provides beyond traditional naive sensitivity methods, (b) the deterministic SIR model demonstrates the value of using second-order sensitivity in refining model predictions, (c) the stochastic SIR model shows how differential sensitivity can be attacked in stochastic modeling, and (d) a discrete birth-death-migration model illustrates how the complex perturbation method of differential sensitivity can be generalized to a broader range of biological models. Finally, we compare the speed, accuracy, and ease of use of these methods. We find that forward mode automatic differentiation has the quickest computational time, while the complex perturbation method is the simplest to implement and the most generalizable.
This is a PLOS Computational Biology Methods paper.
1 Introduction
In many mathematical models underlying parameters are poorly specified. This problem is particularly acute in biological and biomedical models. Model predictions can have profound implications for scientific understanding, further experimentation, and even public-policy decisions. For instance, in an epidemic some model parameters can be tweaked by societal or scientific interventions to drive infection levels down. Differential sensitivity can inform medical judgement about the steps to take with the greatest impact at the least cost. Similar considerations apply in economic modeling. Additionally, parameter estimation for model fitting usually involves differential sensitivity through maximum likelihood or least squares criteria. These optimization techniques depend heavily on gradients and Hessians with respect to parameters. While some parameter estimation methods rely on Bayesian computational techniques [1] rather than gradients, these techniques tend to scale poorly as the number of model parameters increases. A common way to alleviate the poor scaling of Bayesian inference is Hamiltonian Monte Carlo [2], which itself requires gradient calculations. Techniques for assessing sensitivity of stochastic models often rely on the gradient-dependent Fisher information matrix of the model, which is the basis for a variety of multi-step local sensitivity analysis techniques for discrete stochastic models [3].Calculation of gradients and Hessians of a model can also be important in other steps of the scientific process. For example, iterative model development [4] involves using the Fisher information matrix to inform experimental design. Extended Kalman filtering [5] incorporates differential sensitivity into model construction. Regardless of the method, parameter estimation is an important step in fitting a biological model, and the success of this step strongly impacts the ultimate utility of the model. Understanding the uses and limitations of differential sensitivity can aid in determining the identifiability of model parameters, how sensitive they are to experimental error or measurement noise, and the overall importance of their existence in the model. Finally, it is worth noting that while local sensitivity analysis is the focus of this manuscript, global sensitivity analysis often relies on local differential sensitivity estimates to inform optimal stepsizes in regional searching [6] or to resolve inconsistencies that arise when local sensitivity is non-monotonic [7].In any case it is imperative to know how sensitive model predictions are to changes in parameter values. Unfortunately, assessment of model sensitivity can be time consuming, computationally intensive, inaccurate, and simply confusing. Most models are nonlinear and resistant to exact mathematical analysis. Understanding their behavior is only approachable by solving differential equations or intensive and noisy simulations. Sensitivity analysis is often conducted over an entire bundle of neighboring parameters to capture interactions. If the parameter space is large or high-dimensional, it is often unclear how to choose representative points from this bundle. Faced with this dilemma, it is common for modelers to fall back on varying just one or two parameters at a time. Model predictions also often take the form of time trajectories. In this setting, sensitivity analysis is based on lower and upper trajectories bounding the behavior of the dynamical system.The differential sensitivity of a model quantity is measured by its gradient with respect to the underlying parameters at their estimated values. The existing literature on differential sensitivity is summarized in the modern references [8,9]. There are a variety of software packages that evaluate parameter sensitivity. For example, the Julia software DifferentialEquations.jl [10] makes sensitivity analysis routine for many problems. Additionally, PESTO [11] is a current Matlab toolbox for parameter estimation that uses adjoint sensitivities implemented as part of the CVODES method from SUNDIALS [12]. Although the physical sciences have widely adopted the method of differential sensitivity [13,14], the papers and software generally focus on a single sensitivity analysis method rather than a comparison of the various approaches. This singular focus leaves open many questions when biologists conduct sensitivity analyses. Should the continuous sensitivity equations be used, or would automatic differentiation of solvers be more efficient on biological models? On the types of models biologists generally explore, would implicit parallelism within the sensitivity equations be beneficial, or would the overhead cost of thread spawning overrule any benefits? How close do simpler methods based on complex perturbation get to these techniques? The purpose of the current paper is to explore these questions on a variety of models of interest to computational biologists.In the current paper we also suggest an accurate method of approximating gradients that compares favorably against forward automatic differentiation techniques, provided a model involves analytic functions without discontinuities, maxima, minima, absolute values, or any other excursion outside the universe of analytic functions. In the sections immediately following, we summarize known theory, including the important adjoint method for computing the sensitivity of functions of solutions [13, 14]. Then we illustrate sensitivity analysis for a few deterministic models and a few stochastic models. Our exposition includes some straightforward Julia code that readers can adapt to their own sensitivity needs. These examples are followed by an evaluation of the accuracy and speed of the suggested numerical methods. The concluding discussion summarizes our experience, indicates limitations of the methods, and suggests new potential applications.For the record, here are some notational conventions used throughout the paper. All functions that we differentiate have real or real-vector arguments and real or real-vector values. All vectors and matrices appear in boldface. The superscript indicates a vector or matrix transpose. For a smooth real-valued function f(x), we write its gradient (column vector of partial derivatives) as ∇f(x) and its differential (row vector of partial derivatives) as . If g(x) is vector-valued with ith component g(x), then the differential (Jacobi matrix) dg(x) has ith row dg(x). The chain rule is expressed as the equality of differentials. The transpose (adjoint) form of the chain rule is . For a twice-differentiable function, the second differential (Hessian matrix) d2f(x) = d∇f(x) is the differential of the gradient. Finally, i will denote .
2 Methods for computing sensitivity
2.1 Forward method
S3 Appendix briefly discusses sensitivity analysis for the linear constant coefficient system of ordinary differential equations (ODEs). Sensitivity of the nonlinear system can be evaluated by differentiating the original ODE with respect to β, interchanging the order of differentiation, and numerically integrating the systemThis formulation of the problem depends on knowing x(t, β). In practice, one solves the system
jointly, where dx[t, β] is the Jacobi matrix of x(t, β) with respect to β. This is commonly referred to as forward sensitivity analysis and is carried out by software suites such as DifferentialEquations.jl and SUNDIALS CVODES [12]. We note that a common implementation of sensitivity analysis is to base calculations on directional derivatives. Thus, the directional derivative
version of the forward method allows one to evolve dynamical systems without ever computing full Jacobians. The forward method can also be applied when quantities of interest are defined recursively.
2.2 Adjoint methods
The adjoint method is incorporated in the biological parameter estimation software PESTO through CVODES [12]. This method [8,9] is defined directly on a function g[x(β), β] of the solution of the ODE. The adjoint method introduces a Lagrange multiplier λ(β), numerically solves the ODE system forward in time over [t0, t], then solves the system
for λ(β) in reverse time, and finally uses the introduced parameter to compute derivatives viaThe second and third stages are commonly combined by appending the last equation to the set of ODEs being solved in reverse. This tactic achieves a lower computational complexity than other techniques, which require solving an n-dimensional ODE system p times for p parameters. In contrast, the adjoint method solves an n-dimensional ODE forwards and then solves an n-dimensional and a p-dimensional system in reverse, changing the computational complexity from to . Whether such asymptotic cost advantages lead to more efficiency on practical models is precisely one of the points studied in this paper.Alternatively, one can find the partial derivatives using finite differences. The simplest method here is to compute a slightly perturbed trajectory x(t, β+Δv) and form the forward differences
at all specified time points as approximations to the forward directional derivatives of x(t, β) in the direction v. Choosing v to be unit vectors along each coordinate axis gives ordinary partial derivatives. The accuracy of this crude method suffers from round-off error in subtracting two nearly equal function values. These round-off errors are in addition to the usual errors committed in integrating the differential equation numerically. Round-off errors can be ameliorated by using central differences
rather than forward differences. However, the central difference method requires twice the number of computations as the forward difference method. Thus, the choice of a difference method depends on prioritization of accuracy versus computational efficiency. In small models, computational efficiency may be less of a priority, in which case central difference methods are preferred.
2.3 Complex perturbation methods
There is a far more accurate way of computing model sensitivity when the function f[x, β] defining the ODE is analytic in the parameter vector β [15]. An analytic function can be expanded in a locally convergent power series around every point of its domain. This implies that the trajectory x(t, β) is also analytic in β. For a real analytic function g(β) of a single variable β, the derivative approximation
in the complex plane avoids roundoff and is highly accurate for Δ>0 very small [16,17]. Thus, in calculating a directional derivative of x(t, β), it suffices to (a) solve the governing ODE with β+Δiv replacing β, (b) take the imaginary part of the result, and (c) divide by Δ. To make these calculations feasible, the computer language implementing the calculations should support complex arithmetic and ideally have an automatic dispatching mechanism so that only one implementation of each function is required. In contrast to numerical integration of the joint system (Eq 1), the complex perturbation method is much more simply parallelizable across parameters.The following straightforward Julia routine for computing sensitivitiesfunction differential(f::F, p, Δ) where Ffvalue = real(f(p)) # function valuedf = zeros(length(fvalue), length(p)) # states x parameterspworker = [map(complex, p) for _ in 1:Threads.nthreads()]Threads.@threads for j = 1:length(p)_p = pworker[Threads.threadid()] # thread worker array_p[j] = _p[j] + Δ * im # perturb parameterfj = f(_p) # compute perturbed function value_p[j] = complex(real(_p[j]), 0.0) # reset parameterdf[:,j]. = imag(fj)./ Δ # fill in jth partialendreturn (fvalue, df)endtakes advantage of the simplicity of multithreading the complex perturbation method by parameter. This function requires a function f(p): ℝ↦ℝ of a real vector p declared as complex. The perturbation scalar Δ should be small and real, say 10−10 to 10−12 in double precision. If the parameters p vary widely in magnitude, then a good heuristic is to perturb p by pdi instead of di. The returned value df is an m×n real matrix. The Julia commands real and complex effect conversions between real and complex numbers, and Julia substitutes im for . We will employ these functions later in some case studies.A recent extension [18] of the complex perturbation method facilitates accurate approximation of second derivatives. The relevant formula is
where . Roundoff errors can now occur but are usually manageable. Here we present a novel result for how to extend the complex perturbation method to approximate mixed partials. Our derivation is condensed into the following equationsThis approximation is accurate to order O(Δ6) and allows us to infer thatSince we can approximate and , we can now approximate to order O(Δ4). These approximations are derived in S1 Appendix.The Julia code for computing second derivativesfunction hessian(f::F, p, Δ) where Fd2f = zeros(length(p), length(p)) # hessiandp = Δ * (1.0 + 1.0 * im) / sqrt(2)for j = 1:length(p) # compute diagonal entries of d2fp[j] = p[j] + dpfplus = f(p)p[j] = p[j] - 2 * dpfminus = f(p)p[j] = complex(real(p[j]), 0.0) # reset parameterd2f[j, j] = imag(fplus + fminus) / Δ^2endfor j = 2:length(p) # compute off diagonal entriesfor k = 1:(j—1)(p[j], p[k]) = (p[j] + dp, p[k] + dp)fplus = f(p)(p[j], p[k]) = (p[j] - 2 * dp, p[k] - 2 * dp)fminus = f(p)(p[j], p[k]) = (complex(real(p[j]), 0.0), complex(real(p[k]), 0.0))d2f[j, k] = imag(fplus + fminus) / Δ^2d2f[j, k] = (d2f[j, k]—d2f[j, j]—d2f[k, k]) / 2d2f[k, j] = d2f[j, k]endendreturn d2fendoperates on a scalar-valued function f(u) of a real vector p declared as complex. The second-order complex perturbation method can also be multithreaded by parameter, provided the unmixed second partials are computed prior to the mixed ones. Because roundoff error is now a concern, the perturbation scalar Δ should be in the range 10−3 to 10−6 in double precision. The returned value d2f is a symmetric matrix.
2.4 Automatic differentiation
Another technique one can use to calculate the derivatives of model solutions is to differentiate the numerical algorithm that calculates the solution. This can be done with computational tools collectively known as automatic differentiation [19]. Forward mode automatic differentiation is performed by carrying forward Jacobian-vector products at each successive calculation. This is accomplished by defining higher-dimensional numbers, known as dual numbers [20], coupled to primitive functions f(x) through the actionHere ϵ is a dimensional marker, similar to the complex i, which is a two-dimensional number. For a composite function f = f2 ∘ f1, the chain rule is . The ith column of the Jacobian appears in the expression . Since computational algorithms can be interpreted as the composition of simpler functions, one need only define automatic differentiation on a small set of base cases (such as +, *, sin, and so forth, known as the primitives) and then apply the accepted rules in sequence to differentiate more elaborate functions. The ForwardDiff.jl package [20] in Julia accomplishes this by defining dispatches for such primitives on a dual number type and provides convenience functions for easily extracting common objects like gradients, Jacobians, and Hessians. Hessians are calculated by layering automatic differentiation twice on the same algorithm to effectively take the derivative of a derivative.In this form, forward mode automatic differentiation shares many similarities to the complex perturbation methods described above without the requirement that the extension of f(x) be complex analytic. At every stage of the calculation f(x) must be differentiable, a weaker yet still restrictive assumption. Conveniently, automatic differentiation allows for arbitrarily many derivatives to be calculated simultaneously. By defining higher-dimensional dual numbers that act independently via
one can calculate entire Jacobians in a single function call . This use of higher-dimensional dual numbers is a practice known as chunking. Chunking reduces the number of primal (non-derivative) calculations required for computing the Jacobian. Because the ForwardDiff.jl package uses chunking by default, we will investigate the extent to which this detail is applicable in biological models.
3 Case studies
We now explore applications of differential sensitivity to a few core models in oncology and epidemiology.
3.1 CARRGO model
The CARRGO model [21] was designed to capture the tumor-immune dynamics of CAR T-cell therapy in glioma. The CARRGO model generalizes to other tumor cell-immune cell interactions. Its governing system of ODEs
follows cancer cells x as prey and CAR T-Cells y as predators. This model captures Lotka-Volterra dynamics with logistic growth of the cancer cells. Our numerical experiments assume the parameter values and initial conditions
suggested by Sahoo et al. [21].A traditional sensitivity analysis hinges on solving the system of ODEs and displaying the solutions at a chosen future time across an interval or rectangle of parameter values. Fig 1 shows how x(t) and y(t) vary at t = 1000 days under joint changes of κ1 and κ2, where κ1 is the rate at which cancer cells are destroyed in an interaction with an immune cell, and κ2 is the rate at which immune cells are recruited after such an interaction. This type of analysis directly portrays how a change in one or two parameters impacts the outcome of the system. Surprisingly, the number of cancer cells x(t) depends strongly on κ2 but only weakly on κ1. In contrast, the number of immune cells y(t) depends comparably on both parameters, perhaps because the initial population of immune cells is much smaller than the initial population of cancer cells.
Fig 1
Sensitivity of Cancer and Immune Cells in the CARRGO Model.
A heatmap representing the number of cancer cells, or x(t) (left) and the number of immune cells, or y(t) (right) as the parameters κ1 (horizontal axis) and κ2 (vertical axis) are varied. Results displayed summarize simulations of the CARRGO model with parameter values and initial conditions indicated in this section at time t = 1000 days.
Sensitivity of Cancer and Immune Cells in the CARRGO Model.
A heatmap representing the number of cancer cells, or x(t) (left) and the number of immune cells, or y(t) (right) as the parameters κ1 (horizontal axis) and κ2 (vertical axis) are varied. Results displayed summarize simulations of the CARRGO model with parameter values and initial conditions indicated in this section at time t = 1000 days.There are limitations to this type of sensitivity analysis. How many solution curves should be examined? What time is most informative in displaying system changes? Is it necessary to compute sensitivity over such a large range of parameters when the trends are so clear? These ambiguities cloud our understanding and require far more computing than is necessary. Differential sensitivity successfully addresses these concerns. Gradients of solutions immediately yield approximate solutions in a neighborhood of postulated parameter values. The relative importance of different parameters in determining species levels can be determined from inspection of the gradient. Furthermore, modern software easily delivers the gradient along entire solution trajectories. There is no need to solve for an entire bundle of neighboring solutions.Differential assessment is far more efficient. The required calculations involve solving an expanded system of ordinary differential equations just once under the automatic differentiation method or solving the system once for each parameter under the complex perturbation method. Either way, the differential method is much less computationally intensive than the traditional method of solving the ODE system over an interval for each parameter or over a rectangle for each pair of parameters. Here is our brief Julia code for computing sensitivity via the complex perturbation method.using DifferentialEquations, Plotsfunction sensitivity(x0, p, d, tspan)problem = ODEProblem{true}(ODE, x0, tspan, p)sol = solve(problem, saveat = 1.0) # solve ODE(lp, ls, lx) = (length(p), length(sol), length(x0))solution = Dict{Int, Any}(i = > zeros(ls, lp + 1) for i in 1:lx)for j = 1:lx # record solution for each species@views solution[j][:, 1] = sol[j,:]endfor j = 1:lpp[j] = p[j] + d * im # perturb parameterproblem = ODEProblem{true}(ODE, x0, tspan, p)sol = solve(problem, saveat = 1.0) # resolve ODEp[j] = complex(real(p[j]), 0.0) # reset parameter@views sol. = imag(sol) / d # compute partialfor k = 1:lx # record partial for each species@views solution[k][:,j + 1] = sol[k,:]endendreturn solutionendfunction ODE(dx, x, p, t) # CARRGO modeldx[1] = p[4] * x[1] * (1—x[1] / p[5])—p[1] * x[1] * x[2]dx[2] = p[2]* x[1] * x[2]—p[3] * x[2]endp = complex([6.0e-9, 3.0e-11, 1.0e-6, 6.0e-2, 1.0e9]); # parametersx0 = complex([1.25e4, 6.25e2]); # initial values(d, tspan) = (1.0e-13, (0.0,1000.0)); # step size and time interval in dayssolution = sensitivity(x0, p, d, tspan); # find solution and partialsCARRGO1 = plot(solution[1][:, 1], label = "x1", xlabel = "days",ylabel = "cancer cells x1", xlims = (tspan[1],tspan[2]))CARRGO2 = plot(solution[1][:, 2], label = "d1x1", xlabel = "days",ylabel = "p1 sensitivity", xlims = (tspan[1],tspan[2]))In the Julia code the parameters κ1, κ2, θ, ρ, and γ and the variables x and y exist as components of the vector p and x, respectively. The two plot commands construct solution curves for cancer and its sensitivity to perturbations of κ1.Fig 2 reinforces the conclusions drawn from the heatmaps, but more clearly and quantitatively. Additionally, differential sensitivity allows for the assessment of the sensitivity over the course of time, rather than just at a single time or small set of times. For example, the sensitivity of x with respect to γ in this model exhibits both large positive and large negative values over the course of time. Measured as the difference in x caused by a difference in γ at our end-time t = 1000, these effects tend to cancel each other out and fail to communicate the impact of the parameter γ on x at intermediate times. In brief, the scaled sensitivity of cancer cells x is much more dependent on carrying capacity γ later in the simulation, while the model sensitivity to birth rate ρ is most pronounced around the earlier time t = 200.
Fig 2
Sensitivity of Cancer Cells in the CARRGO Model.
Time series plots of cancer cells (x(t)) and the derivatives of x(t) with respect to the CARRGO parameters κ1, κ2, θ, ρ, γ. Results shown are for the initial conditions and parameter values defined in Fig 1 and simulated over the course of t = 1000 days. The complex perturbation method of sensitivity analysis is used to compute derivatives.
Sensitivity of Cancer Cells in the CARRGO Model.
Time series plots of cancer cells (x(t)) and the derivatives of x(t) with respect to the CARRGO parameters κ1, κ2, θ, ρ, γ. Results shown are for the initial conditions and parameter values defined in Fig 1 and simulated over the course of t = 1000 days. The complex perturbation method of sensitivity analysis is used to compute derivatives.
3.2 Deterministic SIR model
The deterministic SIR model follows the number of infectives I(t), the number of susceptibles S(t), and the number of recovereds R(t) during an epidemic. These three subpopulations satisfy the ODE system
where η is the daily infection rate per encounter and δ is the daily rate of progression to immunity per person. For SARS-CoV-2, current estimates [22] of η range from 0.0012 to 0.48, and estimates of δ range from 0.0417 to 0.0588 [23]. As an alternative to solving the extended set of differential equations, we again use the complex perturbation method to evaluate parameter sensitivities.The following Julia code for the complex perturbation method reuses the generic sensitivity function from the CARRGO model example.function ODE(dx, x, p, t) # Covid modelN = 3.4e8 # US population sizedx[1] = —p[1] * x[2] * x[1] / Ndx[2] = p[1] * x[2] * x[1] / N—p[2] * x[2]dx[3] = p[2] * x[2]endp = complex([0.2, (0.0417 + 0.0588) / 2]); # parametersx0 = complex([3.4e8, 100.0, 0.0]); # initial values(d, tspan) = (1.0e-10, (0.0, 365.0)) # 365 dayssolution = sensitivity(x0, p, d, tspan);Covid = plot(solution[1][:,:], label = ["x1" "d1x1" "d2x1"],xlabel = "days", xlims = (tspan[1],tspan[2]))Our parameter choices roughly capture measurements for the COVID-19 virus from early in the pandemic [22,23]. Fig 3 plots the susceptible curve and its sensitivities. In this case all three curves conveniently occur on comparable scales. Fig 3 captures not only the pronounced parameter sensitivity early in the pandemic but also the symmetry between the δ and η parameters.
Fig 3
Sensitivities of Susceptibles in the Covid Model.
Time series of the susceptible population (S(t)) and its sensitivities to the two parameters (η and δ) of the classic SIR model. Results shown are for the SIR model simulated for one year with initial conditions S0 = 3.4×108, I0 = 100, R0 = 0, and the parameter values η = 0.7194, δ = 0.5025. Derivatives are calculated using the complex perturbation method.
Sensitivities of Susceptibles in the Covid Model.
Time series of the susceptible population (S(t)) and its sensitivities to the two parameters (η and δ) of the classic SIR model. Results shown are for the SIR model simulated for one year with initial conditions S0 = 3.4×108, I0 = 100, R0 = 0, and the parameter values η = 0.7194, δ = 0.5025. Derivatives are calculated using the complex perturbation method.
3.3 Second-order expansions of solution trajectories
In predicting nearby solution trajectories, the second-order Taylor expansion
improves accuracy over the first-order expansionThe improved accuracy achieved by including second-order terms often justifies their computation. The complex perturbation method permits straightforward computation of second derivatives via approximations (Eq 2) and (Eq 3). The DiffEqSensitivity.jl and ForwardDiff.jl packages implement both adjoint and forward difference methods for computing the second derivatives of differential equation systems. Fig 4 displays predicted trajectories for the SIR model using the complex perturbation method when all parameters p are replaced by p(1+U), where each U is chosen uniformly from (−0.25,0.25). Fig 4 vividly confirms the improvement in accuracy in passing from a first-order to a second-order approximation. More improvement becomes evident as the non-linearity of the solution trajectory increases.
Fig 4
Model Trajectories for SIR Model Calculated Using First and Second Differentials.
Time series plot of the SIR model simulated over t = 100 days with initial conditions S0 = 1000 and I0 = 10. Results depend on the SIR model with the original parameters from Fig 3 (original trajectory), re-simulating the SIR trajectory after perturbing the parameters by a random amount around 25% (trajectory with perturbed parameters), approximating the trajectory based on the linear expansion (Eq 5) and the first derivative calculated with the complex perturbation method (first-order prediction), and approximating the trajectory based on the quadratic expansion (Eq 4) and the first and second derivatives calculated with the complex perturbation method (second-order prediction).
Model Trajectories for SIR Model Calculated Using First and Second Differentials.
Time series plot of the SIR model simulated over t = 100 days with initial conditions S0 = 1000 and I0 = 10. Results depend on the SIR model with the original parameters from Fig 3 (original trajectory), re-simulating the SIR trajectory after perturbing the parameters by a random amount around 25% (trajectory with perturbed parameters), approximating the trajectory based on the linear expansion (Eq 5) and the first derivative calculated with the complex perturbation method (first-order prediction), and approximating the trajectory based on the quadratic expansion (Eq 4) and the first and second derivatives calculated with the complex perturbation method (second-order prediction).For example, the top right panel of Fig 4 shows that the solution trajectory of infected individuals bends dramatically with a change in parameters. This behavior is much better reflected in the second-order prediction compared to the first-order prediction, which over-corrects at the peak. The Euclidean distance between the actual and predicted trajectories at the sampled time points is about 25.4 in the first-order case and only about 9.06 in the second-order case, a reduction of over 60% in prediction error. By contrast, the trajectory of the recovered individuals steadily increases in a much more linear fashion. The bottom left panel of Fig 4 shows that the first-order prediction now remains reasonably accurate over a substantial period. Even so, the discrepancy between the predicted solutions grows so that by day 100 the Euclidean distance between the first-order prediction and the actual trajectory exceeds 154, compared to about 34.0 for the second-order prediction. Thus, calculating second-order sensitivity is helpful in both highly non-linear systems and systems with long time scales.
3.4 Stochastic SIR model
We now illustrate sensitivity calculations in the stochastic SIR model. This model postulates an original population of size n with i infectives and s susceptibles. The parameters δ and η again capture the rate of progression to immunity and the infection rate per encounter. Since extinction of the infectives is certain, we focus on the time to elimination of the infectives. It is also convenient to follow the vector (i, n), where n = i+s is the sum of the number of infectives i plus the number of susceptibles s. The mean time t to elimination of all infectives satisfies the recurrence
for 0The expression for t stems from adding the expected time for the i→i−1 transition, plus the expected time i−1→i−2, and so forth. This system of equations can be solved recursively for i = n, n−1,…0 starting with n = 1. Once the values for a given n are available, n can be incremented, and a new round is initiated. Ultimately the target size n = N is reached. Taking partial derivatives of the recurrence (Eq 6) yields a new system of recurrences that can also be solved recursively in tandem with the original recurrence. The complex perturbation method is easier to implement and comparable in accuracy to the partial derivative method.Another important index of the SIR process is the mean number of infectives m ever generated starting with i initial infectives and n total people. These expectations can be calculated via the recurrences
for 0One can compute the sensitivities of the m to parameter perturbations in the same way as the t. Here is the Julia code for the two means and their sensitivities via the complex perturbation method. Note how our earlier differential function plays a key role.function SIRMeans(p)(delta, eta) = (p[1], p[2])M = zeros(typeof(p[1]),(N+1, N+1)) # mean matrixT = similar(M) # time to extinction matrixfor n = 1:N # recurrence relations loopfor j = 0:(n-1)i = n—ja = i * delta # immunity rateif i = = n # initial conditionsM[i+1, n+1] = iT[i+1, n+1] = T[i, i] + 1 / aelseb = i * (n—i) * eta / N # infection ratec = 1 / (a + b)M[i+1, n+1] = a * c * (M[i, n] + 1) + b * c * M[i+2, n+1]T[i+1, n+1] = c * (1 + a * T[i, n] + b * T[i+2, n+1])endendendreturn [M[:, N+1]; T[:, N+1]]endp = complex([0.2, (0.0417 + 0.0588) / 2]); # delta and beta(N, d) = (100, 1.0e-10);@time (f, df) = differential(SIRMeans, p, d);The left column of Fig 5 displays a heatmap of the expected total number of individuals infected and the right column displays a heatmap of the expected days to extinction of the infection process. Rows 2 and 3 show the sensitivites of these quantities to the η and δ parameters in the stochastic SIR model.
Fig 5
Sensitivity of Stochastic SIR Model.
Heatmaps showing the mean number of infected individuals (M) at extinction, the mean time to extinction (T), and their sensitivities to the parameters η and δ for the stochastic SIR process. Sensitivities rely on the complex perturbation method to calculate derivatives and assume initial conditions S0 = 100 and I0 = 1.
Sensitivity of Stochastic SIR Model.
Heatmaps showing the mean number of infected individuals (M) at extinction, the mean time to extinction (T), and their sensitivities to the parameters η and δ for the stochastic SIR process. Sensitivities rely on the complex perturbation method to calculate derivatives and assume initial conditions S0 = 100 and I0 = 1.It is interesting to compare results from differential sensitivity to estimates from stochastic simulations. To see the difference in accuracy, we calculated the average number of individuals infected and the average time to extinction by stochastic simulation using the software package BioSimulator.jl [24]. Table 1 records the analytic and simulated means of these outcomes in the SIR model. As Table 1 indicates, the simulated means over r = 100 runs are roughly comparable to the analytic means, but the standard errors of the simulated means are large. Because the standard errors decrease as , it is difficult to achieve much accuracy by simulation alone. In more complicated models, simulation is so computationally intensive and time consuming that it is nearly impossible to achieve accurate results. Of course, the analytic method is predicated on the existence of an exact solution or an algorithm for computing the same.
Table 1
Comparison between the calculated and simulated means of SIR model outcomes in the stochastic SIR model simulated under the initial conditions S0 = 3.4×104, I0 = 1 and parameter values η = 0.7194, δ = .5025. Results for the simulated means were obtained using the BioSimulator package in Julia and averaging results over r = 100 runs.
Calculated Mean
Simulated Mean
Simulated Standard Error
Time to Extinction
2.792×10 days
3.074×10 days
4.153 days
Number Infected
5.484×103 people
5.838×103 people
8.551×102 people
Parameter sensitivities inform our judgment in interesting and helpful ways. For example, derivatives of both the total number of infecteds and the time to extinction with respect to η are very small except in a narrow window of the η parameter. This suggests that we focus further simulations, sensitivity analysis, and possible interventions on the region of parameter space where η falls in these windows. Derivatives with respect to δ also depend mostly on η except at very small values of δ. These conclusions are harder to draw from noisy simulations alone.
3.5 Branching processes
Branching process models offer another opportunity for checking the accuracy of sensitivity calculations. For simplicity we focus on birth-death-migration processes [25]. These are multi-type continuous-time processes [26,17] that can be used to model the early stages of an epidemic over a finite graph with n nodes, where nodes represent cities or countries. On node i we initiate a branching process with birth rate β>0 and death rate δ>0. The migration rate from node i to node j is λ≥0. All rates are per person, and each person is labeled by a node. Let λ = ∑λ be the sum of the migration rates emanating from node i. Given this notation, the mean infinitesimal generator of the process is the matrixThe entries of the matrix represent the expected number of people at node j at time t starting from a single person of type i at time 0. The process is irreducible when the pure migration process corresponding to the choice β = δ = 0 for all i is irreducible. Equivalently, the process is irreducible when the graph representing transition probabilities is strongly connected. Henceforth, we assume the process is irreducible and let Γ denote the mean infinitesimal generator of the pure migration process. The process is subcritical, critical, or supercritical depending on whether the dominant eigenvalue ρ of Ω is negative, zero, or positive.To determine the local sensitivity of ρ to a parameter θ [26, 27], suppose its left and right eigenvectors v and w are normalized so that vw = 1. Differentiating the identity Ωw = ρw with respect to θ yieldsIf we multiply this by v on the left and invoke the identities vΩ = ρv and vw = 1 we find thatBecause , it follows that an increase in δ has the same impact on ρ as the same decrease in β. The sensitivity of v and w can be determined by an extension of this reasoning [28]. The extinction probabilities e of the birth-death-migration satisfy the system of algebraic equations
for all i. This is a special case of the vector extinction equation
for a general branching process with offspring generating function P(x) for a type i person [29]. For a subcritical or critical process, e = 1. For a supercritical process all e∈(0,1). Iteration is the simplest way to find e. Starting from e0 = 0, the vector sequence e = P(e) satisfies
and converges to a solution of the extinction equations. Here all inequalities apply component-wise.To find the differential [28] of the extinction vector e with respect to a vector θ of parameters, we assume that the branching process is supercritical and resort to implicit differentiation of the equation e(θ) = P[e(θ), θ]. The chain rule givesThis equation has the solutionThe indicated inverse does, in fact, exist. Alternatively, one can compute an entire extinction curve e(t) whose component e(t) supplies the probability of extinction before time t starting from a single person of type. This task reduces to solving the ODE for by the methods previously discussed.The following Julia code computes the sensitivities of the extinction probability for a two-node process by the complex perturbation method.using LinearAlgebrafunction extinction(p)types = Int(sqrt(1 + length(p)) - 1) # length(p) = 2 * types + types^2(x, y) = (zeros(Complex, types), zeros(Complex, types))for i = 1:500 # functional iterationy = P(x, p)if norm(x—y) < 1.0e-16 break endx = copy(y)endreturn yendfunction P(x, p) # progeny generating functiontypes = Int(sqrt(1 + length(p)) - 1) # length(p) = 2 * types + types^2delta = p[1: types]beta = p[types + 1: 2 * types]lambda = reshape(p[2 * types + 1:end], (types, types))y = similar(x)t = delta[1] + beta[1] + lambda[1, 2]y[1] = (delta[1] + beta[1] * x[1]^2 + lambda[1, 2] * x[2]) / tt = delta[2] + beta[2] + lambda[2, 1]y[2] = (delta[2] + beta[2] * x[2]^2 + lambda[2, 1] * x[1]) / treturn yenddelta = complex([1.0, 1.75]); # death ratesbeta = complex([1.5, 1.5]); # birth rateslambda = complex([0.0 0.5; 1.0 0.0]); # migration ratesp = [delta; beta; vec(lambda)]; # package parameter vector(types, d) = (2, 1.0e-10)@time (e, de) = differential(extinction, p, d)To adapt the code to a different branching process model, one simply supplies the appropriate progeny generating function and necessary parameters.The average number a of infected individuals of type j ultimately generated by a single initial infected individual of type i is also of interest. The matrix A = (a) of these expectations can be calculated via the matrix equation
where F is the offspring matrixOne can determine the local sensitivity of the expected numbers of total descendants by differentiating the equation A = (I−F)−1. The result
depends on the sensitivity of the expected offspring matrix F. Julia code for the complex perturbation method with two nodes follows.function particles(p) # mean infected individuals generatedtypes = Int(sqrt(1 + length(p)) - 1) # length(p) = 2 * types + types^2delta = p[1: types]beta = p[types + 1: 2 * types]lambda = reshape(p[2 * types + 1:end], (types, types))F = complex(zeros(types, types))t = delta[1] + beta[1] + lambda[1, 2](F[1, 1], F[1, 2]) = (2 * beta[1] / t, lambda[1, 2] / t)t = delta[2] + beta[2] + lambda[2, 1](F[2, 1], F[2, 2]) = (lambda[2, 1] / t, 2 * beta[2] / t)A = vec(inv(I—F)) # return as vectorenddelta = complex([1.0, 1.75]); # death ratesbeta = complex([1.5, 1.5]); # birth rateslambda = complex([0.0 0.5; 1.0 0.0]); # migration ratesp = [delta; beta; vec(lambda)]; # package parameter vector(types, d) = (2, 1.0e-10)@time (A, dA) = differential(particles, p, d)
4 Results
We now measure the accuracy, computational speed, and prediction error for adjoint, forward mode, and complex perturbation methods. To account for the variety of settings encountered by biologists, we include two additional ODE models in our comparisons. The ROBER model describes chemical reactions typical of enzymatic behavior [30] and furnishes an example of a stiff ODE system. More information on the ROBER model can be found in S2 Appendix. To compare the three methods in a high-dimensional ODE model, we turn to the mammalian cell cycle (MCC) model. Our MCC model is a simplified version of the original MCC model constructed by Gerard and Goldbetor [31], as explained in more detail in S2 Appendix. The model comprises 11 equations and 15 parameters and captures aspects of cell reproduction and cycling mediated by chemical signaling via cell-state dependent proteins such as tumor repressors, transcription factors, and other DNA replication checkpoints. The model relies on cell state as opposed to cell mass and nicely replicates sequential progression along the cell cycle.
4.1 Accuracy
It is important to understand how close computed differential sensitivities are to true differential sensitivities. Unfortunately, the latter are almost always unavailable for ODE models. For the stochastic SIR and branching process models, true sensitivities are well matched by the approximate sensitivities delivered by the complex perturbation methods, provided the complex perturbation is small enough [32]. As a proxy for comparison to true values in ODE models, one can compute the Euclidean distance between sensitivities delivered by the complex perturbation method and the methods relying on the chain rule. In general, we find that these distances are very small.For the forward and adjoint sensitivities of non-stiff ODEs such as the SIR and CARRGO models, it is known that as one decreases the tolerance of the underlying ODE solver, the solution and its sensitivities converge to their true values [33]. To demonstrate that the same behavior occurs in our cases, we compute the sensitivities of the SIR model and of the ROBER model at t = 1000 using the adjoint, forward, and complex perturbation methods at a variety of tolerances ranging from 1×10−2 to 1×10−8.Fig 6 shows that all three method types (adjoint, forward, and complex perturbation) ultimately converge. In the non-stiff case (the SIR model), the adjoint method requires a step size of 1.0 to converge, while the stiff case (the ROBER model) requires a much smaller step size of 0.1 to converge. Each method converges at a different rate and potentially from a different direction. In the case of a relatively small, non-stiff model, the complex perturbation method converges more quickly (and at a higher tolerance) than the other methods. Notably, when the tolerance for the adjoint method is too weak the error rate increases more dramatically than for the forward method. This behavior becomes even more pronounced if we consider a stiff ODE model such as ROBER. In this case it is worth noting that the forward and complex perturbation methods converge, albeit under a more stringent tolerance. The adjoint method however struggles to converge for the ROBER model unless the step size is decreased to 0.1 (as shown in the Fig 6). While the smaller step size does allow the adjoint method to converge even in the stiff case, this smaller step size is much more computationally intensive and, in many cases, may be infeasible.
Fig 6
Convergence of Adjoint, Forward, and Complex Perturbation Methods for Numerical Sensitivities.
Convergence plot of the SIR model (left) and ROBER model (right) simulated over t = 1000 days. For SIR the initial conditions are S0 = 3.4×108, and I0 = 100, and the parameters are η = 0.7194 and δ = 0.5025. For ROBER the initial conditions are x1 = 1.0, x2 = 0.0, and x3 = 0.0, and the parameters are p1 = 4×10−2, p2 = 3×107, and p3 = 1×104. First-order sensitivities are computed via code from this manuscript (complex perturbation method), the ForwardDiff.jl package (forward method), and the Rodas4(autodiff = false) solver under the QuadratureAdjoint(autojacvec = EnzymeVJP()) sensealg protocol in the DiffEqSensitivities.jl package (adjoint method). The adjoint method requires a step size of 1.0 for the SIR model and a step size of 0.1 in the ROBER model to converge. All results are normalized by the number of time steps included in the simulation.
Convergence of Adjoint, Forward, and Complex Perturbation Methods for Numerical Sensitivities.
Convergence plot of the SIR model (left) and ROBER model (right) simulated over t = 1000 days. For SIR the initial conditions are S0 = 3.4×108, and I0 = 100, and the parameters are η = 0.7194 and δ = 0.5025. For ROBER the initial conditions are x1 = 1.0, x2 = 0.0, and x3 = 0.0, and the parameters are p1 = 4×10−2, p2 = 3×107, and p3 = 1×104. First-order sensitivities are computed via code from this manuscript (complex perturbation method), the ForwardDiff.jl package (forward method), and the Rodas4(autodiff = false) solver under the QuadratureAdjoint(autojacvec = EnzymeVJP()) sensealg protocol in the DiffEqSensitivities.jl package (adjoint method). The adjoint method requires a step size of 1.0 for the SIR model and a step size of 0.1 in the ROBER model to converge. All results are normalized by the number of time steps included in the simulation.
4.2 The speed versus accuracy trade-off
The trade-off between computational speed and accuracy is relevant to solving ODE systems whether they are stiff or not. Fig 7 displays the time versus error trade-off for both the SIR (non-stiff) and ROBER (stiff) models. In this case, error is calculated as the Euclidean distance between the derivatives calculated at various error tolerances and the derivatives calculated at a strict tolerance of 1×10−8 (for the SIR model) and 1×10−5 (for the ROBER model). We chose these tolerances as the strictest possible that are numerically realistic for each model. Fig 6 demonstrates that our choices are strict enough for the methods to reach convergence. We display errors versus time in a log-log plot averaged over compartments and parameters and normalized by length of time. We do not include the adjoint method in this comparison due to its difficulties in convergence and large computational cost.
Fig 7
Time vs Error of Forward and Complex Perturbation Methods for Numerical Sensitivities.
Time versus error log-log plot of the SIR model (left) and ROBER model (right) simulated over t = 1000 days. For SIR the initial conditions are S0 = 3.4×108, and I0 = 100, and the parameters are η = 0.7194 and δ = 0.5025. For ROBER the initial conditions are x1 = 1.0, x2 = 0.0, and x3 = 0.0, and the parameters are p1 = 4×10−2, p2 = 3×107, and p3 = 1×104. First-order sensitivities are computed via code from this manuscript (complex perturbation method) and the ForwardDiff.jl package (forward method). Times reported are the median times computed using the Benchmark.jl package, and errors are the Euclidean distance between the solution at the strictest tolerance (10−8 for SIR and 10−5 for ROBER) and the solution at a variety of tolerances with a maximum of 10−2. All errors are normalized by the number of time steps.
Time vs Error of Forward and Complex Perturbation Methods for Numerical Sensitivities.
Time versus error log-log plot of the SIR model (left) and ROBER model (right) simulated over t = 1000 days. For SIR the initial conditions are S0 = 3.4×108, and I0 = 100, and the parameters are η = 0.7194 and δ = 0.5025. For ROBER the initial conditions are x1 = 1.0, x2 = 0.0, and x3 = 0.0, and the parameters are p1 = 4×10−2, p2 = 3×107, and p3 = 1×104. First-order sensitivities are computed via code from this manuscript (complex perturbation method) and the ForwardDiff.jl package (forward method). Times reported are the median times computed using the Benchmark.jl package, and errors are the Euclidean distance between the solution at the strictest tolerance (10−8 for SIR and 10−5 for ROBER) and the solution at a variety of tolerances with a maximum of 10−2. All errors are normalized by the number of time steps.Fig 7 demonstrates the clear trade-off between speed and accuracy in both the stiff (ROBER) and non-stiff (SIR) cases. In both cases, the forward method can be computed more quickly for equal errors than the complex perturbation method. As expected, the ROBER model has a less steep slope compared to the SIR model, indicating that the returns in accuracy grow more slowly per time invested for a stiff ODE system.
4.3 Computational speed
Speed is an important attribute of any computational method, especially when it is performed without the benefit of computational clusters or distributed computing resources. Our speed comparisons offer a first look at the efficiency gains possible with multithreading. In implementing multithreading for both the complex perturbation and forward mode methods, we call the Polyester.jl package to compute each partial derivative in a separate thread. All computations were done in Julia version 1.7.1 on a Windows operating system with an Intel Core i7-8565U CPU.In addition to multithreading, the forward method as implemented in ForwardDiff.jl package provides the capability of multichunking. This involves splitting the equations in each system into different chunks to be solved separately. While forward methods do benefit from chunking, this tactic is unavailable in many packages outside of ForwardDiff.jl or outside of the Julia language. For biologists who depend on other packages and computer languages, it may be more pertinent to focus on the non-chunked results for the forward method.Tables 2, 3, 4 and 5 record the computational speed of the complex perturbation, forward, and adjoint methods (and their multithreaded and multi-chunked versions, as applicable) for four ODE systems models (SIR, CARRGO, ROBER, and MCC). Our comparisons of the first-order methods show that the forward and complex perturbation methods perform comparably, while the adjoint method performs orders of magnitude slower than the other two. The fastest method is the multichunked forward method, with the complex perturbation method a close second for the simpler ODE systems such as SIR and CARRGO. For the stiff (ROBER) and large (MCC) models however, the complex perturbation method falls further behind the multichunk forward mode method. This could be expected from the larger gap between the time versus accuracy curves in the ROBER model as compared with the SIR model and illustrated in Fig 7. However, it is noteworthy that naive implementations of forward mode differentiation lack the advantage of chunking and are consequently slower than the complex perturbation method.
Table 2
Computational time (μs) for the SIR ODE model. Parameters match those previously introduced in this manuscript. Multithread refers to parallelism across parameters. Multichunk refers to parallelism across compartments. We invoke the Julia solver AutoVern9(Rodas5(autodiff = false)) with a tolerance of 1×10−5, reflecting the convergence tolerance. For the second-order adjoint method, the ForwardDiffOverAdjoint(QuadratureAdjoint(autodiff = false) solver option was used.
First-order Methods
tend = 10
tend = 100
tend = 1000
Complex Perturbation
2.252×102
1.688×103
1.377×104
Complex Perturbation Multithread
1.913×102
1.401×103
1.062×104
Forward
3.272×102
2.036×103
1.460×104
Forward Multithread
2.218×102
1.480×103
1.117×104
Forward Multichunk
1.567×102
9.564×102
7.247×103
Forward Multichunk Multithread
1.499×102
9.526×102
7.236×103
Adjoint
8.901×104
7.707×106
6.950×108
Second-order Methods
tend=10
tend=100
tend=1000
Complex Perturbation
7.885×102
5.712×103
5.806×104
Complex Perturbation Multithread
6.732×102
4.528×103
3.724×104
Forward
9.325×102
5.280×103
4.530×104
Forward Multithread
7.546×102
3.504×103
2.640×104
Forward Multichunk
1.742×102
7.601×102
4.541×103
Forward Multichunk Multithread
1.714×102
7.270×102
4.631×103
Adjoint
2.976×104
6.240×105
1.626×107
Table 3
Computational time (μs) for the CARRGO ODE model. Parameters match those previously introduced in this manuscript. Multithread refers to parallelism across parameters. Multichunk refers to parallelism across compartments. We invoke the Julia solver AutoVern9(Rodas5(autodiff = false)) with a tolerance of 1×10−5, reflecting the convergence tolerance. For the second-order adjoint method, the ForwardDiffOverAdjoint(QuadratureAdjoint(autodiff = false) solver option was used.
First-order Methods
tend = 10
tend = 100
tend = 1000
Complex Perturbation
3.977×102
2.195×103
2.332×104
Complex Perturbation Multithread
3.661×102
2.480×103
2.330×104
Forward
5.404×102
2.597×103
2.505×104
Forward Multithread
4.527×102
2.601×103
2.336×104
Forward Multichunk
3.759×102
1.661×103
1.417×104
Forward Multichunk Multithread
2.699×102
1.352×103
1.215×104
Adjoint
6.118×104
5.097×106
7.825×108
Second-order Methods
tend=10
tend=100
tend=1000
Complex Perturbation
2.039×103
1.245×104
1.469×105
Complex Perturbation Multithread
2.123×103
1.206×104
1.573×105
Forward
2.749×103
1.239×104
1.376×105
Forward Multithread
1.737×103
1.011×104
1.735×105
Forward Multichunk
1.097×103
4.475×103
5.382×104
Forward Multichunk Multithread
7.135×102
3.181×103
3.967×104
Adjoint
2.048×104
2.795×105
7.536×106
Table 4
Computational time (μs) for the ROBER ODE model. Parameters match those previously introduced in this manuscript. Multithread refers to parallelism across parameters. Multichunk refers to parallelism across compartments. We invoke the Julia solver Rodas4(autodiff = false) with a tolerance of 1×10−7, reflecting the convergence tolerance. Second-order adjoint method not included at t = 1000 due to time constraints. For the second-order adjoint method, the ForwardDiffOverAdjoint(QuadratureAdjoint(autodiff = false) solver option was used.
First-order Methods
tend = 10
tend = 100
tend = 1000
Complex Perturbation
2.475×103
4.111×103
4.117×103
Complex Perturbation Multithread
1.549×103
2.600×103
5.016×103
Forward
3.029×103
4.544×103
8.271×103
Forward Multithread
1.726×103
2.905×103
4.766×103
Forward Multichunk
1.471×103
2.422×103
4.113×103
Forward Multichunk Multithread
1.343×103
2.442×103
3.902×103
Adjoint
1.456×108
2.656×109
2.069×1010
Second-order Methods
tend=10
tend=100
tend=1000
Complex Perturbation
7.985×103
1.250×104
2.306×104
Complex Perturbation Multithread
5.157×103
8.868×103
1.763×104
Forward
7.422×103
1.101×104
2.291×104
Forward Multithread
4.062×103
6.131×103
1.403×104
Forward Multichunk
1.420×103
2.157×103
3.655×103
Forward Multichunk Multithread
1.439×103
2.159×103
3.552×103
Adjoint
3.669×107
7.388×108
–
Table 5
Computational time (μs) for the MCC ODE model. Parameters match those previously introduced in this manuscript. Multithread refers to parallelism across parameters. Multichunk refers to parallelism across compartments. We invoke the Julia solver AutoVern9(Rodas5(autodiff = false)) with a tolerance of 1×10−5, reflecting the convergence tolerance. For the second-order adjoint method, the ForwardDiffOverAdjoint(QuadratureAdjoint(autodiff = false) solver option was used.
First-order Methods
tend = 10
tend = 100
tend = 1000
Complex Perturbation
2.952×103
2.588×104
8.50×104
Complex Perturbation Multithread
1.806×103
1.521×104
4.612×104
Forward
2.758×103
1.527×104
7.741×104
Forward Multithread
2.147×103
1.524×104
4.646×104
Forward Multichunk
1.071×103
6.806×104
1.709×104
Forward Multichunk Multithread
8.038×102
5.494×103
1.325×104
Adjoint
3.601×105
3.029×107
3.332×109
Second-order Methods
tend=10
tend=100
tend=1000
Complex Perturbation
3.336×104
4.457×105
1.262×106
Complex Perturbation Multithread
3.969×104
2.969×104
1.198×106
Forward
6.331×104
5.213×105
1.383×106
Forward Multithread
3.465×104
3.445×105
1.116×106
Forward Multichunk
2.257×104
1.392×105
2.886×105
Forward Multichunk Multithread
1.544×104
8.824×104
2.007×105
Adjoint
6.589×105
2.041×107
7.388×108
The adjoint method also has the worst time performance of the second-order methods by orders of magnitude. Both the forward and complex perturbation methods performed well in all four ODE systems models, with the complex perturbation method performing particularly well in models where the number of parameters is not large compared to the number of equations.While multi-threading usually decreases computational time for both first-order and second-order methods, it does not decrease computational time by as wide of a margin as expected. Many of the solver methods for stiff ODEs rely on BLAS operations that are already internally optimized by running on multiple threads. Explicitly multi-threading sensitivity methods therefore restricts the number of threads available for BLAS operations, adversely affecting their performance. In addition to the reduced efficiency of BLAS operations, multi-threading incurs a start-up cost for each thread. These start-up costs may overshadow the benefits of multi-threading if the amount of computation per thread is not high enough. Multi-threaded methods require more allocations than other methods, and thus require more garbage collection. While time spent on garbage collection varies, we find that garbage collection can take over twice as much computational time in multi-threaded methods than in their single-threaded counterparts. Thus, multi-threading can only really start to improve computational efficiency when these additional costs are small compared to the cost of each computation. Multi-threading may even be less efficient in some cases.Tables 6 and 7 compare the computational speeds of the different methods for the stochastic SIR and branching process models. As expected for the stochastic SIR model, computational speed varies roughly quadratically with the number N of individuals in the system. In the stochastic SIR model, the complex perturbation method proves to be twice as fast as the manual differentiation of (Eq 10) and (Eq 8) because the latter requires a larger number of individual computations. For the branching process model however, this trend reverses since manual differentiation relies on fast linear algebra rather than iteration and avoids the overhead of complex arithmetic. The derivatives of A are matrix equations, and in this case forward mode differentiation even without chunking performs as well as the complex perturbation method, although it does not scale as well to larger systems (N = 1000). However, in the case of the derivatives of e, which are calculated using recursion, neither implementation of forward mode differentiation can be computed as quickly as the complex perturbation method, and this difference increases with the size of the system. Other evidence not shown suggests that the complex perturbation method can reliably evaluate sensitivities where solutions depend on linear algebra and/or recurrence relations. In summary, unless derivatives are quite complicated, manual differentiation is generally more computationally efficient than either the complex perturbation method or the forward method. In computing second derivatives, we expect the tables will be turned. To their credit, the forward and complex perturbation methods do not require formulating derivatives analytically in advance and are consequentially easier to implement.
Table 6
Computational time (μs) in the stochastic SIR model. Model parameters match those previously described in this manuscript. Manual differentiation relies on differentiating Eq 7 and Eq 6 for the stochastic SIR model.
∂M/∂δ
N = 10
N = 100
N = 1000
Complex Perturbation
1.90×101
1.634×103
1.975×105
Manual Differentiation
3.80×101
3.879×103
4.925×105
∂T/∂δ
N = 10
N = 100
N = 1000
Complex Perturbation
1.86×101
1.620×103
2.006×105
Manual Differentiation
3.45×101
4.213×103
4.875×105
Table 7
Computational time (μs) for the branching process model. Model parameters are generated randomly on the range β∈[0.05,0.16], δ∈[0.05,0.19], and λ∈[0.0003,0.00046]. Manual differentiation relies on differentiating Eq 11 and Eq 9 for the branching process model.
∂A/∂δ1
N = 10
N = 100
N = 1000
Complex Perturbation
2.43×103
2.33×105
1.36×108
Manual Differentiation
1.08×101
3.75×104
4.97×105
Forward
1.79×102
2.96×105
–
Forward Multichunk
2.85×101
1.32×105
1.39×109
∂e/∂δ1
N = 10
N = 100
N = 1000
Complex Perturbation
1.04×103
3.44×104
4.25×106
Manual Differentiation
4.26×102
4.90×104
3.19×106
Forward
1.03×104
1.33×106
–
Forward Multichunk
1.23×103
1.12×106
2.27×109
4.4 Prediction error
In general, prediction error measures how well the first and second-order sensitivities capture the change in behavior of a model. Since we have previously shown that the various methods for computing differential sensitivity yield nearly the same results, prediction error is a good metric for determining the value of differential sensitivity in a particular model. We measure prediction error by the Euclidean normsOther norms, such as the ℓ1 and ℓ∞ norms, yield similar results. In the ODE models, f(x) denotes a matrix trajectory so the Frobenius norm applies. To capture proportional prediction errors, we normalize all vector outputs by their length and all matrix outputs by the square of their length.Prediction accuracy varies widely between models and even between parameters. As we expect however, second-order approximations are more accurate in prediction. Tables 8, 9 and 10 record prediction errors for each model. For the ODE systems, we see that stiffness highlights the added value of the second-order approximations. In the ROBER and CARRGO models, the second-order approximations have an order of magnitude less prediction error than the first-order approximations. However, stiffness does not appear to impact how the prediction errors grow over time. The ROBER and MCC models do not suffer from increased errors per time point after longer prediction intervals.
Table 8
Prediction error results for ODE models.
Derivatives are calculated with the forward method. All predictions are for a 10% change in parameter. Parameters match those previously introduced in this manuscript.
ODE Models
tend = 10
tend = 100
tend = 1000
SIR First Order
3.370×101
2.444×106
2.524×105
SIR Second Order
8.208×100
2.299×106
2.303×105
CARRGO First Order
6.195×10−1
2.801×103
1.465×105
CARRGO Second Order
1.116×10−2
4.956×102
4.217×104
ROBER First Order
3.205×10−5
3.588×10−5
1.837×10−5
ROBER Second Order
1.753×10−6
2.201×10−6
1.039×10−6
MCC First Order
3.467×10−4
7.556×10−5
1.542×10−4
MCC Second Order
1.268×10−4
1.922×10−5
3.918×10−5
Table 9
Prediction error results for the stochastic SIR model.
Derivatives are calculated with the complex perturbation. All predictions are for a 10% change in parameter. Parameters match those previously introduced in this manuscript.
Stochastic SIR Model
N = 10
N = 100
N = 1000
Total Number Infected (M) from η
2.322×10−3
2.009×10−2
1.241×101
Total Number Infected (M) from δ
4.456×10−3
2.586×10−2
3.670×10−2
Time to Extinction (T) from η
1.601×10−3
6.715×10−3
4.046×10−3
Time to Extinction (T) from δ
2.074×10−1
1.599×10−1
4.811×10−2
Table 10
Prediction error results for the branching process model.
Derivatives are calculated with the complex perturbation method. Parameters are generated randomly on the range β in [0.05,0.16], λ in [0.0003,0.00046], and δ = β+.03 for calculation of a sub-critical system (A) and δ = β−.03 for calculation of a super-critical system (e).
Branching Process Model
N = 10
N = 100
N = 1000
Total Number Infected (A) from β1
3.025×10−2
7.020×10−6
7.234×10−9
Total Number Infected (A) from δ1
5.036×10−2
8.734×10−5
1.348×10−7
Total Number Infected (A) from λ1,1
2.402×10−4
4.229×10−6
4.152×10−8
Extinction Probability (e) from β1
1.119×10−4
3.476×10−7
7.257×10−10
Extinction Probability (e) from δ1
7.682×10−4
3.776×10−6
9.424×10−9
Extinction Probability (e) from λ1,1
5.123×10−5
3.044×10−6
5.800×10−8
Prediction error results for ODE models.
Derivatives are calculated with the forward method. All predictions are for a 10% change in parameter. Parameters match those previously introduced in this manuscript.
Prediction error results for the stochastic SIR model.
Derivatives are calculated with the complex perturbation. All predictions are for a 10% change in parameter. Parameters match those previously introduced in this manuscript.
Prediction error results for the branching process model.
Derivatives are calculated with the complex perturbation method. Parameters are generated randomly on the range β in [0.05,0.16], λ in [0.0003,0.00046], and δ = β+.03 for calculation of a sub-critical system (A) and δ = β−.03 for calculation of a super-critical system (e).In the stochastic SIR model, prediction error does not seem to be compounded at all; in fact, the error per value calculated decreases in the case of . In the case of branching processes with many types N and large parameter sets, it is inadvisable to compare prediction accuracy across system sizes. However, we can conclude from these results that at least the prediction error does not compound as N increases. Furthermore, prediction accuracy for the branching process models appears to vary dramatically depending on the parameter in question.
5 Discussion
Our purpose throughout has been to demonstrate the ease and utility of incorporating differential sensitivity analysis in dynamical modeling. Because models are always approximate, and parameters are measured imprecisely, uncertainty plagues virtually all dynamical models. While improving models is incremental and domain specific, sensitivity analysis provides a handle on local parameter uncertainty across models.Of the methods mentioned in this text, the adjoint method, forward method, and complex perturbation methods all require that the functions defining a model be differentiable in the underlying parameters. While the complex perturbation method has the additional requirement that these functions be complex analytic, it is the only method explored in this manuscript that can be extended to discrete stochastic models in addition to ODE systems. For the modeler who prefers a one size fits all approach, or who prefers to prioritize ease of implementation, we argue that the complex perturbation method should be the method of choice. In addition to its wide range of applicability, the complex perturbation method can be easily multi-threaded and requires only implementation of the component functions of the model. In contrast to the second-order complex perturbation method, forward differentiation slows dramatically in calculating a Hessian directly. It becomes competitive if one calculates the gradient of the gradient. The gradient of the gradient method is not always available natively and usually must be implemented separately as we have done in the current manuscript. Crucially, implementing a specialized forward mode method was possible due to the underlying automatic differentiation software’s flexibility and support for composition.In situations demanding computational speed, our results suggest that choosing a method tailored to a model may be pertinent. In the case of stochastic models, manually differentiating and applying the chain rule must be balanced against the complex perturbation method, which requires less effort up front but longer processing after the derivatives have been determined. For ODE systems models, forward mode is the most computationally efficient when multichunking is available. If multichunking is not available, then the complex perturbation method has comparable speed to the forward method when run with the same tolerance. In maximizing computational efficiency, it is important to note that the use of automatic differentiation tools may require more user input for algorithm selection or multi-threading implementation. Choice of software is critical as well; not all software packages with automatic forward differentiation support chunking as implemented in the ForwardDiff.jl package and that so dramatically improves the computational efficiency of this method.There are additional challenges to computing model sensitivity that we do not address. For example, not all models use functions that are differentiable in their parameters. Additionally, models may be differentiable yet extremely stiff, in which case the computational time for each sensitivity method discussed here will suffer disproportionally as the number of parameters grows. Furthermore, assessing global parameter sensitivity is more challenging. It can be attacked by techniques such as Latin square hypercube sampling or Sobel quasi-random sampling, but these become infeasible in high dimensions [34]. Given the availability of appropriate software, differential sensitivity is computationally feasible, even for high-dimensional systems.In the case of stochastic models, traditional methods require costly and inaccurate simulation over a bundle of parameter values. Differential sensitivity is often out of the question. Current automatic differentiation systems, such as PyTorch, Zygote and ForwardDiff, treat generated random numbers as constants, and thus are not reliable methods for use in calculating differential sensitivity of model outcomes that depend on these random variables. This limits the ability of researchers to understand a biological system and how it responds to parameter changes. If a system index such as a mean, variance, extinction probability, or extinction time can be computed by a reasonable algorithm, then differential parameter sensitivity analysis can be undertaken. We have indicated in a handful of examples how this can be accomplished.In summary, across many models representative of computational biology, we have reached the following conclusions:Forward mode, adjoint, and complex perturbation sensitivity methods all converge to the same differential sensitivity values in non-stiff models, thus offering the same level of accuracy for all methods. For stiff models, forward mode and complex perturbation methods converge but adjoint sensitivity struggles and does not achieve convergence for realistic tolerance parameters.Chunked forward mode automatic differentiation and forward mode sensitivity analysis tend to be the most computationally efficient on the tested models.The complex perturbation methods described in this manuscript are competitive and often outperform the unchunked version of forward mode automatic differentiation, while being less sensitive to stiffness than the adjoint methods.Shared memory multi-threading of the complex perturbation and forward mode automatic differentiation methods provides a performance gain but only in high-dimensional systems.Forward mode automatic differentiation method requires that each step of a calculation is differentiable. This renders it unusable for calculating the derivative of ensemble means of discrete state models, such as birth-death processes. For these cases, the complex perturbation method outperforms manual differentiation.The complex perturbation method is competitive with automatic differentiation methods in accuracy, is more straightforward to implement, and can be applied to a wider variety of methods.These conclusions are tentative but supported by our limited number of biological case studies.We note that the performance differences may change depending on the efficiency of the implementations. The Julia DifferentialEquations.jl library and its DiffEqSensitivity.jl package have been shown to be highly efficient, outperforming other libraries in both equation solving and derivative calculations in Python, MATLAB, C, and Fortran [19,33]. Details on the current state of performance can be found at https://github.com/SciML/SciMLBenchmarks.jl.The automatic differentiation implementations in machine learning libraries optimize array operations much more than scalar operations. This can work to the detriment of forward mode AD. MATLAB or Python style vectorization improves the performance of forward mode AD sensitivity analysis by reducing interpreter overhead. Therefore, our conclusions serve as guidelines for the case where all implementations are well-optimized. For programming languages with high overheads or without compile-time optimization of the automatic differentiation passes, the balance in efficiency shifts more favorably towards the complex perturbation method.One last point worth making is on the coding effort required by the various methods. Both automatic differentiation and the complex perturbation method have comparable accuracy when applied to systems of ODEs, with automatic differentiation having the advantage in speed when it is implemented with the additional level of parallelization provided by chunking. However, the complex perturbation method can easily be generalized to other kinds of objective functions and may be more straightforward to implement for those less sophisticated in computer science. While automatic differentiation is the basis of many large scientific packages, the code required for the complex perturbation methods is fully contained within this manuscript and is easily transferable to other programming languages with similar dispatching on complex numbers. This hard to measure benefit should not be ignored by practicing biologists who simply wish to quickly arrive at reasonably fast code.
Derivation of Second Derivative Complex Perturbation Method.
(DOCX)Click here for additional data file.
Additional Models.
(DOCX)Click here for additional data file.
Sensitivity of Linear Systems.
(DOCX)Click here for additional data file.10 Dec 2021Dear Dr. Lange,Thank you very much for submitting your manuscript "Differential Methods for Assessing Sensitivity in Biological Models" for consideration at PLOS Computational Biology.As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.Especially ensure to highlight why the readers of PLoS Computational biology are the right audience for this manuscript.We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.When you are ready to resubmit, please upload the following:[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).Important additional instructions are given below your reviewer comments.Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,Attila Csikász-NagyAssociate EditorPLOS Computational BiologyMark AlberDeputy EditorPLOS Computational Biology***********************Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: This paper is a useful overview and comparison of sensitivity analysis methods for deterministic and stochastic biological models. I fully agree that (time dependent) sensitivity is fundamental in parameter estimation, experiment design, control input selection or measurement setup. Moreover, in many cases, researchers tend to apply less efficient (e.g., simulation based) methods, although there are more powerful and also more informative alternatives. In this respect this paper has a really high tutorial value. The paper is generally well-written and it successfully balances between the necessary formal notations and textual explanation. The presented Julia codes are useful and easy to understand.Specific comments:- In the Introduction part, it would be even more motivating to write about how sensitivity may determine the quality of parameter estimation results, e.g. in the context of structural and/or practical identifiability.- I'd suggest structuring Section 2 into subsections corresponding to the different methods described.- There seems to be a typo in the inline formula of the first order Taylor approximation in line 75, page 3.- I'd suggest changing superscript t for the notation of transposed to someting else, e.g. to superscript T. Otherwise it can be confused with the time argument.- Please characterize the parameter dependence of matrix $A$ on $\\beta$ in the beginning of Section 2.- What is 'original' in Fig. 4?- In Table 3, the label of the 3rd sub-column of 'Time' should be $t_{end}=1000$- Similarly, in Table 4, the 3rd sub-column of 'Time' should be $N=1000$- Does irreducibility of the branching process mean that the transition graph is strongly connected?Reviewer #2: This work showcases the use of differential sensitivity analysis for a small set of biologically relevant models including ODEs and markov chains. As for as I can tell the only novel result in this paper is equation (4) where the authors make use of the complex plane to approximate hessians. The main results focus of performance and complexity benchmarks between this analytic approach, forward-mode autodiff and finite differences. While such studies are important in machine learning literature, they are of little interest to a biological audience and hence not appropriate for publication in this journal.The overall quality of figures is poor, using the default styling of the Julia library Plots.jl and paying little attention to matching notation between code, figure legends and math in the main text. The narrative reads more like a tutorial and my recommendation would be to restructure this work as such, perhaps submitting to next year's JuliaCon workshops. I've attached the main text with comments on how to improve the text.Reviewer #3: This work compares different numerical methods for computing the local sensitivities of parameter-dependent ODEs. The authors implement three classical methods (forward, adjoint, automatic differentiation) and a newer complex analytic method that is described in this manuscript (although it is not clear how much of this fourth method is a unique contribution of the current manuscript). Helpful Julia code is provided to illustrate each approach (although this code sometimes disrupts the flow of the manuscript and could be replaced with pseudocode and moved to an appendix or supplemental material). The numerical examples are based on ODEs and stochastic models taken from oncology and epidemiology. By comparing the accuracy and computational cost across methods, the authors hoped to provide practical guidance for computational biologists to decide which numerical method to implement for their sensitivity analysis needs. The goal of the manuscript seemed to be to provide insights into the complex tradeoff between accessibility, speed, and accuracy of these four methods as they are applied to dynamic models in biology. This goal is extremely worthwhile and exciting. If achieved, a careful exposition on the advantages and disadvantages of different methods for sensitivity analysis for different types of deterministic and stochastic biological models would be very helpful toa wide audience of computational scientists. However, as it is currently written, the manuscript falls very short of these goals, and substantial improvements both in clarity of presentation and rigor of the numerical analyses would be needed for the manuscript to be acceptable for publication. More detailed comments are given below.Major comments1) The metrics used to quantify accuracy do not address the accuracy in the calculation of the local sensitivity (section 4, lines 613-614). Rather, it appears that these metrics quantify how well the local behavior of the model can be approximated by a linear or quadratic Taylor approximation. Still, it leaves open whether the differentials themselves are well-approximated by the numerical methods. The latter seems to be more interesting to the users of these numerical methods, who may have chosen to use a 1st or 2nd order sensitivity analysis and wish to know which is more accurate/fast/accessible.2) In practice, users tend to face the complex tradeoff between different algorithmic parameters, such as the tolerance of the ODE solver, the step size of the finite-difference method versus computational budget. The provided numerical benchmarks do not seem to give any insights into these important concerns. Please make more explicit what ODE solver was used and what were the values of the error tolerances. Could the users afford to relax some of those error tolerances and still achieve acceptable accuracy? Perhaps a graph showing the dependence between (complex) finite-difference step size and/or ODE solver tolerance versus accuracy will be helpful here.3) Parameter sensitivity analysis is a well-established field that is heavily used in computational biology for the interpretation and design of experiments. Unfortunately, as this paper is written, the introduction and discussion sections do not provide a thorough enough discussion of previous works to set the stage or motivate new efforts this important topic. Instead, the introduction reads more like a collection of anecdotes and generalizations, and the reference list is incredibly short for such a well-studied topic, lacking references from key players in the field such as M. Stumpf, DE Kirschner, F. Doyle, M. Khammash. A more thorough and scientifically documented introduction that better explains the uses for, and limitations of, existing sensitivity analyses would help place the current work in a far more compelling context that would motivate readers to appreciate the importance of the manuscript’s results.4) The writing of this document does not provide enough discussion of the advantages and limitations of the obtained results, and the manuscript does not sufficiently describe the biological contexts in which these results will, or will not, be important.5) There are problems with, or missing information in, many figures: Figure 2 lacks a y-axis label. Figure 1 is cropped to obscure the colorbar and is missing a label for the colorbar. Several figure captions are not provided or are not sufficiently descriptive. Fonts on several figures are too small to read for both labels and axes values, and some of the plot lines are too thin to see very well. Please consider carefully to improve the quality and format of the figures -- this could increase the overall quality of the final document.6) The discussion of the advantage of multithreading on the MCC model (Lines 664-665 main text) is not convincing and seems to contradict the CPU times reported in the last sub-table of Table 2. In particular, the multi-threaded versions of both first and second-order methods seem to improve only slightly for the case t_end=10 and have mixed outcomes for t_end=100 and t_end=1000. Moreover, the specifics of how multithreading is accomplished are not sufficiently clear in the manuscript.Minor Comments7) The insertion of the Julia codes in the main text is somewhat distracting, especially for readers who may be more experienced in other coding languages. We would recommend moving the specific Julia codes to appendices and GitHub links, and then replace the algorithms in the main text with language-agnostic pseudo code.8) Multithreading is mentioned in the manuscript, but the example Julia routines did not seem to have any thread directives. How much complexity would be added to the code if we wish to implement the multi-threaded versions of the methods?9) Please provide more details on how parallelism is done for the implementations. For example, with the complex-analytic method, is each function evaluation distributed among threads, but is the computation of the ODE at each parameter perturbation done sequentially?10) In Table 2, it is not clear what the difference between "Parallel FD.jl" and "Multi-Threaded FD.jl" is.11) In most cases shown in Table 2, it appears that the "Multi-Threaded Analytic" version does not improve, and even slightly worsen, computational time compared to "Analytic." If the computations of the gradient entries are done in parallel, I would expect something close to linear scalability. Is there any explanation for this?12) In Table 5 of Appendix C, what is the difference between "Parallel FD.jl Gradient" and "Parallel FD.jl Multi-Threaded Gradient"? Are there other parallelisms other than multithreading?13) Appendix B. Mammalian Cell Cycle Model. The authors cite reference 23 as the source of this model. But a quick inspection of the reference reflects a model with 39 ODEs versus the 11 equations in this text. Also, the initial condition values reported in this manuscript are different to the initial conditions reported on the original document. Is the model presented here a simplification of the one presented by Gerard and Goldbetor? If this is the case, why are the authors not explaining these changes to the original model? How are these modifications affecting the original model?14) Some equations are numbered but others are not? The manuscript would be easier to read with all equations given a number.15) Line 254, page 12: please provide the citation.16) There seem to be typos in Table 2 where, for example, "t_end=100" is repeated twice. Perhaps the third column should be "t_end=1000"? Same with "N=100" in Table 3.17) To improve readability, please provide more information in the captions of the tables, explaining how the methods are mapped to the numerical schemes in the main text. For example, what does "Parallel FD.jl" in Table 2 correspond to?18) Page 3, line 56 – change "These optimization techniques depend heavily on gradients and and Hessians with respect to parameters." to "These optimization techniques depend heavily on gradients and Hessians with respect to parameters."19) Please hyphenate two-word adjectives before a noun.20) In the paragraph starting on line 149, the adjoint variable term \\lambda is not to defined or described to sufficient detail.21) At or near line 72, please state explicitly that \\Beta refers to the model parameters – this will make it clearer to readers later on.22) In section 3.3, how are the second order terms calculated? This does not appear to be included in the preceding Julia codes.23) Please carefully check the whole document and make sure that no typos and errors exist in the document's final version.Reviewer #4: There isn't really much to say about this paper, it's a survey of different approaches to computing time-dependent sensitivities. I think it's a useful contribution, not groundbreaking in any way but a useful survey of methods and how well they perform. As far as I could tell, there weren't any really new algorithms developed but an assessment of existing ones. I think the provision of Julia code is very useful. My only tiny issue is with the forward difference method mentioned on page 8, no one in their right mind would use that, but the authors do mention the use of the two-point method in the following paragraph. One question that authors might consider in the discussion is how well these methods perform on really stiff problems and the problem with fast changing variables where the derivatives change rapidly. Differentiation is notoriously bad in such situations, and we, for example, have to resort to Richardson's extrapolation for tough problems, which is particularly slow.**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: YesReviewer #2: No: The referenced code repository [https://github.com/rachelmester/SensitivityAnalysis] appears to be privateReviewer #3: YesReviewer #4: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: Yes: Grisha SzepReviewer #3: NoReviewer #4: NoFigure Files:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at .Data Requirements:Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.Reproducibility:To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocolsSubmitted filename: PCOMPBIOL-D-21-01956_reviewer.pdfClick here for additional data file.4 Mar 2022Submitted filename: reviewresponse_DifferentialSensitivity.pdfClick here for additional data file.8 Apr 2022Dear Ms. Mester,Thank you very much for submitting your manuscript "Differential Methods for Assessing Sensitivity in Biological Models" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.Ensure that all figures are publication quality and consider suggestions by referee 2.Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.When you are ready to resubmit, please upload the following:[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).Important additional instructions are given below your reviewer comments.Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,Attila Csikász-NagyAssociate EditorPLOS Computational BiologyMark AlberDeputy EditorPLOS Computational Biology***********************A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:[LINK]Reviewer's Responses to QuestionsComments to the Authors:Please note here if the review is uploaded as an attachment.Reviewer #1: The manuscript has been significantly improved compared to the initial version. The authors have addressed all of the comments and clearly explained and justified the modifications. I maintain that this paper is a really useful overview of parametric sensitivity and the related computational tools for the analysis of biological systems described by dynamical models.Reviewer #2: Although the authors have made positive changes to the manuscript with sections 1-4. The quality of figures and math is still poor. In my opinion the paper is now somewhat below acceptance threshold and therefore I request a major revision. I recommend:- revision of all figures to meet a publication quality- focus section 4 on fowarddiff.jl vs complex perturbation, in particular advantages of complex perturb in branching processes- remove any excess math that does not contribute to the main narrative, use clear and consistent definitions and notations for derivatives and definitions of different types of errors in the introduction sectionsI must re-iterate here that we may simply disagree, or I may have mis-interpreted the target audience for this paper. As a computational biologist I can merely give my own perspective on what aspects of this work I find valuable to the community I am acquainted with and do not wish to cause offence but am trying to help. There is genuinely interesting work here that I think can be presented more succinctly.More specific comments can be found in the attached PDF.Best WishesReviewer #3: The manuscript is very much improved and the authors have thoroughly addressed our primary concerns in the previous version. We have a small number of minor comments, but otherwise we are satisfied with the revised manuscript:sWhile we agree that it is very helpful to make the Julia code easily available to the reader, the Julia code as currently formatted is still somewhat distracting from the flow of the text. Perhaps this can be improved by placing it in a labeled box - or perhaps it could be presented in a different font — we recommend working with the journal editors to choose the most appropriate way to present this.Some of the figure fonts (especially axes labels and tick numbers) are still too small to read - we really had to zoom in to see these. Please increase to a minimum font size of 9 or 10pt.In Fig. 7 (left), it is not clear why the errors are so large (what do values of 10^10 mean in this context?). We wonder if the Euclidean distance definition for the error is not so informative and perhaps a Euclidean distance for the relative error may be better. I expect that the the figure would look the same (except with the x-axis numbers would be scaled), so this should not change any of the conclusions.**********Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.Reviewer #1: NoneReviewer #2: YesReviewer #3: Yes**********PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.Reviewer #1: NoReviewer #2: Yes: Grisha SzepReviewer #3: NoFigure Files:While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.Data Requirements:Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.Reproducibility:To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocolsReferences:Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.Submitted filename: PCOMPBIOL-D-21-01956-Revision.pdfClick here for additional data file.10 May 2022Submitted filename: Letter.pdfClick here for additional data file.12 May 2022Dear Ms. Mester,We are pleased to inform you that your manuscript 'Differential Methods for Assessing Sensitivity in Biological Models' has been provisionally accepted for publication in PLOS Computational Biology.Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.Best regards,Attila Csikász-NagyAssociate EditorPLOS Computational BiologyMark AlberDeputy EditorPLOS Computational Biology***********************************************************30 May 2022PCOMPBIOL-D-21-01956R2Differential Methods for Assessing Sensitivity in Biological ModelsDear Dr Mester,I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!With kind regards,Agnes PapPLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol
Authors: Juliane Liepe; Paul Kirk; Sarah Filippi; Tina Toni; Chris P Barnes; Michael P H Stumpf Journal: Nat Protoc Date: 2014-01-23 Impact factor: 13.491