Literature DB >> 29410846

Using sensitivity analysis to identify key factors for the propagation of a plant epidemic.

Loup Rimbaud¹, Claude Bruchou², Sylvie Dallot¹, David R J Pleydell¹, Emmanuel Jacquot¹, Samuel Soubeyrand², Gaël Thébaud¹.

Abstract

Identifying the key factors underlying the spread of a disease is an essential but challenging prerequisite to design management strategies. To tackle this issue, we propose an approach based on sensitivity analyses of a spatiotemporal stochastic model simulating the spread of a plant epidemic. This work is motivated by the spread of sharka, caused by plum pox virus, in a real landscape. We first carried out a broad-range sensitivity analysis, ignoring any prior information on six epidemiological parameters, to assess their intrinsic influence on model behaviour. A second analysis benefited from the available knowledge on sharka epidemiology and was thus restricted to more realistic values. The broad-range analysis revealed that the mean duration of the latent period is the most influential parameter of the model, whereas the sharka-specific analysis uncovered the strong impact of the connectivity of the first infected orchard. In addition to demonstrating the interest of sensitivity analyses for a stochastic model, this study highlights the impact of variation ranges of target parameters on the outcome of a sensitivity analysis. With regard to sharka management, our results suggest that sharka surveillance may benefit from paying closer attention to highly connected patches whose infection could trigger serious epidemics.

Entities: Chemical Disease Gene Species

Keywords: Sobol's method; heterogeneous landscape; polynomial regression; sensitivity index; simulation model; spatially explicit model

Year: 2018 PMID： 29410846 PMCID： PMC5792923 DOI： 10.1098/rsos.171435

Source DB: PubMed Journal: R Soc Open Sci ISSN： 2054-5703 Impact factor: 2.963

Introduction

Many factors impact the spread of a plant disease. Such factors include the biology of the different species involved in the pathosystem, the processes governing primary and secondary infections and, generally, landscape structure and composition [1]. Precisely identifying the key factors for disease spread is of prime interest for disease management, because they constitute a target for research efforts and control strategies [2,3]. However, the identification of the key factors of an epidemic through experiments rapidly becomes intractable, because there can be a huge number of candidate parameters, not counting their interactions. In addition, epidemics must be considered at a large spatiotemporal scale, which is rarely possible in experiments (but see e.g. [4]). Expert opinions are often simple, efficient and quick alternatives, but are by their nature prone to error. Finally, epidemiological models are an interesting approach because of their ability to test several scenarios, while circumventing the difficulties associated with experiments. Global sensitivity analysis of a simulation model aims to assess the fraction of the variability of the output that can be attributed to each input parameter, and thus to rank these parameters by influence on the model output [2,5,6]. This method is based on a numerical experiment designed to explore the parameter space using numerous parameter combinations. In ecological modelling, possible values of each input parameter traditionally come from the literature or via expert knowledge, and are often summarized by bounded intervals [6]. Global sensitivity analysis has been very useful in identifying key factors behind the spread of different animal pathogens [7,8] and their vectors [9], as well as of invasive plants [10]. In plant epidemiology, some studies dealing with plant viruses [11-13] and fungi [14,15] used formal sensitivity analyses; however, the developed approaches did not allow the inclusion of all candidate epidemiological processes and interactions. A panel of statistical methods has been proposed for global sensitivity analysis of deterministic models during the last 30 years. Available approaches mainly include graphical methods to explore marginal effects, the elementary effects method [16], variance-based methods or metamodelling, i.e. fitting of a mathematical function which approximates the model [5,17,18]. By contrast, few methods have been proposed for stochastic models [7,8,15], despite their wide use in ecology and epidemiology. Model stochasticity accounts for variability and uncertainty in the different biological processes [1]. Stochastic models are thus essential when performing epidemiological risk assessment. The first aim of this study was to present a rigorous framework for the sensitivity analysis of a stochastic spatially explicit model simulating the spread of a plant disease, and to better understand the influence of different epidemiological parameters on model outputs. The second objective was to uncover the key parameters of epidemic spread. As a case study, this work focused on sharka, caused by plum pox virus (PPV, genus Potyvirus, family Potyviridae), one of the most damaging diseases affecting trees of the genus Prunus (e.g. peach, apricot and plum) [19,20]. Depending on virus–host (or virus–cultivar) interactions, fruits of symptomatic hosts can be unmarketable due to considerable alterations or premature drop [21]. The disease has dispersed worldwide owing to commercial exchanges of contaminated plant material [20]. In addition, PPV is naturally transmitted from infectious sources to healthy trees by many aphid species in a non-persistent manner [20,22,23]. According to this transmission mechanism, an aphid vector can acquire viral particles from an infected host and may inoculate the virus to a susceptible host for a short period of time [24]. Once inoculated—and after a latent period—new infected hosts constitute in turn viral sources for aphid-mediated transmission. For PPV, it is commonly assumed, and was recently shown with young peach plants, that the beginning of the infectious period occurs at the same time as symptom expression (i.e. the end of the incubation period) on the infected host [25,26]. Prevalence at introduction in orchards can be extremely variable depending on the type of contamination in nurseries. If only some plants intended for planting are contaminated by infectious aphids, few infected trees will be planted in orchards. On the contrary, if a tree used as propagation stock is infected, the whole batch of progeny plants intended for planting will be infected, resulting in a massive introduction at orchard planting. An explicit spatiotemporal simulation model was previously developed and used to estimate biological parameters of sharka spread in a real landscape using data from PPV (M strain) epidemics in a French peach-growing area [27]. It is based on an SEIR architecture, i.e. each host can be classified in one of the following states: S (susceptible: healthy), E (exposed: infected but not yet infectious nor symptomatic), I (infectious and symptomatic), R (removed). Transitions between states are defined by stochastic equations. Through Markov chain Monte Carlo simulations within a Bayesian framework, this model enabled estimating the transmission coefficient (i.e. the transmission intensity per infectious tree), the duration of the latent period of PPV infection, and the dispersal kernel of infectious aphids [27]. In a landscape (i.e. in two dimensions), a dispersal kernel can be defined as the probability distribution of the position of a particle (e.g. an insect) after dispersal from the origin [28]. In the present study, this simulation-based estimation model was modified in order to perform two different sensitivity analyses, destined to highlight the impact of the variation ranges of the target parameters. Global sensitivity analyses generally only focus on a specific system and use narrow variation ranges for well-known parameters and wider variation ranges to encompass all possible values of less-known parameters (e.g. [29,30]). In contrast, here we intended to treat equally all target parameters by using wide and narrow variation ranges in two separate analyses. Thus, the first analysis (called ‘broad-range sensitivity analysis’ hereafter) was designed to assess the intrinsic influence of different epidemiological parameters on the overall behaviour of the model, ignoring any prior information on parameter values. The range of variation for each target parameter was very wide and covered, when possible, the whole definition domain. The second, sharka-specific, analysis benefited from available knowledge on the epidemiological parameters of PPV in southeastern France (published data and expert opinions were jointly used to obtain biologically reasonable bounds for the variation ranges of each parameter). On the one hand, the first analysis demonstrates the interest of sensitivity analyses based on Sobol's indices and metamodelling to explore the properties of a stochastic model. The second, sharka-specific, sensitivity analysis uncovers the most influential epidemiological parameters, which may require further knowledge from research work, or further action for sharka management in peach orchards.

Model description

The model is stochastic, orchard-based, with a discrete time step of 1 week denoted t (t = 1, … , T). Simulations run for a period of 35 years (from a0 = 1 to af = 35), which is a reasonable duration to assess the long-term impact of an epidemic in cultivated perennial plants.

Landscape and orchard dynamics

Orchard data were collected in a peach-growing area in southeastern France. The resulting database includes 34 years of information on 553 patches (i.e. pieces of land); the distance between the centroids of neighbouring patches ranges from 16 to 496 m (median: 55 m). On these patches, 597 peach orchards (i.e. homogeneous crop of a single cultivar with the same planting date) have been continuously cultivated (i.e. removed orchards are replanted during the next winter). In our model, we use the real landscape of 553 patches (electronic supplementary material, figure S1) which has been used previously to estimate most parameters of the model [27]. However, we simulate new orchard dynamics on the patches. To stabilize the mean age of the orchards at year a0 = 1, an initial orchard planting year is uniformly drawn between −39 and −20 for each patch. Then the following algorithm is iterated: (i) the orchard duration (after which the orchard is removed) is drawn from a Poisson distribution whose mean is ψ = 15 years; (ii) the year after orchard removal gives the planting year of the subsequent orchard on the same patch. For each orchard, a planting density is randomly drawn from the empirical distribution of the planting densities of the 597 orchards in the database.

SEI architecture

Each host is in one of the following three different health states (figure 1): S (susceptible), E (exposed) or I (infectious). In orchard i at time step t, let the variables S, E and I describe the respective number of hosts in each state (S + E + I is the total number of living hosts). Based on the characteristics of non-persistent transmission, a viruliferous vector is considered non-infectious once it has probed (and may have transmitted the pathogen) on a first healthy host. Thus, the number of hosts being infected (i.e. moving from S to E) between t and t + 1 in orchard i is defined as follows: with λ the infectious potential incurred by each tree of the orchard due to the presence of infectious hosts inside the orchard as well as in other source orchards. This infectious potential is calculated as: where β is the transmission coefficient (i.e. the number of vectors leaving a given infectious host over one year and able to initiate an infection on a healthy host); α is the normalized flight density (i.e. the proportion of annual flights that occur at time step t); and m is the connectivity between orchards i′ and i. This connectivity gives the probability that a vector disperses from orchard i′ to orchard i, and depends on the dispersal kernel of the vector. The division by S + E + I allows the expression of λ as a mean infectious potential per host in the recipient orchard.

Figure 1.

Representation of the spatiotemporal stochastic simulation model as a flow diagram. S, susceptible (i.e. healthy); E, exposed (i.e. infected but neither infectious nor diseased); I, infectious and symptomatic. Productive and non-productive (diseased) hosts are in green and red, respectively. Infection of healthy hosts depends on the infectious potential (λ) applied by infectious hosts and their spatial location. Infected hosts become infectious after a latent period (with a mean rate 1/θexp). Regardless of their sanitary status, hosts can be removed due to orchard turnover (with a mean rate 1/ψ). Each orchard planting has a mean risk of introduction (ϕ) of infectious trees (whose mean proportion in the orchard is E(τ)). Infected hosts become infectious (i.e. move from state E to state I) after a latent period whose duration is given by with and A2 the duration between the inoculation date and the end of the year. The minimal delay imposed by the term A2 was introduced because, in natural conditions, the latent period of sharka does not end within the growing season when the tree is inoculated [21,31,32]. It is assumed that trees in state I are not productive any more, due either to the absence of yield induced by the disease, or to the ban on fruit sales from symptomatic trees. However, the disease is not supposed to affect host lifespan (no available data reported any increase in peach mortality due to PPV), and this work is focused on disease spread in the absence of any control action (e.g. removal of diseased trees). Consequently, this model does not include an R (removed) state. Nevertheless, as defined by the algorithm used to simulate orchard turnover (see ‘Landscape and orchard dynamics’), orchards are regularly renewed (due to agricultural reasons, regardless of the disease) with trees in state S (or possibly I, see ‘Pathogen introduction’) (figure 1).

Pathogen dispersal

The calculation of the connectivity matrix, m, requires the integration of the dispersal kernel, f, over all the points p1 in orchard i (whose patch is represented by polygon A) and the points p2 in orchard i′ (whose patch is represented by polygon A): with the Euclidean distance between points p1 and p2. For a given dispersal kernel, this calculation is performed using the CaliFloPP algorithm [33]. However, its computational cost is huge, and consequently not affordable if the dispersal kernel varies among the different simulations of a sensitivity analysis. Therefore, the dispersal kernel, which is isotropic in this work (i.e. we assume that aphid dispersal is homogeneous in all directions), is defined as a weighted mixture of J exponential distributions using the same approach as in [27]: with the distance parameter and the weight of each exponential distribution. Using J = 20, the equation governing h enables the use of exponential distributions whose means increase exponentially from 3 m to 4.5 km, i.e. from approximately the mean distance between two trees in an orchard to the dimension of the simulated landscape. To manipulate the weights of these J exponential distributions via just two parameters (s1 and s2), and to guarantee that they sum to 1, they are calculated by integrating the probability density g(x; s1, s2) of the variable , hereafter called ‘dispersal weighting variable’, on J sub-intervals consisting in a partition of the interval [0; 1] (noting that the probability density of a beta distribution integrates to 1 on the interval [0; 1], see also electronic supplementary material, figure S2). The diversity of possible shapes of the beta distribution allows a lot of flexibility to define the different weights (see electronic supplementary material, figure S3C). Thus, the connectivity matrix used in the simulations is a weighted mixture of J = 20 matrices calculated beforehand with their respective exponential functions. The mean dispersal distance is: .

Pathogen introduction

At t = 1, the pathogen is introduced for the first time at orchard planting. In addition, starting from t = 1, every orchard planting (whatever the patch) is subjected to a risk Φ of new introduction. For each introduction, the probability that a given host is infected at orchard planting, τ, is described by a weighted mixture of two beta distributions: with and . These two distributions represent two different patterns of pathogen introduction from contaminated nurseries, depending on whether the contamination occurred on plants for planting (leading to a weak prevalence at introduction) or on propagation stock (leading to a high prevalence). Thus pMI indicates the relative probability of massive introduction. Infected hosts at planting are placed directly in state I. This assumption is more parsimonious than those needed to simulate the latent period duration of these trees, whose inoculation date (in nurseries) and process (from viruliferous aphids or from infected propagation stock) are unknown. The mean number of infectious trees at planting (in case of introduction) is: . Parametrization of τ1 and τ2 is such that . To control the connectivity of the patch where the first introduction occurs, this patch is selected based on its theoretical ‘outgoing connectivity’. For each patch i′, this parameter, denoted κ, describes the mean number of infectious aphids that would leave patch i′ if all trees in this patch were infectious, and land in any other patch of the landscape: with the number of trees in patch i′ of area A, calculated using the reference planting density estimated from expert opinion: μ = 666 trees.ha−1. Thus, the patch of first introduction is selected by choosing a quantile q of the outgoing connectivity calculated beforehand for all patches of the landscape. One can note that the ranking of the patches according to κ depends on their respective area and connectivity with neighbouring patches, and that κ reflects the potential of patch i′ to initiate an epidemic.

Output variable

The age-dependent fruit yield of the hosts in S and E states in orchard i at year a, denoted y, is relative to the maximum yield (i.e. y is a proportion). The impact of disease spread in the landscape between the beginning of the epidemic (at year a0) and the end of the simulation (at year af) is summarized by the mean equivalent number of fully productive trees per hectare and per year (figure 1): with Atot = 523 ha, the total area covered by the patches.

Sensitivity analyses

A sensitivity analysis generally consists of four steps: (i) definition of the target parameters and their respective variation ranges and probability distributions; (ii) generation of a numerical experimental design to explore the parameter space; (iii) simulation and (iv) computation of sensitivity indices [5]. These four steps can be complemented with metamodelling, which provides further insights into the relation between the model input and output.

Target parameters and variation ranges

The sensitivity analyses targeted six parameters: the quantile of the outgoing connectivity of the patch of first introduction (q), the probability of introduction (Φ), the relative probability of massive introduction (pMI), the transmission coefficient (β), the expected value of the dispersal weighting variable (Wexp) and the expected duration of the latent period (θexp). Only one parameter per epidemiological process was targeted in the sensitivity analyses, in order to easily rank these processes by influence. Moreover, this approach helps target parameters which vary independently from each other. Thus, the beta distribution of the dispersal weighting variable (W) was re-parametrized with its expected value, Wexp, and its variance, Wvar, in order to vary only Wexp in the sensitivity analyses (Wvar was kept fixed at its reference value; table 1). The shape parameters of W were then calculated by: and . Similarly, the gamma distribution of the latent period duration was re-parametrized by its expected value, θexp, and its variance, θvar, and only θexp was targeted in the sensitivity analyses. The shape and scale parameters were, respectively, calculated by: ; . Nevertheless, θvar was not held constant, because data on incubation periods of many infections show a strong correlation between the median or mean and the standard deviation (electronic supplementary material, Methods S1, table S1 and figure S10). With regard to PPV infection, we obtained a coefficient of 0.35 from the mean and standard deviation estimates of the latent period duration [27]. This is within the range of values for various diseases (electronic supplementary material, table S1) so the variance of the latent period duration was calculated by θvar = (0.35 × θexp)2.

Table 1.

Parameters of the epidemiological model: description, reference value and variation ranges of the parameters targeted in each sensitivity analysis.

			broad-range sensitivity analysis		sharka-specific sensitivity analysis
parameter	description	reference value for sharka	min	max	min	max
a₀	first year of epidemic	1	—	—	—	—
a_f	last year of simulation	35	—	—	—	—
ψ	expected orchard duration (years)	15^a	—	—	—	—
y	relative age-dependent yield of hosts in S or E states	0.00 until 2 years^a0.50 at 3 years^a0.65 at 4 years^a0.85 at 5 years^a1.00 from 6 to 15 years^a0.80 from 16 years^a	—	—	—	—
q_κ	quantile of the outgoing connectivity of the patch of first introduction	0.50	0.00^b	1.00^b	0.00^b	1.00^b
ϕ	probability of introduction at planting	0.007^a	0.00^b	1.00^b	0.0046^a	0.0107^a
ξ₁₁	first shape parameter of the prevalence for small introductions	0.62^a	—	—	—	—
ξ₁₂	second shape parameter of the prevalence for small introductions	23.73^a	—	—	—	—
ξ₂₁	first shape parameter of the prevalence for massive introductions	1^a	—	—	—	—
ξ₂₂	second shape parameter of the prevalence for massive introductions	1^a	—	—	—	—
p_MI	relative probability of massive introduction	0.00^a	0.00^b	1.00^b	0.00^a	0.10^a
W_exp	expected value of the dispersal weighting variable	0.486^c	0.14^e	0.86^e	0.469^c	0.504^c
W_var	variance of the dispersal weighting variable	0.0434^c	—	—	—	—
β	transmission coefficient	1.32^d	0.08^f	12.8^f	1.25^d	1.39^d
θ_exp	expected duration of the latent period (years)	1.92^d	0.0027^f,g	33.00^f	1.71^d	2.14^d
θ_var	variance of the latent period duration (years)	0.44^d	—	—	—	—

aEstimated through expert opinion.

bDefinition domain.

cEstimated in [27] using with J = 100.

dEstimated in [27].

eDefinition domain for a unimodal distribution of W with Wvar = 0.0434.

fSee details in electronic supplementary material, Methods S2.

gThat is, 1 day.

Parameters of the epidemiological model: description, reference value and variation ranges of the parameters targeted in each sensitivity analysis. aEstimated through expert opinion. bDefinition domain. cEstimated in [27] using with J = 100. dEstimated in [27]. eDefinition domain for a unimodal distribution of W with Wvar = 0.0434. fSee details in electronic supplementary material, Methods S2. gThat is, 1 day. In the broad-range sensitivity analysis, target parameters were varied within their whole definition domain, or, if impossible, were bounded by values below and above which the epidemic process does not fundamentally change any more. Therefore, the variation ranges of q, Φ and pMI were easily defined by the bounds of their definition domain (i.e. [0; 1]) (table 1). For Wexp this interval had to be restricted to generate only unimodal beta distributions for W (see electronic supplementary material, figure S3C), and thus smooth dispersal kernels (see electronic supplementary material, figure S4). Parameter θexp was bounded by 1 day (i.e. a value close to 0 time step in our model) and 33 years (see electronic supplementary material, Methods S2). At this upper bound, less than 5% (on average) of the hosts infected at planting would become infectious before orchard removal. Finally, bounds for β were calculated using a deterministic susceptible-infectious model in a single orchard where only one host was infected at planting (see electronic supplementary material, Methods S2). When β was set at its lower bound, only one host was newly infected after 30 years; at its upper bound, 95% of the orchard was infected after 3 years (i.e. when the orchard starts to be productive). In the sharka-specific sensitivity analysis, the variation ranges of the target parameters were informed by available knowledge on sharka epidemics in French peach orchards (table 1). Previous work enabled the estimation of β, θexp and Wexp [27]; thus, the bounds of these parameters were defined by their 99% credibility intervals. Since the first introduction could occur in any patch of the landscape, q was varied inside its whole definition domain, as in the previous analysis. Bounds of Φ and pMI were assessed by expert opinion based on the available field data on introductions. Because κ is calculated using m, q also depends on Wexp. In order to independently vary q and Wexp, κ was calculated using a different connectivity matrix, denoted m′, as follows. In the broad-range sensitivity analysis, m′ was generated using a simple uniform function: if m; 0 otherwise. Using this function, describes the proportion of a circular 1 km2 zone centred on patch i′ and occupied by other orchards. In the sharka-specific sensitivity analysis, because Wexp varied slightly around its reference value, m′ was generated using the dispersal kernel f (see ‘Pathogen dispersal’) with Wexp = 0.486 (i.e. the reference value). Thanks to these procedures, the six target parameters were sampled independently. One can note that the variation of some of the target parameters (pMI, Wexp, θexp) led to variation of whole probability distributions (electronic supplementary material, figures S3 and S4) used, in turn, as input in the model.

Calculation of sensitivity indices

Simulations were performed with 15 000 different parameter combinations generated with Sobol's sequences [34,35]. To take stochasticity into account, each combination was replicated 50 times. Let Y denote the output variable (i.e. the mean or standard deviation of these replicates), and X (i = 1, … ,p) the input parameters of the model. Sobol's indices are calculated for each X as follows: and where X denotes the whole set of parameters except X. is the first-order index, which measures the main effect of X alone; is the total index, which measures the influence of X including all its interactions with other parameters (for details on the computation of Sobol's sensitivity indices, see electronic supplementary material, Methods S3 and [17,18,36]). First-order indices were estimated with Sobol–Saltelli's method [37,38] whereas total indices were estimated with Sobol–Jansen's method [37,39], because these two methods were shown to be good estimators of small first-order indices and (large and small) total indices, respectively. The 95% confidence intervals (CI95) of the sensitivity indices were estimated using 10 000 bootstrap replicates [40]. In each sensitivity analysis, key interactions were identified using polynomial regression. A third-degree polynomial including interactions restricted to polynomial terms of degree two was fitted to the means and standard deviations of the model output. Then, the most parsimonious model was obtained using a stepwise selection algorithm based on the Bayesian information criterion (BIC) [41]. Before model fitting, the target parameters were standardized to the mean and standard deviation of their sampling distribution, in order to easily interpret the estimated coefficients as the effect of one standard deviation change in each parameter on the output variable [2]. The first-order sensitivity indices of the metamodel are deduced from Sobol's decomposition (see details in electronic supplementary material, Methods S3): where Z is the standardized version of X; b0 denotes the intercept; denote the estimated coefficients of Z, , and , respectively; and b and b denote the estimated coefficients of the products Z and Z, respectively.

Simulations with fixed parameters

In order to better visualize the impact of the introduction parameters on epidemic spread, 100 simulations were performed for each of the following scenarios: A. ‘Disease-free’: the pathogen is not introduced in the landscape; B. ‘Single introduction of one tree’: the pathogen is introduced once at t = 1 with only one infected host; C. ‘Single introduction of several trees’: the pathogen is introduced once at t = 1 with possibly several infected hosts; D. ‘Weakly connected first introduction’: the pathogen is first introduced at t = 1 in a weakly connected patch (q = 0.01), and with a risk Φ at every subsequent orchard planting; E. ‘Moderately connected first introduction (reference)’: same as scenario D, but the patch of first introduction is moderately connected (q = 0.50); F. ‘Highly connected first introduction’: same as scenario D, but the patch of first introduction is highly connected (q = 0.99). Except the introduction parameters, all the model parameters were fixed at reference values associated with sharka epidemics in French peach orchards, estimated from epidemiological data or expert opinion (table 1). For scenarios C–F, the probability that a tree is infected at introduction is given by τ, whose mean is 2.6% (using reference values for ξ11, ξ12, ξ21, ξ22 and pMI).

Software

The model was written in R and C languages. Within the R software v. 3.0.3 [42], Sobol's sequences and indices were obtained using the packages fOptions (v. 3010.83) and sensitivity (v. 1.11), respectively. The stepwise model-selection algorithm used the function regsubsets of the package leaps (v. 2.9). A simulation takes approximately 4 s with a regular computer (Intel® Core™ i7-4600M). The overall exploration of the model was carried out on a computer cluster, which enabled the parallelization of the 50 stochastic replicates for each sensitivity analysis, each replicate requiring roughly one day to explore the 15 000 parameter combinations.

Results

Strong influence of latent period duration on the simulation model

In the broad-range analysis, the mean output over the stochastic replicates varied from 47 to 567 equivalent fully productive plants per hectare and per year. The sensitivity analysis showed that the expected duration of the latent period (θexp) had the strongest impact (figure 2a), with a total Sobol's index (SItot) of 0.81 (CI95: 0.75–0.87). Next came the probability of introduction (Φ), the transmission coefficient (β) and the relative probability of massive introduction (pMI), with SItot of 0.12 (0.11–0.14), 0.09 (0.07–0.10) and 0.08 (0.07–0.08), respectively. The last two parameters, i.e. the expected value of the dispersal weighting variable (Wexp), and the quantile of the outgoing connectivity of the patch of first introduction (q), had a negligible influence on the output as shown by their very small SItot. The first-order indices (SI1) were close to (and mostly not significantly different from) SItot. In addition, the total variability associated with the main effects (assessed by the sum of the SI1) explained 94% of the variability in the output. These elements indicate that the simulated process involves few interactions. In order to characterize the relationship between parameters and the mean response, a polynomial metamodel was adjusted to the model output. According to its coefficient estimates, the most important interactions (figure 3a) involved firstly β and θexp (SI2 = 0.03) and, secondly, Φ and pMI (SI2 = 0.01). In addition to the good coefficient of determination (R2 = 0.96), the goodness-of-fit of this metamodel was confirmed by the estimates of the SI1, which did not differ from the indices estimated with Sobol's method by more than 0.01 (figures 2a and 3a).

Figure 2.

Figure 3.

Sensitivity indices of the target parameters and their interactions up to the order 2 on the mean output (μ, mean number of fully productive trees per hectare and per year) of the stochastic replicates, assessed through a degree-3 polynomial optimized using BIC in the broad-range analysis (a, R2 = 0.96), or informed by knowledge acquired on sharka in peach orchards (b, R2 = 0.83). θexp, expected duration of the latent period; Φ, probability of introduction at orchard planting; β, transmission coefficient; pMI, relative probability of massive introduction; Wexp, expected value of the dispersal weighting variable; q, quantile of the outgoing connectivity of the patch of first introduction.

First-order and total Sobol's sensitivity indices of the six target parameters on the mean output (μ, mean number of fully productive trees per hectare and per year) of the stochastic replicates, assessed in the broad-range analysis (a, ), or informed by knowledge acquired on sharka in peach orchards (b, ). θexp, expected duration of the latent period; Φ, probability of introduction at orchard planting; β, transmission coefficient; pMI, relative probability of massive introduction; Wexp, expected value of the dispersal weighting variable; q, quantile of the outgoing connectivity of the patch of first introduction. Horizontal bars: CI95 (assessed using 10 000 bootstrap replicates). * This estimate of the first-order Sobol's index was slightly negative, which may occur when the indices do not significantly differ from 0 [43]. Sensitivity indices of the target parameters and their interactions up to the order 2 on the mean output (μ, mean number of fully productive trees per hectare and per year) of the stochastic replicates, assessed through a degree-3 polynomial optimized using BIC in the broad-range analysis (a, R2 = 0.96), or informed by knowledge acquired on sharka in peach orchards (b, R2 = 0.83). θexp, expected duration of the latent period; Φ, probability of introduction at orchard planting; β, transmission coefficient; pMI, relative probability of massive introduction; Wexp, expected value of the dispersal weighting variable; q, quantile of the outgoing connectivity of the patch of first introduction. The estimated metamodel was useful to predict the mean number of productive plants for each value of the target parameters (all other parameters being fixed at their mean value, figure 4a). The number of productive plants greatly increased with the duration of the latent period, until approaching the maximum (i.e. 565 fully productive plants per hectare and per year, in the absence of disease) for expected durations of the latent period above 15 years. Because this saturation phenomenon occurred for half of the simulations, graphics associated with the other parameters showed a lot of noise. Despite this noise, it is possible to see that increasing values of β, Φ and pMI had a negative impact on the output, whereas Wexp and q had a negligible effect, as already indicated by the sensitivity indices.

Figure 4.

Effect of each target parameter on the mean output (μ, mean number of fully productive trees per hectare and per year) in the broad-range analysis (a), or informed by knowledge acquired on sharka in peach orchards (b). Each point represents the mean of 50 stochastic replicates of model simulations performed with the same combination of parameters. Blue line, prediction using the degree-3 polynomial optimized using BIC (all other parameters fixed at their mean value); this line is extremely close to the moving average of the number of productive trees, which has not been represented for legibility. Green dashed line: number of productive trees in the disease-free scenario (i.e. 565). In (a) the thick line on the x-axis corresponds to the variation range used in the sharka-specific analysis; in (b) the full circle on the x-axis indicates the reference value of each parameter. Wexp, expected value of the dispersal weighting variable (higher values correspond to longer dispersal distances); β, transmission coefficient; θexp, expected duration of the latent period; q, quantile of the outgoing connectivity of the patch of first introduction; Φ, probability of introduction at orchard planting; pMI, relative probability of massive introduction.

Importance of the patch of first introduction for sharka epidemics

In the analysis specific to sharka epidemics in peach orchards, the number of productive trees varied from 469 to 559 per hectare and per year. Parameter q was the most influential, with an SItot of 0.56 (0.52–0.60). Parameters θexp, Φ, β, pMI and Wexp followed with SItot of 0.30 (0.28–0.33), 0.17 (0.16–0.19), 0.13 (0.12–0.14), 0.10 (0.09–0.11) and 0.06 (0.06–0.07), respectively. Interactions had a small effect, the main effects of the target parameters explaining 87% of the variability in the mean number of productive trees (figure 2b). These results were similar to the indices calculated through the metamodel (SI1 did not differ from Sobol's indices by more than 0.09). The main (although weak) interaction was between θexp and q, with an SI2 of 0.005 (figure 3b, R2 = 0.83). Similarly to the broad-range analysis, the graphical representation of the influence of each target parameter on the output showed that the epidemic impact increased when q, Φ, pMI and β increased, and when θexp decreased (figure 4b). Stochasticity in the sharka-specific analysis was due to the main effects of θexp, q, pMI and β, and to many interactions, as revealed by the large differences between SItot and the SI1 for the six target parameters. Although these interactions were numerous, the metamodel showed that each one had a small sensitivity index (see electronic supplementary material, figure S5b and S6b). The number of productive trees was more variable when q, pMI and β increased, and when θexp decreased, i.e. when the epidemic was stronger (see electronic supplementary material, figure S7b).

Simulation of different epidemic scenarios

Because the sharka-specific sensitivity analysis highlighted the role of introductions in epidemic spread, several epidemic scenarios were simulated to illustrate the impact of introduction parameters. In the absence of disease, a median number of 565 (558–572) fully productive trees were grown per hectare and per year (figure 5, scenario A). The number of productive trees was almost the same, 564 (554–572), when PPV was introduced only once through the planting of one infectious tree (scenario B). Epidemics slightly impacted the number of productive trees (556; CI95 = 538–568) when several infected trees were allowed to be planted (scenario C), showing the importance of prevalence at introduction. In more realistic scenarios, corresponding to repeated introductions of PPV [27], the mean number of fully productive trees dropped to 547 (499–568), 541 (498–564) and 509 (462–551) when the first introduction occurred in a weakly (scenario D), moderately (scenario E) or highly (scenario F) connected patch, respectively. These results illustrate not only the impact of multiple introductions on sharka epidemic spread, but also and most importantly of the location of the first introduction. Videos of epidemic spread in the landscape in scenarios B to F are available as electronic supplementary material, Videos S1–S5.

Figure 5.

Distribution of the number of fully productive trees per hectare and per year after 35 years of simulation (Y) under six scenarios. (A) The pathogen is not introduced in the landscape. (B) The pathogen is introduced once at t = 1 in a moderately connected patch, with only one infected host. (C) The pathogen is introduced once at t = 1 in a moderately connected patch, with possibly several infectious trees (mean prevalence of 2.55%). (D) The pathogen is first introduced at t = 1 in a weakly connected patch, and at every subsequent orchard planting in the landscape with a risk of 0.7%, and with possibly several infectious trees (mean prevalence of 2.55%) for each introduction. (E) Same as (D) but the patch of first introduction is moderately connected. (F) Same as (D) but the patch of first introduction is highly connected. Except the introduction parameters, all the model parameters were fixed at their reference values (table 1).

Discussion

Sobol's method and metamodelling complement each other

The first aim of this study was to investigate the combined use of Sobol's method [44,45] and polynomial metamodelling to identify key factors and interactions in a stochastic spatiotemporal model simulating epidemics among host plants. The parameter space was explored through a quasi Monte Carlo sampling, generated using Sobol's sequences because of their good space-filling and fast convergence properties. Sobol's method to calculate sensitivity indices benefited from numerous improvements of its precision and computational cost [37-39,44,46,47]. In our work, Sobol's method proved to be an efficient tool to rank parameters by influence on the output variable, and thus to identify potential targets for further research or disease management. Additionally, in the case of large differences between first-order and total sensitivity indices, this method reveals the importance of interactions, which must be further investigated for a careful management. In our approach, Sobol's method and polynomial metamodelling complemented each other. Both methods gave extremely close estimates of the first-order sensitivity indices. Sobol's method is more straightforward in the estimation of the total indices, as well as the CIs (figure 2). Polynomial metamodelling helped identify specifically which interactions are involved in the epidemic process, and visualize model behaviour in response to the variation of input parameters (figure 4). The approach can be applied to many simulation models, and is especially useful for stochastic models. To account for stochasticity, some authors replicated simulations for a few selected parameter combinations to characterize the variance or the distribution of the output variable [48,49]. In another approach, every parameter combination of the simulation design was replicated in such way that an extra sensitivity index could be calculated by ANOVA to describe the proportion of the total variability attributable to the variability among parameter combinations [8,15]. Here, we adopted an approach developed originally with non-spatial epidemic models [7,29]. Stochasticity was accounted for by replicating simulations for each parameter combination and by performing sensitivity analyses on the mean and standard deviation of the replicates. However, the total number of simulations (including the different parameter combinations and the stochastic replicates per combination) grows as a power of the number of parameters targeted in the sensitivity analysis. This curse of dimensionality necessitates a compromise between the resolution of parameter space exploration, the precision of the model output and the total computational cost [37,50]. In this study, the parameter space was thoroughly explored using 15 000 different parameter combinations, and each combination was replicated 50 times because this number was largely sufficient to get robust estimates of the mean and standard deviation associated with each combination (not shown).

Impact of the variation range of the target parameters

Our second aim was to uncover the key drivers of plant epidemics, using sensitivity analysis. The choice of variation ranges for the target parameters has a great impact on the results of such analysis. In the first sensitivity analysis, we used the broadest possible ranges of values for the target parameters, ignoring any prior information. This analysis is thus indicative of the relative intrinsic influence of the epidemiological processes on model behaviour in the absence of knowledge on parameter values. Our results showed that the latent period duration is the most influential parameter of the model. Depending on the latent period duration, the impact of the simulated epidemic may be completely different, from insignificant and stable (when the latent period lasts more than 15 years) to important and highly variable (if it lasts less than 5 years). It is very likely that the threshold distinguishing insignificant epidemics from others (i.e. approximately 15 years) is linked to the mean orchard duration. In order to test whether the high influence of the latent period in the broad-range sensitivity analysis was just due to its very wide variation range, we replicated this analysis with smaller upper bounds for the mean duration of the latent period (from 5 to 33 years, see electronic supplementary material, figure S8). This additional information demonstrates that the results are qualitatively robust to changes in this upper bound. One may also note that the most significant interaction involves the mean latent period and the transmission coefficient (figure 3a), which may explain why the latter becomes the most influential parameter for epidemics whose latent periods are below 7 years (see electronic supplementary material, figure S8). In the second sensitivity analysis, the variation range of all target parameters focused on the best available values regarding sharka epidemiology in peach orchards. These variation ranges defined a parameter space which had been poorly explored in the previous sensitivity analysis (figure 4a, thick line on the x-axis), in spite of the large number of tested parameter combinations. Within this sharka-specific analysis, whatever the parameter combination the mean equivalent number of productive trees was significantly lower than what could be obtained in the absence of disease. This encourages the application of control measures. Here, in contrast with the broad-range analysis, we show that the connectivity of the patch of introduction is the most influential parameter on sharka epidemics. Consequently, the outgoing connectivity of a given patch of the landscape can be considered as an index of the risk to initiate an epidemic, and sharka control may include for example preventive and regular surveys of these patches. Because this index of epidemic risk depends on patch connectivity, it was surprising to observe a low influence of the dispersal kernel (represented by the parameter Wexp) in both the broad-range and sharka-specific analyses. On the contrary, mean dispersal distance was shown to be a key driver of spread of fungal pathogens [14] and invasive plants [10]. In our broad-range analysis, one may think that the low impact of the dispersal kernel might be due to the fact that the variance of the weighting variable (Wvar) was held constant, which may result in dispersal kernels with similar shapes. Nevertheless, variation of the dispersal kernel covered a wide range of distances (see electronic supplementary material, figure S4). Thus, another explanation may be the patchiness of the real agricultural landscape that we used for the simulations (see electronic supplementary material, figure S1). Indeed, short-distance dispersal resulted in very slow disease spread, and long-distance dispersal resulted in a higher proportion of dispersal events ending outside the orchards (where hosts are absent). This is an interesting property of fragmented agricultural landscapes. In the sharka-specific analysis, the low influence of the dispersal kernel can be explained in part by the narrow range of variation in Wexp. To investigate the impact of the parameter variation ranges, we performed a sensitivity analysis similar to the sharka-specific analysis except that Wexp varied as in the broad-range analysis. This resulted in a very high sensitivity index for Wexp (see electronic supplementary material, figure S9). This exemplifies how the sensitivity indices depend on the extent of the variation ranges of the target parameters. As another illustration of the influence that the variation range of a parameter can have, a model simulating tomato yellow leaf curl virus (TYLCV) epidemics was almost insensitive to the latent period duration [11], whereas it was the first and the second most influential parameter in our broad-range and sharka-specific analyses, respectively. This difference can be explained by the fact that our output variable (the number of productive hosts) included latently infected hosts whereas the output in [11] did not, but also by the fact that the variation range of the latent period duration was narrower in the TYLCV study. These elements, and the strong differences between the two analyses performed in the present study, highlight the importance of identifying appropriate ranges of variation for the target parameters in a sensitivity analysis. In particular, our results show that the parameters with the greatest intrinsic influence on overall behaviour of the model are not necessarily the key drivers of epidemic spread in a specific situation. Therefore, in different pathosystems and epidemiological contexts, the variation ranges, and especially their width, must be re-examined in order to identify relevant targets for research efforts and management strategies.

Genericity of the model

In this study, an existing spatiotemporal stochastic model of sharka spread, based on an SEIR architecture, was modified in order to perform sensitivity analyses. This model corresponds to vector-borne diseases of perennial plants (small turnover of crops, long latent period, pathogen introductions at crop planting), but its parametrization and flexibility enable the study of various horizontally transmitted plant infections, including wind-dispersed epidemics of annual crops. Nevertheless, some assumptions of the model are specific to sharka and should be re-examined for other diseases. In particular, for vector-borne diseases transmitted in a persistent manner, infectious vectors are able to transmit the pathogen to several healthy hosts. Thus, modelling such pathosystem needs to account for vector movement throughout their infectious period, which generally does not start immediately after acquisition. The resulting dispersal function may thus have a completely different shape. We also considered that latently infected hosts were still fully productive, whereas infectious individuals had a null productivity (no yield, or sales ban), and that the disease did not trigger host death (which influences the duration of the infectious period); such assumptions do not apply to all diseases. Our study uncovered the key role played by highly connected introduction patches on sharka spread. This result highlights that paying close attention to these patches might prevent serious epidemics. Our next aims are to enrich this model with various control strategies and simulate various realistic landscapes in order to more directly help improve disease management.

12 in total

1. Factors that make an infectious disease outbreak controllable.

Authors: Christophe Fraser; Steven Riley; Roy M Anderson; Neil M Ferguson
Journal: Proc Natl Acad Sci U S A Date: 2004-04-07 Impact factor: 11.205

2. Durable strategies to deploy plant resistance in agricultural landscapes.

Authors: Frédéric Fabre; Elsa Rousseau; Ludovic Mailleret; Benoit Moury
Journal: New Phytol Date: 2012-01-19 Impact factor: 10.151

3. Sensitivity analysis to identify key parameters influencing Salmonella infection dynamics in a pig batch.

Authors: Amandine Lurette; Suzanne Touzeau; Matieyendou Lamboni; Hervé Monod
Journal: J Theor Biol Date: 2009-02-06 Impact factor: 2.691

4. Effects of initial epidemic conditions, sporulation rate, and spore dispersal gradient on the spatio-temporal dynamics of plant disease epidemics.

Authors: X M Xu; M S Ridout
Journal: Phytopathology Date: 1998-10 Impact factor: 4.025

Review 5. Sharka epidemiology and worldwide management strategies: learning lessons to optimize disease control in perennial plants.

Authors: Loup Rimbaud; Sylvie Dallot; Tim Gottwald; Véronique Decroocq; Emmanuel Jacquot; Samuel Soubeyrand; Gaël Thébaud
Journal: Annu Rev Phytopathol Date: 2015-05-29 Impact factor: 13.078

6. Modelling the effect of heterogeneity of shedding on the within herd Coxiella burnetii spread and identification of key parameters by sensitivity analysis.

Authors: Aurélie Courcoul; Hervé Monod; Mirjam Nielen; Don Klinkenberg; Lenny Hogerwerf; François Beaudeau; Elisabeta Vergu
Journal: J Theor Biol Date: 2011-06-24 Impact factor: 2.691

7. Parameter sensitivity analysis of stochastic models: application to catalytic reaction networks.

Authors: Chiara Damiani; Alessandro Filisetti; Alex Graudenzi; Paola Lecca
Journal: Comput Biol Chem Date: 2012-11-07 Impact factor: 2.877

8. Pathogen population dynamics in agricultural landscapes: the Ddal modelling framework.

Authors: Julien Papaïx; Katarzyna Adamczyk-Chauvat; Annie Bouvier; Kiên Kiêu; Suzanne Touzeau; Christian Lannou; Hervé Monod
Journal: Infect Genet Evol Date: 2014-01-27 Impact factor: 3.342

9. Plum pox in north america: identification of aphid vectors and a potential role for fruit in virus spread.

Authors: Frederick Gildow; Vern Damsteegt; Andrew Stone; William Schneider; Douglas Luster; Laurene Levy
Journal: Phytopathology Date: 2004-08 Impact factor: 4.025

Review 10. Sustainable agriculture and plant diseases: an epidemiological perspective.

Authors: Christopher A Gilligan
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2008-02-27 Impact factor: 6.237

3 in total

1. Composite Monte Carlo decision making under high uncertainty of novel coronavirus epidemic using hybridized deep learning and fuzzy rule induction.

Authors: Simon James Fong; Gloria Li; Nilanjan Dey; Rubén González Crespo; Enrique Herrera-Viedma
Journal: Appl Soft Comput Date: 2020-04-09 Impact factor: 6.725

2. Emerging strains of watermelon mosaic virus in Southeastern France: model-based estimation of the dates and places of introduction.

Authors: L Roques; C Desbiez; K Berthier; S Soubeyrand; E Walker; E K Klein; J Garnier; B Moury; J Papaïx
Journal: Sci Rep Date: 2021-03-29 Impact factor: 4.379

3. Global sensitivity analysis in epidemiological modeling.

Authors: Xuefei Lu; Emanuele Borgonovo
Journal: Eur J Oper Res Date: 2021-11-16 Impact factor: 6.363

3 in total