Literature DB >> 29689134

Inferring pathogen dynamics from temporal count data: the emergence of Xylella fastidiosa in France is probably not recent.

Samuel Soubeyrand¹, Pauline de Jerphanion², Olivier Martin¹, Mathilde Saussac², Charles Manceau³, Pascal Hendrikx², Christian Lannou⁴.

Abstract

Unravelling the ecological structure of emerging plant pathogens persisting in multi-host systems is challenging. In such systems, observations are often heterogeneous with respect to time, space and host species, and may lead to biases of perception. The biased perception of pathogen ecology may be exacerbated by hidden fractions of the whole host population, which may act as infection reservoirs. We designed a mechanistic-statistical approach to help understand the ecology of emerging pathogens by filtering out some biases of perception. This approach, based on SIR (Susceptible-Infected-Removed) models and a Bayesian framework, disentangles epidemiological and observational processes underlying temporal counting data. We applied our approach to French surveillance data on Xylella fastidiosa, a multi-host pathogenic bacterium recently discovered in Corsica, France. A model selection led to two diverging scenarios: one scenario without a hidden compartment and an introduction around 2001, and the other with a hidden compartment and an introduction around 1985. Thus, Xylella fastidiosa was probably introduced into Corsica much earlier than its discovery, and its control could be arduous under the hidden compartment scenario. From a methodological perspective, our approach provides insights into the dynamics of emerging plant pathogens and, in particular, the potential existence of infection reservoirs.

Entities: Disease Species

Keywords: Bayesian inference; emerging plant pathogen; infection reservoir; introduction date; mechanistic-statistical model; multi-host pathogen; plant-pathogen interaction; surveillance data

Mesh：

Year: 2018 PMID： 29689134 PMCID： PMC6032966 DOI： 10.1111/nph.15177

Source DB: PubMed Journal: New Phytol ISSN： 0028-646X Impact factor: 10.151

Introduction

Invasions of new territories by pathogens are facilitated by the high level of connectivity of most of the world areas (Tatem et al., 2006; Hulme, 2009; Olsen et al., 2011; Fisher et al., 2012), despite containment and regulation strategies at the level of countries and unions of countries. In addition, global climate change allows pathogens to settle in new environments (Anderson et al., 2004; Jeger et al., 2011), which were accessible in the past only with the combined levers of migration and adaptation. For some specific threats, that is, when the pathogen effects are clearly visible or the awareness of the society is high at all levels (governmental agencies, health systems, stakeholders in forestry and agriculture, scientific communities, citizens), invasions may be detected rapidly. However, it is also common that an emerging pathogen is detected with a potentially long delay after its settlement in a new area (Jones & Baker, 2007; Waage et al., 2008; Faria et al., 2014) and the first detection may occur too late to be able to rapidly eradicate the pathogen at a reasonable socioeconomic cost. Let us consider the case in which an invading pathogen, which presents a significant threat to protected, patrimonial or cultivated plants, has been detected. Then, more or less consistent surveillance strategies can be followed to assess the sanitary situation in space and its temporal evolution, to inform decision makers, to evaluate the efficiency of eventual control measures and, more marginally but importantly, to acquire scientific knowledge. The diversity of the objectives and their time‐varying relative levels of priority lead to surveillance data that can generate biases of perception. Indeed, disease prevalence might be over‐estimated by focusing on the surveillance of areas with previously detected infected hosts. Disease incidence might be under‐estimated if a host species or a geographic region is not sampled. Disease prevalence and incidence may be under‐estimated because of the lack of power of diagnostic tests. Such biases of perception are quite common in invasion studies. For example, in a related context, the discovery rate of introduced species does not systematically reflect the actual introduction rate (Costello & Solow, 2003). In addition, for multi‐host pathogens settled in complex environments mixing cultivated, urban and wild areas, unravelling the pathogen dynamics that underlies observations may be complicated by the existence of a hidden compartment in the host population (i.e. hosts that are not observable; see Fig. 1), which may play the role of an infection reservoir (Haydon et al., 2002; Viana et al., 2014) and have an influence on the observations limited to the observable compartment.

Figure 1

Schematic representation of the host population with an observable compartment and, complementarily, a hidden compartment (hosts outside the square). In addition, hosts are classified with respect to two other factors: healthy/infected and symptomatic/asymptomatic. As an illustration, let us consider the European situation of Xylella fastidiosa, which has been in situ detected and identified in 2013 in Italy, 2015 in France and 2016 in Spain. Xylella fastidiosa is a bacterium with a large range of wild and cultivated host plants, which lies in the xylem of the plant and may cause a rapid decline of its host (Purcell, 2013). Xylella fastidiosa is spread by insect vectors that feed on plant xylem (e.g. Philaenus spumarius; Saponari et al., 2014) and by transport of infected plants. The capacity for X. fastidiosa to invade new environments is facilitated by the existence of numerous strains varying in their host range and environmental preference, its multi‐host nature and the difficulty in observing the infection as a result of either a lack of symptoms or symptoms similar to those caused by other disorders (e.g. water stress). Since X. fastidiosa was detected in situ in South Corsica, France, during summer of 2015, a surveillance and control protocol focused on X. fastidiosa has been implemented and applied by local governmental agencies and other stakeholders. In this protocol, detected positive cases (as well as surrounding potential host and symptomatic plants) were destroyed to control the propagation of X. fastidiosa. Fig. 2 (upper panel) displays the observed proportion of positive cases, for symptomatic and asymptomatic plants, across time since the first detection. This proportion tends to decrease with time. Meanwhile, although the surveillance was initially mainly focused on host species already detected as infected in Corsica and areas surrounding positive cases, the cumulative numbers of sampled host genera and sampled municipalities were later significantly increased (see Fig. 2, lower panel) with the aim to better assess the presence of X. fastidiosa in terms of host range and geographic space. Thus, the decrease in the proportion of positive cases might be the consequence of (1) the destruction of hosts in foci of X. fastidiosa and (2) a decrease in the preference of sampling at‐risk hosts. Point (2) is a possible source of bias of perception that should be filtered out to determine which epidemic underlies the observations shown in Fig. 2. Moreover, because of the multi‐host nature of X. fastidiosa and the highly diverse plant population in Corsica, including large wild areas, the hidden compartment/infection reservoir hypothesis is plausible a priori and should be tested.

Figure 2

Raw data from the surveillance of Xylella fastidiosa in South Corsica, France, in 2015–2016. The first observation of X. fastidiosa in South Corsica was made in July 2015. Upper panel: observed proportion of plants positive for X. fastidiosa among all sampled plants (continuous line), symptomatic plants (dashed line) and asymptomatic plants (dotted line). Lower panel: cumulative counts of sampled plant genera (continuous line) and municipalities (dashed line); there are 124 municipalities in South Corsica. The list of sampled plant genera is provided in Supporting Information Notes S1 and includes a large number of wild and ornamental species. Understanding the complex ecological structures of pathogens, such as X. fastidiosa, is a long‐term and multidisciplinary task. Model‐based analyses of large‐scale data can contribute to such an understanding. In particular, the mechanistic‐statistical approach can help to elucidate the contributions of diverse epidemiological and observational components in data. This approach couples a mechanistic model of the temporal dynamics of the disease, a probabilistic model of the observation process and a statistical inference procedure (Soubeyrand et al., 2009). It allows the inference of epidemiological processes by taking into account specificities related to the observation process, including the sources of biases mentioned above. In this article, we propose a mechanistic‐statistical framework to infer epidemics underlying temporal observations consisting of counting data collected from symptomatic and asymptomatic hosts. This framework is based on a discrete‐time Susceptible–Infected–Removed (SIR) model (Allen, 1994; Brauer et al., 2008) including a hidden compartment and a surveillance/control process. It allows the inference of pathogen dynamics in both the observable and hidden compartments of the host population, the estimation of the introduction date when data are collected over a post‐introduction observation window, and the prediction of the pathogen dynamics under various surveillance scenarios. The mechanistic‐statistical framework was applied to X. fastidiosa data collected in South Corsica. Several specifications of the model were tested and a model selection was carried out to assess whether a hidden compartment and a time‐varying preference in surveillance have to be accounted for. Results are discussed with respect to two main perspectives: the control of a multi‐host pathogen in a complex environment after its discovery and the role of infection reservoirs in sustaining epidemics.

Materials and Methods

Pathosystem

Xylella fastidiosa is a plant pathogenic bacterium dispersed by xylem‐sap‐feeding insects (Redak et al., 2004; Purcell, 2013; Baker et al., 2015), and by humans who may transport and plant infected hosts (e.g. Nunney et al., 2010; Nunes et al., 2003). Xylella fastidiosa is divided into several subspecies, including X. fastidiosa ssp. Xylella fastidiosa especially causing Pierce's disease in grapevine; X. fastidiosa ssp. sandyi especially causing oleander leaf scorch; X. fastidiosa ssp. pauca especially found on citrus, coffee and olive trees; and X. fastidiosa ssp. multiplex causing scorch diseases in a large range of hosts (Denancé et al., 2017b). Together, the different subspecies of X. fastidiosa cause diseases on more than 350 plant species from more than 200 genera and 70 botanical families (Gardi et al., 2016). The subspecies multiplex, which has been identified in a large majority of positive samples collected in Corsica, France (the subspecies not being identified in the other samples; Denancé et al., 2017b), is mostly found in temperate climates of the Americas and has been detected in Europe, not only in France but also in Spain in 2016 (European Commission, Ref. Ares(2017)3773669 – 27/07/2017; https://ec.europa.eu/food/sites/food/files/plant/docs/ph_biosec_legis_list-demarcated-union-territory_en.pdf). Xylella fastidiosa has been studied especially for its pathogenicity on numerous host species, including plants with economic importance, but the interactions between X. fastidiosa and its host species are diverse and it does not appear to cause disease in most host species (Almeida & Nunney, 2015). Hence, asymptomatic infections not necessarily leading to disease development might be frequent, in particular in environments with high plant diversity, and might complicate the observation of X. fastidiosa in all its dimensions. This complication is increased by the capacity of X. fastidiosa to be transmitted by insect vectors (sharpshooter leafhoppers and spittlebugs), which are distributed worldwide in tropical and temperate climates and seem to be nonspecific, that is able to transmit diverse X. fastidiosa subspecies, but whose transmission efficiency is the outcome of complex vector–plant–pathogen–environment interactions (Almeida & Nunney, 2015). Thus, the presence of X. fastidiosa in an environment can translate into very diverse situations, including situations in which the bacteria can remain unseen for some (long) time.

The Corsican environment

Corsica is an island in the north‐west of the Mediterranean Sea, characterized by warm summers and mild winters. It is covered by a large proportion of natural and semi‐natural habitats: wild heathlands and forests cover 44% and 30%, respectively, whereas agricultural areas and urban areas cover 12% and 2%, respectively (Corine Land Cover Inventory, 2012, http://land.copernicus.eu/faq/about-data-access). Despite anthropic stress and an insular nature, Corsica has a high level of plant biodiversity and is one of the refugial areas in the Mediterranean region (Médail & Diadema, 2006; Jeanmonod et al., 2011). Numerous potential X. fastidiosa host species listed by Gardi et al. (2016) are present in Corsica, in the wild, urban and agricultural areas. In addition, at least 12 potential vector species have been reported in Corsica (Germain, 2016).

Data

The French administration decided that an enhanced surveillance of X. fastidiosa was necessary after its detection in July 2015 from a Polygala myrtifolia population growing in Propriano, in the south‐west of the Island (the strategy was described in official plans DGAL/SDQSPV/2017‐653 and DGAL/SDQSPV/2017‐39; see https://info.agriculture.gouv.fr/gedei/site/bo-agri/instruction-2017-653 and https://info.agriculture.gouv.fr/gedei/site/bo-agri/instruction-2017-39). Samples from both symptomatic and asymptomatic plants were collected throughout the country and analysed in the plant health laboratory of the French Agency for Food, Environmental and Occupational Health and Safety (ANSES) and, from November 2015, in certified laboratories. Detection of X. fastidiosa in collected samples was performed with a real‐time PCR (Denancé et al., 2017b; technical reference: ANSES/LSV/MA039 version 1, October 2015; https://www.anses.fr/fr/system/files/ANSES_MA039_Xylellafastidiosa_final.pdf). Samples analysed as positives in certified laboratories were confirmed by the plant health laboratory of ANSES. Data on samples, their locations and the results of the PCR have been centralized in a database managed by the ANSES unit for coordination and support to surveillance, after a verification of data quality. We extracted from the database those data which were collected from the French department Corse‐du‐Sud (i.e. South Corsica) between July 2015 and December 2016. We restricted the dataset to Corse‐du‐Sud because X. fastidiosa has been mostly found in this part of Corsica (the pathogen having a sparse distribution in Haute‐Corse, that is, the other department of Corsica, see Supporting Information Fig. S1, as well as in the south‐east of mainland France). Table S1 provides the counts, on a monthly basis, of sampled plants and infected plants by differentiating symptomatic and asymptomatic plants. These data were used to fit the competing models presented below.

Models

We built a mechanistic‐statistical model based on an SIR architecture including a submodel of the controlled epidemic process and a submodel of the observation process. The control in the epidemic process results from the observation of positive cases, which are destroyed and therefore subtracted from the overall disease incidence. Below, we present the model outlines. Notes S2 and Table S2 provide details on the model construction. In the model, time (denoted by t) is discrete and takes values in the set of integers (in the application, the time unit is 1 month). By convention, the time of the first observation is t = 0, and the date of introduction is t = t 0. Before t 0, the total number of susceptible hosts is and the proportion of the host population that is observable is (there is no hidden compartment if ). At t 0, infected hosts are introduced in both the observable and hidden compartment in proportions and , respectively. The submodel of the controlled epidemic process describes the discrete‐time dynamics followed by the counts of susceptible and infected hosts, and makes the distinction between these counts in the observable compartment (say S O(t) and I O(t)) on the one hand, and these counts in the hidden compartment (say S H(t) and I H(t)) on the other. This distinction does not imply independence: we assume that all infected hosts contribute to new infections in both compartments, irrespective of the compartments to which they belong. Thus, the disease dynamics in the two compartments are dependent, and the hidden compartment can play the role of infection reservoir. In the model, new infections are governed by a sort of discrete‐time renewal equation, parameterized by the infection strength parameter w > 0. Infected hosts are affected by a mortality rate and are replaced by susceptible hosts if they have not been detected by the surveillance system. Infected hosts detected by the surveillance system are removed and replaced by resistant hosts immediately after their detection. The assumptions made above are mathematically formalized as follows: where is the rounding operator introduced to obtain integer values for (S O(t), S H(t), I O(t), I H(t)); I obs(t−1) is the number of (symptomatic and asymptomatic) infected hosts detected at time t−1; and are counts of new infected hosts in the observable and hidden compartments, respectively, and satisfy: In the application, we set k = 12 months such that w measures the contribution of the overall disease prevalence 1 yr in the past to new infections at time t. Setting k = 12 allows the inference of an eventual annual periodicity. More flexible forms for and are presented in Notes S2, but the additional model flexibility leads to convergence issues in the estimation algorithm given the information contained in the data at our disposal, and we therefore rely on the simple forms presented above. By definition, the observation process only applies to the observable compartment. Thus, the model for the numbers and of symptomatic and asymptomatic observed infected hosts () takes as input variables S O(t), I O(t) and the numbers of sampled symptomatic and asymptomatic hosts, but not S H(t) and I H(t). In our approach, and are drawn in hypergeometric distributions taking into account the rate of false negatives in the diagnostic test, and a time‐varying preference in sampling at‐risk hosts introduced in the model with the function . The sub‐model of the observation process also includes parameters and lying in [0, 1], which are the proportions of symptomatic hosts among infected and susceptible hosts, respectively, belonging to the observable compartment. Thus, when the counts of symptomatic and asymptomatic observed hosts at time t, say and , are positive:where the hypergeometric distribution is parameterized by the numbers of successes and defaults in the population and the number of draws; and are the numbers of symptomatic hosts at time t in the observable compartment that are infected and susceptible, respectively; and and are the numbers of asymptomatic hosts at time t in the observable compartment that are infected and susceptible, respectively, and that are considered as at‐risk. These numbers satisfy:where g(t) is the time‐varying proportion of susceptible hosts (both symptomatic and asymptomatic) in the observable compartment that are considered as at‐risk, that is, that are likely to be sampled (note that all infected hosts in the observable compartment are considered as at‐risk and are consequently likely to be sampled). It should be noted that the fraction of infected hosts is removed from the number of successes in each hypergeometric distribution and added to the number of defaults to take into account the risk of false negatives. In the hypergeometric distributions, a given number of hosts are sampled in a finite population of infected and susceptible hosts, up to the false‐negative rate, and the sampling is assumed to be uniformly random among the infected and susceptible hosts. However, the sampling may be orientated towards at‐risk hosts, and this orientation may change with time. In particular, susceptible hosts might have a reduced propensity to be sampled because of the current knowledge about the epidemic and noticeable host factors (e.g. altitude, distance to infected areas and species). We did not explicitly take into account these factors, but we handled their effects by introducing into the model the function g that takes values of [0, 1] and reduces the number of susceptible hosts appearing in each hypergeometric distribution. More precisely, the function g gives the time‐varying proportion of the susceptible hosts in the observable compartment which can be sampled. These hosts, together with infected hosts in the observable compartment, are called at‐risk hosts. The function g is parameterized by β1 and β2 in [0, 1], which gives, respectively, the values of g at the first and last times of observation. In the Results section, we use the preference in sampling at‐risk hosts, which is defined as the ratio Pref(t)=1/(1 + g(t)) and gives the probability of sampling the infected host within a set of two hosts, one being infected and the other being healthy. In the application, we consider eight competing models, denoted , which are different instances of the modelling framework described above. They correspond to different specifications concerning the existence of a hidden compartment and the preference in sampling at‐risk hosts. Table 1 provides the model specificities.

Table 1

Specifications of the hidden compartment and the preference in sampling for models ; it should be noted that models have different prior distributions for the parameter

Preference in sampling	Hidden compartment
Preference in sampling	None ϕ=1	Fraction of the whole population ϕ∈[0,1]
None g≡1	M1	M4
At‐risk, constant g≡cst∈[0,1]	M2	M5
At‐risk, linearly varying	M3	M6 (uniform prior in [0, 1] for ϕ)
g: linear function		M7 (a priori large value for ϕ)
with values in [0, 1]		M8 (a priori small value for ϕ)

Specifications of the hidden compartment and the preference in sampling for models ; it should be noted that models have different prior distributions for the parameter

Bayesian estimation and model selection

Models are parameterized by: In models , the proportion of the host population that is observable is fixed at . In models and , . In models and , , where has to be estimated. More or less informative priors were chosen depending on the available knowledge about the parameters. Prior distributions are specified and motivated in Notes S2 and Table S3, and are briefly described in what follows. The prior for the introduction date t 0 was relatively vague (uniform prior over the 50 yr preceding the first detection of X. fastidiosa in Corsica). The total number of susceptible host units N 0 at t 0 had a prior mean of 5.5 million and a range between 1.9 and 13.3 million (prior quantiles of order 0.025 and 0.975). The number I 0 of introduced infected hosts at t 0 was set at a fixed value in all models because of some identifiability issues. This is the only parameter that we did not infer. We set I 0 = 10, which amounts to the assumption that the epidemic began with the introduction of a small batch of infected plants and that subsequent introductions did not significantly impact the overall curse of the epidemic. Notes S3 and Figs S2 and S3 provide an analysis of the impact of the value of I 0 on the inference output. The prior distribution for the mortality rate was chosen to encompass significantly different mortality dynamics (roughly, from 50% of death in the first year of infection to 50% of death in the first 7.7 yr of infection). A vague uniform prior over [0, 10] was used for w. For the proportions , we chose vague uniform priors over [0, 1], except in the following cases: for models , and without hidden compartment, was equal to 1; for models (with an a priori small hidden compartment) and (with an a priori large hidden compartment), the prior for was a beta distribution with parameter vectors equal to (4, 1) and (1, 4), respectively; for models and , ; for models and , was a priori uniform over [0, 1] and . Finally, the false‐negative rate was a priori uniformly distributed over , that is, was a priori rather low, but could take non‐negligible values. Parameters were estimated with an Markov chain Monte Carlo (MCMC) algorithm with Metropolis–Hastings updates. Three chains were run for each model to check the convergence of the algorithm, and were merged to obtain large posterior samples of parameters. Parameters were updated by blocks with a Gaussian proposal distribution centred around the current parameter values (the variances in the proposal distribution were tuned to obtain rapid algorithm convergence). For each MCMC run, we performed iterations, applied a burnin of iterations, and subsampled the rest of the chain every 2000 iterations. Thus, posterior samples were formed by 24 000 vectors of parameter values. Model selection was performed with respect to several criteria: the Akaike's information criterion (AIC), the Bayesian information criterion (BIC), the deviance information criterion (DIC) proposed by Spiegelhalter et al. (2002), the DIC modification proposed by Gelman et al. (2014, Chapter 7), the DIC modification proposed by Ando (2011) and the Bayes factor computed from the harmonic mean of the likelihood values (Kass & Raftery, 1995).

Results

Dualism in model selection

Among the competing models , the best models are those with a preference in sampling at‐risk hosts, which varies across time (Table 2). In addition, the incorporation in the model of a hidden compartment seems to be useless based on the diverse selection criteria. It should be note that not selecting a model with a hidden compartment does not mean that the hidden compartment does not exist, but tends to indicate that the hidden compartment, if any, has a negligible influence on the observations (see the Discussion section).

Table 2

Selection criteria computed for models with different specifications for the hidden compartment and the preference in sampling

Hidden compartment	Preference in sampling	Model	Log_L	AIC	BIC	DIC‐S	DIC‐G	DIC‐A	Bayes factor
None	None	M1	−224	463	512	436	477	411	<10⁻⁴
	At‐risk, constant	M2	−229	476	539	417	475	368	<10⁻⁴
	At‐risk, varying	M3	−197	412	475	356	412	308	1.00
Fraction of the whole population	None	M4	−224	465	520	461	481	460	<10⁻⁴
	At‐risk, constant	M5	−229	478	547	388	477	309	<10⁻⁴
	At‐risk, varying	M6	−197	415	484	NA	416	NA	0.80
A priori small fraction	At‐risk, varying	M7	−197	414	484	363	414	321	0.08
A priori large fraction	At‐risk, varying	M8	−199	418	488	399	415	392	1.41

LogL is the log‐likelihood, AIC is the Akaike’s information, BIC is the Bayesian information criterion, DIC‐S, DIC‐G and DIC‐A are the deviance information criteria of Spiegelhalter et al. (2002), Gelman et al. (2014) and Ando (2011). DIC‐S and DIC‐A cannot be calculated for model , for which the posterior mean of the parameter vector is unlikely because of the multimodality of the posterior (this is indicated in the table by NA, which stands for not available). is selected as the best model by the Bayes factor, whereas is selected by the other criteria (figures in bold).

Selection criteria computed for models with different specifications for the hidden compartment and the preference in sampling LogL is the log‐likelihood, AIC is the Akaike’s information, BIC is the Bayesian information criterion, DIC‐S, DIC‐G and DIC‐A are the deviance information criteria of Spiegelhalter et al. (2002), Gelman et al. (2014) and Ando (2011). DIC‐S and DIC‐A cannot be calculated for model , for which the posterior mean of the parameter vector is unlikely because of the multimodality of the posterior (this is indicated in the table by NA, which stands for not available). is selected as the best model by the Bayes factor, whereas is selected by the other criteria (figures in bold). A closer look at the hidden compartment hypothesis leads to an unexpected result: under model (which contains a hidden compartment, a vague prior for and a varying preference in sampling at‐risk hosts), the proportion of the observable compartment has a clearly bimodal posterior distribution (Fig. 3, left), with large probabilities for values close to either zero (i.e. most of the hosts are hidden) or one (i.e. most of the hosts are observable); the latter case is well approximated by model , in which . We investigated this characteristic by generating two additional competing models differing from model with respect to the prior distribution of : we changed the uniform prior into a beta prior with shape parameters (4, 1) for model and (1, 4) for model . Thus, under (), the prior mean of is 0.8 (0.2) and the hidden compartment is a priori a small (large) fraction of the whole host population. Based on the Bayes factor, model with a large hidden compartment is the best model and, a posteriori, the hidden compartment represents c. 99% of the whole host population (Fig. 3, right; 95%‐posterior interval: [95%; 100%]).

Figure 3

Posterior distribution of the proportion of the observable compartment under models (left) and (right). The dotted and dashed lines indicate the posterior mean and median of , respectively.

Posterior distribution of the proportion of the observable compartment under models (left) and (right). The dotted and dashed lines indicate the posterior mean and median of , respectively. This dualism in the model selection led us to present in what follows the inferences obtained under both models (without a hidden compartment) and (with a hidden compartment), which similarly fit the raw data obtained from the surveillance of X. fastidiosa in South Corsica (see Fig. 4).

Figure 4

Proportion of infected hosts across time under models (upper) and (lower). The proportion is computed for all hosts (left), symptomatic hosts (centre) and asymptomatic hosts (right). Red curve, observed proportion; black continuous curve, posterior median; black dashed curves, pointwise posterior quantiles of order 0.025 and 0.975.

Two scenarios in the past

The inferences obtained under models and correspond to two different scenarios mostly diverging in terms of the introduction date (t 0) and total number of infected hosts. In scenario 1 (model ), the introduction occurred around 2001, and the infected host units ranged from 400 to 1700 at the end of 2016. In scenario 2 (model ), the introduction occurred around 1985, and the infected host units ranges from 30 000 to 660 000 at the end of 2016 (see Fig. 5; Tables S4, S5).

Figure 5

Posterior medians, 0.025 quantiles and 0.975 quantiles of the past numbers of infected hosts in the whole host population (left), the observable compartment (centre) and the hidden compartment (right) under models (upper) and (lower); the median is given by the continuous curve, the quantiles by the dashed curves. The number of infected hosts in the hidden compartment is zero under model as this compartment is empty. In the left panels, the grey histograms and the continuous vertical line give the posterior distributions of the introduction date and its posterior median under each model. The dotted vertical line gives the date of the first observation. Interestingly, the posterior of the number N 0 of susceptible host units at the introduction date is approximately the same under models and . Thus, the two scenarios are based on a similar description of the host population except for the fact that a large fraction of the population is hidden in scenario 2. Consequently, the difference in the number of infected hosts provided above translates into a difference in proportions: a very small proportion (≈ 3‱) of the host population is infected in scenario 1, much smaller than the corresponding proportion (≈5%) in scenario 2 (Table S5; Fig. S4). The size N 0 is not the only parameter similarly estimated with models and . Indeed, we obtained consistent estimations of the mortality rate (), the infection strength (w), the proportions of symptomatic hosts in the observable compartment ( and ) and the false‐negative rate () (see Figs S5, S6). Hence, the two scenarios share several epidemiological and observational features. There is however an observational feature that varies: the preference in sampling at‐risk hosts. This preference decreases in both scenarios, but the magnitude of decrease is different. In scenario 1, where the observable compartment is huge as it coincides with the whole population, Pref(t) remains very high (it decreases from nearly 0.999 to 0.995). In scenario 2, Pref(t), which applies only to the observable compartment, decreases from nearly 0.9 to 0.6 (see Fig. 6). We will see below that this preference in sampling at‐risk hosts may be a crucial lever for controlling the disease dynamics.

Figure 6

Posterior medians (continuous curve), 0.025 quantiles and 0.975 quantiles (dotted curves) of the preference in sampling at‐risk hosts across time under model (left) and (right).

Implications for the future

We have previously highlighted differences and similarities in the two past scenarios for the X. fastidiosa dynamics in South Corsica. When one looks at the future, the models and provide significantly different outputs. As demonstrated below, the hidden compartment in model plays the role of infection reservoir, which would make the control of the disease difficult. Fig. 7 shows, for the next 10 yr, the predictions of the proportion of infected hosts in the whole population, the observable compartment and the hidden compartment under models (top panels) and (bottom panels). These predictions were made with a constant (but reinforced) surveillance effort and a constant preference in sampling at‐risk hosts: 800 symptomatic plants and 200 asymptomatic plants were sampled per month (these values are among the highest values encountered in the past surveillance; see Table S1), and Pref(t) = 0.995 with model and Pref(t) = 0.6 with model (these values were those estimated at the end of the sampling period; see Fig. 6).

Figure 7

Posterior medians, 0.025 quantiles and 0.975 quantiles of the past and future proportions of infected hosts in the whole host population (left), the observable compartment (center) and the hidden compartment (right) under models (upper) and (lower). In the prediction part of the curves, 800 symptomatic plants and 200 asymptomatic plants were sampled per month and Pref(t) = 0.995 with model and Pref(t) = 0.6 with model (see Fig. 5 for additional details on plot construction). With such a characterization of the surveillance, X. fastidiosa could be brought to low levels under model (the oscillating curse of the epidemic during the actual surveillance period, from month 0 to 17, vanishes thanks to the reinforced surveillance), but should continue to increase under model , even in the observable compartment. This noticeable difference occurs although we estimated approximately the same number of infected hosts in the observable compartment with both models (Fig. 5). Thus, under model , we can see the positive effect of the hidden compartment on the development of X. fastidiosa and, consequently, its role as infection reservoir. The effect of the hidden compartment is initially weak (we observe in the bottom centre panel of Fig. 7 a nearly constant prevalence in the observable compartment from month 0 to month 40, the hidden compartment and the reinforced surveillance generating opposite but comparable forces). After month 40, the continuous growth of the prevalence in the hidden compartment, which is not controlled, has a larger impact on the infection dynamics than does the reinforced surveillance and, consequently, the prevalence in the observable compartment significantly increases. Figs S7 and S8 provide 10‐yr predictions for diverse characterizations of the surveillance. They especially show that increasing the preference in sampling at‐risk hosts (as defined in our work) is a lever to be considered for reducing disease prevalence (and not only a source of bias of perception). Indeed, a large preference in sampling at‐risk hosts, Pref(t) = 1/(1 + g(t)), amounts focusing the surveillance on actually infected hosts, which are destroyed after their detection, and therefore to more efficiently reducing the disease prevalence. However, the correct way to increase Pref(t) is not obvious in practice: it can be increased by preferentially sampling species and areas that are known to be infected, but one must avoid simultaneously enlarging the hidden compartment. For instance, if one samples only the most infected species, then all the other infected species enter into the hidden compartment.

Discussion

Based on temporal observations and an adapted original model, our analyses tend to show that the emergence of X. fastidiosa in Corsica, France, is probably not a recent story. The model selection led to two scenarios: the first with an introduction around 2001 (1998–2005) and without a hidden compartment, and the second with an introduction around 1985 (1978–1993) and a hidden compartment. The two scenarios also diverge in terms of prediction, the scenario with a hidden compartment leading to significantly more severe future epidemics irrespective of the applied control measures. To determine which scenario is more realistic requires further data collection and analyses. In particular, evaluation of what could be the hidden compartment (e.g. wild and semi‐natural landscape components, or host species for which diagnostic tests are not done or not efficient) and sampling in this compartment are crucial to test the veracity of our second scenario. Although new data should be produced to investigate specific epidemiological questions and to better unravel the ecological structure of X. fastidiosa in Corsica, existing data still contain unexploited information. Indeed, our approach is only based on time series providing the symptomatic nature of sampled plants and their observed health status with respect to X. fastidiosa. Further analyses should be carried out to more finely exploit the spatiotemporal surveillance dataset available (e.g. spatial coordinates and species information of sampled hosts, and genetic information on bacterial strains). Such analyses should lead to more accurate results on the date of introduction and other epidemiological parameters, such as the mortality rate and the infection strength of infectious hosts. They should also provide information on processes not accounted for in our work, for instance, the dynamics of vectors (as in Bosso et al., 2016; White et al., 2017), the evolution of bacterial strains and the spatial spread of the disease. In particular, including in the analyses genetic and demographic data from North Corsica and south‐east of mainland France, where X. fastidiosa has been more sporadically detected, could provide crucial information on eventual multiple introductions and human‐mediated long‐distance dispersal (as in Mollentze et al., 2014, in the case of rabies). Inferences made about X. fastidiosa are obviously constrained by the features of our model. In particular, this model explicitly incorporates a hidden compartment, but ignores spatial and species information. The explicit incorporation of the counts of susceptible and infected hosts in the hidden compartment is a way to objectively account for the time‐varying risk of infection caused by infected hidden hosts. This approach is adopted in many temporal SIR‐like models that make the distinction between different types of hosts, for example target hosts and alternate hosts, including vectors (Dobson, 2004; Allen et al., 2012). Such multi‐host epidemic models are often based on a system of ordinary differential equations, but can also be based on Markov processes (McCormack & Allen, 2006; Allen, 2017), as in our case. A classical alternative modelling approach is to decompose the risk of infection into two components, the first that is dependent on the number of infected hosts in the compartment of interest (often modelled as an auto‐regressive term) and the second that is independent from this number (Held et al., 2006; Unkel et al., 2012). The second component is a way to implicitly handle alternate/hidden hosts but also environmental to factors, it is generally time‐varying, can incorporate explanatory variables and can be estimated for example, in the framework of hidden Markov models (HMMs). Although our model takes into account various epidemiological components (observable/hidden host compartments, symptomatic/asymptomatic status of hosts, delay of infection, preference in sampling), it nevertheless ignores spatial and species information, as mentioned above. Indeed, our model is built on a mean‐field assumption (or homogeneous mixing assumption) concerning the interaction between hosts, as are many deterministic or stochastic epidemiological models (Kleczkowski & Grenfell, 1999; Keeling & Grenfell, 2000; Aparicio & Pascual, 2007; Britton et al., 2015): the effect of the other hosts on any host is approximated by a single average effect, irrespective of their locations and species. Obviously, this assumption is not perfectly realistic for a pathogen that can be spread by insects (mostly at short distances and certainly with heterogeneous cross‐species transmissions) and by humans (both at short and long distances and with between‐host‐species heterogeneities). Hence, it would be worthwhile assessing the inference accuracy achieved with our model for data simulated under a spatially and species‐explicit model, as predictions under mean‐field models are compared with predictions obtained by their individual‐based counterparts. Dating pathogen emergences is a complex issue, but the integration of different sources of information can help to reduce the uncertainty. Dates of introduction of pathogens have been inferred from various types of data – for example demographic data (this article; Heiler et al., 2013; Soubeyrand & Roques, 2014), genomic data (Dudas & Rambaut, 2014; Nunes et al., 2014), archaeological data, archives and historical records (Le Floc'h, 1991; Preston et al., 2004; Potter et al., 2011) – and various analyses techniques – for example epidemiological investigations, forward simulations of population dynamic models, statistical estimation techniques, phylogenetic and phylogeographic analyses. Despite these data and techniques, origins of outbreaks generally remain uncertain (Woolhouse & Gaunt, 2007; with the exception of situations in which epidemiological investigations allowed the identification of the primary case(s)). This statement typically holds for plant pathogens arriving in regions in which the awareness is not focused on these pathogens at their introduction times. The combination of different analyses performed with different data should help to reduce the uncertainty about the origin. Concerning our case study, namely the emergence of X. fastidiosa in Corsica, a complementary approach based on molecular dating of a phylogenetic tree exploiting genome data provided the following mean dates of divergence between couples of French isolates and their American relatives: c. 1980 for strain ST6 and 1965 for strain ST7 (Denancé et al., 2017a). These dates can be considered as proxies or lower bounds of the introduction dates. They are relatively consistent with our second scenario (1985 (1978–1993)), and a joint analysis of demographic and genomic data could help in to refine our conclusions. Identifying and characterizing reservoirs of infection, if any, is crucial for understanding of infectious disease dynamics, design of surveillance and control strategies, and the anticipation and prevention of future emergences (Haydon et al., 2002; Karesh et al., 2012; Bartoli et al., 2015). For humans, numerous pathogens have long been recognized to have environmental or animal reservoirs (the corresponding diseases being called sapronoses and zoonoses, respectively; Woolhouse & Gaunt, 2007). For agricultural plants, early examples of identification and control of infection reservoirs do exist (see, for example, the eradication of barberry, an alternate host of the wheat stem rust; Stakman, 1919), but Morris et al. (2007) pointed out a decade ago that pathogenic bacteria had been almost exclusively studied in agricultural contexts, neglecting environmental niches, and Burdon & Thrall (2008) designated the study of the agro‐ecological interface and its evolutionary implications as a major issue for future research. With time, plant pathogen reservoirs of various kinds have been studied (e.g. wild or weedy host plants, volunteer plants, alternate hosts, leaf litter, freshwater and snowpack; Holt et al., 2003; Li et al., 2014; Gérard et al., 2006; Beckstead et al., 2010; Fabre et al., 2012; Monteil et al., 2013; Soubeyrand et al., 2017), and reservoirs are today considered as important drivers in plant epidemiology. The approach developed in this article can be viewed as a data‐driven way of testing the existence (or the influence) of a reservoir during an outbreak, when data are collected only from the target population. Obviously, the influence of the reservoir has to be non‐negligible to be detected with our method, which simply exploits demographic counting data. For X. fastidiosa in South Corsica, we were not able to firmly determine whether or not there is a hidden compartment (viewed as an infection reservoir in our study), but our results show that this hypothesis is plausible and should be investigated in further studies. Since July 2015, more than 20 new host species have been found in Corsica (Gardi et al., 2016; see also https://ec.europa.eu/food/plant/plant_health_biosecurity/legislation/emergency_measures/xylella-fastidiosa/susceptible_en for updated information). This progressive discovery of host species supports the hidden compartment hypothesis. Moreover, an analysis of the demography and disease prevalence for a host such as Cistus monspeliensis suggests that it could be, among others, an important component of the hidden compartment. Indeed, C. monspeliensis is very abundant in Corsica (http://www.tela-botanica.org), in particular in wild areas; its observed infection rate is quite high (c. 11%), and insect vectors tend to be frequent around this host (recent molecular analyses have shown that X. fastidiosa is present in c. 20% of insect vectors Philaenus spumarius collected from several C. monspeliensis populations across Corsica; Cruaud et al., 2018). However, this host species has been weakly surveyed (3% of samples) in comparison with much less abundant host species, such as Polygala myrtifolia (12% of samples), which is an ornamental plant with an observed infection rate of 26%. Thus, the C. monspeliensis population is under‐represented in surveillance data and a fraction of this population, in particular in wild areas, could contribute to the hidden compartment. The evaluation of the spatial distribution of this host and its comparison with it to the spatial pattern of sampled C. monspeliensis in the surveillance of X. fastidiosa would be a first step towards the identification of a potential reservoir.

Author contributions

S.S. conceived the ideas and designed the methodology; S.S., P.d.J., O.M. and M.S. prepared and analysed data; S.S., P.d.J., O.M., M.S., C.M., P.H. and C.L. discussed the objectives of the study at an early stage and commented on the results; S.S. led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication. Please note: Wiley Blackwell are not responsible for the content or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office. Fig. S1 Locations of Xylella fastidiosa positive and negative samples in Corsica, France, between July 2015 and December 2016. Fig. S2 Prior and posterior distributions of I 0 under model when I 0 is not fixed. Fig. S3 Posterior means and quantiles of parameters of model M8 obtained for various values of I 0. Fig. S4 Posterior medians, 0.025 quantiles and 0.975 quantiles of the past proportions of infected hosts under models and . Fig. S5 Marginal posterior distributions of the parameters of model . Fig. S6 Marginal posterior distributions of the parameters of model . Fig. S7 Posterior medians, 0.025 quantiles and 0.975 quantiles of the future proportions of infected hosts under model and different surveillance scenarios. Fig. S8 Posterior medians, 0.025 quantiles and 0.975 quantiles of the future proportions of infected hosts under model and different surveillance scenarios. Table S1 Monthly surveillance data in South Corsica Table S2 Specifications of the hidden compartment and the preference in sampling for models , with mathematical expressions of the function g Table S3 Prior distributions of parameters Table S4 Posterior means, medians, 0.025 quantiles and 0.975 quantiles of parameters of models and Table S5 Posterior means, medians, 0.025 quantiles and 0.975 quantiles of the introduction year and the number/proportion of infected hosts in December 2016 under models and Notes S1 List of plant genera sampled in South Corsica from July 2015 to December 2016. Notes S2 Detailed model description. Notes S3 Impact of the choice of the number I 0 of introduced infected hosts on the estimation of the other model parameters. Click here for additional data file.

34 in total

1. Individual-based perspectives on R(0).

Authors: M J Keeling; B T Grenfell
Journal: J Theor Biol Date: 2000-03-07 Impact factor: 2.691

2. Durable strategies to deploy plant resistance in agricultural landscapes.

Authors: Frédéric Fabre; Elsa Rousseau; Ludovic Mailleret; Benoit Moury
Journal: New Phytol Date: 2012-01-19 Impact factor: 10.151

3. Building epidemiological models from R0: an implicit treatment of transmission in networks.

Authors: Juan Pablo Aparicio; Mercedes Pascual
Journal: Proc Biol Sci Date: 2007-02-22 Impact factor: 5.349

4. A framework to gauge the epidemic potential of plant pathogens in environmental reservoirs: the example of kiwifruit canker.

Authors: Claudia Bartoli; Jay Ram Lamichhane; Odile Berge; Caroline Guilbaud; Leonardo Varvaro; Giorgio M Balestra; Boris A Vinatzer; Cindy E Morris
Journal: Mol Plant Pathol Date: 2014-08-24 Impact factor: 5.663

Review 5. Paradigms: examples from the bacterium Xylella fastidiosa.

Authors: Alexander Purcell
Journal: Annu Rev Phytopathol Date: 2013-05-17 Impact factor: 13.078

6. Surprising niche for the plant pathogen Pseudomonas syringae.

Authors: Cindy E Morris; Linda L Kinkel; Kun Xiao; Philippe Prior; David C Sands
Journal: Infect Genet Evol Date: 2006-06-27 Impact factor: 3.342

7. Infectivity and transmission of Xylellua fastidiosa by Philaenus spumarius (Hemiptera: Aphrophoridae) in Apulia, Italy.

Authors: Maria Saponari; Giuliana Loconsole; Daniele Cornara; Raymond K Yokomi; Angelo De Stradis; Donato Boscia; Domenico Bosco; Giovanni P Martelli; Rodrigo Krugner; Francesco Porcelli
Journal: J Econ Entomol Date: 2014-08 Impact factor: 2.381

8. A Bayesian approach for inferring the dynamics of partially observed endemic infectious diseases from space-time-genetic data.

Authors: Nardus Mollentze; Louis H Nel; Sunny Townsend; Kevin le Roux; Katie Hampson; Daniel T Haydon; Samuel Soubeyrand
Journal: Proc Biol Sci Date: 2014-03-11 Impact factor: 5.349

Review 9. Global transport networks and infectious disease spread.

Authors: A J Tatem; D J Rogers; S I Hay
Journal: Adv Parasitol Date: 2006 Impact factor: 3.870

10. Air travel is associated with intracontinental spread of dengue virus serotypes 1-3 in Brazil.

Authors: Marcio R T Nunes; Gustavo Palacios; Nuno Rodrigues Faria; Edivaldo Costa Sousa; Jamilla A Pantoja; Sueli G Rodrigues; Valéria L Carvalho; Daniele B A Medeiros; Nazir Savji; Guy Baele; Marc A Suchard; Philippe Lemey; Pedro F C Vasconcelos; W Ian Lipkin
Journal: PLoS Negl Trop Dis Date: 2014-04-17

9 in total

1. Emergence of a Plant Pathogen in Europe Associated with Multiple Intercontinental Introductions.

Authors: Blanca B Landa; Andreina I Castillo; Annalisa Giampetruzzi; Alexandra Kahn; Miguel Román-Écija; María Pilar Velasco-Amo; Juan A Navas-Cortés; Ester Marco-Noales; Silvia Barbé; Eduardo Moralejo; Helvecio D Coletta-Filho; Pasquale Saldarelli; Maria Saponari; Rodrigo P P Almeida
Journal: Appl Environ Microbiol Date: 2020-01-21 Impact factor: 4.792

2. A lattice model to manage the vector and the infection of the Xylella fastidiosa on olive trees.

Authors: Annalisa Fierro; Antonella Liccardo; Francesco Porcelli
Journal: Sci Rep Date: 2019-06-19 Impact factor: 4.379

3. Phylogenetic inference enables reconstruction of a long-overlooked outbreak of almond leaf scorch disease (Xylella fastidiosa) in Europe.

Authors: Eduardo Moralejo; Margarita Gomila; Marina Montesinos; David Borràs; Aura Pascual; Alicia Nieto; Francesc Adrover; Pere A Gost; Guillem Seguí; Antonio Busquets; José A Jurado-Rivera; Bàrbara Quetglas; Juan de Dios García; Omar Beidas; Andreu Juan; María P Velasco-Amo; Blanca B Landa; Diego Olmo
Journal: Commun Biol Date: 2020-10-09

4. Shape and rate of movement of the invasion front of Xylella fastidiosa spp. pauca in Puglia.

Authors: David Kottelenberg; Lia Hemerik; Maria Saponari; Wopke van der Werf
Journal: Sci Rep Date: 2021-01-13 Impact factor: 4.379

5. Emerging strains of watermelon mosaic virus in Southeastern France: model-based estimation of the dates and places of introduction.

Authors: L Roques; C Desbiez; K Berthier; S Soubeyrand; E Walker; E K Klein; J Garnier; B Moury; J Papaïx
Journal: Sci Rep Date: 2021-03-29 Impact factor: 4.379

9. Multi-scale spatial genetic structure of the vector-borne pathogen 'Candidatus Phytoplasma prunorum' in orchards and in wild habitats.

Authors: Véronique Marie-Jeanne; Nicolas Sauvion; François Bonnot; Gaël Thébaud; Jean Peccoud; Gérard Labonne
Journal: Sci Rep Date: 2020-03-19 Impact factor: 4.379