Literature DB >> 28112914

Modeling Biphasic Environmental Decay of Pathogens and Implications for Risk Analysis.

Andrew F Brouwer¹, Marisa C Eisenberg¹, Justin V Remais², Philip A Collender², Rafael Meza¹, Joseph N S Eisenberg¹.

Abstract

As the appreciation for the importance of the environment in infectious disease transmission has grown, so too has interest in pathogen fate and transport. Fate has been traditionally described by simple exponential decay, but there is increasing recognition that some pathogens demonstrate a biphasic pattern of decay-fast followed by slow. While many have attributed this behavior to population heterogeneity, we demonstrate that biphasic dynamics can arise through a number of plausible mechanisms. We examine the identifiability of a general model encompassing three such mechanisms: population heterogeneity, hardening off, and the existence of viable-but-not-culturable states. Although the models are not fully identifiable from longitudinal sampling studies of pathogen concentrations, we use a differential algebra approach to determine identifiable parameter combinations. Through case studies using Cryptosporidium and Escherichia coli, we show that failure to consider biphasic pathogen dynamics can lead to substantial under- or overestimation of disease risks and pathogen concentrations, depending on the context. More reliable models for environmental hazards and human health risks are possible with an improved understanding of the conditions in which biphasic die-off is expected. Understanding the mechanisms of pathogen decay will ultimately enhance our control efforts to mitigate exposure to environmental contamination.

Entities: Chemical Disease Species

Mesh：

Year: 2017 PMID： 28112914 PMCID： PMC5789392 DOI： 10.1021/acs.est.6b04030

Source DB: PubMed Journal: Environ Sci Technol ISSN： 0013-936X Impact factor: 9.028

Introduction

Many infectious pathogens are transmitted between hosts via environmental media, including water, air, food, and fomites. Pathogens occupying these media are subject to environmental stresses, transport, and senescence, and the persistence of pathogens in these settings is of great interest when considering interventions that target environmental media (including water treatment, surface decontamination, and hand hygiene) as well as strategies for meeting regulatory thresholds for environmental protection.[1] When pathogens persist in the environment, their movement through various media, water in particular, can facilitate transmission over long distances.[2] Various modeling frameworks that describe pathogen persistence and transport via surface water—so-called fate and transport models—have been used to gain insight into the environmental dynamics of infectious disease risk (e.g., refs (3−7)). A critical feature of fate and transport models is the representation of pathogen attenuation over time or distance, which is generally expressed as an exponential, monophasic decay of pathogen concentration over time. While long-tailed deviations from monophasic decay have been known since the early 20th century, assumptions of first-order kinetics remain overwhelmingly the norm.[8−12] Some pathogens, Escherichia coli in particular, exhibit a biphasic pattern—a period of faster decay (labile regime) followed by a period of slower decay (resistant regime); see Hellweger et al.[13] for a review. Studies that have characterized biphasic decay (e.g., refs (13−15)) generally consider one of two empirical models: the piecewise log–linear function[12]or the biexponential model[11,16−18]Other models have also been used, such as the Weibullwhich can be used to represent a heterogeneous distribution of stress tolerance and has been used predominantly in the context of the inactivation of foodborne pathogens.[19,20] Another kind of biphasic behavior is also seen in some contexts: a phase of relative population stability, or even growth, followed by faster decay. This phenomenon is also important from a risk perspective, but is beyond the scope of our study, as it is typically modeled using very different strategies from those used to address slow-decaying tails. See, for example, Phaiboun et al.[21] for a starvation kinetics model capable of reproducing this behavior. The additional parameter or parameters that must be estimated in biphasic models—and the additional longitudinal data necessary to estimate them—explain in part why biphasic decay is rarely incorporated into fate and transport models. Modelers have tended to superimpose monophasic models on biphasic data. For example, monophasic decay parameters are sometimes fit only to the labile regime (e.g., ref (22)), which can happen if sampling studies end prematurely (Figure a). In this case, the resulting model will consistently underestimate underlying biphasic pathogen concentrations. Alternatively, a monophasic model can be fit to an entire biphasic data set (Figure b) (e.g., ref (23−25)). In this case, pathogen concentrations will be substantially overestimated initially but underestimated after a certain point. Yet another approach is to fit the model only to the first and last data points (e.g., ref (26)).

Figure 1

Two monophasic models for a population of Shigella sonnei undergoing biphasic decay in seawater at room temperature[27] and a biexponential fit (eq ). (a) A monophasic model fit only to labile regime, providing relatively accurate estimates for the initial phase but underestimating pathogen survival at later time points. (b) A monophasic model fit to all data, overestimating initial pathogen survival and underestimating survival after about 20 days. All models fit by log-transformed least-squares and with a set intercept. To date, all data on biphasic decay have come from observational studies in which media (either in a laboratory or in the environment) are sampled over time to estimate pathogen population and die-off. Further, consideration of biphasic decay in exposure and risk assessment has been confined to pathogens on agricultural products, particularly in regard to delay of harvesting time after wastewater application.[28−30] We know of no studies incorporating biphasic decay in an analysis of disease risk at the population scale or in a hydrological fate and transport context. Various hypotheses have been proposed to explain the observation of biphasic dynamics.[13,14,31] We categorize these hypotheses into four—not necessarily mutually exclusive—categories: Population heterogeneity. The pathogen population is composed of distinct subpopulations, some more susceptible to environmental decay. These populations might be different strains, the result of a new mutation, or populations in different phases of growth, etc. This hypothesis is frequently used, for example, to explain the observed biphasic decay of E. coli populations. Hardening off. Once exposed to the environment, the pathogen converts to a hardier phenotype through altered gene expression or other mechanisms. This hypothesis is particularly relevant for pathogens that exhibit microbial dormancy or quiescence, a kind of bet-hedging strategy in which cells limit growth in exchange for environmental resilience.[32,33] Such states are thought to explain, for example, antibiotic-resistant persistor cells and certain chronic infections. VBNC. The pathogen enters a viable-but-not-culturable (VBNC) state that cannot be detected by usual culturing methods. This hypothesis is closely related to the dormancy discussed above, but, here, the resistant cells are not measured, typically because their metabolic profiles have changed in a way that renders them incapable of growth on the cell culture media used in quantification. Under this hypothesis, biphasic decay can be observed even if the two pathogen types decay at the same rate and thus could be an artifact of data collection methods. VBNC states have been observed for E. coli, H. pylori, V. cholerae, Salmonella, and Shigella, among others.[34] Density effects. Pathogen decay rates are tied to pathogen density, possibly because of substrate concentration or quorum sensing. Previous results suggest that the decay of E. coli in surface water is not a density effect,[13] but this mechanism could be relevant for other pathogens or for other kinds of biphasic phenomena.[21] Our mathematical framework does not consider density effects. Our analysis has two broad aims. First, we show that the biexponential model can arise from mechanistic assumptions and thus should be preferred to purely empirical models such as the piecewise log–linear model. Longitudinal pathogen concentration data, however, cannot distinguish between several variants of the mechanistic model presented. Through identifiability analysis, we identify the parametric information available in these sampling studies and show how this information can be interpreted in the context of different underlying mechanisms. Our analysis is not a validation of any specific mechanistic model; indeed, we show that such validation is impossible—beyond demonstrating that a biexponential curve may fit biphasic data—when only cell concentration data are available. The second aim is to emphasize that biased estimates of risk can result when a monophasic model is used to analyze a biphasic decay process. We provide two case studies in support of this aim. First, we show that assuming monophasic decay when the pathogen exhibits biphasic decay can substantially underestimate disease risk. Second, we explore the effect of biphasic decay in hydrological fate and transport modeling, showing that monophasic approximations to biphasic data may over- or underestimate concentrations depending on how the approximations are formulated.

Materials and Methods

Model

We consider a family of linear, two-compartmental models that encompasses several possible mechanisms of biphasic decay. Let E1(t) and E2(t) be populations of pathogens of type 1 (labile) and type 2 (resistant) in the environment, and let E(t) = E1(t) + E2(t) be the total pathogen population. Let u(t) be the total addition of pathogens into the environment (e.g., shedding). We consider models of the formwith initial conditions E1(0) = ρω and E2(0) = (1−ρ) ω, where the seven model parameters are listed in Table .

Table 1

Parameters for the General Pathogen-Decay Model Given in Eq

symbol	parameter
θ₁, θ₂	decay rate coefficients for labile and resistant pathogens, respectively
δ₁	rate coefficient for transition to resistant state (e.g., entering dormancy)
δ₂	rate coefficient for transition to labile state (e.g., resuscitation)
ω	initial population of pathogens
ρ	initial fraction of pathogens that are labile
η	fraction of introduced pathogens that are labile
u(t)	time-varying input of pathogens to the system

Although biphasic decay can mathematically occur from a number of different configurations of a two-state linear compartmental model (Figure a), we will ground our analysis in three plausible models of pathogen decay. The models shown in Figure b–d correspond to the first three hypotheses listed in the Introduction. In the population heterogeneity model (Figure b), pathogens are inherently of two different types that decay at different rates, but we observe only the total population (δ1 = δ2 = 0; observed data is E(t)). In the hardening-off model (Figure c), all pathogens are initially of one type but begin to convert to a hardier, possibly dormant, form once in the environment with negligible resuscitation (δ2 = 0, η = 1; observed data is E(t)). In the viable-but-not-culturable model (Figure d), pathogens all decay at the same rate and interconvert between measurable and unmeasurable forms (η = 1, θ2 = θ1; observed data is E1(t)). Although it is likely that the nonculturable pathogens do not decay at the same rate as the cultivable ones, this submodel demonstrates that observed biphasic behavior does not necessarily imply different pathogen decay rates. Although we are grounding our analysis in these three models, the truth may very well be a blend of mechanisms dependent on the pathogen and environmental conditions.

Figure 2

General model of biphasic decay and three mechanistic submodels. Solid lines are transfer of pathogens, and the dashed lines indicate measurement.

Identifiability

A necessary consideration for estimation of model parameters from data is identifiability. A model parameter is said to be identifiable if it can be uniquely determined from observed data. The model is said to be identifiable only if all model parameters are identifiable (more formal definitions are presented elsewhere[35−37]). As an example, consider the linear model y = (m1 + m2) x + b. This model is unidentifiable when the data are (x, y) pairs because, although b is identifiable, m1 and m2 cannot be uniquely determined. Identifiability of a model is dependent on both the parametrization and the data measured. There may problems estimating parameters uniquely because of the inherent specification of the model (e.g., trying to separately estimate m1 and m2 in the linear model above) or because of issues relating to data quality, frequency of sampling, etc. (e.g., trying to model a sinusoid when the resolution of the data is only once per period at the same point in its phase). The study of inherent identifiability problems (as in the linear model example), which would exist even with continuous, perfectly measured, and error-free data, is called structural identifiability; the study of identifiability problems related to the data (as in the sinusoid example) is called practical identifiability.[38] If a model is structurally unidentifiable, one can identify the parametric information available in the data in the form of identifiable parameter combinations,[37] e.g. m1 + m2 is an identifiable parameter combination in the linear model above. It is sometimes possible to make unidentifiable models identifiable by reparameterizing the model in terms of identifiable combinations (e.g., setting m = m1 + m2 in the linear example above) or by measuring different data (e.g., measuring a sinusoid at different points in its phase). In this paper, structural identifiability analysis is done by a differential algebra approach.[39−41] We note that many of the identifiability results in this paper are recontextualizations of results or build from prior work in the field of pharmacokinetics,[35,42,43] but pathogen decay presents a novel context and interpretation.

Results

We present results—both theoretical analysis and numerical case studies—relating to the model presented in eqs , which considers both sampling studies with no input (u = 0) and experiments with measured pathogen inputs (u ≢ 0). Five technical results are first presented to provide insight into the properties of the model. Proposition 1 proves that the biexponential model describes the behavior of the family of considered mechanisms. Proposition 2 states that the apparent parameters of the biexponential model can be uniquely determined from pathogen concentration data (i.e., are identifiable). In contrast, Proposition 3 proves that the general mechanistic model is unidentifiable, and it considers what combinations of the mechanistic model parameters can be identified under various assumptions about which data are measured. Corollary 1, which follows directly from Proposition 3, gives these identifiable combinations in the context of the population heterogeneity, hardening-off, and viable-but-not-culturable models. Proposition 4 states that it is impossible to distinguish from the shape of the data alone whether we are measuring all pathogens or only a culturable fraction. Finally, we illustrate these results through examples demonstrating that data obtained from sampling studies alone cannot elucidate the mechanism of decay and through case studies exploring the implications of biphasic decay in risk assessment and hydrological fate and transport modeling.

Theoretical Results

Proposition 1. Except for degenerate parameter combinations, the model given in eq displays biphasic behavior in the labile fraction E1(t) and in the total pathogen population E(t) = E1(t) + E2(t). That is, for some parameter combinations a, b, c, d, and h, E(t) and E1(t) have the formwhereis a forcing function where υ(t) = u(t) for E(t) and υ(t) = ηu(t) for E1(t). Take-away: Any of the mechanisms modeled in Figure can produce biphasic dynamics.The proof of Proposition 1 is left to the supplement. When the total population E(t) is observed, a degeneracy (that is, decay is not biphasic) occurs when there is no difference in the labile and resistant decay rates (θ1 = θ2); when only the labile fraction E1(t) is observed, there is a degeneracy when resistant pathogens cannot be resuscitated and cultured (δ2 = 0). Parameters a, b, c, d, and h have meaning in the shape of the observed decay and are most easily interpreted as if the underlying mechanism was population heterogeneity (and thus we call them “apparent” parameters): a is the apparent labile decay rate, b is the apparent resistant decay rate, c is the apparent initial fraction of labile pathogens, d is the initial total concentration of pathogens, and h is the apparent fraction of newly introduced pathogens that are labile. In terms of the mechanistic parameters,For the total pathogen population E(t),For the labile pathogen population E1(t),Correct association of the values of a and b with mechanistic model parameter combinations does not depend on whether the environmental compartment measured is in truth E(t) or E1(t) because a and b are associated with the same parameter combinations regardless. However, c, d, and h are associated with different parameter combinations depending on whether E(t) or E1(t) is measured and thus will be misspecified if the environmental compartment is not correctly identified (e.g., only observing one type of pathogen when there are really two). For example, if one assumes that one is measuring E(t) but is in truth measuring E1(t), then one is implicitly assuming that ca + (1 – c) b is the weighted decay rates ρθ1 + (1 – ρ) θ2 when that quantity is actually θ1 + δ1 + δ2(1–1/ρ). Proposition 2. The parameter combinations a, b, c, d, and h in Proposition 1 are identifiable. Take-away: When observing biexponential decay, one can uniquely determine the apparent decay rates, the proportion of the population attributed to each type, the initial population, and, if there is shedding, the proportion of shed pathogens apparently of each type. That the coefficients and exponents of an observed sum of exponentials are identifiable is known from identifiability theory[42,44] and can be used to generate this result. It is also a straightforward application of the differential algebra approach to identifiability, which we include in the supplement. We still want to know the extent to which we can uniquely determine the parameters of the underlying mechanistic models in Figure . However, the apparent parameters in Proposition 2 are complicated functions of the mechanistic parameters. In Proposition 3, we manipulate the apparent parameters into a different set of identifiable combinations that express equivalent information but are much simpler expressions of the mechanistic parameters. Note that the number of identifiable combinations of mechanistic parameters is the same as the number of identifiable apparent parameters. Proposition 3. The full model given in eq with data E(t) = E1(t) + E2(t) is unidentifiable. If θ1 ≠ θ2, then when there is no pathogen input (u(t) ≡ 0), the identifiable combinations are θ1 + θ2 + δ1 + δ2 (the sum a + b), (θ1 + δ1) (θ2 + δ2) – δ1δ2 (the product ab), ρθ1 + (1 – ρ) θ2 (decay rates weighted by initial fractions, ca+(1– c) b), and ω (the total initial pathogen population, d); when pathogen input over time is measured (υ(t) = u(t) ≢ 0), one can additionally identify ηθ1 + (1−η) θ2 (decay rates weighted by input fractions, ha + (1 – h) b). The model given in eq with data E1(t) is unidentifiable. If δ2 ≠ 0, then when there is no pathogen input (u(t) ≡ 0), the identifiable combinations are θ1+θ2+δ1+δ2 and (θ1+δ1) (θ2+δ2) – δ1δ2, θ1+δ1+δ2(1–1/ρ), and ρω (again found by a + b, ab, ac+(1– c) b, and d, respectively). when pathogen addition over time is measured (υ(t) = ηu(t) ≢ 0), one can additionally identify θ1+δ1+δ2(1–1/η) (again ha+(1– h) b). Take-away: The full linear, compartmental model (Figure a) has six parameters (seven, if υ ≢ 0 and is known), including initial conditions, but the pathogen survival data contains only four (five, if υ ≢ 0 and is known) pieces of parametric information. In other words, the model given the data could be made identifiable if information about two parameters is independently known, although the choice of which two parameters is not entirely arbitrary; only certain pairs of parameters, dependent on the identifiable parameter combinations given above, will resolve the unidentifiability problem, e.g. knowing, a priori, both decay rates θ1 and θ2 resolves the unidentifiability but knowing the dormancy rate δ1 and initial fraction ρ does not. The proof is the same as that of Proposition 2 and may be found in the supplement. Note that we expressed the mechanistic identifiable combinations as the same functions of the apparent parameters in Proposition 3 whether the data is E(t) or E1(t) but that these combinations represent different underlying process. Now that we have the identifiable parameter combinations for the full mechanistic model, it is straightforward to simplify the identifiable combinations under the assumptions of the submodels.

Corollary 1

In the sampling framework (u(t) ≡ 0), the population heterogeneity model is identifiable with parameters θ1, θ2, ρ, and ω (respectively a, b, c, and d); the hardening-off model is unidentifiable with identifiable combinations θ1+δ1, θ2, ρ(θ1–θ2), and ω (respectively a, b, c(a– b), and d); the viable-but-not-culturable model is unidentifiable with identifiable combinations θ, δ1 + δ2, δ2/ρ, and ρω (respectively b, a – b, (1 – c) (a – b), and d). In the measured-input framework (u(t) ≢ 0), when u(t) is known, the population heterogeneity model is identifiable with parameters θ1, θ2, ρ, ω, and η (respectively a, b, c, d, and h); when u(t) is known, the hardening-off model is identifiable with parameters θ1, δ1, θ2, ρ, and ω (respectively ha + (1– h) b, (1 – h) (a – b), b, c/h, and d); when ηu(t) is known, the viable-but-not-culturable model is identifiable with parameters θ, δ1, δ2, ρ, and ω (respectively b, h(a– b), (1– h) (a – b), (1 – h) /(1 – c), and (1 – c) d/(1 – h)) Take-away: Making assumptions about mechanism simplifies the parameter combinations identified in Proposition 3. We illustrate this corollary in Examples 1 and 2. In particular, this corollary demonstrates that the mechanistic submodels are identifiable if we measure the input to the system. We use a thought experiment to explore this possibility in Example 2. When measuring pathogen concentrations, one does not necessarily know if technique is capturing a representation of the whole population, or only of a culturable subsample. This next proposition tells us that the shape of the data alone cannot resolve this issue. Proposition 4. When measuring a pathogen concentration (where it is unknown a priori whether it is E(t) = E1(t) + E2(t) or E1(t)) and, possibly, an input υ(t) (including υ(t) ≡ 0), it cannot be determined a posteriori whether (i)E(t) is measured and υ(t) = u(t) or (ii) E1(t) is measured and υ(t) = ηu(t). Take-away: One cannot tell from the pathogen concentration alone whether one is measuring all of the pathogens or only a subpopulation of them. A single example of indistinguishability suffices as proof, and we illustrate this in Example 2.

Examples

In this section, we provide four examples. In Example 1, we fit models to observations of pathogen population over time when there is no additional input to the system. In Example 2, we use simulated data to fit models to observations of pathogen population over time when there is a known, continuous addition of pathogen. In Example 3, we examine the implications of biphasic decay in a risk assessment. Finally, in Example 4, we explore the biases introduced by the misapplication of monophasic models in a hydrological fate and transport model. Example 1: Modeling pathogen concentration samples over time In this example, we use data from a study by Rogers et al.,[14] in which known concentrations of transformed green fluorescent protein E. coli O157:H7/pZs were added to manure-amended soil with 80% field capacity moisture content and incubated at 25 °C for 120 days. The piecewise log–linear model (eq ) used by the authors, the Weibull model (eq ), and biexponential model (eq ) all fit the data well (Figure ); the Weibull and biexponential models are fit by minimizing the log-transformed least-squares difference in R (v. 3.0.1) using optim.

Figure 3

Fitting (log-transformed least-squares) piecewise log–linear, Weibull, and biexponential models of biphasic decay to population of E. coli over time in Rogers et al.[14] The best fit estimates are a = 0.37, b = 0.072, t* = 9 for the piecewise log–linear model (eq ), α = 2.1 and β = 0.59 for the Weibull model (eq ), and a = 0.36, b = 0.069, and c = 0.93 for the biexponential model (eq ). Although both the piecewise log–linear and biexponential models can estimate the apparent labile a and resistant b pathogen decay rates, the biexponential model identifies c, the initial fraction attributable to the labile group. From a control perspective, c can be a useful parameter; the expected fraction of resistant pathogens may be a crucial consideration in water treatment strategies. The best-fit break point t* obtained from the piecewise log–linear model, on the other hand, is likely determined by several experiment-specific factors, and is not directly interpretable as a simple biological quantity with control relevance. Similarly, although the parameters α (hazard rate) and β (shape parameter) of the Weibull model define a distribution of stress tolerances in the population, such a distribution is quite abstract and difficult to interpret in mechanistic biological terms. We purposefully do not include a goodness-of-fit comparison. We argue that such a statistical comparison of these models does not provide information about mechanism and would thus, as all of the models fit reasonably well, be of limited use and distract from the main point. The piecewise log–linear model is purely empirical, while the biexponential model arises from mechanistic assumptions, and is thus, we argue, preferable. The Weibull model, while rooted in assumptions about distributions of stress tolerance in the pathogen population, is also, ultimately, empirical, as its parameters are not measurable quantities. Of our mechanistic submodels, only the population heterogeneity model is identifiable (θ1 = 0.36, θ2 = 0.07, and ρ = 0.93). The hardening off model only partially resolves the parameter values (θ1 + δ1 = 0.36, θ2 = 0.07, ρ(θ1 – θ2) = 0.27). This is also true of the VBNC model (θ = 0.069, δ1 + δ2 = 0.29, and δ2/ρ = 0.02). The observed dynamics can arise from any of the mechanisms—with identical fits—demonstrating that the data do not inform the specific mechanism. Example 2: Modeling populations over time with a measured input In Corollary 1, we saw that all three of the mechanistic submodels are identifiable if the pathogen input to the system is measured. In this example, we consider a thought experiment of a pathogen decay study with a measured, continuous input; since we know of no comparable pathogen decay study, we use simulated data. Although we imagine it here as a possible lab experiment, it could also apply to a situation where a number of sick individuals are shedding at a known rate into some environmental medium that is subsequently sampled. We begin with an initial population of 0, a constant input u(t), and parameter set of (θ1, θ2, δ1, δ2, η)=(0.4, 0.1, 0.2, 0.3, 0.9). As indicated by Proposition 4, the same data can be generated by setting u(t) = 1000 and observing E(t) as by setting η = 0.566 and u(t) = 1000/0.566 and observing E1(t). Hence, from the data, it is impossible to tell, a posteriori, whether one is measuring E(t) and υ(t) = u(t) or E1(t) and υ(t) = ηu(t). We fit each of the submodels to this data after a normally distributed error was applied. We then plot the best-fit lines (numerical least-squares) for the population heterogeneity, hardening off, and viable-but-not-culturable models, which are seen to be identical in Figure . (Note that the error is added for simulated realism only; each model could fit the exact data).

Figure 4

Simulated biphasic dynamics of E. coli with continuous input. See Example 2 in text.

Simulated biphasic dynamics of E. coli with continuous input. See Example 2 in text. All three models fit the data equally well. However, comparing the true parameter values used to generate the data with each parameter set estimated (Table ), we see that each model is misspecified. This is because we choose to generate data from the full model. All parametric information in the data can be summarized by the estimated apparent parameter combinations a = 2.15, b = 0.23, and h = 0.19 (Proposition 2). Comparing these to the true values of a = 0.76, b = 0.24, and h = 0.25, we see that this study actually provides insufficient data to accurately measure the apparent labile decay rate.

Table 2

Parameters Estimated from the Simulated Data in Figure by the Different Mechanistic Submodels

model	θ₁	θ₂	δ₁	δ₂	η
true values underlying the simulated data	0.4	0.1	0.2	0.3	0.9
population heterogeneity estimates	2.15	0.23	—	—	0.19
hardening off estimates	0.60	0.23	1.56	—	—
viable-but-not-culturable estimates	0.23	—	0.37	1.56	—

Each of the submodels make assumptions about how these estimated parameter combinations a, b, and h relate to the underlying parameters. These assumptions allow all three models to be identifiable. However, the models are also indistinguishable and do not accurately reflect the generating parameters. This example thus illustrates the important distinction between identifiability and distinguishability. As the analytical results in the previous section show, even if identifiable parameters are estimated, the model may be misspecified, making these estimates incorrect or even meaningless (e.g., if the estimated parameter does not appear in the true underlying mechanism). Example 3: Consequences of neglecting biphasic decay in a risk assessment We extend a prior risk assessment of drinking water exposure to Cryptosporidium(45) to investigate the potential impact on risk when accounting for biphasic pathogen decay. Decay rates for Cryptosporidium vary in the range of 0.005–0.04 log10 units per day and appear relatively insensitive to temperature between 0 and 20 °C.[23,46−49] Although Cryptosporidium is not known for exaggerated biphasic decay behavior, there have been suggestions that certain forms are hardier than others. Permeability of oocysts to the fluorescent vital dye 4′,6-diamidino-2-phenylindole (DAPI) correlates with viability, and it has been hypothesized, given that oocysts can interconvert between DAPI± forms, that impermeability may confer environmental protection.[47,50,51] Indeed, Campbell, Robertson, and Smith observed this in one of their studies[50] but not a subsequent one.[47] Hence, there may be environmental conditions in which biphasic decay occurs for some types of Cryptosporidium, even if it is not typical more generally. That such long-tail survival might not be expected only highlights the possibility of risk misestimation. For the purposes of this example, assume that a population of Cryptosporidium can be written asa reparameterization of eq that emphasizes the decay half-lives. Here, half-lives of 10 and 60 days correspond to decays of 0.03 and 0.005 log10 units per day, a range consistent with the literature but more conservative than that considered by Eisenberg et al.[52] When c = 1, the population consists solely of labile pathogens, and, when c = 0, the population is solely resistant pathogens. Suppose that a wastewater treatment plant measures daily concentrations of Cryptosporidium in its inflow that are lognormally distributed with mean 2.12 and standard deviation 4.48 organisms/L and that water is then held for 10 days before treatment (at which point the distribution would be identical to that estimated in Eisenberg et al.[52] under fully fast decay). Then, the concentration of organisms immediately before treatment is consistent with the distribution of organisms in the source water considered in the prior risk assessment when c = 1.[45] Consistent with the prior risk assessment, (1) treatment efficacy of sedimentation and filtration was modeled as a Weibull distribution with mean 3.84 and standard deviation 0.59 log10 unit removal, (2) the volume of tap water consumption was assumed to be log-normal with mean 1.2 and standard deviation 1.2 L/day, (3) the dose–response relationship was assumed to be an exponential cumulative probability function f with parameter k = 0.004078, and (4) the moribidity ratio M, the fraction of infected who become ill, was assumed to be 0.39.[45,53] We treat the c parameter in two ways. First, we characterize risk as a function of c, and, second, we sample c daily from a uniform (0,1) distribution. For each day i, we (1) sample from the distribution of Cryptosporidium concentrations 10 days prior, and let P be the concentration of Cryptosporidium on day i, that is, after 10 days of holding, and (2) sample the treatment efficiency T on day i. For each person on each day, we (1) draw a volume of water consumed V, (2) draw a number of pathogens N consumed by person j on day i from a Poisson distribution with mean PTV, (3) calculate the probability of illness as π = M · f(N), (4) draw a number u from a uniform distribution on (0,1), and 5) let the indicator I of illness for person j on day i be 1 if u < π and 0 otherwise. Then, the sum of I is the total number of cases experienced by the population in one year. In Figure , we present a heat map of yearly cases of illness attributable to Cryptosporidium over 1,000 simulations as a function of c, the fraction of fast decaying pathogens, and when c is randomly sampled on each day. The sample means are also plotted. When all pathogens decay at the faster rate, the average is 3.3 case per year per 10 000 with a standard deviation of 2.5 and a maximum of 23; when all pathogens decay slowly (c = 0), the average number of cases is 6.0, nearly double that of the first case, with a standard deviation of 4.7 and a maximum of 107. When c is sampled each day, the average number of cases is 4.8 with a standard deviation of 3.1 and a maximum of 31. Biphasic decay increases both the mean and the variance of the number of cases.

Figure 5

Heatmap displaying the fraction of simulations that have each number of cases of illness in a year in a population of 10 000, both as a function of c, the proportion of pathogens that are fast decaying, and with c chosen randomly each day. The mean number of cases is shown with a dot. Example 4: Consequences of neglecting biphasic decay in a hydrological fate and transport model To explore the potential biases introduced through the incorrect assumptions of monophasic decay within a hydrological fate and transport model, we simulate the extent of bacteriological impairment downstream of a wastewater treatment outfall on the Sonora river in Mexico, using a simplification of a previously published E. coli transport model.[54] Specifically, using Markov chain Monte Carlo methods, we compared predictions of downstream bacterial concentrations in three scenarios: (1) when using a biexponential model, (2) when using a monophasic model fit only to the labile regime, and (3) when using a monophasic model fit to the full biphasic experimental data (Figure of Hellweger et al.[13]). We assume that E. coli originating from the Baviacora wastewater treatment plant decay and grow in a biphasic manner (eq ) with parameters a and b normally distributed (μ = 1.94, σ = 0.23, μ = −0.15, σ = 0.04; note that a negative μ indicates growth of the resistant fraction) and c beta-distributed (α = 784.29, β = 10.87) to be consistent with the experimental results of Hellweger et al.[13] A sedimentation rate coefficient was estimated from our biphasic decay rate and the total removal rate estimated by Robles-Morua et al.[54] Back calculating from Figure 7 of Robles-Morua et al.,[54] we estimate the average streamflow velocity below the Baviacora outfall to be 16.75 km/day; discharge and flow velocity downstream of the Baviacora outfall is observed to be relatively constant over 60 km. Assuming, as in Robles-Morua et al.,[54] an initial E. coli concentration of 24.63 times the EPA standard for bathing water of 126 colony forming units (cfu) per 100 mL,[55] we estimate the downstream extent of water impairment up to 60 km under the three scenarios. Additional details on the methods (determination of monophasic decay rate and sedimentation rate) are available in the Supporting Information. In the baseline scenario—Scenario 1—compliance was achieved within 60 km in 61% of simulations, with the median simulation achieving compliance in 37.7 km (minimum 16.7 km), though once compliant, simulations did not necessarily stay so because of the growth of the resistant fraction of E. coli. Over 99.8% of simulations in Scenario 2 achieved compliance within 60 km, with the median achieving compliance at 21.0 km (minimum 15.4 km). For Scenario 3, only 23% of simulations achieved compliance within 60 km, with the median simulation achieving compliance at 78.3 km (assuming river conditions remain constant past 60 km; minimum 25.4 km). In order to characterize the biases of Scenarios 1 and 3 over the spatiotemporal scale, we plot the simulations up to a distance of 60 km in Figure . The median estimated E. coli concentration is always an underestimate for Scenario 2 relative to Scenario 1, which highlights the danger of fitting monophasic curves only to the labile regime and is especially problematic in a regulatory context. For Scenario 3, the median concentration is an overestimate over the considered range (up to 76 km if river conditions remain constant, after which it is an underestimate; additional details in appendix). Hence, for considering compliance within 60 km, Scenario 3 can be thought of as a conservative simplification of the biphasic model. The greatest overestimation occurs at around 36 km, where the median of Scenario 3 is 5.3 times that of Scenario 1. Neither Scenario 2 nor Scenario 3 can capture the regrowth dynamic in the biphasic scenario.

Figure 6

Comparative simulation results for (a) Scenarios 1 (biphasic, in gray) and 2 (monophasic fit to the labile regime, in red) and (b) Scenarios 1 (biphasic, in gray) and 3 (monophasic fit to entire data, in red). Each scenario shows the median and 95% CI simulated E. coli concentrations over 60 km of hydrologic transport. The black line gives the EPA regulatory compliance threshold of 126 cfu/100 mL.

Discussion

The predominant assumption in the risk assessment and transmission modeling literature is that pathogens in the environment decay through a monophasic exponential process. Increasingly, however, we are recognizing that a biphasic framework for understanding pathogen decay provides a more accurate description of observed die-off for certain pathogens, environmental/experimental conditions, and time-scales. Our identifiability analysis highlights the limitations of current sampling data in informing the specific mechanisms associated with observed biphasic decay and provides guidance on future empirical studies that can inform intervention strategies. Our case studies highlight the fact that biases associated with the misspecificiation of the pathogen decay process can lead to either over- or underestimates of risk depending on how the data are used to estimate a first-order die-off process. Although we considered only fast–slow biphasic decay in this study, future work may be able to further elucidate the slow–fast decay observed in certain scenarios and considered by Phaiboun et al.[21] The following sections discuss modelidentifiability and risk misestimation in more detail. The biexponential model arises from a mechanistic description of pathogen decay and should be preferred to purely empirical models; nevertheless, one should be careful to not overinterpret the biexponential model’s apparent parameters. Although previous literature has often attributed a population heterogeneity mechanism to biphasic decay, identical dynamics can arise from a number of other mechanisms, including hardening off and the existence of viable-but-not-culturable states. Full identifiability of the general model (Figure a) parameters—and thus determination of mechanism—requires that the labile and resistant populations be measured separately. This separation might be accomplished through genomic, expressomic, or proteomic analysis of the pathogen population. Sampling studies that observe overall pathogen concentrations, however, do allow for the identification of combinations of model parameters. Propositions 2 and 3 demonstrate that four pieces of information (degrees of freedom) are embedded in the observation of biphasic decay, most easily understood in terms of the parameters of Proposition 1: the apparent decay rates a and b, the total population d, and the apparent initial fraction of labile pathogens c. An additional piece of information, represented by h, can be obtained when pathogens are added to the system at a measured rate; however, as an additional parameter must also be estimated, this additional information does not resolve the identifiability issues illustrated here. The full model in eq contains six (with no input) or seven (with measured input) parameters, and hence two parameteric constraints or pieces of information, as might be available from independent experimental studies, are need to make the model identifiable. Our results also provide guidance on what independent information will resolve the unidentifiability of the model, as the independent determination of two parameters arbitrarily may not be sufficient. Each of the models in shown in Figures b–d have two constraints. But, as we see in Example 1 and the first part of Corollary 1, despite having four pieces of information and two constraints, the six parameters are not individually identifiable in the hardening-off and VBNC models. However, in the experimental context (Example 2), with five pieces of information and two constraints, all seven parameters may be uniquely estimated for each of the three models (second part of Corollary 1). Moreover, that each of the models is fully identifiable in the latter context highlights the difference between structural identifiability of a model (i.e., whether the parameters may be determined from data) and whether structure of the model (and thus mechanism) can be identified. Our work highlights the usefulness of identifiability in the hydrologic and microbial modeling contexts. Structural identifiability and distinguishability analysis developed out of problems in engineering and pharmacokinetics[35,42,44] and has also found an increasing range of applications in hydrology-related fields such as water quality[56] and water treatment,[57] among others.[58−60] There, as here, it provides a tool to assist in experiment design, evaluate uncertainty, and assess modeling assumptions. Indeed, the comparison of alternative models of biphasic decay (both mechanistic and nonmechanistic) has strong parallels to the problems encountered when using multiexponential, noncompartmental, and compartmental models to assess drug distribution and elimination in pharmacokinetics.[61−63]

Risk Estimation

If a monophasic decay model is estimated from the labile regime of biphasic decay data, risks will always be underestimated. The degree of underestimation, as shown in Example 3, will depend on the distribution of labile and resistant pathogens in the environment. In this risk assessment case study, the mean risk doubled as the pathogen population moved from wholly labile to wholly resistant, and the maximum simulated burden nearly quadrupled because of the increased variance. We see this again in Example 4, where the mispecified monophasic model fit to the labile regime (Scenario 2) predicted compliance within 60 km in 98% of simulations while the biphasic model (Scenario 1) predicted compliance in only 61% of simulations. Hence, not accounting for the possibility of biphasic decay can potentially result in appreciable underestimation. On the other hand, when using a monophasic model fit to the entirety of the biphasic data (Scenario 3), we found that the model modestly overestimated bacterial concentrations on the scale of our hydrological transport case study, so that only 25% of simulations achieved compliance within 60 km. Bacterial concentrations further downstream were underestimated. In Example 4, neither monophasic parametrization could capture the regrowth dynamic of the resistant fraction of the bacteria. While our results suggest that certain monophasic parametrizations may be conservative for some environmentally relevant scenarios, it is important to remember that we only considered decay in one environmental medium. If one considers die-off in manure prior to washout, transport, and decay in a river, the relevant time scales would likely extend beyond the point at which any monophasic approximation might yield conservative results. The potential for biphasic pathogen die-off should be considered in all risk assessments, and the biexponential model should be used when appropriate. This model arises from mechanistic models but is not limited to any one mechanism. To effectively incorporate biphasic decay and understand the potential for intervention, targeted studies are needed to inform the mechanisms underlying biphasic dynamics. Further, because biphasic dynamics are likely a function of the interaction of mechanism and environmental factors—such as temperature, pH, or salinity—understanding mechanism will be important to characterizing the environmental conditions conducive to biphasic decay. Ultimately, these improved models will drive the design of more effective intervention strategies.

36 in total

1. Global identifiability of nonlinear models of biological systems.

Authors: S Audoly; G Bellu; L D'Angiò; M P Saccomani; C Cobelli
Journal: IEEE Trans Biomed Eng Date: 2001-01 Impact factor: 4.538

Review 2. Evaluation of the effect of temperature on the die-off rate for Cryptosporidium parvum oocysts in water, soils, and feces.

Authors: X Peng; T Murphy; N M Holden
Journal: Appl Environ Microbiol Date: 2008-10-10 Impact factor: 4.792

3. On the predictive ability of mechanistic models for the Haitian cholera epidemic.

Authors: Lorenzo Mari; Enrico Bertuzzo; Flavio Finger; Renato Casagrandi; Marino Gatto; Andrea Rinaldo
Journal: J R Soc Interface Date: 2015-03-06 Impact factor: 4.118

Review 4. Tailing of survival curves of bacterial spores.

Authors: O Cerf
Journal: J Appl Bacteriol Date: 1977-02

5. Compartmental vs. noncompartmental modeling for two accessible pools.

Authors: C Cobelli; G Toffolo
Journal: Am J Physiol Date: 1984-09

6. Identifiable pharmacokinetic models: the role of extra inputs and measurements.

Authors: K R Godfrey; R P Jones; R F Brown
Journal: J Pharmacokinet Biopharm Date: 1980-12

7. Parameter and structural identifiability concepts and ambiguities: a critical review and analysis.

Authors: C Cobelli; J J DiStefano
Journal: Am J Physiol Date: 1980-07

8. Differential Decay of Wastewater Bacteria and Change of Microbial Communities in Beach Sand and Seawater Microcosms.

Authors: Qian Zhang; Xia He; Tao Yan
Journal: Environ Sci Technol Date: 2015-07-10 Impact factor: 9.028

9. In vitro excystation of Cryptosporidium parvum.

Authors: L J Robertson; A T Campbell; H V Smith
Journal: Parasitology Date: 1993-01 Impact factor: 3.234

Review 10. Current Perspectives on Viable but Non-Culturable (VBNC) Pathogenic Bacteria.

Authors: Thandavarayan Ramamurthy; Amit Ghosh; Gururaja P Pazhani; Sumio Shinoda
Journal: Front Public Health Date: 2014-07-31

12 in total

1. Formation of Escherichia coli O157:H7 Persister Cells in the Lettuce Phyllosphere and Application of Differential Equation Models To Predict Their Prevalence on Lettuce Plants in the Field.

Authors: Daniel S Munther; Michelle Q Carter; Claude V Aldric; Renata Ivanek; Maria T Brandl
Journal: Appl Environ Microbiol Date: 2020-01-07 Impact factor: 4.792

2. Evidence for a Growth Zone for Deep-Subsurface Microbial Clades in Near-Surface Anoxic Sediments.

Authors: Karen G Lloyd; Jordan T Bird; Joy Buongiorno; Emily Deas; Richard Kevorkian; Talor Noordhoek; Jacob Rosalsky; Taylor Roy
Journal: Appl Environ Microbiol Date: 2020-09-17 Impact factor: 4.792

3. Modeling environmentally mediated rotavirus transmission: The role of temperature and hydrologic factors.

Authors: Alicia N M Kraay; Andrew F Brouwer; Nan Lin; Philip A Collender; Justin V Remais; Joseph N S Eisenberg
Journal: Proc Natl Acad Sci U S A Date: 2018-03-01 Impact factor: 11.205

4. Die-off of plant pathogenic bacteria in tile drainage and anoxic water from a managed aquifer recharge site.

Authors: Carina Eisfeld; Jan M van der Wolf; Boris M van Breukelen; Gertjan Medema; Jouke Velstra; Jack F Schijven
Journal: PLoS One Date: 2021-05-05 Impact factor: 3.240

5. Modeling the Effects of Future Hydroclimatic Conditions on Microbial Water Quality and Management Practices in Two Agricultural Watersheds.

Authors: R Coffey; J Butcher; B Benham; T Johnson
Journal: Trans ASABE Date: 2020-01-01 Impact factor: 1.188

6. Dose-response relationships for environmentally mediated infectious disease transmission models.

Authors: Andrew F Brouwer; Mark H Weir; Marisa C Eisenberg; Rafael Meza; Joseph N S Eisenberg
Journal: PLoS Comput Biol Date: 2017-04-07 Impact factor: 4.475

7. Transcriptional Response of Staphylococcus aureus to Sunlight in Oxic and Anoxic Conditions.

Authors: Jill S McClary; Alexandria B Boehm
Journal: Front Microbiol Date: 2018-02-23 Impact factor: 5.640

8. Disentangling bacterial invasiveness from lethality in an experimental host-pathogen system.

Authors: Tommaso Biancalani; Jeff Gore
Journal: Mol Syst Biol Date: 2019-06-11 Impact factor: 11.429

9. Hepatitis C virus modelled as an indirectly transmitted infection highlights the centrality of injection drug equipment in disease dynamics.

Authors: Miles D Miller-Dickson; Victor A Meszaros; Salvador Almagro-Moreno; C Brandon Ogbunugafor
Journal: J R Soc Interface Date: 2019-09-04 Impact factor: 4.118

10. Impact of Freeze-Thaw Cycles on Die-Off of E. coli and Intestinal Enterococci in Deer and Dairy Faeces: Implications for Landscape Contamination of Watercourses.

Authors: Emmanuel O Afolabi; Richard S Quilliam; David M Oliver
Journal: Int J Environ Res Public Health Date: 2020-09-24 Impact factor: 3.390