Literature DB >> 28018655

Inferring phage-bacteria infection networks from time-series data.

Luis F Jover¹, Justin Romberg², Joshua S Weitz³.

Abstract

In communities with bacterial viruses (phage) and bacteria, the phage-bacteria infection network establishes which virus types infect which host types. The structure of the infection network is a key element in understanding community dynamics. Yet, this infection network is often difficult to ascertain. Introduced over 60 years ago, the plaque assay remains the gold standard for establishing who infects whom in a community. This culture-based approach does not scale to environmental samples with increased levels of phage and bacterial diversity, much of which is currently unculturable. Here, we propose an alternative method of inferring phage-bacteria infection networks. This method uses time-series data of fluctuating population densities to estimate the complete interaction network without having to test each phage-bacteria pair individually. We use in silico experiments to analyse the factors affecting the quality of network reconstruction and find robust regimes where accurate reconstructions are possible. In addition, we present a multi-experiment approach where time series from different experiments are combined to improve estimates of the infection network. This approach also mitigates against the possibility of evolutionary changes to relevant phenotypes during the time course of measurement.

Entities: Chemical Disease Species

Keywords: complex communities; inference; microbial ecology; nonlinear dynamics; viral ecology

Year: 2016 PMID： 28018655 PMCID： PMC5180153 DOI： 10.1098/rsos.160654

Source DB: PubMed Journal: R Soc Open Sci ISSN： 2054-5703 Impact factor: 2.963

Introduction

Bacterial viruses are ubiquitous and play an important ecological role at the global scale. In the oceans, viruses are responsible for a significant fraction of bacterial mortality and as a result have an effect on global biogeochemical cycles [1-4]. By killing bacteria, they redirect resources from higher trophic levels. Yet, not all bacteria types are susceptible to each and every virus type. Phage potentially infect a subset of hosts; these relationships constitute complex networks of infection [5]. Quantifying who infects whom remains essential to understanding how individual-based traits affect ecosystem-wide properties in complex environments [6]. For more than 60 years, the host range of phage, i.e. the types of host that a phage type infects, has been measured using plaque assays [7]. A plaque assay is an experimental method in which a growing culture of bacteria on an agar surface is exposed to phage. Clear ‘plaques’ are formed whenever the phage can infect and lyse the target host. Plaque assays are considered the gold standard for determining infection but are hard to scale-up to community levels. The principal reason is that the majority of phage and bacteria in a community sample are not yet available in culture. In response, a number of (partially) culture-independent methods have been proposed, including digital PCR [8], viral tagging [9,10] and PhageFISH [11]. Each of these methods leverages a ‘targeted’ approach, i.e. requiring some degree of culturing or co-visualization of labelled particles. Targeted approaches present challenges for scaling-up to communities. By contrast, an approach that considers community-scale interactions may be feasible, particularly if leveraging information in the temporal dynamics of complex virus–bacteria systems. The inference of interaction networks from system dynamics is a field of study with widespread applications from inference of gene regulatory networks [12,13] and chemical reactions [14], to neural networks [15]. The key insights, from one class of inference methods is that statistical patterns in dynamics, including cross-correlation and mutual information, can be leveraged to infer interactions [16]. However, such correlation-based approaches can be of limited value when applied to high dimensional systems with nonlinear interactions. As an alternative, Shandilya et al. [17] showed a method for reconstructing interaction networks from discrete measurements of the time series in systems where the underlying functional form of the interactions is known. Similarly, Stein et al. [18] following the work of Monier et al. [19] used discretized Lotka–Volterra equations to estimate interaction networks, model parameters, and time-dependent perturbations in competitive microbial communities. Here, we adapt the approach of Stein et al. [18] to phage–bacteria systems with antagonistic interactions. The central advance is the use of nonlinear dynamic models and inference methods to estimate quantitative infection and lysis rates given information embedded in community time series. We test the inference capabilities of the method using in silico experiments, as a proof of principle. As we show, inferring realistic phage–bacteria infection networks in complex communities may be possible given appropriate deployment of existing culture-independent technologies already available to estimate changing genotype densities over time.

Method

Model

We model the interaction between Nh host types and Nv virus types using a generalization of the Lotka–Volterra predator–prey equations [20,21]. The densities of multiple host and virus types are described by a system of differential equations that include the effect of competition between host types and the infection of host by multiple virus types [22,23]: and The model consists of Nh equations of the form (2.1) for the density of each host type, h, and Nv equations of the form (2.2) for the virus densities, v. In this system: r is the growth rate of host i in the absence of viruses and other hosts, a is the competitive effect of host i′ on host i, K is the system-wide carrying capacity, ϕ is the adsorption rate of virus j when attaching to host i, β is the burst size of virus j when infecting host i, and m is the decay rate of virus j. Finally, M is the infection matrix, i.e. a matrix representation of the infection network, which takes a value of 1 if host i is infected by virus j and zero otherwise. The nonlinearities arise owing to the cumulative effects of pairwise interactions among bacteria (hh) and between viruses and bacteria (hv).

Numerical simulations of the dynamics: infection network ensembles and model parameters

To study the performance of our reconstruction method, we simulated time series of systems where several hosts and virus types interact. We used Matlab’s ODE45 to numerically integrate systems of equations of the form described in §2.1. In doing so, we use both random infection networks and nested infection networks. Nested interaction networks are commonly observed in culture-based analyses, such that the host range of phage and the phage range of hosts form ordered subsets [24]. Following Jover et al. [25], we generated an ensemble of 100 infection matrices, each one with 10 host types and 10 virus types, spanning a spectrum of nestedness values. The infection matrices were generated by starting with a modular matrix and shifting interactions, through a random process, to regions that increase nestedness [25]. We also found feasible parameter sets (i.e. parameters with positive steady-state densities) for each one of the infection matrices. We followed the procedure described in [25] to find feasible parameter sets. Namely, we select a subset of the model parameters and target densities (table 1) and use the steady-state equations to solve for the rest of the parameters obtaining a feasible parameter set.

Table 1.

Parameter and target steady-state density ranges used to find feasible parameter sets. (Bacteria growth rates, r, and virus decay rates, m, were derived using the steady-state equations and the parameters presented here using the feasibility-based framework (see Methods). The range denotes the limits of the uniform distributions used to generate parameters.)

parameter (unit)	range∖value
ϕ_j (ml/(virus ⋅ d)	10⁻⁸ to 10⁻⁷
β_j (viruses cell⁻¹)	10–50
H*_i (cell ml⁻¹)	10³–10⁴
V*_j (virus ml⁻¹)	10⁶–10⁷
K (ml)	max(Hi∗)×100=106

Infection network reconstruction

Our method for reconstructing infection networks requires discrete measurements of the dynamics resulting from the interaction of different host and virus types. This method extends the approach described in [18] to host-phage systems. We use only the equations describing the dynamics of the viruses (equations of the form (2.2)). We start by rewriting equation (2.2) in the form We assume that we have N+1 measurements of the densities of all virus and host types in the system at times [t1,t2,…,t]. For time step Δt=t−t, we can write a discretized form of equation (2.3): where we define the quantitative infection network and . We can write an analogous equation to equation (2.4) for all time steps and all virus types in the system. All of these equations can be written in a compact form using a single matrix equation: or defining the matrices W and H with elements and H=h(t), and the column vector with elements m, we can write where 1 is a vector of ones with dimensions 1×N. Given density measurements of the hosts and viruses, we can reconstruct the quantitative infection network using equation (2.6). We solve the following minimization problem to obtain approximations and of the quantitative infection matrix, , and the decay rate vector : To solve this problem, we used CVX, a package for specifying and solving convex problems [26,27]. In this study, we focus on the reconstruction of the quantitative infection network, but the method also infers decay rates for all virus types in the system. We use a normalized Frobenius distance between the original and reconstructed infection matrices as a metric of the quality of reconstruction, namely The inference was implemented in Matlab, and scripts are available at https://github.com/WeitzGroup/infection_network_reconstruction.

Results

Reconstruction quality depends on the variability of the dynamics

We begin with an example in which there are 10 host types, 10 virus types and 20 virus–bacteria interactions. The effective infection rates (ϕ*β) vary from 10−7 to 5×10−6. Figure 1 shows an example of a successful infection network reconstruction, using the method described in §2.3. The matrices W and H were calculated, using measurements of the dynamics every 6 min for a total of 96 h. This results in a reconstruction error Error=0.01. The method is able to correctly identify all of the interactions. The small error arises from differences in the inferred quantitative values.

Figure 1.

Example of infection network reconstruction. (a) Virus and host dynamics for 96 h. (b) Matrices W and H constructed by taking measurements of virus and host densities every 6 min as described in §2.3. (c) Original and reconstructed infection matrices (Error=0.01). A feasible parameter set was used in the simulation as described in §2.2 In general, there are multiple factors affecting reconstruction quality. One important factor is the variability of the dynamics. For example, if the dynamics start at a fixed point, then there would be no variability in the dynamics, the columns of the matrix H would all be identical, and it would not be possible to infer the infection network. We test the effect of variability systematically by performing matrix reconstruction for an ensemble of matrices and different levels of variability. To control variability in the dynamics, we change how far the initial densities are from the equilibrium densities. We initialize density of each host and virus type in the system at x0=xeq(1±δ), where xeq is the equilibrium density of a given type and δ is a free parameter that controls the distance from its equilibrium density. We calculated the mean reconstruction error for an ensemble of 100 matrices (figure 2). The reconstruction error has a maximum at δ=0 (not shown for visualization purposes), which corresponds to starting the system at the equilibrium densities. The quality of the reconstruction increases as the initial conditions move away from the equilibrium densities.

Figure 2.

Effect of deviation from equilibrium on reconstruction. Mean reconstruction error as a function of the fraction away from the equilibrium densities, δ, for an ensemble of 100 matrices. Feasible parameter sets were used in the simulation as described in §2.2.

Reconstruction from multiple experiments: an alternative approach

We propose an improvement to the single experiment approach for reconstruction. In this alternative approach, we combine measurements from different experiments to increase reconstruction quality. One key advantage of this approach is that by increasing the number of experiments used for reconstruction, we can reduce the total time and number of measurements per experiment. This is a crucial advantage in virus–bacteria systems, which are known to evolve rapidly [28-30]. In the multiple-experiment approach, we generate a host matrix H and a virus matrix W by combining matrices from multiple experiments that differ only in their initial conditions (figure 3). This extends equation (2.6) to include information from multiple experiments. Specifically, assuming that we perform p different experiments and calculate matrices {H1,H2,…,H} and {W1,W2,…,W} for each experiment, we can write the system where 1 is a vector of ones with dimensions 1×(N1+N2+⋯+N), assuming that we take N measurements from experiment i. Using the same minimization process presented in §2.3, we can obtain an approximation, , of .

Figure 3.

Schematic of data aggregation in the multiple-experiment approach. Multiple experiments are performed with the same matrix and different initial conditions. The host dynamics are concatenated to assemble a single H matrix. Figure 4 compares the single and multiple experiments approach for three matrices with different nestedness values. We see how the multiple experiment approach results in lower reconstruction error for the three different cases. Figure 5 extends the comparison with an ensemble of 100 different matrices. We compare the multiple experiment approach with the average result of the single experiment approach. For a given matrix, we performed 20 different experiments. Each experiment has the same infection matrix and the same model parameters but different initial conditions. We compare the performance of the reconstruction, using each experiment individually versus combining the measurements of the 20 experiments as described in equation (3.1). In this comparison, we fix the total number of measurements; we compare the reconstruction error when using 960 measurements from a single experiment (measuring the dynamics every 6 min for 96 h) against the performance when combining the first 48 measurement of all 20 experiments (every 6 min for 4.8 h).

Figure 4.

Figure 5.

Reconstruction error versus nestedness for two different methods. The black line denotes the reconstruction error, Error, using the multiple-experiments approach. The blue line describes the mean reconstruction error for the same 20 experiments used in the multiple-experiment approach but using each experiment separately. The total number of measurements is the same in both approaches.

Examples of reconstruction for three different matrices and two different methods. Each row shows the original matrix and the resulting reconstruction for each method. The first column shows the original matrices with values of nestedness (NODF): 0.34, 0.55 and 1, respectively. The middle column shows the reconstructed matrices and corresponding reconstruction errors for the single experiment approach, using 960 measurements. The last column from the right shows the reconstructed matrices and corresponding errors for the multiple experiment approach, using 20 experiments and 48 measurements per experiment. The total number of measurements is the same in the three different methods. The time between measurements is . Reconstruction error versus nestedness for two different methods. The black line denotes the reconstruction error, Error, using the multiple-experiments approach. The blue line describes the mean reconstruction error for the same 20 experiments used in the multiple-experiment approach but using each experiment separately. The total number of measurements is the same in both approaches. We performed the comparison for 100 different matrices (figure 5). Multiple-experiment reconstruction results in lower error than the average single experiment reconstructions across a wide range of nestedness values. The multiple experiment approach is also more robust; it results in smaller variance in the reconstruction error. Performing more than a few experiments not only decreases the mean reconstruction error, but also decreases the standard deviation significantly (figure 6). For the specific configuration studied here, reconstruction error minimizes around 18 experiments.

Figure 6.

Effect of experiment number on reconstruction error. Mean (blue line) and standard deviation (dotted line) of the reconstruction error for 100 infection matrices as a function of the number of experiments used in the multiple-experiment approach. Fixed number of total measurements (960). .

Identification of optimal sampling intervals

The inference of cross-infection in a complex community also depends on the sampling interval. Here we test the effects of variation in both the sampling interval and the total number of measurements on the quality of reconstruction. First, we vary the sampling interval, Δt, and total hours, T, for an ensemble of 100 matrices. Feasible parameter sets were used in the simulation as described in Methods. Figure 7a shows the variation in reconstruction error as a function of variation in Δt and T. As is apparent, reconstruction error decreases when increasing the length of sampling given fixed sampling intervals. However, when the total experimental effort is fixed, we find an ‘optimal’ intermediate sampling interval (figure 7b). The inference procedure is not effective given very short sampling intervals. This problem will become particularly acute given noise. Similarly, if the interval is too long, then there is additional error introduced given the linear approximation of a nonlinear model. Refinement of this time should be considered in the design and implementation of experimental protocols, e.g. preliminary estimation of host growth rates and viral latent periods.

Figure 7.

Effect of variation in sampling interval and measurement number on reconstruction. (a) Error of reconstruction given normalized Frobenius distance (colour) given variation in Δt and total hours, T. The three black lines denote combinations given the same total number of measurements, 100, 200 and 400. (b) Error of reconstruction given normalized Frobenius distance for scenarios in which there are 100, 200 and 400 measurements with alternative sampling intervals, Δt.

Robustness of inference given noise in measurement

Here we evaluate the effect of measurement of white Gaussian noise on the quality of the inference. We follow the same procedure as in the noiseless case to reconstruct infection networks using multiple experiments. Figure 8 shows mean reconstruction error for an ensemble of 100 matrices as a function of the signal-to-noise ratio (SNR). We see that using 20 experiments and 48 measurements per experiment, network inference is possible for large SNR, but reconstruction error increases significantly when the noise approaches 10% of the signal (SNR=10 dB).

Figure 8.

Mean (blue line) and standard deviation (dotted line) of the reconstruction error for 100 different matrices as a function of the signal-to-noise ratio. The multiple experiment approach was used to reconstruct the matrix . For each reconstruction, the matrices H and W were constructed, using 20 runs with different initial conditions and 48 measurements per run. .

Discussion

We presented a theory-driven method to estimate host–phage infection networks in a community with multiple virus and host types. Current experimental techniques that leverage targeted approaches to measure interactions are difficult to scale to large systems. Our approach addresses this limitation by using time-series measurements involving the whole virus–bacteria community. We presented a series of alternative experimental designs, robustly inferring interactions given a single or multiple time series. The multiple-experiment approach has the additional advantage of requiring shorter measurement times per experiment. As a consequence, there is a lower probability of a host evolving resistance to a virus type or a virus evolving the ability to infect a new host, increasing the chances of reconstructing the infection network of the focal community. The current method takes as input the measured densities of bacteria and phage in an environmental sample. Next-generation high-throughput sequencing techniques provide a means to characterize bacterial and viral communities in a variety of environmental samples [31-35]. In the past, such characterization has focused on phylogenetics groups, by using RNA and other marker genes. Such markers are insufficiently resolved with respect to differences in relevant phenotypes, e.g. phage–bacteria infectivity. However, new computational approaches are increasingly able to resolve strain-level dynamics from metagenomic datasets [36,37]. The use of quantitative pipelines from sample to strain densities for both bacteria and viruses represents the most promising candidate to enable the inference proposed here [38-40]. Our present approach uses the nonlinear dynamics of virus populations to infer virus–bacteria infection networks. This method can be enhanced by including nonlinear bacterial population dynamics to infer competitive interactions between bacteria types and bacterial growth rates, i.e. by learning from equations (2.1) and (2.2). In addition, it is important to keep in mind that the present approach is adapted to a specific functional form of the virus–bacteria interactions [23]. In the future, it will be important to address the theoretical limits to inference given errors arising from model misspecification, i.e. structural stability and the ecological effects of lysogeny and other persistent infections. In addition, experimental verification [18] is necessary to test whether or not the dynamical model is a sufficiently robust representation of naturally occurring systems, particularly those with high diversity. In summary, this study presents key steps towards determining quantitative infection and lysis rates in a complex virus–bacteria community. The method has the potential to significantly reduce the experimental burden, by inferring Nh×Nv quantitative interactions by measuring the dynamics of Nh+Nv populations. Crucially, such inference does not require a culture-based or a targeted approach. Moving forward, incorporating advances in environmental sequencing into a time-series framework may help to realize a long-term goal of inferring community-wide interactions.

29 in total

Review 1. Marine viruses--major players in the global ecosystem.

Authors: Curtis A Suttle
Journal: Nat Rev Microbiol Date: 2007-10 Impact factor: 60.633

2. Laboratory procedures to generate viral metagenomes.

Authors: Rebecca V Thurber; Matthew Haynes; Mya Breitbart; Linda Wegley; Forest Rohwer
Journal: Nat Protoc Date: 2009 Impact factor: 13.491

Review 3. Gene regulatory network inference: data integration in dynamic models-a review.

Authors: Michael Hecker; Sandro Lambeck; Susanne Toepfer; Eugene van Someren; Reinhard Guthke
Journal: Biosystems Date: 2008-12-27 Impact factor: 1.973

4. Mechanisms of multi-strain coexistence in host-phage systems with nested infection networks.

Authors: Luis F Jover; Michael H Cortez; Joshua S Weitz
Journal: J Theor Biol Date: 2013-04-19 Impact factor: 2.691

5. The elemental composition of virus particles: implications for marine biogeochemical cycles.

Authors: Luis F Jover; T Chad Effler; Alison Buchan; Steven W Wilhelm; Joshua S Weitz
Journal: Nat Rev Microbiol Date: 2014-07 Impact factor: 60.633

6. Probing individual environmental bacteria for viruses by using microfluidic digital PCR.

Authors: Arbel D Tadmor; Elizabeth A Ottesen; Jared R Leadbetter; Rob Phillips
Journal: Science Date: 2011-07-01 Impact factor: 47.728

7. Multiple regimes of robust patterns between network structure and biodiversity.

Authors: Luis F Jover; Cesar O Flores; Michael H Cortez; Joshua S Weitz
Journal: Sci Rep Date: 2015-12-03 Impact factor: 4.379

8. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms.

Authors: J Gregory Caporaso; Christian L Lauber; William A Walters; Donna Berg-Lyons; James Huntley; Noah Fierer; Sarah M Owens; Jason Betley; Louise Fraser; Markus Bauer; Niall Gormley; Jack A Gilbert; Geoff Smith; Rob Knight
Journal: ISME J Date: 2012-03-08 Impact factor: 10.302

9. The marine viromes of four oceanic regions.

Authors: Florent E Angly; Ben Felts; Mya Breitbart; Peter Salamon; Robert A Edwards; Craig Carlson; Amy M Chan; Matthew Haynes; Scott Kelley; Hong Liu; Joseph M Mahaffy; Jennifer E Mueller; Jim Nulton; Robert Olson; Rachel Parsons; Steve Rayhawk; Curtis A Suttle; Forest Rohwer
Journal: PLoS Biol Date: 2006-11 Impact factor: 8.029