Literature DB >> 25558126

Optimal Design of Non-equilibrium Experiments for Genetic Network Interrogation.

Kaska Adoteye¹, H T Banks¹, Kevin B Flores¹.

Abstract

Many experimental systems in biology, especially synthetic gene networks, are amenable to perturbations that are controlled by the experimenter. We developed an optimal design algorithm that calculates optimal observation times in conjunction with optimal experimental perturbations in order to maximize the amount of information gained from longitudinal data derived from such experiments. We applied the algorithm to a validated model of a synthetic Brome Mosaic Virus (BMV) gene network and found that optimizing experimental perturbations may substantially decrease uncertainty in estimating BMV model parameters.

Entities: Chemical Disease Gene Species

Keywords: Optimal experimental design; brome mosaic virus; genetic networks.; inverse problem; synthetic biology; uncertainty analysis

Year: 2015 PMID： 25558126 PMCID： PMC4281269 DOI： 10.1016/j.aml.2014.09.013

Source DB: PubMed Journal: Appl Math Lett ISSN： 0893-9659 Impact factor: 4.055

Introduction

Recent efforts in modeling the host immune response to HIV infection have illuminated the relationship between perturbations that drive biological systems away from equilibrium and information content in data measured from such systems [1], [2]. For example, the HIV model developed by Banks, et al. [3], [4] describes how anti-retroviral therapy (ART) drives viral load in patients toward an equilibrium level that is undetectable, even by ultra-sensitive assays. When ART is interrupted, e.g., due to patient non-adherence, the HIV model converges toward an equilibrium with high viral load. Indeed, these are the dynamics observed in clinical patient data [4]. Banks, et al. fit their HIV model to clinical patient data and exhibited that the number of HIV model parameters that could be estimated with high statistical confidence increased with the number of treatment interruptions [2]. Thus, non-equilibrium dynamics, induced by ART perturbations, increased the data information content as calculated through asymptotic standard errors for estimated model parameters. We hypothesized that this positive relationship between information content and system perturbations may exist for more general mathematical models and in particular for models describing biological networks. To investigate this relationship, we employed an optimal experimental design theory framework [5], [6], [7] to develop an algorithm that minimizes parameter standard errors by choosing optimal perturbations to experimental inputs. Specifically, we describe how the algorithm for optimizing selection of observation times can be extended to include optimization of experimentally controlled perturbations in order to produce data sets with maximal information content. Although we do not propose intentional perturbations in a clinical setting with patients, such a framework could be useful for gaining information from in vitro experiments where there may exist limitations on the number of observable states and observation times. A particularly useful application of our algorithm involves interrogation of genetic networks. Data from genetic networks can be collected by measuring longitudinal gene expression, either pre- or post-translational, from in vitro cell lines. Importantly, there are also several methods for experimentally perturbing in vitro gene expression at the pre- and post-transcriptional levels [8], [9]. We recently estimated kinetic parameters for a model of a synthetically constructed gene network for the recruitment module of the Brome Mosaic Virus replication cycle [10], [11]. In the BMV synthetic system, gene expression is tuned by the concentration of experimentally controlled chemicals. Here, we report how optimization of the experimentally controlled inputs (chemicals) for the BMV system can lead to more informative experiments, and thereby dramatically decrease standard errors for estimated model parameters, i.e., reduce dramatically the uncertainty in estimating model parameters.

Data and methods

Mathematical models, statistical models, and parameter uncertainty quantification

In this note, we formulate an optimal design framework for experimental systems with a scalar time-dependent input . In practice, is assumed to be known since it is controlled by the experimenter. The mathematical model we consider is where is the vector of state variables of the system generated using a parameter vector , , that contains initial values and system parameters listed in . The map has the corresponding observation process with observation operator that connects the model solution to observed data. Here, is a matrix, where is allowed. The times and are initial and final experiment times, respectively. To illustrate the inverse problem methodology, we use a constant i.i.d statistical error model, although more general error formulations can be readily derived and treated. Further statistical details, including a description of the associated covariance matrix , can be found in [12]. In this work we consider the simple case where can be described as a binary vector of length , with values in that represent whether the experimental input is on or off in the time intervals , . For a given member of the estimated parameter vector the standard error () is computed by standard methods from asymptotic theory. For Table 1, Table 2 , the normalized standard error (NSE) is defined as ; the 95% confidence interval (CI) is given by (see [12] for details).

Table 1

Parameter	ry	dy	dz	m	s
Estimate	31.641	0.7562	0.3139	0.5557	1.2374
NSE (A)	0.2223	0.6651	0.1947	2.9583	0.4318
95% CI (A)	(17.8575,45.4245)	(−0.22964,1.742)	(0.19414,0.43366)	(−2.6663,3.7777)	(0.19025,2.2846)
NSE (B-D)	0.1632	0.5402	0.1444	2.0333	0.3385
95% CI (B-D)	(21.52,41.762)	(−0.0445,1.5569)	(0.22508,0.40272)	(−1.6589,2.7703)	(0.41635,2.0584)
NSE (B-E)	0.1526	0.5152	0.1356	1.9022	0.3218
95% CI (B-E)	(22.1797,41.1023)	(−0.0074757,1.5199)	(0.2305,0.3973)	(−1.5161,2.6275)	(0.45694,2.0179)
NSE (B-SE)	0.1482	0.5032	0.1329	1.8226	0.3256
95% CI (B-SE)	(22.4505,40.8315)	(0.010449,1.502)	(0.23214,0.39566)	(−1.4294,2.5408)	(0.44772,2.0271)
NSE (C-D)	0.0744	0.0820	0.0940	0.3082	0.0454
95% CI (C-D)	(27.0296,36.2524)	(0.63472,0.87768)	(0.25607,0.37173)	(0.22,0.8914)	(1.1273,1.3475)
NSE (C-E)	0.0519	0.1669	0.0587	0.3471	0.0770
95% CI (C-E)	(28.4206,34.8614)	(0.50884,1.0036)	(0.2778,0.35)	(0.17765,0.93375)	(1.0507,1.4241)
NSE (C-SE)	0.0607	0.0813	0.0643	0.2981	0.0530
95% CI (C-SE)	(27.8745,35.4075)	(0.63564,0.87676)	(0.27431,0.35349)	(0.23107,0.88033)	(1.1089,1.3659)

Table 2

BMV model results for optimized time points and inputs using D-optimal design criteria (D), E-optimal design criteria (E), or SE-optimal design criteria (SE).

Parameter	ry	dy	dz	m	s
Estimate	31.641	0.7562	0.3139	0.5557	1.2374
NSE (D)	0.0852	0.1052	0.1049	0.3210	0.0583
95% CI (D)	(26.3558,36.9262)	(0.60023,0.91217)	(0.24935,0.37845)	(0.20603,0.90537)	(1.0958,1.379)
NSE (E)	0.0541	1.5602	0.0901	1.0197	0.9503
95% CI (E)	(28.2845,34.9975)	(−1.5563,3.0687)	(0.25845,0.36935)	(−0.55494,1.6663)	(−1.0676,3.5424)
NSE (SE)	0.0599	0.0840	0.0701	0.3173	0.0665
95% CI (SE)	(27.9255,35.3565)	(0.63163,0.88077)	(0.27075,0.35705)	(0.21005,0.90135)	(1.0761,1.3987)

BMV model results for naive time points and naive inputs (A), optimized time points and naive inputs for D-, E-, and SE-optimal designs (B-D through B-SE), or naive time points and optimized inputs for D-, E-, and SE-optimal designs (C-D through C-SE). NSE normalized standard error. BMV model results for optimized time points and inputs using D-optimal design criteria (D), E-optimal design criteria (E), or SE-optimal design criteria (SE).

Optimal design measures

We follow the optimal design formulation using the Generalized Fisher Information Matrix [5], [6], [7]. Let denote the set of all bounded distributions on the interval . Let , the set of binary vectors of length that represent the input perturbation . Let represent the set of all bounded distributions on . Then the GFIM may be written as We consider the case of observations collected at discrete times where we choose a set of time points , , and . The corresponding discrete Fisher information matrix (FIM) for a discrete input measured at discrete times is Methods for calculating the sensitivities for delay differential equations, such as the model we consider below, are described in [13]. The choice of optimal design criteria is given by the minimization of a functional ; a description of SE-, D-, and E-optimal design criteria can be found in [6].

Non-equilibrium experimental design algorithm

The algorithm is initialized with an initial experimental design consisting of an ordered set of sampling times, , and a vector of ones for the experimental input . This initial design represents the unoptimized, or naive, experimental design in which the input is always on. Calculating the optimal (, )-pair requires a computationally demanding nonlinear optimization of time points and possible input vectors (a total of dimensions). We instead iteratively solve the set of coupled equations where represents the SE-, D-, or E-optimal design criteria.

Results

A gene network model for RNA3 recruitment in Brome Mosaic Virus

We applied our optimal design framework to the following previously validated model [10] of RNA3 recruitment in the Brome Mosaic Virus (BMV) replication cycle. This model was developed to investigate the recruitment process in the replication cycle of BMV, a positive strand RNA virus. This replication cycle is highly conserved across positive strand RNA viruses, such as Severe Acute Respiratory Syndrome (SARS) and Hepatitis C, and the BMV system has been used to gain insights into interactions of the virus with host factors [11], [14], [15]. Briefly, the mathematical model describes the interaction between Protein 1a () and RNA3 in the unstabilized () and stabilized () forms; for an in-depth description see [10]. The levels of Protein 1a () and total RNA3 () are measured at time points designed by the experimenter. Parameters describing Protein 1a () were estimated prior to estimating parameters for RNA3. Thus, below we treat Protein 1a parameters as known and only estimate the RNA3 parameters: , and the time delay . The values we used for estimated model parameters and the variance of the statistical error model can be found in [10]. The model was developed to describe an experiment performed in synthetic yeast that contained plasmids for RNA3 and protein 1a whose expression is controlled by the concentration of copper and galactose, respectively. Data were collected under equilibrium experimental conditions, i.e., both copper and galactose were given at constant concentrations and the biological system described by Eq. (6) converged toward a constant equilibrium. Importantly, previous data did not support a high confidence in the estimation of several RNA3 parameters [10]. We subsequently hypothesized that creating a non-equilibrium experiment in which the galactose input is allowed to vary on or off, and copper is held constant, would lead to increased statistical confidence in RNA3 parameter estimates. The function represents the input and, below, we discretize into an dimensional binary vector, .

Naive experimental design and non-iterative algorithm results

We first compared results from the unchanged naive experimental design (such as used in [10]) to a non-iterative version of the optimal design algorithm described above, i.e., optimizing only the observation times or the input . For each case, we considered a scenario with 27 experimental observation times of total RNA3 over 26 h, where the initial and final times were fixed at to , respectively (Fig. 1 and Fig. 2 ). Only results from SE-optimal design criteria were plotted in Fig. 1, since this criterion, unsurprisingly, results in the lowest standard errors for each parameter. We also consider the simple case in which the time intervals over which is discretized, , , are of equal length.

Fig. 1

Fig. 2

Results of iterative algorithm for SE (left), D (middle), and E (right) optimal designs. Protein 1a level . RNA3 level . Observation time points are labeled as ‘x’. Experiment times when the input is ‘on’ are labeled on the -axis with ‘*’.

Left: results for naive time points and naive inputs (SE-optimal design criteria). Middle: results for optimized time points and naive inputs. Right: results for naive time points and optimized inputs. Protein 1a level . RNA3 level . Observation time points are labeled as ‘x’. Experiment times when the input is ‘on’ are labeled on the -axis with ‘*’. Results of iterative algorithm for SE (left), D (middle), and E (right) optimal designs. Protein 1a level . RNA3 level . Observation time points are labeled as ‘x’. Experiment times when the input is ‘on’ are labeled on the -axis with ‘*’. We found that optimizing the input with the SE-optimal design criteria resulted in lower normalized standard errors (NSE) for each parameter as compared to optimizing the time points or the naive experimental design (Table 1). Among optimizations of , SE-optimal design criteria outperformed the D- and E- criteria when considering the overall sum of the NSEs (see Fig. 3 ).

Fig. 3

Convergence of the iterative algorithm for the sum of normalized standard errors (NSE), the change in time points (Euclidean norm), and the change in inputs (Euclidean norm). The axis for NSE is on a scale; the and are on a scale. Optimal design criteria: SE ‘’, D ‘circles’, E ‘squares’.

Iterative algorithm results

We next compared results from SE-, D-, and E- optimal design criteria when iterating between Eqs. (4), (5) (see Fig. 2). We found that the effectiveness in using the algorithm allowed the use of less observation time points; hence in the results below we used 14 observation time points instead of 27. Overall, the iterative algorithm outputs an experimental design which may result in significantly lower standard errors for all parameter estimates as compared to the naive experimental design regardless of the optimal design criteria choice (Fig. 3 (left), zero iterations naive experimental design). Between the optimal design criterion, the SE-optimal design resulted in the lowest sum of NSEs, followed by D-optimal and E-optimal designs, although we note that there was variability in this comparison for each individual parameter (Table 2).

Discussion

Overall, our results suggest that experimental input manipulation can produce non-equilibrium system dynamics, leading to a greater information content in collected data. Taking the non-iterative algorithm results together with the iterative algorithm results, our findings suggest that input manipulation is a more powerful tool for reducing standard errors in parameter estimates than optimizing observation times for the BMV system. For example, optimizing only the observation times still resulted in unreasonably large confidence intervals for the parameter , whereas optimizing only the experimental input resulted in acceptably narrow confidence intervals for , as well as extremely narrow confidence intervals for all other parameters regardless of the choice of optimal design criteria (Table 1). In future investigations, we will extend the BMV model to consider multiple time-dependent inputs for both Protein 1a and RNA3, since they are controlled separately by the concentration of galactose and copper, respectively. We postulate that, in general, lower standard errors can be achieved when a greater number of system variables are manipulated with experimentally controlled inputs. In addition, we are currently exploring the use of the iterative algorithm (Eqs. (4), (5)) in other genetic network systems that approach a periodic equilibrium to test whether the structure of the -limit set affects algorithm convergence.

10 in total

1. Theoretical foundations for traditional and generalized sensitivity functions for nonlinear delay differential equations.

Authors: H Thomas Banks; Danielle Robbins; Karyn L Sutton
Journal: Math Biosci Eng Date: 2013 Oct-Dec Impact factor: 2.080

2. Experimental Design for Distributed Parameter Vector Systems.

Authors: H T Banks; K L Rehm
Journal: Appl Math Lett Date: 2013-01-01 Impact factor: 4.055

3. Estimation and prediction with HIV-treatment interruption data.

Authors: B M Adams; H T Banks; M Davidian; E S Rosenberg
Journal: Bull Math Biol Date: 2007-01-09 Impact factor: 1.758

4. Comparison of Optimal Design Methods in Inverse Problems.

Authors: H T Banks; Kathleen Holm; Franz Kappel
Journal: Inverse Probl Date: 2011-07-01 Impact factor: 2.407

5. A positive-strand RNA virus replication complex parallels form and function of retrovirus capsids.

Authors: Michael Schwartz; Jianbo Chen; Michael Janda; Michael Sullivan; Johan den Boon; Paul Ahlquist
Journal: Mol Cell Date: 2002-03 Impact factor: 17.970

6. Tuning and controlling gene expression noise in synthetic gene networks.

Authors: Kevin F Murphy; Rhys M Adams; Xiao Wang; Gábor Balázsi; James J Collins
Journal: Nucleic Acids Res Date: 2010-03-08 Impact factor: 16.971

7. Modelling HIV immune response and validation with clinical data.

Authors: H T Banks; M Davidian; Shuhua Hu; Grace M Kepler; E S Rosenberg
Journal: J Biol Dyn Date: 2008-10 Impact factor: 2.179

Review 8. Brome mosaic virus RNA replication: revealing the role of the host in RNA virus replication.

Authors: Amine O Noueiry; Paul Ahlquist
Journal: Annu Rev Phytopathol Date: 2003-03-10 Impact factor: 13.078

Review 9. Saccharomyces cerevisiae: a useful model host to study fundamental biology of viral replication.

Authors: Isabel Alves-Rodrigues; Rui Pedro Galão; Andreas Meyerhans; Juana Díez
Journal: Virus Res Date: 2006-05-15 Impact factor: 3.303

Review 10. Interplay between gene expression noise and regulatory network architecture.

Authors: Guilhem Chalancon; Charles N J Ravarani; S Balaji; Alfonso Martinez-Arias; L Aravind; Raja Jothi; M Madan Babu
Journal: Trends Genet Date: 2012-02-25 Impact factor: 11.639

10 in total

2 in total

1. Information content in data sets for a nucleated-polymerization model.

Authors: H T Banks; Marie Doumic; Carola Kruse; Stephanie Prigent; Human Rezaei
Journal: J Biol Dyn Date: 2015-06-05 Impact factor: 2.179

2. Optimal parameter identification of synthetic gene networks using harmony search algorithm.

Authors: Wei Zhang; Wenchao Li; Jianming Zhang; Ning Wang
Journal: PLoS One Date: 2019-03-29 Impact factor: 3.240

2 in total