Literature DB >> 18776194

STREAM: Static Thermodynamic REgulAtory Model of transcription.

Denis C Bauer1, Timothy L Bailey.   

Abstract

MOTIVATION: Understanding the transcriptional regulation of a gene in detail is a crucial step towards uncovering and ultimately utilizing the regulatory grammar of the genome. Modeling transcriptional regulation using thermodynamic equations has become an increasingly important approach towards this goal. Here, we present stream, the first publicly available framework for modeling, visualizing and predicting the regulation of the transcription rate of a target gene. Given the concentrations of a set of transcription factors (TFs), the TF binding sites (TFBSs) in a regulatory DNA region, and the transcription rate of the target gene, stream will optimize its parameters to generate a model that best fits the input data. This trained model can then be used to (a) validate that the given set of TFs is able to regulate the target gene and (b) to predict the transcription rate under different conditions (e.g. different tissues, knockout/additional TFs or mutated/missing TFBSs). AVAILABILITY: The platform independent executable of stream, as well as a tutorial and the full documentation, are available at http://bioinformatics.org.au/stream/. stream requires Java version 5 or higher.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18776194      PMCID: PMC2732279          DOI: 10.1093/bioinformatics/btn467

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Transcription of a gene can be induced by the binding of specific transcription factors (TFs) to so-called cis-regulatory modules (CRMs). The frequency and duration of the binding events are influenced by the concentrations of the TFs, the binding affinities and location of the TF binding sites (TFBSs) in the CRM and the properties of the TFs themselves (e.g. effectiveness, competitive interaction with other TFs). With the availability of an increasing number of detailed measurements of gene concentrations in different situations (e.g. tissues, developmental time points) as well as TF-DNA binding affinities, it has become possible to build mathematical models for transcriptional regulation. Building mathematical models to associate a specific occupation of a specific CRM with an observed transcriptional response promotes a better understanding of the transcriptional regulation and enables in silico hypothesis testing about postulated regulatory TFs or mechanisms. An increasingly successful approach to mathematically simulating transcriptional regulation is using thermodynamic models that model the interaction of TFs and DNA using kinetic equations. Several such thermodynamic models have been proposed in the last years (Janssens et al., 2006; Segal et al., 2008; Zinzen et al., 2006). These models take the CRM sequence, a set of TFs along with their concentrations and predict the transcriptional response of the target gene as mediated by the CRM and the TFs. A training algorithm is used to optimize the model's internal parameters to minimize the difference between the observed and predicted transcriptional response.

2 APPROACH AND USAGE

Here we present stream, a Java-implemented framework to calculate and visualize transcriptional regulation using thermodynamic modeling approaches. stream currently uses the thermodynamic model introduced by Reinitz et al. (2003), but the framework is flexible and can be used in conjunction with other models implemented in Java. stream offers several optimization methods including gradient descent and simulated annealing for adjusting the internal parameters of the model to best fit the user's input data. To the best of our knowledge, stream is to data the only publicly available framework for modeling the regulation of the rate of transcription. stream has been tested extensively on the even-skipped gene (eve) in Drosophila melanogaster (Bauer and Bailey, 2008). stream can be executed using a graphical user interface (GUI) as well as via the command line. The GUI is illustrated in Fig. 1. It offers the same functionality (e.g. multistart options of the optimization or automatic cross-validation evaluation) as the command-line tool, but in an intuitive and dialogue-based fashion. Both the command-line and the GUI version can save the current result and settings of the program into a file, which makes saving and modifying previous experiments simple.
Fig. 1.

Screen shot of the GUI of stream. The interface is exemplified on a model trained for the even-skipped gene (eve).

Screen shot of the GUI of stream. The interface is exemplified on a model trained for the even-skipped gene (eve).

3 METHODS

In order to generate a model for a particular target gene, the user must identify a set of putative TFs and a putative regulatory region. The suspected role of each TF, x, as an activator x∈A or repressor x∈B must be specified by the user. The program also requires measurements of the concentrations and binding preferences of those TFs, as well as the transcriptional output of the target gene. In the following section, we introduce the required input data, D, which contains the concentration and rate data, C, and a TFBS map, S. The concentration and rate data, C, comprises a set of independent data points. Each data point is a pair (V, v), with V=(a1],…,[a]), a vector listing the protein concentrations of the putative regulatory TF proteins (a1,…, a), and v giving the corresponding observed transcription rate of the target gene. The concentration and rate data, C, can be measured by various methods. To measure the TF concentrations, in situ antibody staining can be used to measure V. To obtain the corresponding v, one can proceed indirectly by measuring the mRNA levels of a reporter gene by staining against the mRNA of the reporter (Jaeger et al., 2004). For visualization purposes, stream allows the user to label the data points with up to two ‘features’ (e.g. the ‘condition’ under which the measurements were made). Features may be ‘continuous’ (real numbers as in the example above), or ‘categorical’ such as cell types, tissue types or experimental treatments. stream utilizes the values of the features for visualization only, and the user may interactively select either feature as the X-axis for plots of the data. Besides the concentration data, the input data also contains the TFBS map, S, corresponding to the regulatory region of interest. The TFBS map, S=(s1, s2,…,s), is a list, where each s represents a TFBS as a triple (a, l, s) giving the name, a, of the binding TF, the position, l, of the binding site and the log-odds score (natural logarithm), s, of the site. The log-odds score is proportional to the binding strength of the TFBS (Stormo, 2000). S can be constructed from experimentally verified binding sites, in silico predicted sites using a prediction algorithm such as fimo (http://meme.sdsc.edu), or a combination of both. For more detailed information, see Bauer and Bailey (2008). The objective of the optimization is to determine the set of model parameters, Θ, that optimally explains the input data, D. Θ depends on the thermodynamic model. Currently, stream implements the model introduced by Reinitz et al.(2003). This model uses free parameters Θ=(θ0, R0, W), where θ0 is an energy barrier, R0 is the maximal transcription rate, and W contains a tuple, (K, E), for each TF, x, where K is the association constant of the TF to the DNA and E is the effectiveness of the TF to activate transcription, if x∈A, or to repress, if x∈B. For more details see Reinitz et al. (2003) and Bauer and Bailey (2008). Four different optimization methods are implemented: simulated annealing (SA), gradient descent (GD), genetic algorithm (GA) and limited-memory quasi-Newton unconstrained optimization (LBFGS) (for a comparison of the optimization methods see D.C.Bauer and T.L.Bailey, manuscript in preparation). All optimization methods seek to optimize the free parameters of the Reinitz model by minimizing the root mean-squared (RMS) error between the known transcription rate and the rate predicted by the Reinitz model, averaged over all input points, D (Bauer and Bailey, 2008).

4 FUTURE DEVELOPMENTS

Besides the Reinitz model, we plan to provide additional models with enhanced functionality to simulate interacting TFs in a more detailed way, e.g., by incorporating TF-TF cooperation. Furthermore, simplifying models by using discreet TFBSs might introduce artifacts, hence, changing the model approach to use continuous binding gradients deems beneficial. Future research on other CRMs will guide the development of new models and our framework can provide the environment to directly compare them. In addition, we plan to extend the approach to optimize one model to fit more than one CRM. Being able to fit one model to the data of several CRMs, which are suspected to have the same regulatory TFs, increases the confidence in the produced model. Finally, we plan to improve the functionality of the GUI in manipulating input data, e.g., by an interactive interface to vary the properties of the TFBS map and to directly observe the changes in the predicted transcription rate.
  6 in total

Review 1.  DNA binding sites: representation and discovery.

Authors:  G D Stormo
Journal:  Bioinformatics       Date:  2000-01       Impact factor: 6.937

2.  Dynamical analysis of regulatory interactions in the gap gene system of Drosophila melanogaster.

Authors:  Johannes Jaeger; Maxim Blagov; David Kosman; Konstantin N Kozlov; Ekaterina Myasnikova; Svetlana Surkova; Carlos E Vanario-Alonso; Maria Samsonova; David H Sharp; John Reinitz
Journal:  Genetics       Date:  2004-08       Impact factor: 4.562

3.  Computational models for neurogenic gene expression in the Drosophila embryo.

Authors:  Robert P Zinzen; Kate Senger; Mike Levine; Dmitri Papatsenko
Journal:  Curr Biol       Date:  2006-06-06       Impact factor: 10.834

4.  Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even skipped gene.

Authors:  Hilde Janssens; Shuling Hou; Johannes Jaeger; Ah-Ram Kim; Ekaterina Myasnikova; David Sharp; John Reinitz
Journal:  Nat Genet       Date:  2006-09-17       Impact factor: 38.330

5.  Predicting expression patterns from regulatory sequence in Drosophila segmentation.

Authors:  Eran Segal; Tali Raveh-Sadka; Mark Schroeder; Ulrich Unnerstall; Ulrike Gaul
Journal:  Nature       Date:  2008-01-02       Impact factor: 49.962

6.  Studying the functional conservation of cis-regulatory modules and their transcriptional output.

Authors:  Denis C Bauer; Timothy L Bailey
Journal:  BMC Bioinformatics       Date:  2008-04-29       Impact factor: 3.169

  6 in total
  1 in total

1.  Optimizing static thermodynamic models of transcriptional regulation.

Authors:  Denis C Bauer; Timothy L Bailey
Journal:  Bioinformatics       Date:  2009-04-27       Impact factor: 6.937

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.