Literature DB >> 18505473

In silico promoters: modelling of cis-regulatory context facilitates target predictio.

Mauritz Venter1, Louise Warnich.   

Abstract

Elucidation of gene regulatory complexity holds much promise towards aiding therapeutic interventions in medical research. It has become progressively more evident that the characterization of highly conserved regulatory modules within promoters may assist in the elucidation of distinct cis-motif and trans-element regulatory interactions, shared in response to stimulus-evoked pathological changes. With special emphasis on the promoter, accurate analyses of cis-motif architecture combined with integrative in silico modelling might serve as a more refined approach for prediction and study of regulatory targets and major regulators governing transcriptional control. In this review, we have highlighted key examples and recent advances implementing in silico promoter models that could serve as essential contributions for future research in molecular medicine.

Entities:  

Mesh:

Year:  2008        PMID: 18505473      PMCID: PMC3823354          DOI: 10.1111/j.1582-4934.2008.00371.x

Source DB:  PubMed          Journal:  J Cell Mol Med        ISSN: 1582-1838            Impact factor:   5.310


Introduction

Transcriptional regulation is the first and vital step in the unified flow of biological information and is governed by (i) the context of cis-regulatory regions (cis-motifs residing within promoters, distant enhancers and silencers) and (ii) functional interactions between the products of specific regulatory genes (transcription factors-TFs) and cis-motifs [1]. Gaining insight into the orchestrated assembly and synergistic interplay of transcriptional regulatory mechanisms have been a challenging and burgeoning effort and much progress has been made since the preliminary deciphering of the human genome [2, 3]. Advances in high-throughput microarray and chromatin immunoprecipitation (ChIP) technologies have gained much momentum [4], but these systems are unable on their own to reveal new insights into the combinatorial and conserved nature of transcription. Whole-genome sequence data integrated with high-throughput technologies and complemented by systematic computational (in silico) strategies have set the stage for functional genomics (Fig. 1). Genome-wide functional analysis has allowed researchers to gain and predict a holistic view of the regulatory networks controlling gene expression and, although at a slower pace than anticipated, holds much promise in advancing post-genomic biomedical research [5-9]. Endeavours to decipher the principles of transcriptional regulation involve comprehensive interactions from different data systems on different levels that are managed, processed and modelled by integrative in silico tools combining database-assistance and motif detection algorithms. Numerous systematic integration and modelling strategies have been developed to elucidate the factors that contribute to the complexity of gene regulation within a network of genomic circuits. However, most of these approaches comprise of a relative general integration design combining high-throughput gene expression analysis, promoter data and bioinformatics. The scope of high-throughput technologies and transcriptional regulatory network analysis is too large to be covered here, and has been reviewed elsewhere [4, 10–12]. With special emphasis on cis-motif logic and promoter architecture, it is apparent that the complexity in identifying and predicting the presence, abundance, orientation and particular order of true (or over-represented) cis-motifs, poses a major challenge for understanding the functional relevance within a specific environment (e.g. tissue-specificity), condition (e.g. health- or disease-state) and/or in response to a specific compound (e.g. drug) or stimulus (e.g. stress) (Fig. 2). In this article, we focus on an integrative in silico modelling approach with special emphasis on promoter models in the context of regulatory target prediction in medical research. We have not attempted to summarize all the related literature, instead a limited number of the most relevant references have been used and we have highlighted a few concepts and results that more recent key studies have generated. On the basis of these observations and information gained, we present simplified integrative promoter modelling-strategies.
1

Simplified ‘building-block’ representation of integrated platforms constituting functional genomics.

2

Cis-motif logic. Accurate dissection of promoter architecture upstream of the TATA-box containing core-promoter, allows modelling of cis-motif context particular to a specific stimulus, micro-environment or condition.

Simplified ‘building-block’ representation of integrated platforms constituting functional genomics. Cis-motif logic. Accurate dissection of promoter architecture upstream of the TATA-box containing core-promoter, allows modelling of cis-motif context particular to a specific stimulus, micro-environment or condition.

Combined in silico strategies contribute to accurate deciphering

Promoters hold the key to understanding and functional interpretation of the regulatory factors in cis and trans that control the site and level of gene activity [13-18]. It has become progressively more evident that accurate analysis and in silico modelling of promoter architecture and regulatory networks could assist in the study and prediction of disease-state regulatory processes, novel therapeutic targets and consequently facilitate pharmaceutical drug design [9, 12, 19–25]. Numerous other elements i.e. co-regulators, chromatin modulators and the presence of bi-directional and alternative promoters, contribute to the complexity (and diversity) of transcriptional regulation [18] and poses a significant challenge for accurate promoter detection and analysis. Therefore, the reliability and accuracy of promoter modelling strategies relies heavily on computational methods to detect over-representation of ‘true’cis-motifs in highly conserved modules that in turn could be used to study or accurately predict possible TFs modulating a group of genes expressed during a defined condition [11, 12, 21]. The in silico promoter model can be defined as the representation of a specific framework of DNA sequences (motifs detected by computational tools), within cis-context, that could provide essential information on a mechanism regulating transcriptional activity within a unique biological process, pathway or environment [21]. Comprehensive advances in the development of in silico strategies have shed light on the limitations in our understanding of complex regulatory processes by providing means to visualize gene regulation as a holistic event rather than a linear series of events. There are currently numerous motif and/or module detection tools, TF-binding-site databases and modelling platforms available. These computational tools are evaluated on a regular basis and comprehensive overviews and assessments are provided elsewhere [26-31]. Here we highlight specific examples of three computational components; (i) motif detection algorithm, (ii) conserved non-coding sequence identification and (iii) TF-binding site database assistance—that are needed to integrate high-throughput molecular data in a systematic modelling strategy. We illustrate this strategy combining in silico modelling and experimental extraction/validation in a simplified representation (Fig. 3). High-throughput gene expression analyses (i.e. microarrays) of a particular disease-state tissue reveal a cluster of co-expressed genes of which the promoter sequences contain conserved cis-motifs. Probabilistic alignment-based methods such as MEME (Multiple Expectation Maximization for Motif Elicitation [32]) and Gibbs-sampling [33] are some of the most powerful and widely used algorithms (Fig. 3A) to detect motifs within a so-called ‘noisy’ background. These methods perform maximum likelihood estimates to identify statistical over-represented motifs in the highest scoring sequence alignments. In parallel, multi-species conservation analysis combined with predicted cis-motif clustering can be performed using the regulatory visualization tool for alignment (rVISTA) [34] computational platform (Fig. 3B). This is a hypothesis-driven strategy, which states that cis-regulatory motifs within evolutionary conserved sequences are more likely to be functional compared to cis-motifs in non-conserved regions [35]. Currently the two major databases comprising comprehensive sets of TF-binding profiles are TRANSFAC [36] and JASPAR [37]. TF-binding site database assistance allows for large-scale cis-motif comparisons to consensus sequences or energy binding scores of experimentally validated TF-binding sites (Fig. 3C). Binding scores are calculated from positional weight matrices (PWMs) that are derived from log-scale converted positional frequency matrices (PFMs) and these scores are directly related to DNA-protein binding energy interactions [27]. The extent to which the identified regulatory modules/motifs contribute to a specific interaction can be evaluated by modified array technologies (reviewed by Hoheisel [4]) such as ChiP assays [29]. Putative identification and comparison of newly discovered TF-targets is restricted to the experimentally verified database entries, therefore it is imperative that TF-binding site databases are continuously updated. Nevertheless, the combined employment of computational tools integrated with biological data (extrapolated from high-throughput variations of validation techniques) is powerful and allows for a refined elucidation, comparison and prediction of cis-regulatory context [29].
3

Accurate deciphering of regulatory context. Extrapolation of promoter data from genes expressed during a particular disease (transcriptomic profile represented by microarray) using (A) probabilistic motif detection algorithms i.e. MEME and Gibbs-sampling or (B) comparative phylogenetic promoter analysis across different species. (C) Combining before-mentioned strategies with TF-database assistance (using TRANSFAC and/or JASPAR) for putative motif identification and representation.

Accurate deciphering of regulatory context. Extrapolation of promoter data from genes expressed during a particular disease (transcriptomic profile represented by microarray) using (A) probabilistic motif detection algorithms i.e. MEME and Gibbs-sampling or (B) comparative phylogenetic promoter analysis across different species. (C) Combining before-mentioned strategies with TF-database assistance (using TRANSFAC and/or JASPAR) for putative motif identification and representation.

Systematic module analysis leads to target discovery

A well-established view on gene regulation is the fact that the promoter regions of co-expressed genes usually contain conserved areas that are comprised of single and/or a compact arrangement (a.k.a. module) of specific cis-motifs that are likely to be regulated by similar TFs. Therefore, if the deciphering of regulatory context is defined and well characterized, it is possible to predict novel genes or functional interactions of regulatory networks within a specific biological environment (e.g. tissue-specificity), process (e.g. biological pathway) or condition (e.g. disease). This traditional view allows for systematic modelling of promoter architecture and could expedite the discovery of biomarker or pharmaceutical cis- and/or trans-acting targets [7, 23, 24, 38]. Studies in model organisms S. cerevisiae, E. coli and D. melanogaster demonstrated how the analysis of promoter sequence information could serve as a platform for integration and successful prediction of transcriptional synergistic and regulatory events [39-42]. Recently integrative strategies combining phylogenetic footprinting, content-driven bioinformatics and gene expression profiles (and/or knowledge of gene function) have been applied in higher eukaryotes to predict transcriptional targets in cholesterol biosynthesis [43], regulatory single nucleotide polymorphisms (SNPs) influencing antioxidant response elements [44] and tissue specificity [17, 45–47]. Several studies have successfully accentuated the strategy of systematic modelling in clinical applications to predict transcriptional targets by integration of in silico analysis and high-throughput technologies. These investigations identified the conserved organization of regulatory modules or targets that were implicated in distinct conditions and/or processes such as reovirus infection of human embryonic kidney cells [48], androgen receptor binding [49], antibacterial response [50], enterocyte differentiation [51], human erythropoiesis [52], alcohol-related apoptosis and cell proliferation [53] and methylation in prostate cancer [54]. Within this context, we specifically highlight a study by Freebern et al. [23] that implemented an integrated ‘profiling of transcriptional targets’ (PTT)-strategy by extrapolating information from (i) multiple high-throughput data sets, (ii) computational cis-regulatory evaluation, (iii) mapping of signalling pathways and (iv) functional promoter validation during mitogen- and drug-induced activation of T cells. Conserved cis-regulatory module data combined with computational interpretation from different data sets, and functional promoter analyses of the candidate genes involved in immune cell function, led to the discovery of co-modulation by insulin-like growth factor 1 (IGF-1) [23]. Subsequently, focussed screening assays on T cells, co-stimulated with different IGF-1 concentrations, over short-time intervals and in the presence of different mitogen combinations, confirmed the regulation of immune cell function genes in response to IGF-1 induction on both a transcriptional and proteomic level. Although complex signalling of IGF-1 and multiple dataset analysis is not discussed here, results from this study underscore the importance of using evolutionary conserved promoter data, integrated with PTT, as a robust route to screen potential drug targets [23]. Clinically, although information of the underlying regulatory mechanisms in several pathological and cellular processes remains obscure, it is evident that highly conserved promoter regions could serve as ‘building-blocks’ for implementation of a systems approach, combining several elements (multi-dimensional datasets) on different levels, to assist in identifying the conserved nature of transcriptional targeting during a particular disease or pathological process.

Applications of in silico promoters based on conserved regulatory context

Accurate interpretation of promoter architecture is dependent on underlying regulatory commonalities that exist for genes that are similar on the basis of expression, regulation and/or function. Moreover, (i) combining expression data with functional annotation and (ii) the use of cis-regulatory module, rather than individual motif-information, can lead to a more defined predictive model design [16, 55]. An early model-based study by Gailus-Durner et al. [19] illustrated how in silico promoter models can be generated from literature data-mining in the absence of sequence similarity. A promoter model was constructed from a previously identified specific Sp1-cis-motif arrangement in the proximal promoter region of the muscle-specific cardiac/slow twitch sarcoplasmic reticulum Ca2+-ATPase (SERCA2) rabbit gene. This model was compared to the sequences in the rodent section of the European Molecular Biology Laboratory (EMBL) nucleotide sequence database [56]. Out of 28 possible matches, 14 were associated with muscle expression and 6 of the 14 showed high muscle-expression specificity [19]. Overall results of this study showed relative accurate prediction of co-regulated muscle-specific genes based on a single experimentally verified model that was used as reference for comparison and prediction [19]. A similar study, combining database assistance, generated in silico cell type-specific submodels based on the functional context of experimentally verified (stimulated or unstimulated) cis-regulatory information of the human RANTES/CCL5 promoter as reference [20]. The RANTES/CCL5 gene is a chemokine involved in the pathology of inflammatory disease and transcriptional regulation is governed by a module comprised of six functionally characterized cis-motifs within <300 nucleotides of the core promoter [20]. Elucidation and subsequent comparison of regulatory context (or framework) allowed for the characterization of 53 additional target genes that either shared co-regulation with RANTES/CCL5 or were associated with inflammation [20]. This work was highlighted in a review by Werner et al. [21] emphasizing the use of in silico promoter modelling that could serve as a valuable tool to identify and predict co-regulated target genes sharing conserved organizational features within their promoters [21]. These investigations provided an exciting and relatively new strategy for elucidation of transcriptional regulatory complexity, furthermore demonstrating that in silico promoter analyses could facilitate systematic modelling and understanding of the synergistic interplay between expression arrays, regulatory networks and gene function [16, 21, 38]. Two studies performed with similar in silico promoter comparative strategies revealed how the conserved organization of promoter motifs that are linked to a disease [22] or a tissue-specific micro-environment [57] can be successfully used to detect novel tissue-specific or disease-associated genes, based on reconstructive promoter modelling. The strategy used by Döhr et al. [22] combined promoter frameworks of (i) orthologous genes and (ii) co-regulated genes associated with a similar biological function, disease or pathway. Models generated from these cross-referenced frameworks were used to identify signalling pathways co-ordinating the interaction of co-regulated genes associated with maturity onset diabetes of the young (MODY- [22]). Cohen et al.[57] expanded the comparative promoter strategy by showing that accurate analysis of the promoter framework (module organization) derived from genes distinctively expressed in the functional unit that contributes to the complex phenotype of the podocyte/slit-diaphragm, can provide direct links to identify a novel regulatory network by interaction of co-regulation and a related event (podocyte-directed expression). Based on the hypothesis that podocyte-specific genes may share conserved promoter features, Cohen et al.[57] used a comparative computational promoter strategy to search for shared TF-binding sites in the promoters of 47 podocyte-associated genes in human beings, mouse and rat species. The initial analysis revealed that two genes, nephrin (NPHS1) and zonula occludens (ZO-1), shared a similar promoter context or so-called ‘framework’, phylogenetically conserved across all three species. Experimental gene expression analysis of NPHS1 and ZO-1 confirmed significant co-regulatory activity in micro-dissected human glomeruli taken from renal biopsies representing various disease conditions that have an effect on the glomerular filtration barrier i.e. benign nephrosclerosis, membranous glomerulosclerosis and diabetic nephropathy [57]. A subsequent second round promoter model screening revealed the presence of the shared NPHS1/ZO-1 promoter framework in 79 of 50,145 human promoter sequences screened [57]. However, only one novel candidate gene, cadherin-5 (CDH5), was identified to share the NPHS1/ZO-1 promoter model in all three species (human beings, mouse and rat) and, more interestingly, has not previously been associated with podocyte-specific gene expression [57]. Experimental gene expression analysis performed with biopsy glomeruli samples from 76 patients representing human glomerular disease confirmed predicted co-regulation of all three genes (NPHS1, ZO-1 and CDH5) [57], thus validating the accuracy of the promoter model. Findings from the investigations described above expanded hypothesis-driven research by combining phylogenetic conserved regulatory context with shared biological function. Predicting co-regulated genes of functional significance derived from the conserved organization of promoter modules is not restricted to phylogenetic analyses. Additionally, integrative promoter modelling strategies can be used to predict novel transcriptional targets or genes based solely on the functional interaction and/or association within a specific biological environment, process or condition without inter-species comparison. In a most recent investigation, Moss et al. [58] elegantly demonstrated the use of in silico promoter models by successfully predicting novel colon cancer-associated genes based on the compact arrangement of highly conserved cis-motifs associated with cell proliferation. This study utilized a range of different bioinformatic tools on all levels ranging from extrapolation of transcriptomic data and characterization of promoter architecture to the prediction of novel co-regulated genes sharing a common promoter module associated with cell proliferation in colon cancer [58]. This example showed how the conserved nature of cis-motif promoter architecture can be implemented to identify additional molecular targets in a systematic top-down approach (Fig. 4A). By further accentuation of this strategy, we suggest that a bottom-up approach based on cis-motif logic and promoter sequence availability can be used to predict so-called master-regulators that operate as activators, repressors, modifiers and/or co-activators by modulating the regulation of several genes in a biological pathway, disease or cellular environment (Fig. 4B). In the midst of studies described here, research efforts that report on the use of in silico promoter models are relatively limited. Although different modelling strategies exist, it is evident that refined analysis of promoter organization serves as the major objective within all methods. In silico and comparative analysis (i.e. shared promoter framework within specific tissue, condition and/or species) of cis-regulatory architecture could (i) provide further insight into defining the relationship of genes that are co-expressed and/or co-regulated, (ii) assist in the identification of functionally related promoter elements in the absence of gene sequence similarity, (iii) predict novel disease-associated genes sharing a unique regulatory mechanism or biological pathway and (iv) subsequently allow for identification and representation of transcriptional targets for the development of therapeutic agents [19–23, 51, 57, 58]. Contrary to the advantages offered by promoter modelling, several drawbacks such as (i) limited experimental validation of promoter function and protein-DNA interactions, (ii) the lack of high-throughput biological data, (iii) variation in the accuracy of computational tools and (iv) overall complexity of regulatory mechanisms (in addition to transcriptional control) pose significant challenges for future modelling strategies. Nevertheless, examples highlighted underscore the importance of (i) integration, (ii) the conserved nature of a regulatory framework and (iii) the use of in silico promoter models as valuable tools to study the complex mechanisms governing transcriptional regulation in the context of disease and potential target discovery.
4

In silico modelling strategy and implementation. (A) Combination of highly conserved promoter cis-motifs identified from genes expressed during a particular disease (transcriptomic profile represented by microarray) and used to construct promoter model. Specific promoter model can be implemented to identify novel genes (represented by red dots on underlying grey microarray) that can otherwise not be identified by conventional gene expression profile. (B) In addition, promoter model can be used to predict putative (i) regulatory pathways and (ii) TF-targets. This in silico bottom-up strategy is based on the concept of using a conserved regulatory promoter area (represented as a model) to predict (combining literature database mining) a central TF-target and/or so-called ‘master regulator’ (regulating a specific cascade of genes during a particular biological process). This information could be valuable for the development of a therapeutic agent affecting a central molecular regulator.

In silico modelling strategy and implementation. (A) Combination of highly conserved promoter cis-motifs identified from genes expressed during a particular disease (transcriptomic profile represented by microarray) and used to construct promoter model. Specific promoter model can be implemented to identify novel genes (represented by red dots on underlying grey microarray) that can otherwise not be identified by conventional gene expression profile. (B) In addition, promoter model can be used to predict putative (i) regulatory pathways and (ii) TF-targets. This in silico bottom-up strategy is based on the concept of using a conserved regulatory promoter area (represented as a model) to predict (combining literature database mining) a central TF-target and/or so-called ‘master regulator’ (regulating a specific cascade of genes during a particular biological process). This information could be valuable for the development of a therapeutic agent affecting a central molecular regulator.

Conclusion

The principles of several regulatory mechanisms have been well characterized individually; however, gaining a holistic insight into the complexity of orchestrated regulatory events remains a challenge. While gaps in our appreciation of transcriptional regulation still remain, advances in bioinformatics and high-throughput technologies such as ChIP-on-chip have greatly extended our reach into the discovery of novel promoters as well as enhancer elements, allowing a more accurate modelling of the regulatory code in cis-context. Consequently such models can be used to identify and/or predict transcriptional activation and signalling in fundamental research endeavours and biopharmaceutical applications. The studies described in this review have laid the groundwork for future investigations integrating the concept of promoter modelling as a tool in molecular medicine.
  57 in total

Review 1.  The RNA polymerase II core promoter: a key component in the regulation of gene expression.

Authors:  Jennifer E F Butler; James T Kadonaga
Journal:  Genes Dev       Date:  2002-10-15       Impact factor: 11.361

Review 2.  Diversified transcription initiation complexes expand promoter selectivity and tissue-specific gene expression.

Authors:  Andreas Hochheimer; Robert Tjian
Journal:  Genes Dev       Date:  2003-06-01       Impact factor: 11.361

3.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles.

Authors:  Albin Sandelin; Wynand Alkema; Pär Engström; Wyeth W Wasserman; Boris Lenhard
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

4.  Transcriptional regulatory network analysis of developing human erythroid progenitors reveals patterns of coregulation and potential transcriptional regulators.

Authors:  M A Keller; S Addya; R Vadigepalli; B Banini; K Delgrosso; H Huang; S Surrey
Journal:  Physiol Genomics       Date:  2006-08-29       Impact factor: 3.107

Review 5.  Computational and experimental approaches for modeling gene regulatory networks.

Authors:  J Goutsias; N H Lee
Journal:  Curr Pharm Des       Date:  2007       Impact factor: 3.116

6.  Regulatory context is a crucial part of gene function.

Authors:  Sabine Fessele; Holger Maier; Christian Zischek; Peter J Nelson; Thomas Werner
Journal:  Trends Genet       Date:  2002-02       Impact factor: 11.639

7.  Identification of polymorphic antioxidant response elements in the human genome.

Authors:  Xuting Wang; Daniel J Tomso; Brian N Chorley; Hye-Youn Cho; Vivian G Cheung; Steven R Kleeberger; Douglas A Bell
Journal:  Hum Mol Genet       Date:  2007-04-04       Impact factor: 6.150

Review 8.  Functional genomics and gene microarrays--the use in research and clinical medicine.

Authors:  Ladina Joos; Emel Eryüksel; Martin H Brutsche
Journal:  Swiss Med Wkly       Date:  2003-01-25       Impact factor: 2.193

Review 9.  In silico representation and discovery of transcription factor binding sites.

Authors:  Giulio Pavesi; Giancarlo Mauri; Graziano Pesole
Journal:  Brief Bioinform       Date:  2004-09       Impact factor: 11.622

10.  Linking disease-associated genes to regulatory networks via promoter organization.

Authors:  S Döhr; A Klingenhoff; H Maier; M Hrabé de Angelis; T Werner; R Schneider
Journal:  Nucleic Acids Res       Date:  2005-02-08       Impact factor: 16.971

View more
  1 in total

1.  Analysis of DNA sequence variants detected by high-throughput sequencing.

Authors:  David R Adams; Murat Sincan; Karin Fuentes Fajardo; James C Mullikin; Tyler M Pierson; Camilo Toro; Cornelius F Boerkoel; Cynthia J Tifft; William A Gahl; Tom C Markello
Journal:  Hum Mutat       Date:  2012-02-28       Impact factor: 4.878

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.