Literature DB >> 18453628

ConTra: a promoter alignment analysis tool for identification of transcription factor binding sites across species.

Bart Hooghe1, Paco Hulpiau, Frans van Roy, Pieter De Bleser.   

Abstract

Transcription factors (TFs) are key components in signaling pathways, and the presence of their binding sites in the promoter regions of DNA is essential for their regulation of the expression of the corresponding genes. Orthologous promoter sequences are commonly used to increase the specificity with which potentially functional transcription factor binding sites (TFBSs) are recognized and to detect possibly important similarities or differences between the different species. The ConTra (conserved TFBSs) web server provides the biologist at the bench with a user-friendly tool to interactively visualize TFBSs predicted using either TransFac (1) or JASPAR (2) position weight matrix libraries, on a promoter alignment of choice. The visualization can be preceded by a simple scoring analysis to explore which TFs are the most likely to bind to the promoter of interest. The ConTra web server is available at http://bioit.dmbr.ugent.be/ConTra/index.php.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18453628      PMCID: PMC2447729          DOI: 10.1093/nar/gkn195

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Nowadays, context-specific changes in gene expression levels can be easily monitored on a genome-wide scale by using microarray analysis and serial analysis of gene expression, but the molecular mechanisms and the specific transcription factors (TFs) that drive those specific changes remain unknown in most cases. Identification of the components and mechanisms of signaling pathways is a slow process that inevitably involves a strategy of trial-and-error. Therefore, in silico prediction of the components before and during the identification process is highly desirable. In silico approaches estimate that there are about 2000 human TFs (3), of which about 800 have been characterized to varying degrees. For many of them, information on DNA-binding sites is available, allowing the modeling of binding characteristics to a reasonable extent. The most commonly used model for TF binding specificity is the position weight matrix (PWM), although it does not account for potential position dependencies within a transcription factor binding site (TFBS) (4). When a PWM or even a more advanced model such as a hidden Markov model (HMM) is used to predict binding sites for a specific TF, the results include a very large proportion of false positives. The reason is that TFBSs are very short, often between 6 and 15 nt, and tolerate relatively high degrees of degeneracy in the sequence. The use of orthologous sequences to find conserved and, therefore, potentially functional TFBSs is called phylogenetic footprinting. This in silico technique is commonly and successfully used in combination with the PWM model to reduce its rate of false positive predictions. The main difficulties of this kind of approach lie in correct aligning regulatory elements in promoter sequences that might have diverged a lot during evolution (5). Comparison of predicted TFBSs in one species with those of other species is not only used to reduce the number of false positive predictions, but also can be a goal in its own right. It is now widely accepted that many differences in animal morphology are due to specific changes in sequences that control gene expression, especially during development (6). Consequently, one expects to find important differences between species in the presence and position of TFBSs. Conservation of a TFBS among several species observed in a multiple alignment is not proof that it is functional. Neither is the conservation of a TFBS required for functionality, because differences between species are at least as biologically important as the similarities. Furthermore, the apparent lack of conservation might not have biological reasons, but could result from ‘incorrect’ alignment. Thus, although systematic hard conclusions are extremely difficult to make, proper display of predicted sites in several possible alignments would certainly be of help to the biologist seeking to generate or support a hypothesis. Despite the availability of a number of web tools that offer phylogenetic footprinting together with some visualization interface, the biologist at the bench still lacks a compact and user-friendly tool that suggests answers to a regularly recurring question. ConTra, the web tool presented in this article, offers interactive visualization of all predicted sites for selected TFs on aligned sequences of orthologous promoters. ConTra works per alternative promoter to facilitate detection of their differences or similarities. Furthermore, a simple scoring analysis can be applied before visualization to identify the TFs that are most likely to bind the promoter(s) of interest.

APPROACH AND FEATURES

ConTra enables easy and fast look-up of all known transcripts related to the human gene(s) or transcript(s) of interest, given by gene name, gene symbol, Ensembl gene id, Entrez gene id, RefSeq transcript id or Ensembl transcript id. The results are fully linked to NCBI (http://www.ncbi.nlm.nih.gov/), UCSC (http://genome.ucsc.edu/) and Ensembl (http://www.ensembl.org/). Transcripts are grouped according to transcription start site (TSS), and each group can be analyzed separately. This important feature of ConTra differentiates it from most other web tools that provide only one promoter per gene for analysis. The potential importance of alternative promoter regulation is exemplified by an alternative promoter of the DICER1 gene. The TSS of the DICER1 transcript NM_030621, predominantly expressed in breast tissue (7), is positioned more than 16 kb upstream from the TSS of the transcript NM_177438, which has been reported to be predominantly expressed in several other tissues (8). It is very likely that some important differences in the spectrum of TFBSs between the two promoters are causing the observed transcript proportion differences in different tissues, and ConTra could help to start exploring these differences. The transcriptional regulation of the DICER1 transcripts is further discussed in the supplemental data document nr 1. For every group of transcripts that has been selected, available qualitative pairwise and multiple alignments on man from Ensembl or UCSC are offered to choose from, and they can be retrieved by simple selection. Offered alignments include the multiz 17-way and 28-way multiple alignments from UCSC (9,10). The 28-way alignment has been produced recently and has been proven powerful for exploring vertebrate and mammalian evolution (11). Other offered alignments are the Pecan (http://www.ebi.ac.uk/~bjp/pecan/) 7-mammals and 10-amniota-vertebrates multiple alignments from Ensembl. The Pecan algorithm has been shown to be one of the best algorithms in terms of specificity and sensitivity (12). ConTra also offers most available pairwise blastz-net alignments on man from UCSC (13,14). The premade alignments offered by ConTra always have the human promoter sequence as the reference sequence because in our experience these alignments are the most frequently asked for. However, users can upload in fasta format their own alignment files with any other reference species. This upload feature also allows the use of alignment types other than those provided. We also plan to enable the upload of own PWMs in order to expand the series of TFs for which predicted binding sites can be visualized. All potential TFBSs are determined independently for each orthologous promoter using ‘vertebrate’ PWMs from the most recent versions of TransFac (1) or JASPAR (2). We have chosen to visualize TFBSs predicted by the simple, often used PWM system as is. Restricting the predicted TFBSs to only those that are phylogenetically conserved or taking into account extra features such as clustering tendency (15) or distance from TSS (16) would produce less false positive predictions. However, these filters would also, respectively, create a bias of the true positive predictions towards conserved TFBSs or towards TFBSs that meet the theoretical assumptions of models developed with too little experimental data. Prediction of TFBSs must and can be improved a lot, but much more experimental data needs to be really available, not just dispersed throughout scientific literature. Recently a few databases were designed that are suitable to contain complex regulatory data, namely ORegAnno and Pazar (17,18), and biologists are strongly encouraged to deposit their regulatory findings in these databases. The parameters that can be set are the length of upstream promoter sequences and the thresholds for PWMs that correspond to the stringency to be used when predicting TFBSs. The visualization of predicted TFBSs in HTML allows Javascript user interaction that is similar to the interaction provided by Jalview, a freely downloadable Java alignment editor (19). The interaction is crucial to keep visualization compact and interpretable. It also facilitates observation of potential coincident binding of several TFs and hence possible coregulation. Files needed for customized Jalview visualization, which is suitable for publication purposes, are provided as well. The results also include an overview picture for every promoter alignment. ConTra provides links to experimentally defined binding sites in the selected promoter region when these are available in ORegAnno (17). A typical output of ConTra visualization is depicted in Figure 1. More ConTra visualizations of experimentally proven TFBSs are linked from the ConTra doc page at http://bioit.dmbr.ugent.be/ConTra/contradoc.php#examples. This collection of examples will be expanded continuously.
Figure 1.

Visualization of the predicted TFBSs for TFs AP-2, CCAAT box, E-BOX and GC box in the multiz 28-way alignment of the promoter of the E-cadherin transcript NM_004360. The results are exactly as described by Comijn et al. (20).

Visualization of the predicted TFBSs for TFs AP-2, CCAAT box, E-BOX and GC box in the multiz 28-way alignment of the promoter of the E-cadherin transcript NM_004360. The results are exactly as described by Comijn et al. (20). The other part of ConTra, the exploration part, predicts which TFs are most likely to bind to the given promoter sequence(s). This prediction is done by using a simple, intuitive but effective score that takes into account the number of predicted binding sites, the extent of phylogenetic conservation, the distance from the TSS, the proportion of conserved predicted TFBSs and the information content (IC) of the predicting PWM. This likelihood score for promoter regulation is calculated for each PWM from both TransFac and JASPAR (CORE and phyloFACTS). For every promoter sequence, the top 100 best ranked PWMs are given, a selection of which can be directly forwarded to the visualization part. Predicting which TFs regulate the gene of interest is an extremely difficult task. The exploration part is mainly intended to give an idea of which TFs are more likely to bind to the promoter and thus to indicate the PWMs for which visualization of predicted TFBSs could be interesting. In the supplemental data document nr 1 we show that the exploration results seem to be biologically meaningful. We start with the extensively described promoter of the IL2 gene, encoding interleukin-2. Most experimentally defined TFBSs described in the literature are ranked at the top of the full list delivered by the ConTra exploration. The second example uses the promoter of MX1 (myxovirus resistance 1), which has two interferon-stimulated response element (ISRE) sites known to be crucial for its expression. The PWMs corresponding to TFs that bind to ISRE sites appear in the top of the resulting list. The third example considers the exploration of the promoter of the DICER1 gene, for which, as far as we know, no transcription regulation experiments have been described in the literature. Those results are intriguing in that they might be correlated with recent findings showing that miRNAs involved in cancer are regulated by TFs already known to play a role in cancerous processes (21). Several other web tools provide information about TFBSs predicted by PWMs (or HMMs) in the context of (multiple) promoter alignments. The supplemental data document nr 2 lists those web tools with their features. We think ConTra competes well with the other tools in this list as it is a compact and user-friendly web tool that provides the biologist at the bench with useful visualization of predicted TFBSs in a cross-species alignment context. The alignments are automatically fetched and contain up to 28 species. ConTra works per alternative promoter and is flexible with respect to promoter length, alignment type and PWM prediction stringency. Also important are the up-to-date PWM libraries of TransFac and JASPAR.

IMPLEMENTATION

Making input user-friendly was accomplished by the integration of resources from HGNC (22), UCSC and Ensembl. The alignment retrieval feature was implemented by perl scripts using data from the ‘golden path’ of UCSC (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/) and the program axtAndBed from the UCSC genome browser source code, or by perl scripts using the Ensembl Compara perl API. The PWM libraries used by ConTra contain 101 ‘vertebrate’ matrices from the latest JASPAR CORE database, 174 matrices from JASPAR phyloFACTS database and a nonredundant selection of 214 matrices from one of the latest TransFac database versions (11.4). Jalview (19) is used to create an overview picture of each promoter alignment, whereas the dynamic view of predicted TFBSs in the HTML-embedded promoter alignments is accomplished by Javascript changing CSS properties. The likelihood score for promoter regulation of each PWM in the exploration part is obtained by an accumulation of the weights of its predicted TFBSs on the reference sequence. The weight of a predicted TFBS depends mainly on the extent of phylogenetic conservation, which is determined by the number of species with a predicted TFBS for the same PWM at about the same position and by the conservation extent of that position. This simply represents the basic concept behind phylogenetic footprinting, i.e. cross-species conserved TFBSs are more likely to be functional compared to nonconserved ones. We do not require that the TFBS is conserved at exactly the same place. The score even rises if TFBSs predicted by the same PWM are near each other, because of the frequently observed presence of homotypic clusters of functional sites and weak ‘shadow’ sites around them (23). Another factor influencing the weight of a predicted TFBS is the distance to the TSS. This is supported by findings of ref. (16), which prove that functional TFBSs are mainly situated in the first 200 nt upstream of the TSS. Continuous high ranking of PWMs with a rather bad quality, i.e. predicting many false positives, is avoided by having the IC of the predicting PWM influence the weight of each predicted TFBS. For the same reason, the accumulated amount of weights is divided by a factor proportional to the number of nonconserved predicted TFBSs. The scoring formula is given as pseudocode in the supplemental data document nr 3.
  23 in total

1.  Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes.

Authors:  W James Kent; Robert Baertsch; Angie Hinrichs; Webb Miller; David Haussler
Journal:  Proc Natl Acad Sci U S A       Date:  2003-09-19       Impact factor: 11.205

2.  Aligning multiple genomic sequences with the threaded blockset aligner.

Authors:  Mathieu Blanchette; W James Kent; Cathy Riemer; Laura Elnitski; Arian F A Smit; Krishna M Roskin; Robert Baertsch; Kate Rosenbloom; Hiram Clawson; Eric D Green; David Haussler; Webb Miller
Journal:  Genome Res       Date:  2004-04       Impact factor: 9.043

3.  Position dependencies in transcription factor binding sites.

Authors:  Andrija Tomovic; Edward J Oakeley
Journal:  Bioinformatics       Date:  2007-02-18       Impact factor: 6.937

4.  An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression.

Authors:  David N Messina; Jarret Glasscock; Warren Gish; Michael Lovett
Journal:  Genome Res       Date:  2004-10       Impact factor: 9.043

5.  Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome.

Authors:  Elliott H Margulies; Gregory M Cooper; George Asimenos; Daryl J Thomas; Colin N Dewey; Adam Siepel; Ewan Birney; Damian Keefe; Ariel S Schwartz; Minmei Hou; James Taylor; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; James B Brown; Peter Bickel; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Eric A Stone; Kate R Rosenbloom; W James Kent; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Angie Hinrichs; Heather Trumbower; Hiram Clawson; Ann Zweig; Robert M Kuhn; Galt Barber; Rachel Harte; Donna Karolchik; Matthew A Field; Richard A Moore; Carrie A Matthewson; Jacqueline E Schein; Marco A Marra; Stylianos E Antonarakis; Serafim Batzoglou; Nick Goldman; Ross Hardison; David Haussler; Webb Miller; Lior Pachter; Eric D Green; Arend Sidow
Journal:  Genome Res       Date:  2007-06       Impact factor: 9.043

6.  Alternative initiation and splicing in dicer gene expression in human breast cells.

Authors:  Charletha V Irvin-Wilson; Gautam Chaudhuri
Journal:  Breast Cancer Res       Date:  2005-05-16       Impact factor: 6.466

7.  The HUGO Gene Nomenclature Database, 2006 updates.

Authors:  Tina A Eyre; Fabrice Ducluzeau; Tam P Sneddon; Sue Povey; Elspeth A Bruford; Michael J Lush
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

8.  FootPrinter3: phylogenetic footprinting in partially alignable sequences.

Authors:  Fei Fang; Mathieu Blanchette
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

9.  PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation.

Authors:  Elodie Portales-Casamar; Stefan Kirov; Jonathan Lim; Stuart Lithwick; Magdalena I Swanson; Amy Ticoll; Jay Snoddy; Wyeth W Wasserman
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

10.  Wide-scale analysis of human functional transcription factor binding reveals a strong bias towards the transcription start site.

Authors:  Yuval Tabach; Ran Brosh; Yossi Buganim; Anat Reiner; Or Zuk; Assif Yitzhaky; Mark Koudritsky; Varda Rotter; Eytan Domany
Journal:  PLoS One       Date:  2007-08-29       Impact factor: 3.240

View more
  32 in total

Review 1.  Mechanisms and evolution of control logic in prokaryotic transcriptional regulation.

Authors:  Sacha A F T van Hijum; Marnix H Medema; Oscar P Kuipers
Journal:  Microbiol Mol Biol Rev       Date:  2009-09       Impact factor: 11.056

2.  Maternal peripheral blood gene expression in early pregnancy and preeclampsia.

Authors:  Daniel A Enquobahrie; Chunfang Qiu; Seid Y Muhie; Michelle A Williams
Journal:  Int J Mol Epidemiol Genet       Date:  2010-12-29

3.  p53 regulates Ki-67 promoter activity through p53- and Sp1-dependent manner in HeLa cells.

Authors:  Mei-Juan Wang; Dong-Sheng Pei; Guo-Wei Qian; Xiao-Xing Yin; Qian Cheng; Lian-Tao Li; Hui-Zhong Li; Jun-Nian Zheng
Journal:  Tumour Biol       Date:  2011-05-25

4.  Gene duplication and neo-functionalization in the evolutionary and functional divergence of the metazoan copper transporters Ctr1 and Ctr2.

Authors:  Brandon L Logeman; L Kent Wood; Jaekwon Lee; Dennis J Thiele
Journal:  J Biol Chem       Date:  2017-05-15       Impact factor: 5.157

5.  Module network inference from a cancer gene expression data set identifies microRNA regulated modules.

Authors:  Eric Bonnet; Marianthi Tatari; Anagha Joshi; Tom Michoel; Kathleen Marchal; Geert Berx; Yves Van de Peer
Journal:  PLoS One       Date:  2010-04-14       Impact factor: 3.240

6.  Autotaxin expression and its connection with the TNF-alpha-NF-kappaB axis in human hepatocellular carcinoma.

Authors:  Jian-Min Wu; Yan Xu; Nicholas J Skill; Hongmiao Sheng; Zhenwen Zhao; Menggang Yu; Romil Saxena; Mary A Maluccio
Journal:  Mol Cancer       Date:  2010-03-31       Impact factor: 27.401

7.  p53 promotes VEGF expression and angiogenesis in the absence of an intact p21-Rb pathway.

Authors:  M Farhang Ghahremani; S Goossens; D Nittner; X Bisteau; S Bartunkova; A Zwolinska; P Hulpiau; K Haigh; L Haenebalcke; B Drogat; A Jochemsen; P P Roger; J-C Marine; J J Haigh
Journal:  Cell Death Differ       Date:  2013-03-01       Impact factor: 15.828

8.  The Prrx1 homeodomain transcription factor plays a central role in pancreatic regeneration and carcinogenesis.

Authors:  Maximilian Reichert; Shigetsugu Takano; Johannes von Burstin; Sang-Bae Kim; Ju-Seog Lee; Kaori Ihida-Stansbury; Christopher Hahn; Steffen Heeg; Günter Schneider; Andrew D Rhim; Ben Z Stanger; Anil K Rustgi
Journal:  Genes Dev       Date:  2013-01-25       Impact factor: 11.361

9.  Molecular cloning and characterization of the porcine prostaglandin transporter (SLCO2A1): evaluation of its role in F4 mediated neonatal diarrhoea.

Authors:  Mario Van Poucke; Vesna Melkebeek; Tim Erkens; Alex Van Zeveren; Eric Cox; Luc J Peelman
Journal:  BMC Genet       Date:  2009-10-06       Impact factor: 2.797

10.  Type I interferon drives tumor necrosis factor-induced lethal shock.

Authors:  Liesbeth Huys; Filip Van Hauwermeiren; Lien Dejager; Eline Dejonckheere; Stefan Lienenklaus; Siegfried Weiss; Georges Leclercq; Claude Libert
Journal:  J Exp Med       Date:  2009-08-17       Impact factor: 14.307

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.