Literature DB >> 16845074

PromAn: an integrated knowledge-based web server dedicated to promoter analysis.

Aurélie Lardenois1, Frédéric Chalmel, Laurent Bianchetti, José-Alain Sahel, Thierry Léveillard, Olivier Poch.   

Abstract

PromAn is a modular web-based tool dedicated to promoter analysis that integrates distinct complementary databases, methods and programs. PromAn provides automatic analysis of a genomic region with minimal prior knowledge of the genomic sequence. Prediction programs and experimental databases are combined to locate the transcription start site (TSS) and the promoter region within a large genomic input sequence. Transcription factor binding sites (TFBSs) can be predicted using several public databases and user-defined motifs. Also, a phylogenetic footprinting strategy, combining multiple alignment of large genomic sequences and assignment of various scores reflecting the evolutionary selection pressure, allows for evaluation and ranking of TFBS predictions. PromAn results can be displayed in an interactive graphical user interface, PromAnGUI. It integrates all of this information to highlight active promoter regions, to identify among the huge number of TFBS predictions those which are the most likely to be potentially functional and to facilitate user refined analysis. Such an integrative approach is essential in the face of a growing number of tools dedicated to promoter analysis in order to propose hypotheses to direct further experimental validations. PromAn is publicly available at http://bips.u-strasbg.fr/PromAn.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16845074      PMCID: PMC1538850          DOI: 10.1093/nar/gkl193

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The functional genomics revolution has given rise to a huge number of transcriptomics studies. This, combined with the availability of numerous eukaryotic genome sequences, has lead to the current challenge of decoding the regulatory networks underlying gene expression. As a consequence, a rapid increase in the number of available databases, methods and programs dedicated to promoter analysis has emerged during the past few years. Transcription factor binding sites (TFBSs) are small, degenerative sequences, so a major problem in promoter analysis is the selection of the correct TFBS predictions in the resulting low signal-to-noise ratio environment. Cross-species comparison has been commonly used to filter TFBS predictions and to identify potentially active regulatory elements, as in ConSite (1) and rVista 2.0 (2). This approach is based on the assumption that gene regulatory regions and elements are often preferentially conserved during evolution but also suggests that selective pressure on orthologous genes must be similar in each respective organism. This phylogenetic footprinting (3,4) method is implemented in almost all web servers implicated in gene regulation studies, e.g. CONREAL (5) and Footer (6). However, the tools differ in their choice of TFBS databases, prediction programs and methods, as well as the genomic sequence alignment and statistical scoring methods implemented to estimate the TFBS conservation pressure. In this context, integrative approaches become essential for in-depth promoter analysis. We thus developed PromAn, a web server aimed at integrating different publicly available databases, programs and methods dedicated to promoter analysis. PromAn integrates transcription start site (TSS) and TFBS databases, prediction programs, a phylogenetic footprinting approach, several statistical scoring methods as well as an interactive graphical user interface, PromAnGUI. PromAnGUI integrates all of this information and helps the biologist to refine results and further guide gene regulation hypotheses in parallel with experimental data and validations.

METHODS

Input genomic sequences

The PromAn web server requires as input a single genomic sequence that will be taken as the reference in subsequent analysis and a set of orthologous sequences in fasta format (Figure 1). PromAn provides the possibility of inputting large genomic sequences, which allows the user to begin analysis with minimal prior knowledge of proximal and distal active promoter regions. To limit the processing required and to allow for variation on the TSS position, the user can input genomic sequences up to 20 kb. According to the NCBI statistics of the human genome (), the average size of exons and introns are 231 and 5407 bp, respectively. Thus, large sequences up to 20 kb can allow for 5′ non-coding exons, TSS mis-location and coding exons anchoring the multiple alignment.
Figure 1

Flowchart of the PromAn integrated strategy. A single reference sequence and a set of orthologous genomic sequences are required as input. First, dinucleotide distribution, TSS and promoter location as well as TFBS predictions characterize the reference sequence. Next, a phylogenetic footprinting approach is used to determine the evolutionary conservation profile and TFBS evaluation with statistical scores. This integrated strategy is used to validate the TSS location, highlighting potentially active promoter regions and potentially functional TFBSs through the PromAn graphical user interface (GUI).

Reference sequence characterization

In order to locate the promoter region on the reference genomic sequence and validate the TSS location, PromAn integrates experimentally-based databases such as DBTSS (Database of Transcriptional Start Sites) (7) and EPD (Eukaryotic Promoter Database) (8) as well as promoter, first exon or exonic map prediction programs, such as First EF (9), Eponine (10) and GenScan (11). TSS location validation is an essential preliminary step in promoter analysis. PromAn determines the nucleotide distribution of the reference sequence to detect the presence of potential GC-rich regions within the given promoter of interest. On the reference sequence, PromAn also looks for TFBS predictions based on the following nucleotide matrices: public TRANSFAC (12) and JASPAR (13) databases, user-defined motifs and databases dedicated to a specific biological problems, such as retina or nuclear receptor databases. The Position Weight Matrices (PWM) scoring method (14,15) has been implemented to locate and score TFBS predictions on the reference promoter region of interest. The profiles are first converted to log-scale PWM to evaluate candidate TFBS predictions. A normalized matrix score, S, is assigned to each prediction where: A threshold is then applied to the scores to define the predictions that are considered to be candidate TFBS predictions.

Phylogenetic footprinting

The TFBSs are small (6–20 bp) and degenerate sequences. Their inherent properties imply that there can at least be one TFBS prediction per genomic sequence base pair. Given the huge number of predictions, >90% are usually false positives. PromAn uses the multiz (16) multiple sequence alignment program to implement a phylogenetic footprinting method to take into account evolutionary selection pressure. Orthologous sequence alignment allows highlighting of regions conserved during evolution. Several studies have demonstrated that regulatory modules are under positive selection pressure, therefore regions of high conservation should correspond to potentially active promoter. Multiple sequence alignment also provides the basis for statistical scores estimating the significance of predicted TFBSs. As the number of orthologous input sequences is not limited in PromAn, use of at least three sequences provides a more precise multiple alignment that allows for more accurate statistical scores reflecting the conservation of TFBSs during evolution. The evolutionary distances between organisms should be taken into account, as sequence divergence between closely related organisms, such as human and chimpanzee or rat and mouse is usually insufficient to provide relevant evolutionary conservation information (1). PromAn implements three complementary statistical scores. The conservation score measures the identity of the orthologous sequences with respect to the reference sequence for a given region. It corresponds to the average percentage of nucleotides identical to the reference sequence at a given position in the multiple sequence alignment. The entropy score is based on the ScoreCons program (17). It gives the degree of nucleic acid variability to quantify residue conservation in a multiple alignment. The mean distance score is based on the ClustalX conservation profile (18) this means on the mean pairwise distance between sequences in a continuous sequence space. The combined scores allow for an evaluation and ranking of TFBS predictions in order to estimate their biological relevance.

PromAn results display and analysis

PromAn results are sent to the user by e-mail. PromAnGUI gives the user the possibility to visualize and refine results as often as needed in parallel with expert biological knowledge and experimental validations, which are indispensable to complete and further guide gene regulation hypotheses. A help section relative to the PromAnGUI graphical user interface is available on the PromAn home page. As an example, we consider the analysis of the mouse rhodopsin promoter region. cis-regulatory elements of the bovin rhodopsin proximal promoter region (RPPR) have been characterized in several studies (19,20). PromAn allows us to determine whether these elements are retrieved in the mouse. In other words, are these biologically active regions conserved during evolution? Input genomic sequences have been extracted from the HomoloGene Downloader tool () and the UCSC Genome Browser web server (). Mouse and a set of five orthologous sequences (human, cow, dog, chicken and Xenopus tropicalis) extracted from −10 kb to +10 kb with respect to the start codon have been used as input in this PromAn analysis. Figure 2 illustrates the display of this analysis in the PromAnGUI. Results are always located with respect to the reference sequence (mouse), which is depicted as two red boxes surrounding different profiles in the upper frame. Both AT (green) and GC (red) profiles can be depicted to easily and immediately identify regions enriched in these dinucleotides. The DBTSS database allows us to locate an experimentally validated TSS at the 9903th bp of the mouse reference sequence. The dinucleotide profiles in Figure 2 show that the RPPR contains a GC-rich region around the TSS. The blue curve represents the conservation profile of the reference sequence based on a multiple alignment of the orthologous genomic sequences. The higher peaks correspond to coding exons present in the mouse sequence and conserved in the five other orthologs. The DBTSS, Eponine, GenScan and TFBS (user-defined TATA and CCAAT boxes, TRANSFAC, JASPAR and retina dedicated databases) predictions are depicted in the lower frames by boxes colored according to the mean distance [gradient from low (grey) to high (red)] score. The conservation profile shows a proximal region of about 1 kb long conserved in all four mammals. Within this region, a small region (from 9677 to 9916—depicted in orange) has been described as being responsible for the photoreceptor specificity and is named the RPPR.
Figure 2

Example of a PromAn output. The upper frame displays the mouse rhodopsin genomic region (extracted from ±10 kb with respect to the start codon—red boxes), surrounding dinucleotide (AT in green and GC in red) and conservation (blue) profiles. The orange rectangle highlights the RPPR that is described in Figure 3. The lower frames depict the DBTSS, Eponine, GenScan, user-defined motifs (Match_Pattern), TRANSFAC (Match_TRANSFAC_MinSumGood), JASPAR (Match_JASPAR_CORE) and retina dedicated (Match_BIBLIO_Retina) predictions. Each prediction is displayed as a colored [gradient from low (grey) to high (red) Mean Distance score] box where the outline indicates the strand (blue for minus and red for plus).

Figure 3 presents a zoom-in on this conserved RPPR region. The full genomic reference sequence is depicted above the profiles and the zoomed-in region is displayed in red below the profiles. PWM and conservation score cut-offs of 0.6 were used to select TFBS predictions. These results highlight the conservation of the Ret-1/PCE-1, BAT-1, NRE and Ret-4 elements among mammals. Many Eopsin-1 binding sites are predicted because the motifs are only 6 bp long.
Figure 3

TFBS analysis of the RPPR responsible for photoreceptor specificity. User-defined motifs, TRANSFAC, JASPAR and retina specific predictions are depicted. They highlight the conservation of the Ret-1/PCE-1, BAT-1, NRE and Ret-4 elements among mammals.

PromAn program implementation and future improvements

The PromAn web server and PromAnGUI visualization tool are written in Tcl/Tk 8.4, HTML and JavaScript. Both tools are modular and organized in order to allow for easily upgrade by the addition of supplementary genomic sequence alignment tools, promoter or TFBS databases and prediction programs. We are currently integrating additional multiple alignment programs dedicated to large genomic sequences, such as TBA (16) and Multi-LAGAN (21). A future version of PromAn will include an option to give the user the possibility to predict TFBSs on orthologous sequences. PromAn will thus integrate statistics similar to the ones available in Footer (6). Identification of cis-regulatory modules will be implemented in a future version of the graphical user interface. PromAnGUI will also integrate tissue-specific, transcriptomic and interactomic data relative to transcription factors to add new filtration dimensions aimed at improving TFBS predictions. Thus, the user will be able to select regulatory motifs or modules according to co-expressed transcription factors interacting together.

CONCLUSION

The PromAn web server provides a number of advantages over many existing systems for promoter analysis. First, minimal prior knowledge of the genomic region of interest is necessary. Second, PromAn provides the possibility of performing multiple alignment using more than two orthologous sequences, allowing refinement of the evolutionary conserved regions. The PromAnGUI graphical user interface is a powerful tool used to integrate and visualize results and to filter out false positive TFBS predictions with matrix, conservation scores and biological knowledge of the user. Therefore, PromAn facilitates the construction of hypotheses in terms of potentially regulatory regions and elements in order to direct further experimental validations.
  21 in total

Review 1.  DNA binding sites: representation and discovery.

Authors:  G D Stormo
Journal:  Bioinformatics       Date:  2000-01       Impact factor: 6.937

Review 2.  Scoring residue conservation.

Authors:  William S J Valdar
Journal:  Proteins       Date:  2002-08-01

3.  LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

Authors:  Michael Brudno; Chuong B Do; Gregory M Cooper; Michael F Kim; Eugene Davydov; Eric D Green; Arend Sidow; Serafim Batzoglou
Journal:  Genome Res       Date:  2003-03-12       Impact factor: 9.043

4.  ConSite: web-based prediction of regulatory elements using cross-species comparison.

Authors:  Albin Sandelin; Wyeth W Wasserman; Boris Lenhard
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

5.  rVISTA 2.0: evolutionary analysis of transcription factor binding sites.

Authors:  Gabriela G Loots; Ivan Ovcharenko
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

6.  Aligning multiple genomic sequences with the threaded blockset aligner.

Authors:  Mathieu Blanchette; W James Kent; Cathy Riemer; Laura Elnitski; Arian F A Smit; Krishna M Roskin; Robert Baertsch; Kate Rosenbloom; Hiram Clawson; Eric D Green; David Haussler; Webb Miller
Journal:  Genome Res       Date:  2004-04       Impact factor: 9.043

7.  Computational identification of promoters and first exons in the human genome.

Authors:  R V Davuluri; I Grosse; M Q Zhang
Journal:  Nat Genet       Date:  2001-12       Impact factor: 38.330

8.  Human-mouse genome comparisons to locate regulatory sites.

Authors:  W W Wasserman; M Palumbo; W Thompson; J W Fickett; C E Lawrence
Journal:  Nat Genet       Date:  2000-10       Impact factor: 38.330

9.  Computational detection and location of transcription start sites in mammalian genomic DNA.

Authors:  Thomas A Down; Tim J P Hubbard
Journal:  Genome Res       Date:  2002-03       Impact factor: 9.043

10.  A new generation of JASPAR, the open-access repository for transcription factor binding site profiles.

Authors:  Dominique Vlieghe; Albin Sandelin; Pieter J De Bleser; Kris Vleminckx; Wyeth W Wasserman; Frans van Roy; Boris Lenhard
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

View more
  10 in total

1.  Ikaros represses the transcriptional response to Notch signaling in T-cell development.

Authors:  Eva Kleinmann; Anne-Solen Geimer Le Lay; MacLean Sellars; Philippe Kastner; Susan Chan
Journal:  Mol Cell Biol       Date:  2008-10-13       Impact factor: 4.272

Review 2.  Synthetic Promoters: Designing the cis Regulatory Modules for Controlled Gene Expression.

Authors:  Jameel Aysha; Muhammad Noman; Fawei Wang; Weican Liu; Yonggang Zhou; Haiyan Li; Xiaowei Li
Journal:  Mol Biotechnol       Date:  2018-08       Impact factor: 2.695

3.  Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction.

Authors:  Meng Zhang; Cangzhi Jia; Fuyi Li; Chen Li; Yan Zhu; Tatsuya Akutsu; Geoffrey I Webb; Quan Zou; Lachlan J M Coin; Jiangning Song
Journal:  Brief Bioinform       Date:  2022-03-10       Impact factor: 11.622

4.  The homeobox gene CHX10/VSX2 regulates RdCVF promoter activity in the inner retina.

Authors:  Sacha Reichman; Ravi Kiran Reddy Kalathur; Sophie Lambard; Najate Aït-Ali; Yanjiang Yang; Aurélie Lardenois; Raymond Ripp; Olivier Poch; Donald J Zack; José-Alain Sahel; Thierry Léveillard
Journal:  Hum Mol Genet       Date:  2009-10-20       Impact factor: 6.150

5.  Transcriptional activation of REST by Sp1 in Huntington's disease models.

Authors:  Myriam Ravache; Chantal Weber; Karine Mérienne; Yvon Trottier
Journal:  PLoS One       Date:  2010-12-14       Impact factor: 3.240

6.  Functional promoter testing using a modified lentiviral transfer vector.

Authors:  Scott F Geller; Phillip S Ge; Meike Visel; Kenneth P Greenberg; John G Flannery
Journal:  Mol Vis       Date:  2007-05-17       Impact factor: 2.367

7.  Contrasting patterns of transposable element insertions in Drosophila heat-shock promoters.

Authors:  Robert A Haney; Martin E Feder
Journal:  PLoS One       Date:  2009-12-29       Impact factor: 3.240

8.  Phylogeny disambiguates the evolution of heat-shock cis-regulatory elements in Drosophila.

Authors:  Sibo Tian; Robert A Haney; Martin E Feder
Journal:  PLoS One       Date:  2010-05-17       Impact factor: 3.240

9.  Fine-tuning of intrinsic N-Oct-3 POU domain allostery by regulatory DNA targets.

Authors:  Robert Alazard; Lionel Mourey; Christine Ebel; Peter V Konarev; Maxim V Petoukhov; Dmitri I Svergun; Monique Erard
Journal:  Nucleic Acids Res       Date:  2007-06-18       Impact factor: 16.971

10.  PAP: a comprehensive workbench for mammalian transcriptional regulatory sequence analysis.

Authors:  Li-Wei Chang; Burr R Fontaine; Gary D Stormo; Rakesh Nagarajan
Journal:  Nucleic Acids Res       Date:  2007-05-21       Impact factor: 16.971

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.