Literature DB >> 28472390

ConTra v3: a tool to identify transcription factor binding sites across species, update 2017.

Lukasz Kreft1, Arne Soete2,3, Paco Hulpiau2,3, Alexander Botzki1, Yvan Saeys2,4, Pieter De Bleser2,3.   

Abstract

Transcription factors are important gene regulators with distinctive roles in development, cell signaling and cell cycling, and they have been associated with many diseases. The ConTra v3 web server allows easy visualization and exploration of predicted transcription factor binding sites (TFBSs) in any genomic region surrounding coding or non-coding genes. In this updated version, with a completely re-implemented user interface using latest web technologies, users can choose from nine reference organisms ranging from human to yeast. ConTra v3 can analyze promoter regions, 5΄-UTRs, 3΄-UTRs and introns or any other genomic region of interest. Thousands of position weight matrices are available to choose from for detecting specific binding sites. Besides this visualization option, additional new exploration functionality is added to the tool that will automatically detect TFBSs having at the same time the highest regulatory potential, the highest conservation scores of the genomic regions covered by the predicted TFBSs and strongest co-localizations with genomic regions exhibiting regulatory activity. The ConTra v3 web server is freely available at http://bioit2.irc.ugent.be/contra/v3.
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28472390      PMCID: PMC5570180          DOI: 10.1093/nar/gkx376

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Eukaryotic gene expression is transcriptionally regulated by the coordinated interaction of transcription factors (TF) with arrays of transcription factor binding sites (TFBSs) (1,2), also known as cis-regulatory modules and with each other (3). Knowing by which TFs a gene is regulated, is essential to reconstruct and model transcriptional regulatory networks governing biological processes such as the cell cycle or differentiation. Traditionally, regulation of genes by TFs is predicted by scanning promoter regions with positional weight matrices (PWMs) of known TFs, retaining putative binding sites scoring higher than an arbitrarily chosen cut-off for a given PWM. The results, however, include a large number of false positives due to the short (6–15 nucleotides) and degenerate nature of TFBSs. Phylogenetic footprinting is commonly and successfully used in combination with the PWM model to reduce its rate of false positive predictions. The main difficulty in this approach is to get correct alignments of regulatory elements in promoter regions that might have diverged during evolution (4). Taking into consideration that conservation of a TFBS among several species in a multiple alignment is neither proof nor required for functionality, the ConTra series of tools (5,6) have been designed to properly display predicted TFBSs in several possible alignments aiming to help the biologist seeking to generate or support a hypothesis. In this update, we describe the new features and expansions of the ConTra v3 web server. The ConTra v3 frontend has been completely re-implemented using latest web technologies to meet the required level of interactivity and user involvement. New features include a new layout, a simpler submission form, an on-screen guide and a dynamic TFBS viewer. The simplified design of the website layout facilitates user interaction and brings the main focus on the information provided. Its responsive design allows users of different screen sized devices to use the service without troubles. The form itself was simplified both visually and practically, allowing the user to have a better understanding of the required data and a clearer overview of the provided input. With the help of the on-screen interactive guide, the user is navigated step-by-step through the form submission process and is provided with sample data. Furthermore, the results page now contains not only static TFBS visualization images but also a dynamic TFBS viewer, where the user can select TFs and zoom in on the identified binding sites. With respect to the backend, we updated the PWM libraries to more recent versions including the TRANSFAC database (update 2011.3) (7), the JASPAR core database (update 2016) (8), the cisBP Homo sapiens database (9) and the Taipale motifs collection for visualization (10). PWM libraries that were seldom used according to our web logs, such as the phyloFACTS database (11) and a collection of homeodomain PWMs derived from a protein binding microarray (12) have been removed. The other part of ConTra v3, the exploration part, predicts which TFs are most likely to bind to a given genomic region. In the previous versions of ConTra (5,6), the likelihood score for regulation of a gene by a TF, represented by its PWM, was obtained by an accumulation of the weights of the predicted TFBSs on the reference sequence. The weight of the predicted TFBS was determined by the number of species with a predicted TFBS for the same PWM at about the same position and the conservation extent of that position. The major drawback of the original implementations of the exploration part was the duration of the calculations involved: this could take from hours to days before results were obtained. As a consequence, this feature was not often used. Therefore, the exploration part was completely revised. In ConTra v3, PWM predicted TFBSs are ranked based on regulatory potential (13), conservation score (14) and the degree of overlap with genomic regions coinciding with regions of experimentally validated TF binding obtained from the comprehensive list of TF Chromatin Immunoprecipitation Sequencing (ChIP-Seq) data released by the ReMap project (15). An overall rank for each PWM is calculated using rank product statistical analysis (16). The rank is based on aggregation of the ranked lists scoring the PWM based TFBSs predictions respectively on regulatory potential, degree of conservation and degree of overlap with genomic regions with demonstrated regulatory activity. A selection of up to 20 of these ranked PWMs can then be used as input for visualization analysis. The duration of the calculations involved are reduced to minutes, making this feature a lot more applicable and useful.

INPUT AND OUTPUT

Input

A typical ConTra v3 analysis consists of four steps. First, users have to choose whether they want to visualize or explore a gene of interest. For visualization, it is also necessary to indicate the reference species and the gene of interest. The second step lists a group of available transcripts for genes matching the search terms, from which one can be selected. For every gene, all possible RefSeq and Ensembl transcript variants are listed with a link to the genomic location in the respective genome browser. This way, genes with alternative promoters, UTRs or alternative intronic regions can be analyzed for regulatory differences. In step 3, different genomic regions of the selected transcript can be chosen (upstream, introns, 5΄-UTR and 3΄-UTR). The final step offers users an extensive choice of PWM motifs: up to 20 PWM motifs can be simultaneously taken into account for analysis. For exploration, one chooses the gene, the transcript, the region of interest and launches the exploration analysis.

Output

For the visualization part, results are split into alignment blocks allowing evaluation of the degree of binding site conservation. In the exploration part, a list of PWMs is given, ranked on the highest regulatory potential, highest conservation scores of the genomic regions covered by their predicted TFBSs and overlap with genomic regions with demonstrated regulatory activity. A selection of these high-scoring PWMs can then be used as input for visualization analysis.

Example

The cytokine interleukin-2 (IL2) is an important signaling protein in the human immune system. Regulation of the IL2 gene has been widely studied (17–19). We validated the ConTra v3 exploration mode by analyzing the IL2 promoter and comparing the results with known regulators from literature. In the first step we selected for exploration of the IL2 gene with Homo sapiens as reference species. In the next steps, we choose to analyze the promoter region (500-bp upstream) of the RefSeq transcript NM_000586. Filtering the predicted TFBS with a q-value of 0.1 and a PWM information content of at least 5 bits retrieves a list of 10 putative conserved binding sites of which at least for half of them there is experimental support. We selected NF-AT (V$NFAT_Q4_01), ELF1 (V$ELF1_Q6) and OCT (V$OCT_Q6) for visualization with a core and similarity stringency of 0.90 and 0.75 respectively. The results are shown in Figure 1. ConTra v3 successfully predicts the two known regulatory elements consisting of an Octamer (OCT), NF-AT and E26 transformation-specific (ETS) binding site and suggests the presence of a similar, third conserved module further upstream (Figure 1A and B). Fasta and feature color files, available for each alignment block, can be used to produce high quality figures with Jalview as shown in Figure 1C. The UCSC link on the result page in ConTra v3 maps the detected TFBS in the UCSC genome browser (Figure 1D). Also shown are additional ENCODE regulation tracks illustrating the co-localization of the third module with the presence of H3K4Me1 marks and open chromatin.
Figure 1.

Analysis of the human IL2 promoter with ConTra v3 in exploration mode followed by visualization of a selection of the top-scoring results. (A) Overview of conserved binding sites for OCT, NF-AT and ETS/ELF transcription factors (TF) in the promoter region 500 bases upstream of the human interleukin-2 (IL2) gene. Gray boxes show repeats of regulatory regions. Regions 1 and 2 are experimentally supported (18,19). (B) Visualization of conserved TFBS across species. A user can choose to show or hide each species and TFBS individually. The alignment region can be zoomed in and out (top) and sites can be inspected at base level (bottom). (C) Alignment blocks can be downloaded as FASTA file with a corresponding feature color file to produce figures in several output formats using Jalview. (D) The detected binding sites can also be looked at, in a genomic context in the UCSC genome browser. Also shown are additional ENCODE regulation tracks illustrating the co-localization of the third module with the presence of H3K4Me1 marks and open chromatin.

Analysis of the human IL2 promoter with ConTra v3 in exploration mode followed by visualization of a selection of the top-scoring results. (A) Overview of conserved binding sites for OCT, NF-AT and ETS/ELF transcription factors (TF) in the promoter region 500 bases upstream of the human interleukin-2 (IL2) gene. Gray boxes show repeats of regulatory regions. Regions 1 and 2 are experimentally supported (18,19). (B) Visualization of conserved TFBS across species. A user can choose to show or hide each species and TFBS individually. The alignment region can be zoomed in and out (top) and sites can be inspected at base level (bottom). (C) Alignment blocks can be downloaded as FASTA file with a corresponding feature color file to produce figures in several output formats using Jalview. (D) The detected binding sites can also be looked at, in a genomic context in the UCSC genome browser. Also shown are additional ENCODE regulation tracks illustrating the co-localization of the third module with the presence of H3K4Me1 marks and open chromatin. The online Supplementary Data S2 and 3 contain two case studies that explain step-by-step how to run an exploration and/or visualization analysis including information how users may change parameters and criteria according to their needs.

TECHNICAL DETAILS

Web tool

The web tool is hosted on a Linux CentOS 6.6 server with 16 GB of RAM, an Apache/2.2.15 web server and PHP 5.4.16. ConTra v3 was implemented using the AngularJS engine, the Bootstrap framework and the Bootstrap Material stylesheets. As database storage engine, MySQL was chosen. The on-screen guide is using Intro.js whereas the dynamic TFBS viewer is rendered as SVG. To track user activity Google Analytics was connected to all of the web pages. Each submitted job is queued on a beanstalkd queue (version 1.9.2). Workers, written in Perl (v5.10.1 × 86_64-linux-thread-multi), take jobs from this queue and process them.

Backend

The backend of ConTra v3 is programmed in a combination of Perl and R (http://www.r-project.org). The visualization part of ConTra v3 relies on the same algorithms implemented in ConTra v2 but uses several updated PWM libraries and multi-species multiple sequence alignments. Furthermore, the framework has been adapted to make inclusion of new PWM collections easier. Users are encouraged to suggest new PWM collections useful for their research. The exploration part was in the previous versions slow. Therefore, it has been revised completely to make this feature much faster and hence more useful. One caveat remains: it is extremely difficult to predict which TFs regulate a gene of interest. The exploration part is primarily intended to give an idea of which TFs are more likely to bind to the genomic region of interest and to point to PWMs for which visualization of the predicted TFBSs could be interesting. If TFBS prediction is the primary concern, we direct the user to our PhysBinder web application (http://bioit.dmbr.ugent.be/physbinder) (20) that is likely to produce more reliable predictions. For exploration, one chooses the gene, the transcript, the region of interest and launches the exploration analysis. First, using the FIMO (21) application (default P-value cut-off: 0.0001) and the combined PWM libraries, TFBS predictions are made for every PWM. Next, the PWMs are ranked independently based on the cumulative scores of the regulatory potential scores of their TFBS predictions (13), the cumulative mean conservation scores of the genomic regions covered by the TFBS predictions (14) and the cumulative regulatory activity scores as a measure of the degree of overlap of the TFBS predictions with genomic regions coinciding with regions of experimentally validated TF binding, contained in the ReMap TF ChIP-Seq dataset (15). The concept of the regulatory potential of a TF for a target gene was introduced by Tang et al. (13) to model the influence of each binding site on gene regulation as a function that decreases monotonically with increasing distance from the transcription start site (TSS) of the gene. Regulatory potential considers both the number of binding sites and their distances to the reported TSS of the putative target gene. Mean conservation scores of the genomic regions covered by the TFBS predictions are obtained using the bigWigSummary tool from the UCSC genome browser with the phastConsElements100way table of the UCSC Genome Browser (http://genome.ucsc.edu) database. This table contains information about conserved elements identified by phastCons (14), a hidden Markov model-based method that estimates the probability that each nucleotide belongs to a conserved element, based on the multiple alignment. As it considers not only each individual alignment column, but also its flanking columns, PhastCons is effective for identifying conserved elements. Finally, the regulatory activity scores of the predicted TFBSs are calculated by counting the number of times they intersect with genomic regions coinciding with regions of experimentally validated TF binding, contained in the ReMap TF ChIP-Seq dataset (15). Rank product analysis (16) is used to select PWMs whose TFBS predictions simultaneously exhibit (i) the highest regulatory potential, (ii) the strongest conservation and (iii) the best overlap with genomic regions with demonstrated regulatory activity. Using the exploration analysis of the human IL2 promoter region as an example we provide an extensive description of how to use and interpret regulatory potential, conservation score and regulatory activity score rankings in Supplementary Data S1. Click here for additional data file.
  21 in total

Review 1.  The architecture of the interleukin-2 promoter: a reflection of T lymphocyte activation.

Authors:  E Serfling; A Avots; M Neumann
Journal:  Biochim Biophys Acta       Date:  1995-09-19

Review 2.  Identification of altered cis-regulatory elements in human disease.

Authors:  Anthony Mathelier; Wenqiang Shi; Wyeth W Wasserman
Journal:  Trends Genet       Date:  2015-01-27       Impact factor: 11.639

3.  DNA-binding specificities of human transcription factors.

Authors:  Arttu Jolma; Jian Yan; Thomas Whitington; Jarkko Toivonen; Kazuhiro R Nitta; Pasi Rastas; Ekaterina Morgunova; Martin Enge; Mikko Taipale; Gonghong Wei; Kimmo Palin; Juan M Vaquerizas; Renaud Vincentelli; Nicholas M Luscombe; Timothy R Hughes; Patrick Lemaire; Esko Ukkonen; Teemu Kivioja; Jussi Taipale
Journal:  Cell       Date:  2013-01-17       Impact factor: 41.582

4.  Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals.

Authors:  Xiaohui Xie; Jun Lu; E J Kulbokas; Todd R Golub; Vamsi Mootha; Kerstin Lindblad-Toh; Eric S Lander; Manolis Kellis
Journal:  Nature       Date:  2005-02-27       Impact factor: 49.962

5.  Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins.

Authors:  Matthew Slattery; Todd Riley; Peng Liu; Namiko Abe; Pilar Gomez-Alcala; Iris Dror; Tianyin Zhou; Remo Rohs; Barry Honig; Harmen J Bussemaker; Richard S Mann
Journal:  Cell       Date:  2011-12-09       Impact factor: 41.582

6.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes.

Authors:  Adam Siepel; Gill Bejerano; Jakob S Pedersen; Angie S Hinrichs; Minmei Hou; Kate Rosenbloom; Hiram Clawson; John Spieth; Ladeana W Hillier; Stephen Richards; George M Weinstock; Richard K Wilson; Richard A Gibbs; W James Kent; Webb Miller; David Haussler
Journal:  Genome Res       Date:  2005-07-15       Impact factor: 9.043

7.  Identification of a putative regulator of early T cell activation genes.

Authors:  J P Shaw; P J Utz; D B Durand; J J Toole; E A Emmel; G R Crabtree
Journal:  Science       Date:  1988-07-08       Impact factor: 47.728

8.  TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes.

Authors:  V Matys; O V Kel-Margoulis; E Fricke; I Liebich; S Land; A Barre-Dirrie; I Reuter; D Chekmenev; M Krull; K Hornischer; N Voss; P Stegmaier; B Lewicki-Potapov; H Saxel; A E Kel; E Wingender
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

9.  FootPrinter3: phylogenetic footprinting in partially alignable sequences.

Authors:  Fei Fang; Mathieu Blanchette
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

10.  A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments.

Authors:  Tom Heskes; Rob Eisinga; Rainer Breitling
Journal:  BMC Bioinformatics       Date:  2014-11-21       Impact factor: 3.169

View more
  38 in total

1.  Methods for Molecular Modelling of Protein Complexes.

Authors:  Tejashree Rajaram Kanitkar; Neeladri Sen; Sanjana Nair; Neelesh Soni; Kaustubh Amritkar; Yogendra Ramtirtha; M S Madhusudhan
Journal:  Methods Mol Biol       Date:  2021

2.  Insights into the transcriptional regulation of the anthracycline reductase AKR7A2 in human cardiomyocytes.

Authors:  Adolfo Quiñones-Lombraña; Amy Intini; Javier G Blanco
Journal:  Toxicol Lett       Date:  2019-02-25       Impact factor: 4.372

3.  Upregulation of CD271 transcriptome in breast cancer promotes cell survival via NFκB pathway.

Authors:  Nabiha Bashir; Mehreen Ishfaq; Kehkashan Mazhar; Jahangir Sarwar Khan; Ramla Shahid
Journal:  Mol Biol Rep       Date:  2021-11-09       Impact factor: 2.316

4.  Tumor necrosis factor overcomes immune evasion in p53-mutant medulloblastoma.

Authors:  Alexandra Garancher; Hiromichi Suzuki; Svasti Haricharan; Lianne Q Chau; Meher Beigi Masihi; Jessica M Rusert; Paula S Norris; Florent Carrette; Megan M Romero; Sorana A Morrissy; Patryk Skowron; Florence M G Cavalli; Hamza Farooq; Vijay Ramaswamy; Steven J M Jones; Richard A Moore; Andrew J Mungall; Yussanne Ma; Nina Thiessen; Yisu Li; Alaide Morcavallo; Lin Qi; Mari Kogiso; Yuchen Du; Patricia Baxter; Jacob J Henderson; John R Crawford; Michael L Levy; James M Olson; Yoon-Jae Cho; Aniruddha J Deshpande; Xiao-Nan Li; Louis Chesler; Marco A Marra; Harald Wajant; Oren J Becher; Linda M Bradley; Carl F Ware; Michael D Taylor; Robert J Wechsler-Reya
Journal:  Nat Neurosci       Date:  2020-05-18       Impact factor: 24.884

5.  PPARγ-p53-Mediated Vasculoregenerative Program to Reverse Pulmonary Hypertension.

Authors:  Jan K Hennigs; Aiqin Cao; Caiyun G Li; Minyi Shi; Julia Mienert; Kazuya Miyagawa; Jakob Körbelin; David P Marciano; Pin-I Chen; Matthew Roughley; Matthew V Elliott; Rebecca L Harper; Matthew A Bill; James Chappell; Jan-Renier Moonen; Isabel Diebold; Lingli Wang; Michael P Snyder; Marlene Rabinovitch
Journal:  Circ Res       Date:  2020-12-16       Impact factor: 17.367

6.  Circadian control of the secretory pathway maintains collagen homeostasis.

Authors:  Joan Chang; Richa Garva; Adam Pickard; Ching-Yan Chloé Yeung; Venkatesh Mallikarjun; Joe Swift; David F Holmes; Ben Calverley; Yinhui Lu; Antony Adamson; Helena Raymond-Hayling; Oliver Jensen; Tom Shearer; Qing Jun Meng; Karl E Kadler
Journal:  Nat Cell Biol       Date:  2020-01-06       Impact factor: 28.213

7.  Rewiring of human neurodevelopmental gene regulatory programs by human accelerated regions.

Authors:  Kelly M Girskis; Andrew B Stergachis; Ellen M DeGennaro; Ryan N Doan; Xuyu Qian; Matthew B Johnson; Peter P Wang; Gabrielle M Sejourne; M Aurel Nagy; Elizabeth A Pollina; André M M Sousa; Taehwan Shin; Connor J Kenny; Julia L Scotellaro; Brian M Debo; Dilenny M Gonzalez; Lariza M Rento; Rebecca C Yeh; Janet H T Song; Marc Beaudin; Jean Fan; Peter V Kharchenko; Nenad Sestan; Michael E Greenberg; Christopher A Walsh
Journal:  Neuron       Date:  2021-09-02       Impact factor: 18.688

8.  mTORC1 amplifies the ATF4-dependent de novo serine-glycine pathway to supply glycine during TGF-β1-induced collagen biosynthesis.

Authors:  Brintha Selvarajah; Ilan Azuelos; Manuela Platé; Delphine Guillotin; Ellen J Forty; Greg Contento; Hannah V Woodcock; Matthew Redding; Adam Taylor; Gino Brunori; Pascal F Durrenberger; Riccardo Ronzoni; Andy D Blanchard; Paul F Mercer; Dimitrios Anastasiou; Rachel C Chambers
Journal:  Sci Signal       Date:  2019-05-21       Impact factor: 8.192

9.  ZBTB32 performs crosstalk with the glucocorticoid receptor and is crucial in glucocorticoid responses to starvation.

Authors:  Lise Van Wyngene; Tineke Vanderhaeghen; Ioanna Petta; Steven Timmermans; Katrien Corbeels; Bart Van der Schueren; Jolien Vandewalle; Kelly Van Looveren; Charlotte Wallaeys; Melanie Eggermont; Sylviane Dewaele; Leen Catrysse; Geert van Loo; Rudi Beyaert; Roman Vangoitsenhoven; Toshinori Nakayama; Jan Tavernier; Karolien De Bosscher; Claude Libert
Journal:  iScience       Date:  2021-06-28

10.  ATF3 induces RAB7 to govern autodegradation in paligenosis, a conserved cell plasticity program.

Authors:  Megan D Radyk; Lillian B Spatz; Bianca L Peña; Jeffrey W Brown; Joseph Burclaff; Charles J Cho; Yan Kefalov; Chien-Cheng Shih; James Aj Fitzpatrick; Jason C Mills
Journal:  EMBO Rep       Date:  2021-07-26       Impact factor: 9.071

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.