Literature DB >> 21576231

ConTra v2: a tool to identify transcription factor binding sites across species, update 2011.

Stefan Broos1, Paco Hulpiau, Jeroen Galle, Bart Hooghe, Frans Van Roy, Pieter De Bleser.   

Abstract

Transcription factors are important gene regulators with distinctive roles in development, cell signaling and cell cycling, and they have been associated with many diseases. The ConTra v2 web server allows easy visualization and exploration of predicted transcription factor binding sites in any genomic region surrounding coding or non-coding genes. In this new version, users can choose from nine reference organisms ranging from human to yeast. ConTra v2 can analyze promoter regions, 5'-UTRs, 3'-UTRs and introns or any other genomic region of interest. Hundreds of position weight matrices are available to choose from, but the user can also upload any other matrices for detecting specific binding sites. A typical analysis is run in four simple steps of choosing the gene, the transcript, the region of interest and then selecting one or more transcription factor binding sites. The ConTra v2 web server is freely available at http://bioit.dmbr.ugent.be/contrav2/index.php.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21576231      PMCID: PMC3125763          DOI: 10.1093/nar/gkr355

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Both transcription factors (TFs) and microRNAs (miRNAs) are key players in gene regulation in multicellular organisms (1). Based on pairing between miRNAs and mRNAs, miRNA targets are predicted by searching for matches with the miRNA seed regions (2). On the other hand, the use of a position weight matrix (PWM) is the leading model for detection of TF binding sites (TFBSs). A PWM represents the sequence motif and depicts the DNA binding preferences of the TF. It is constructed using a set of known binding sequences. Traditionally, regulation of genes by TFs is predicted by analyzing promoter regions and determined experimentally by DNAse-foot-printing assays or electrophoretic mobility shift assays (EMSA). Nowadays, functional protein–DNA binding sites are increasingly studied on a genomic scale by using ChIP-seq. These studies indicate that only some of the functional TFBS are located in promoter regions; introns and untranslated regions (UTRs) also contain a substantial number of functional sites (3–5). For example, regulatory sites in the first intron might interact with sites in the promoter region due to DNA looping (6,7). Of the estimated 2000 human TFs, ∼300 are thought to bind to the core promoter and to play a role in the general transcription machinery, whereas the rest bind more specifically and regulate a fraction of genes (8). The latter TFs are expressed in almost all tissues or only in a few tissues, depending on whether their function is broad or more specific. Over half of the human genes are believed to have alternative promoters (9) and consequently one should investigate the promoters, UTRs and intronic regions of each individual transcript. In this update, we describe the new features and expansions of the ConTra webserver. In this tool, for any genomic region TF binding sites can be detected and visualized of the known transcripts of a gene of interest. Starting from one of nine reference organisms, a scientist can easily investigate regulation at the transcription level using the latest UCSC multiz alignments, which are accessible through the ConTra interface. Alternatively, sequence files and PWMs can be uploaded for analysis of the user's own data. Similar web tools with their pros and cons compared to ConTra v2 are listed in Supplementary Table S1.

NEW FEATURES

The first version of ConTra provided users with a flexible way to analyze promoter alignments (10). Users were able to visualize or explore TFBSs in the promoter region of a gene of interest. PWM libraries from the JASPAR CORE database and TRANSFAC database were used to identify TFBSs in a multi-species alignment with human as reference species. Even though the human genome is one of the most widely used reference genomes, the lack of other reference species and alignments was regarded as one of the most important shortcomings in the first version of ConTra. Furthermore, only the promoter region could be analyzed for TFBSs. The 2011 update of ConTra adds the following features. In addition to the promoter region, users can now look for TFBSs in 5′-UTR, 3′-UTR and introns. Evidence is rising that these regions are at least as important in transcriptional regulation as the promoter region itself (3–5,11). Mokry et al. (3) demonstrated that many (35–40%) of the TCF4 binding sites are intronic. Furthermore, considerable fractions of ZNF-263-, CTCF-, NRSF- and STAT1 binding sites are located in 5′-UTR, 3′-UTR and intronic regions. A detailed overview of the relative importance of the aforementioned genomic regions is given in Supplementary Table S2. In the first edition of ConTra, searching for TFBSs was only possible in multiple alignments in relation to the human genome, which left many users empty handed. In ConTra v2, multiple alignments with mouse, chicken, cow, frog, zebrafish, fruitfly, worm and yeast as reference species have been added. A detailed overview of the different genome assemblies, genes and multiz alignments available in ConTra v2 is presented in Table 1. Although the human genome is the most widely studied genome, other model organisms should not be ignored. The importance of the different model organisms is illustrated in Supplementary Figure S1, in which the popularity of the different organisms is compared in terms of PubMed hits.
Table 1.

Summary of the number of genes, non-coding genes and transcripts for each reference organism that can be analyzed in ConTra v2

Reference speciesCommon nameAssemblyGenesRefSeq transcriptsCoding (NM_) (%)Non-coding (NR_) (%)Ensembl transcriptsMultiple sequence alignment
Homo sapienshumanhg1922 16737 47486.313.7151 222multiz46way of 46 vertebrate genomes (hg19)
Mus musculusmousemm921 78627 62193.36.788 186multiz30way of 30 vertebrate genomes
Bos tauruscowbosTau411 55912 42797.72.331 598multiz5way: cow, dog, human, mouse, platypus
Gallus galluschickengalGal34905517690.19.923 392multiz7way: chicken, human, mouse, rat, opossum, frog, zebrafish
Xenopus tropicalisfrogxenTro28358969599.80.228 937multiz7way: frog, chicken, opossum, human, mouse, rat, zebrafish
Danio reriozebrafishdanRer613 81215 77695.64.432 992multiz6way: zebrafish, tetraodon, stickleback, frog, mouse, human
Drosophila melanogasterfruit flydm314 23023 55094.15.923 017multiz15way of 15 insects
Caenorhabditis eleganswormce619 90324 89297.12.935 019multiz6way of 6 worms
Saccharomyces cerevisiaeyeastsacCer27130nanana7130multiz7way of 7 yeast species

For each species, a specific UCSC multiz alignment is used.

Summary of the number of genes, non-coding genes and transcripts for each reference organism that can be analyzed in ConTra v2 For each species, a specific UCSC multiz alignment is used. In ConTra v2, transcripts can be searched for using the official HGNC gene name, HGNC symbol, alias, Ensembl gene ID (ENSG), the Entrez Gene ID, the RefSeq mRNA ID (NM_/NR_) or the Ensembl transcript ID (ENST). For every species, the most recent alignments are then automatically fetched from UCSC and processed. Users can select binding motifs from different sources, including the latest versions of the TRANSFAC database (update 2010.4) (12), the JASPAR core database update 2010 (13), the phyloFACTS database (14) and a collection of homeodomain TF PWMs derived from a protein binding microarray (PBM) (15). Furthermore, PWMs can be constructed by the user using the web interface. Creating a custom PWM is as easy as uploading a fasta file containing aligned sequences. The ConTra v2 web interface automatically converts the data into the right format. In ConTra v2, non-coding genes are no longer excluded from the analysis. TFs and miRNAs often work together in what is termed a feed-forward loop (FFL). These FFLs regulate many important biological processes, such as those in development and tumor formation (16). Non-coding transcripts are treated as regular transcripts in ConTra, and they can be analyzed in the same way. To verify whether the results on non-coding genes are meaningful, we looked for binding sites in the promoter region of miRNA-223 (hsa-miR-223 or MIR223) with RefSeq accession number NR_029637. Fukao et al. (17) have shown that MIR233 is regulated by a wide range of TFs, such as NFAT, C/EBP, GATA1 and PU.1. Analysis in ConTra v2 not only supports the presence of the binding sites for these TFs but also shows that they have been strongly conserved during evolution (Figure 1).
Figure 1.

Visualization of the evolutionarily conserved mechanism for miRNA-223 regulation in the promoter region, as described by Fukao et al. (17). (A) Multiz alignment showing the conserved binding sites. In orange, the C/EBP TF, predicted using the Jaspar positional weight matrix MA0102.2; in blue, the NFAT TF (TRANSFAC M00935); in green; the GATA1 TF (Jaspar MA0035.2); and in pink, the PU.1 TF (Jaspar MA0080.2). The figure was created with the free multiple alignment editor Jalview using the ConTra fasta and fc file on the results page. (B) Region of (A) was mapped using BLAT on the mir-223 promoter in the UCSC genome browser (black box). Blue box represents the miRNA location.

Visualization of the evolutionarily conserved mechanism for miRNA-223 regulation in the promoter region, as described by Fukao et al. (17). (A) Multiz alignment showing the conserved binding sites. In orange, the C/EBP TF, predicted using the Jaspar positional weight matrix MA0102.2; in blue, the NFAT TF (TRANSFAC M00935); in green; the GATA1 TF (Jaspar MA0035.2); and in pink, the PU.1 TF (Jaspar MA0080.2). The figure was created with the free multiple alignment editor Jalview using the ConTra fasta and fc file on the results page. (B) Region of (A) was mapped using BLAT on the mir-223 promoter in the UCSC genome browser (black box). Blue box represents the miRNA location. A wide variety of examples on the use of ConTra v2 can be found in online Supplementary Data. Supplementary Figures S2–S6 show results of ConTra v2 analyses on different genomic regions, using the UCSC multiz46way alignment based on the human hg19 reference sequence and illustrating experimentally validated binding sites from literature. Supplementary Figure S7 depicts an evolutionarily conserved binding site in the second intron of the Mus musculus nestin gene, as described by Jin et al. (18). In Supplementary Figure S8, two sine oculis (SO) binding sites are conserved in the second intron of the Drosophila Lz gene, which confirms the study of Yan et al. (19). Finally, the promoter of the S. cerevisiae PHD1 (FLO11) gene in Supplementary Figure S9 shows two conserved TEA TFBSs, which supports the regulatory mechanism proposed by Heise et al. (20). If the genomic region of interest, for example, from another reference organism or for a new transcript, is not available in ConTra, alignment files in either the UCSC multiple alignment format (MAF), in multi-fasta format or in clustal format can be uploaded. On the help page of the web site are demos showing how to obtain such a MAF file in the UCSC genome browser, how to upload and analyze this file, and how to use the feature color (fc) file and fasta file on the result page to produce publication-quality figures similar to those in the online Supplementary Data of this article. If a PWM model for a particular TF is not present in the available collections, uploading one's own PWM is also possible. This can be either in the PWM format, but less experienced users can simply upload an alignment file in multi-fasta format. ConTra automatically detects the input format and subsequently builds the PWM.

TECHNICAL DETAILS AND FOUR-STEP ANALYSIS PROCESS

ConTra v2 runs on a CentOS 5 server configured with an apache web server (version 2.2.3), MySQL server (5.0.77), PHP 5.1.6 and perl 5.8.8. The interface is programmed in PHP, and alignments are fetched from UCSC using perl scripts. TFBS hits for a user-defined motif are calculated using the Match algorithm. An overview picture of these hits, created with Jalview, is embedded in the overview page with the help of the Highslide thumbnail viewer (http://www.highslide.com). Different TFs on the result page are visualized dynamically using Javascript. For each alignment block, both a file with PWM scores and a file containing a phylogenetic conservation score for each TF is provided (see File 1 in Supplementary Data for more details). Scores in the ConTra v2 exploration part are calculated in the same way as in the previous version of ConTra, with the exception that due to the inclusion of other genomic regions, we no longer take into account the distance to the transcription start site. The ConTra v2 analysis consists of four steps. First, users have to choose whether they want to visualize or explore a gene of interest. In this step, it is also necessary to indicate the reference species and the gene of interest. The second step lists a group of available transcripts for genes matching the search terms, from which one can be selected. For every gene, all possible RefSeq and Ensembl transcript variants are listed with a link to the genomic location in the respective genome browser. This way, genes with alternative promoters, UTRs or alternative intronic regions can be analyzed for regulatory differences. In step three, different genomic regions of the selected transcript can be chosen (upstream, introns, 5′-UTR and 3′-UTR). The final step offers users an extensive choice of PWM motifs: up to 20 PWM motifs can be simultaneously taken into account for analysis. For the visualization part, results are split into alignment blocks (Supplementary Figure S10). These blocks consist of local alignments produced by the TBA program (threaded blockset aligner) (21). In the exploration part, a list of PWMs is given, ranked according to the prediction score.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Agency for Innovation through Science and Technology in Flanders (grant number 091213). Funding for open access charge: Department for Molecular Biomedical Research, VIB, Ghent, Belgium. Conflict of interest statement. None declared.
  21 in total

1.  A transcriptional chain linking eye specification to terminal determination of cone cells in the Drosophila eye.

Authors:  Huajun Yan; Jude Canon; Utpal Banerjee
Journal:  Dev Biol       Date:  2003-11-15       Impact factor: 3.582

2.  Aligning multiple genomic sequences with the threaded blockset aligner.

Authors:  Mathieu Blanchette; W James Kent; Cathy Riemer; Laura Elnitski; Arian F A Smit; Krishna M Roskin; Robert Baertsch; Kate Rosenbloom; Hiram Clawson; Eric D Green; David Haussler; Webb Miller
Journal:  Genome Res       Date:  2004-04       Impact factor: 9.043

3.  Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences.

Authors:  Michael F Berger; Gwenael Badis; Andrew R Gehrke; Shaheynoor Talukder; Anthony A Philippakis; Lourdes Peña-Castillo; Trevis M Alleyne; Sanie Mnaimneh; Olga B Botvinnik; Esther T Chan; Faiqua Khalid; Wen Zhang; Daniel Newburger; Savina A Jaeger; Quaid D Morris; Martha L Bulyk; Timothy R Hughes
Journal:  Cell       Date:  2008-06-27       Impact factor: 41.582

4.  Gene regulation by transcription factors and microRNAs.

Authors:  Oliver Hobert
Journal:  Science       Date:  2008-03-28       Impact factor: 47.728

5.  Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes.

Authors:  Kouichi Kimura; Ai Wakamatsu; Yutaka Suzuki; Toshio Ota; Tetsuo Nishikawa; Riu Yamashita; Jun-ichi Yamamoto; Mitsuo Sekine; Katsuki Tsuritani; Hiroyuki Wakaguri; Shizuko Ishii; Tomoyasu Sugiyama; Kaoru Saito; Yuko Isono; Ryotaro Irie; Norihiro Kushida; Takahiro Yoneyama; Rie Otsuka; Katsuhiro Kanda; Takahide Yokoi; Hiroshi Kondo; Masako Wagatsuma; Katsuji Murakawa; Shinichi Ishida; Tadashi Ishibashi; Asako Takahashi-Fujii; Tomoo Tanase; Keiichi Nagai; Hisashi Kikuchi; Kenta Nakai; Takao Isogai; Sumio Sugano
Journal:  Genome Res       Date:  2005-12-12       Impact factor: 9.043

6.  Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals.

Authors:  Xiaohui Xie; Jun Lu; E J Kulbokas; Todd R Golub; Vamsi Mootha; Kerstin Lindblad-Toh; Eric S Lander; Manolis Kellis
Journal:  Nature       Date:  2005-02-27       Impact factor: 49.962

7.  Efficient double fragmentation ChIP-seq provides nucleotide resolution protein-DNA binding profiles.

Authors:  Michal Mokry; Pantelis Hatzis; Ewart de Bruijn; Jan Koster; Rogier Versteeg; Jurian Schuijers; Marc van de Wetering; Victor Guryev; Hans Clevers; Edwin Cuppen
Journal:  PLoS One       Date:  2010-11-30       Impact factor: 3.240

8.  An evolutionarily conserved mechanism for microRNA-223 expression revealed by microRNA gene profiling.

Authors:  Taro Fukao; Yoko Fukuda; Kotaro Kiga; Jafar Sharif; Kimihiro Hino; Yutaka Enomoto; Aya Kawamura; Kaito Nakamura; Tsutomu Takeuchi; Masanobu Tanabe
Journal:  Cell       Date:  2007-05-04       Impact factor: 41.582

9.  Second intron of mouse nestin gene directs its expression in pluripotent embryonic carcinoma cells through POU factor binding site.

Authors:  Zhi-Gang Jin; Li Liu; Hua Zhong; Ke-Jing Zhang; Yong-Feng Chen; Wei Bian; Le-Ping Cheng; Nai-He Jing
Journal:  Acta Biochim Biophys Sin (Shanghai)       Date:  2006-03       Impact factor: 3.848

10.  TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes.

Authors:  V Matys; O V Kel-Margoulis; E Fricke; I Liebich; S Land; A Barre-Dirrie; I Reuter; D Chekmenev; M Krull; K Hornischer; N Voss; P Stegmaier; B Lewicki-Potapov; H Saxel; A E Kel; E Wingender
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

View more
  26 in total

1.  Transgene-mediated co-suppression of DNA topoisomerase-1 gene in Caenorhabditis elegans.

Authors:  Myon-Hee Lee; Dong Seok Cha; Srivalli Swathi Mamillapalli; Young Chul Kwon; Hyeon-Sook Koo
Journal:  Int J Biochem Mol Biol       Date:  2014-05-15

2.  Cytosolic branched chain aminotransferase (BCATc) regulates mTORC1 signaling and glycolytic metabolism in CD4+ T cells.

Authors:  Elitsa A Ananieva; Chirag H Patel; Charles H Drake; Jonathan D Powell; Susan M Hutson
Journal:  J Biol Chem       Date:  2014-05-20       Impact factor: 5.157

3.  Smooth Muscle Contact Drives Endothelial Regeneration by BMPR2-Notch1-Mediated Metabolic and Epigenetic Changes.

Authors:  Kazuya Miyagawa; Minyi Shi; Pin-I Chen; Jan K Hennigs; Zhixin Zhao; Mouer Wang; Caiyun G Li; Toshie Saito; Shalina Taylor; Silin Sa; Aiqin Cao; Lingli Wang; Michael P Snyder; Marlene Rabinovitch
Journal:  Circ Res       Date:  2019-01-18       Impact factor: 17.367

4.  KDEL Receptors Are Differentially Regulated to Maintain the ER Proteome under Calcium Deficiency.

Authors:  Kathleen A Trychta; Susanne Bäck; Mark J Henderson; Brandon K Harvey
Journal:  Cell Rep       Date:  2018-11-13       Impact factor: 9.423

5.  Adiponectin influences progesterone production from MA-10 Leydig cells in a dose-dependent manner.

Authors:  David Landry; Aurélie Paré; Stéphanie Jean; Luc J Martin
Journal:  Endocrine       Date:  2014-10-22       Impact factor: 3.633

6.  p53 promotes VEGF expression and angiogenesis in the absence of an intact p21-Rb pathway.

Authors:  M Farhang Ghahremani; S Goossens; D Nittner; X Bisteau; S Bartunkova; A Zwolinska; P Hulpiau; K Haigh; L Haenebalcke; B Drogat; A Jochemsen; P P Roger; J-C Marine; J J Haigh
Journal:  Cell Death Differ       Date:  2013-03-01       Impact factor: 15.828

7.  Systematic interaction network filtering identifies CRMP1 as a novel suppressor of huntingtin misfolding and neurotoxicity.

Authors:  Martin Stroedicke; Yacine Bounab; Nadine Strempel; Konrad Klockmeier; Sargon Yigit; Ralf P Friedrich; Gautam Chaurasia; Shuang Li; Franziska Hesse; Sean-Patrick Riechers; Jenny Russ; Cecilia Nicoletti; Annett Boeddrich; Thomas Wiglenda; Christian Haenig; Sigrid Schnoegl; David Fournier; Rona K Graham; Michael R Hayden; Stephan Sigrist; Gillian P Bates; Josef Priller; Miguel A Andrade-Navarro; Matthias E Futschik; Erich E Wanker
Journal:  Genome Res       Date:  2015-05       Impact factor: 9.043

8.  CBS: an open platform that integrates predictive methods and epigenetics information to characterize conserved regulatory features in multiple Drosophila genomes.

Authors:  Enrique Blanco; Montserrat Corominas
Journal:  BMC Genomics       Date:  2012-12-10       Impact factor: 3.969

9.  Transcriptome characterization of immune suppression from battlefield-like stress.

Authors:  S Muhie; R Hammamieh; C Cummings; D Yang; M Jett
Journal:  Genes Immun       Date:  2012-10-25       Impact factor: 2.676

10.  SOX4 mediates TGF-β-induced expression of mesenchymal markers during mammary cell epithelial to mesenchymal transition.

Authors:  Stephin J Vervoort; Ana Rita Lourenço; Ruben van Boxtel; Paul J Coffer
Journal:  PLoS One       Date:  2013-01-03       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.