Literature DB >> 18032429

YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiae.

Pedro T Monteiro¹, Nuno D Mendes, Miguel C Teixeira, Sofia d'Orey, Sandra Tenreiro, Nuno P Mira, Hélio Pais, Alexandre P Francisco, Alexandra M Carvalho, Artur B Lourenço, Isabel Sá-Correia, Arlindo L Oliveira, Ana T Freitas.

Abstract

The Yeast search for transcriptional regulators and consensus tracking (YEASTRACT) information system (www.yeastract.com) was developed to support the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in September 2007, this database contains over 30 990 regulatory associations between Transcription Factors (TFs) and target genes and includes 284 specific DNA binding sites for 108 characterized TFs. Computational tools are also provided to facilitate the exploitation of the gathered data when solving a number of biological questions, in particular the ones that involve the analysis of global gene expression results. In this new release, YEASTRACT includes DISCOVERER, a set of computational tools that can be used to identify complex motifs over-represented in the promoter regions of co-regulated genes. The motifs identified are then clustered in families, represented by a position weight matrix and are automatically compared with the known transcription factor binding sites described in YEASTRACT. Additionally, in this new release, it is possible to generate graphic depictions of transcriptional regulatory networks for documented or potential regulatory associations between TFs and target genes. The visual display of these networks of interactions is instrumental in functional studies. Tutorials are available on the system to exemplify the use of all the available tools.

Entities: Chemical Species

Mesh：

Substances：
Transcription Factors

Year: 2007 PMID： 18032429 PMCID： PMC2238916 DOI： 10.1093/nar/gkm976

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

OVERVIEW

YEASTRACT (Yeast Search for Transcriptional Regulators And Consensus Tracking; www.yeastract.com) was originally proposed (1) to make publicly available up-to-date information on documented regulatory associations between TFs and target genes, as well as between TFs and DNA binding sites, in Saccharomyces cerevisiae. Additionally, it provides a set of bioinformatics tools that facilitate the full exploitation of the data. Although part of the data was obtained from existing yeast data repository like the S. cerevisiae genome database (SGD) (2), the gene ontology (GO) consortium (3) and the regulatory sequences analysis tools (RSAT) (4), all the data on gene regulation was gathered based on exhaustive literature analysis. The value of YEASTRACT comes from the integration of complete and up-to-date regulatory information, with a number of analysis methods and computational tools. The usefulness of YEASTRACT for the analysis of gene lists, in particular those coming from gene expression analysis by microarrays, also distinguishes this information system from others. Although other databases have also made available information about regulatory mechanisms in yeast and other organisms [e.g. MYBS (5) and TRANSFAC (6)] or computational tools for the analysis of promoter regions [RSAT (4)], YEASTRACT is the system that most seamlessly integrates extensive regulation data and computational tools for the analysis of this information. The database presently contains more than 30 990 regulatory associations between genes and TFs, based on more than 1000 bibliographic references. These include five papers describing global ChIP analysis (7–11), which document 75% of the gathered regulatory associations and one microarray analysis on the effect of the deletion of 55 TFs (12), documenting 15% of these regulatory associations. The results of hundreds of other articles describing more detailed molecular analysis and revealing many regulatory associations that were not detected by global experiments are also included in this version of the system. The explosion of the scientific knowledge in the field of transcriptional regulation led to a 300% increase on the actual number of regulatory associations in the system, with respect to the first release. Each regulation has been annotated manually, after expert examination of the relevant references. The database presently contains 284 specific DNA binding sites for 108 characterized TFs. The total number of TFs in the database is 170, which corresponds to all genes that are identified as TFs at SGD. A comprehensive description of the content and structure of YEASTRACT has been presented in the first publication of this system (1). At a high level, the internal structure of the database is organized around the concept of gene, protein and binding site (consensus) and these three concepts are related by regulation relations. These relations document the associations between TFs and target genes and can be of two types: documented and potential. In the first release, the system made available a set of queries to facilitate the exploitation of the gathered data when solving a number of biological questions, in particular those that involve the analysis of global gene expression results. In the first 6 months of 2007, researchers from more than 300 different institutions, from 70 different countries, have performed over 90 000 queries using YEASTRACT. The number of queries in this period has already reached the total number of queries performed during 2006. In this new release, the available queries and additional utilities were reorganized to simplify their use, maintaining the user-friendly interface and functionality, which were already present in the original release of the system. YEASTRACT has already demonstrated its usefulness as a tool to support research on transcription regulation processes in yeast (13). Nonetheless, this release significantly extends the capabilities of the system by connecting it with a number of data processing tools that will significantly increase its usefulness. YEASTRACT now includes DISCOVERER, a system that enables the user to search for common motifs in the promoter region of genes, using efficient algorithms for structured motif discovery and to automatically compare the results with the transcription factor binding sites (TFBS) described in YEASTRACT. Pattern matching algorithms were also included to enable the user to search the promoter region of one or more genes, for one or more DNA motifs, specified using a number of different representations. Another important new feature is the possibility to identify and display transcription regulatory networks (TRNs) for documented and potential regulatory associations between TFs and target genes. This feature supports the analysis of regulatory mechanisms, based on permanently up-to-date, manually checked, information. Such an analysis will, in the future, support mechanisms for the inference of TRNs in S. cerevisiae, one of the main strategic objectives of this project.

DISCOVERER

The precise coordinated control of gene expression is accomplished by the interplay of multiple regulatory mechanisms. The transcriptional machinery is recruited to the promoter leading to the transcription of the downstream gene through the binding of transcription regulatory proteins to short nucleotide sequences occurring in gene promoter regions. To support the analysis of the promoter sequences in the yeast genome, a set of software tools is available in DISCOVERER. DISCOVERER provides tools for motif extraction, which consists on the identification of de novo binding site consensus sequences from a given set of non-coding DNA sequences (such as the promoter regions of a gene). DISCOVERER contains two distinct structured motif discovery algorithms: MUSA (14) and RISO (15). When the algorithms finish, the user receives, by e-mail, a link to a web page (Figure 1) where it is possible to download the complete list of motifs found, ordered by their P-value and showing the proportion of sequences containing each motif (the quorum). The motifs identified are also clustered in families, represented by a position weight matrix (PWM) description. This assembling of individual motifs into families of motifs is very useful in reducing the number of motifs, leading to a more tractable output and to a more intuitive motif representation. A new algorithm for the motif assembling problem was developed (16), since this is a very difficult problem in its own right (17,18). From the output page it is also possible to download the list of motifs and the PWM description for each family.

Figure 1.

Sample pages showing the motif finder output presenting a PWM for each motif family found and the match output obtained by querying the database for TFs binding sites that match the selected family. These results were obtained for the sample data available in the algorithm's input page. A detailed analysis of these results is available in the DISCOVERER tutorial. Each PWM can be selected, to be compared with the TFBS that are described in the YEASTRACT database. The input PWM is locally aligned [using the Smith–Waterman local alignment algorithm (19)] with each of the TFBS PWM, using a specific column distance metric from a set of options available. The list of the top twenty scoring alignments is displayed for user inspection.

NEW REFINEMENTS

Pattern matching

YEASTRACT now makes available pattern matching methods, supporting the search for one or more nucleotide sequences (e.g. TFBS) within the promoter region of chosen genes, thus leading to the identification of putative target genes for specificTFs. However, the TFBS, used for pattern matching, have to be provided by the user. The query string may be a simple nucleotide sequence, a sequence containing IUPAC nucleotide code or even a sequence containing regular expression elements. The forward and reverse strands of the promoters are searched for the input motifs. This search returns a list of genes in whose promoters the patterns were found, including the number of occurrences in each promoter. The patterns that matched the promoter sequences and their locations in the promoters are also displayed. The queries that were refined, or that are new, using the extended pattern matching capability, are the following ones: Search by DNA motif Find TFBs Search Motifs on Motifs

Transcriptional regulatory networks

Recent studies in data collection and analysis (7,20,21) have shown that the information needed to understand regulatory networks must come from the integration of different sources, such as genomic sequence data, genome-wide transcription data, structural information and biological literature. The comprehensive data on regulatory associations available in YEASTRACT makes it possible to identify and visualize TRNs for documented (i.e. described in the literature) and potential (a known binding site is present in the promoter region) regulatory associations between TF and target genes (Figure 2).

Figure 2.

Sample pages showing the regulation graph obtained from the ‘Group by TF’ query. Shaded squares represent TF and white squares represent genes. Arrows represent interactions between each TF and target genes. These transcriptional regulatory networks and, in particular, the documented ones, correspond to static regulatory networks in S. cerevisiae, since the evidence for the regulatory associations has been described for different processes and experimental conditions. The generation of TRNs through the queries ‘Group by TF’ and ‘Generate Regulation Matrix’, enables the analysis of regulatory mechanisms, supported by up-to-date, manually checked, information. Such an analysis will, in the future, support the inference of mechanisms underlying gene regulatory networks in S. cerevisiae.

19 in total

1. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal: Nat Genet Date: 2000-05 Impact factor: 38.330

2. Complex transcriptional circuitry at the G1/S transition in Saccharomyces cerevisiae.

Authors: Christine E Horak; Nicholas M Luscombe; Jiang Qian; Paul Bertone; Stacy Piccirrillo; Mark Gerstein; Michael Snyder
Journal: Genes Dev Date: 2002-12-01 Impact factor: 11.361

3. Network motifs: simple building blocks of complex networks.

Authors: R Milo; S Shen-Orr; S Itzkovitz; N Kashtan; D Chklovskii; U Alon
Journal: Science Date: 2002-10-25 Impact factor: 47.728

4. Regulatory sequence analysis tools.

Authors: Jacques van Helden
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

5. Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction.

Authors: Esti Yeger-Lotem; Shmuel Sattath; Nadav Kashtan; Shalev Itzkovitz; Ron Milo; Ron Y Pinter; Uri Alon; Hanah Margalit
Journal: Proc Natl Acad Sci U S A Date: 2004-04-12 Impact factor: 11.205

6. Identification of common molecular subsequences.

Authors: T F Smith; M S Waterman
Journal: J Mol Biol Date: 1981-03-25 Impact factor: 5.469

7. Transcriptional regulatory code of a eukaryotic genome.

Authors: Christopher T Harbison; D Benjamin Gordon; Tong Ihn Lee; Nicola J Rinaldi; Kenzie D Macisaac; Timothy W Danford; Nancy M Hannett; Jean-Bosco Tagne; David B Reynolds; Jane Yoo; Ezra G Jennings; Julia Zeitlinger; Dmitry K Pokholok; Manolis Kellis; P Alex Rolfe; Ken T Takusagawa; Eric S Lander; David K Gifford; Ernest Fraenkel; Richard A Young
Journal: Nature Date: 2004-09-02 Impact factor: 49.962

8. Transcriptional regulatory networks in Saccharomyces cerevisiae.

Authors: Tong Ihn Lee; Nicola J Rinaldi; François Robert; Duncan T Odom; Ziv Bar-Joseph; Georg K Gerber; Nancy M Hannett; Christopher T Harbison; Craig M Thompson; Itamar Simon; Julia Zeitlinger; Ezra G Jennings; Heather L Murray; D Benjamin Gordon; Bing Ren; John J Wyrick; Jean-Bosco Tagne; Thomas L Volkert; Ernest Fraenkel; David K Gifford; Richard A Young
Journal: Science Date: 2002-10-25 Impact factor: 47.728

9. Transcription factor binding site identification in yeast: a comparison of high-density oligonucleotide and PCR-based microarray platforms.

Authors: Anthony R Borneman; Zhengdong D Zhang; Joel Rozowsky; Michael R Seringhaus; Mark Gerstein; Michael Snyder
Journal: Funct Integr Genomics Date: 2007-07-19 Impact factor: 3.674

10. The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae.

Authors: Miguel C Teixeira; Pedro Monteiro; Pooja Jain; Sandra Tenreiro; Alexandra R Fernandes; Nuno P Mira; Marta Alenquer; Ana T Freitas; Arlindo L Oliveira; Isabel Sá-Correia
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

71 in total

1. Decoupling the Variances of Heterosis and Inbreeding Effects Is Evidenced in Yeast's Life-History and Proteomic Traits.

Authors: Marianyela Petrizzelli; Dominique de Vienne; Christine Dillmann
Journal: Genetics Date: 2018-12-03 Impact factor: 4.562

2. Systems-level engineering of nonfermentative metabolism in yeast.

Authors: Caleb J Kennedy; Patrick M Boyle; Zeev Waks; Pamela A Silver
Journal: Genetics Date: 2009-06-29 Impact factor: 4.562

Review 3. Learning transcriptional regulation on a genome scale: a theoretical analysis based on gene expression data.

Authors: Ming Wu; Christina Chan
Journal: Brief Bioinform Date: 2011-05-26 Impact factor: 11.622

4. Predicting eukaryotic transcriptional cooperativity by Bayesian network integration of genome-wide data.

Authors: Yong Wang; Xiang-Sun Zhang; Yu Xia
Journal: Nucleic Acids Res Date: 2009-08-06 Impact factor: 16.971

5. Evolutionary tinkering with conserved components of a transcriptional regulatory network.

Authors: Hugo Lavoie; Hervé Hogues; Jaideep Mallick; Adnane Sellam; André Nantel; Malcolm Whiteway
Journal: PLoS Biol Date: 2010-03-09 Impact factor: 8.029

6. Gene expression prediction by soft integration and the elastic net-best performance of the DREAM3 gene expression challenge.

Authors: Mika Gustafsson; Michael Hörnquist
Journal: PLoS One Date: 2010-02-16 Impact factor: 3.240

7. Inferring Transcriptional Interactions by the Optimal Integration of ChIP-chip and Knock-out Data.

Authors: Haoyu Cheng; Lihua Jiang; Maoying Wu; Qi Liu
Journal: Bioinform Biol Insights Date: 2009-10-21

8. Using GeneReg to construct time delay gene regulatory networks.

Authors: Tao Huang; Lei Liu; Ziliang Qian; Kang Tu; Yixue Li; Lu Xie
Journal: BMC Res Notes Date: 2010-05-25

9. Structural and functional study of YER067W, a new protein involved in yeast metabolism control and drug resistance.

Authors: Tatiana Domitrovic; Guennadi Kozlov; João Claudio Gonçalves Freire; Claudio Akio Masuda; Marcius da Silva Almeida; Mónica Montero-Lomeli; Georgia Correa Atella; Edna Matta-Camacho; Kalle Gehring; Eleonora Kurtenbach
Journal: PLoS One Date: 2010-06-17 Impact factor: 3.240

10. Maximization of negative correlations in time-course gene expression data for enhancing understanding of molecular pathways.

Authors: Tao Zeng; Jinyan Li
Journal: Nucleic Acids Res Date: 2009-10-23 Impact factor: 16.971