| Literature DB >> 24265222 |
Maria C Costanzo1, Stacia R Engel, Edith D Wong, Paul Lloyd, Kalpana Karra, Esther T Chan, Shuai Weng, Kelley M Paskov, Greg R Roe, Gail Binkley, Benjamin C Hitz, J Michael Cherry.
Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the community resource for genomic, gene and protein information about the budding yeast Saccharomyces cerevisiae, containing a variety of functional information about each yeast gene and gene product. We have recently added regulatory information to SGD and present it on a new tabbed section of the Locus Summary entitled 'Regulation'. We are compiling transcriptional regulator-target gene relationships, which are curated from the literature at SGD or imported, with permission, from the YEASTRACT database. For nearly every S. cerevisiae gene, the Regulation page displays a table of annotations showing the regulators of that gene, and a graphical visualization of its regulatory network. For genes whose products act as transcription factors, the Regulation page also shows a table of their target genes, accompanied by a Gene Ontology enrichment analysis of the biological processes in which those genes participate. We additionally synthesize information from the literature for each transcription factor in a free-text Regulation Summary, and provide other information relevant to its regulatory function, such as DNA binding site motifs and protein domains. All of the regulation data are available for querying, analysis and download via YeastMine, the InterMine-based data warehouse system in use at SGD.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24265222 PMCID: PMC3965049 DOI: 10.1093/nar/gkt1158
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Types of data accompanying regulatory annotations in SGD
| Data type | Description | Value | Additional notes |
|---|---|---|---|
| Regulator gene | Gene identifier | Genetic name and systematic name | |
| Target gene | Gene identifier | Genetic name and systematic name | |
| Evidence | ECO term | ECO term name and identifier | YEASTRACT-specific evidence codes are mapped to ECO by SGD |
| Condition | Experimental condition such as temperature, growth phase, growth medium and chemical stress | Free text | Noted for some but not all annotations |
| Direction | Activation versus repression | ‘activated’ or ‘repressed’ | Noted where available; YEASTRACT ‘positive’ or ‘negative’ are mapped to ‘activated’ or ‘repressed’ by SGD |
| Strain | Genetic background of the | Strain background name | Noted for SGD-curated annotations |
| Confidence score | Measure of the probability that a binding site is not due to chance | Noted for SGD-curated annotations where available | |
| Source | Database project by which the curation was performed | SGD or YEASTRACT | |
| Reference | Published article supporting the regulatory relationship | PubMed ID |
Figure 1.Summary features of the Regulation page. Top, the Regulation Summary paragraph, written by SGD biocurators, provides an overview of the context in which a TF acts. The Regulation Summary paragraph for the TF Pdr1p is shown. Gene and protein names in the Regulation Summary paragraph are linked to their SGD Locus Summary pages, and reference citations are linked to their SGD Paper pages. The Regulation Summary appears only on the Regulation pages for TFs. Bottom, a graphic indicates the number of targets and regulators for the gene of the page. This summary graphic is present on Regulation pages for all genes. The ‘Targets’ and ‘Regulators’ buttons generate lists of the targets or the regulators and their Descriptions and present links to download the gene list or to analyze it further by sending it to the GO Term Finder [http://www.yeastgenome.org/cgi-bin/GO/goTermFinder.pl; (21)], GO Slim Mapper http://www.yeastgenome.org/cgi-bin/GO/goSlimMapper.pl), SPELL [http://spell.yeastgenome.org/; (22)] or YeastMine [http://yeastmine.yeastgenome.org/yeastmine/begin.do; (23)] tools. The graphic may also be captured as an image file.
Figure 2.The Targets table for an example regulator gene, PDR1. The table header shows the number of annotations in the table as well as the number of unique target genes. Data types displayed in the table are described in Table 1. Columns may be sorted by clicking the up or down arrows in the column headers. The ‘Search’ box filters the table so that only those rows containing the search criteria are displayed. The ‘Download’ button downloads table data as a tab-delimited text file. The ‘Analyze’ button generates a list of the genes in the table, with their Descriptions, that may be downloaded or analyzed further using the GO Term Finder [http://www.yeastgenome.org/cgi-bin/GO/goTermFinder.pl; (21)], GO Slim Mapper (http://www.yeastgenome.org/cgi-bin/GO/goSlimMapper.pl), SPELL [http://spell.yeastgenome.org/; (22)] or YeastMine [http://yeastmine.yeastgenome.org/yeastmine/begin.do; (23)] tools. The Regulators table, showing regulators whose target is the gene of the page, shares the same interactive features.
Figure 3.Network visualization for an example gene, PDR3, showing regulatory relationships supported by five or more annotations. The network is drawn using Cytoscape.js (http://cytoscape.github.io/cytoscape.js/; (24)]. Users can zoom in and out of the network view using a trackpad or mouse wheel and reposition genes by dragging. Genes depicted in the graphic are color coded by their relationships with the central gene: regulators are green, whereas targets are purple. Radio buttons in the top left corner allow display of only targets or only regulators, and the adjustable slider at the bottom sets the minimum number of annotations supporting the relationships shown. To present meaningful networks, we limit each to no >100 genes and no >250 regulatory relationships. A network visualization does not appear for genes whose regulatory relationship data do not meet these criteria: for example, a gene that has >250 regulatory relationships, each supported by just one annotation.
YeastMine template queries for regulation data
| Template name | Description |
|---|---|
| Retrieve all Regulators and their Summary Paragraphs | Retrieve the list of all regulator genes that have Regulation Summary paragraphs, along with the paragraph text and references |
| Gene (regulator) → Gene (targets) | Start with the gene name or systematic name of a regulator (or a list) and retrieve its target genes, along with evidence (ECO term name and ID), reference and source. For some targets, data on experimental condition, direction of regulation (activated or repressed), |
| Gene → Regulation Summary + References | Start with the gene name or systematic name of a regulator (or a list) and retrieve its Regulation Summary paragraph and references for the paragraph. |
| Gene → PWM-predicted binding sites | Start with the gene name or systematic name of a regulator (or a list) and retrieve the number of its binding sites that are predicted in the genome, either between genes or within genes. Numbers predicted at three different |
| Gene (target) → Gene (regulators) | Start with the gene name or systematic name of a target (or a list) and retrieve its regulators, along with evidence (ECO term name and ID), reference and source. For some targets, data on experimental condition, direction of regulation (activated or repressed), |
| Gene → Protein →Domains | Start with a gene name or systematic name (or a list) and retrieve its domains as determined by InterProScan analysis. |
| JASPAR Class → Genes | Select the name of a transcription factor class, as defined by the JASPAR database, from the pull-down menu and retrieve all of the |
| JASPAR Family → Genes | Select the name of a transcription factor family, as defined by the JASPAR database, from the pull-down menu and retrieve all of the |
All the template queries are categorized as ‘Regulation’ templates in YeastMine (http://yeastmine.yeastgenome.org/yeastmine/begin.do) except for the Gene → Protein → Domains template, which is in the ‘Protein’ category.
aTo generate the predicted binding site data, genomic matches to the high- and medium-confidence position-weighted matrices (PWMs) from the YeTFaSCo database were located using the FIMO algorithm (20). The fjoin algorithm (27) was used to determine the overlap of matches at different FIMO P-value thresholds (P < 1e-03, 1e-04 and 1e-05) with either ORF features from the saccharomyces_cerevisiae.gff file (including Dubious ORFs; available at http://www.yeastgenome.org/download-data/sequence) or with intergenic regions (as downloaded from YeastMine using the ‘Gene-→Upstream intergenic region’ template for all ORFs). Overlap was defined to be at least 1 overlapping basepair.