| Literature DB >> 22434826 |
Esther T Chan1, J Michael Cherry.
Abstract
The Saccharomyces Genome Database (SGD) is compiling and annotating a comprehensive catalogue of functional sequence elements identified in the budding yeast genome. Recent advances in deep sequencing technologies have enabled for example, global analyses of transcription profiling and assembly of maps of transcription factor occupancy and higher order chromatin organization, at nucleotide level resolution. With this growing influx of published genome-scale data, come new challenges for their storage, display, analysis and integration. Here, we describe SGD's progress in the creation of a consolidated resource for genome sequence elements in the budding yeast, the considerations taken in its design and the lessons learned thus far. The data within this collection can be accessed at http://browse.yeastgenome.org and downloaded from http://downloads.yeastgenome.org. DATABASE URL: http://www.yeastgenome.org.Entities:
Mesh:
Year: 2012 PMID: 22434826 PMCID: PMC3308148 DOI: 10.1093/database/bar057
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Summary of collected Yeast Genome Map data sets, as of September 2011
| Data type | Description | Number of publications |
|---|---|---|
| Chromatin conformation capture | Capture of chromatin interactions using 3C, 4C, 5C and other related technologies | 1 |
| ChIP-chip | DNA fragments from ChIP purifications, measured by tiling microarrays | 12 |
| ChIP-seq | DNA fragments from ChIP purifications, measured by sequencing | 1 |
| DNase-chip | Measurement of DNase-digested DNA by tiling microarrays | 0 |
| DNase-seq | Sequencing of DNase-digested DNA | 0 |
| FAIRE | Formaldehyde-assisted isolation of regulatory elements | 0 |
| Curated features | Genome feature annotations manually curated by SGD | a |
| Nucleosome profiles | Genome-wide organization of nucleosomes | 6 |
| Other | Other techniques, including DNA-chip and DNA-seq | 3 |
| RNA-chip | RNA expression measured by tiling microarrays | 4 |
| RNA-seq | RNA expression measured by sequencing | 4 |
| SAGE | Serial analysis of gene expression | 2 |
| Total | 33 |
aEleven SGD curated feature tracks available in GBrowse were collected from multiple sequencing projects and publications by SGD biocurators over the course of the SGD project. Zero-numbered data types represent identified gaps that will be filled within our collection.
Figure 1.Flowchart showing the basic data identification, processing, review and release procedure performed by SGD biocurators (blue) and bioanalysts (some blue and all other colours). SGD biocurators perform the first 3 steps in blue as part of their regular literature triage, whereas SGD bioanalysts perform steps 2 and 3 in blue following the biocurators, with an eye for collectible data to integrate (all other colours).
Data types currently collected from each study, where applicable
| Data class | Data type | Description |
|---|---|---|
| General info | Free text | The general goal and outcome of the study |
| General info | Free text | The goal and outcome of each experiment performed |
| Metadata | Protocol | Experimental technique(s) used (e.g. ChIP-chip, RNA-seq, SAGE) |
| Metadata | Protocol | Experimental platform(s) used (e.g. microarray manufacturer and type, sequencing method) |
| Metadata | Protocol | Experimental conditions (e.g. growth media, temperature, chemical treatments) |
| Metadata | Protocol | Experimental control(s) (e.g. controls used to normalize data in two-colour arrays, ChIP-chip/ ChIP-seq binding ratios) |
| Metadata | Reagents | Cell type population (e.g. asynchronous, cell cycle phase-arrested, cell cycle phase enriched) |
| Metadata | Reagents | Antibody information where applicable (e.g. the molecule the antibody was raised against, the catalogue number or identifier of the source) |
| Metadata | Free text | Accession numbers for database repositories (e.g. GEO, ArrayExpress, SRA, GenBank) |
| Metadata | Free text | URLs to supplementary websites |
| Metadata | Free text | Genome sequence version number, date and source (e.g. UCSC sacCer2 June 2008) |
| File | Link | Supplementary files (e.g. supplementary methods, figures and tables provided by the publisher, if applicable) |
| File | Link | Additional files (e.g. Additional data, methods, figures and tables provided by the authors, if applicable) |
Figure 2.An example file header from a bedGraph file, containing the associated metadata collected from Guillemette et al. (28). The header is consistent across different standardized file types and generally contains the following sections: (a) track header (bed, wiggle and bedGraph) or GFF3 directives; (b) abbreviated publication reference and genome version information; (c) file version and modification dates; (d) publication citation from which the enclosed data is collected; (e) brief summary of the publication goal and/or findings; (f) brief summary of origin of enclosed data; (g) reserved ‘tag=value’ pairs containing experimental metadata details; (h) column descriptors for the enclosed standardized formatted data (bedGraph, in this example); and (i) bedGraph-formatted data values.
Figure 3.A Yeast Genome Map screenshot. Box (a) magnifies the tool bar present on each displayed data track. This tool bar can be used to customize one's browsing experience. From left to right, the buttons are ‘favourite’, ‘minimize’, ‘close’, ‘share track’, ‘edit track display’, ‘save track’ and ‘about this track’. The ‘favourite’ button selects the track as a favourite for easy future access. The minimize’ and ‘close’ buttons perform those respective actions on the selected data track. The ‘share track’ button provides URL links that can be copy and pasted into the address bar of another web browser or other GBrowse instances. The ‘edit track display’ button allows one to change the track properties, including glyph shapes, colours and scale. The ‘save track’ button allows for the data track to be saved for the displayed region, the entire chromosome, or the entire dataset. Lastly, the ‘about this track’ button provides a pop-up box with information on the originating data, including the publication citation, the strain(s) used and links to supplementary data files and documentation on the SGD download page. Box (b) and (c) show examples of different glyph types that can be used to display different data types. In this instance, box b shows ORC and MCM2 ChIP-chip data from Xu et al. (47) using the ‘vista_plot’ glyph, which allows superimposition of segment data such as peak calls over continuous data values. Box (c) shows normalized nucleosome occupancy as determined by Kaplan et al. (48) using the ‘wiggle_whiskers’ glyph, which standardizes display of continuous data as z-scores about the mean (x-axis).