| Literature DB >> 17099226 |
Tanya Barrett1, Dennis B Troup, Stephen E Wilhite, Pierre Ledoux, Dmitry Rudnev, Carlos Evangelista, Irene F Kim, Alexandra Soboleva, Maxim Tomashevsky, Ron Edgar.
Abstract
The Gene Expression Omnibus (GEO) repository at the National Center for Biotechnology Information (NCBI) archives and freely disseminates microarray and other forms of high-throughput data generated by the scientific community. The database has a minimum information about a microarray experiment (MIAME)-compliant infrastructure that captures fully annotated raw and processed data. Several data deposit options and formats are supported, including web forms, spreadsheets, XML and Simple Omnibus Format in Text (SOFT). In addition to data storage, a collection of user-friendly web-based interfaces and applications are available to help users effectively explore, visualize and download the thousands of experiments and tens of millions of gene expression patterns stored in GEO. This paper provides a summary of the GEO database structure and user facilities, and describes recent enhancements to database design, performance, submission format options, data query and retrieval utilities. GEO is accessible at http://www.ncbi.nlm.nih.gov/geo/Entities:
Mesh:
Year: 2006 PMID: 17099226 PMCID: PMC1669752 DOI: 10.1093/nar/gkl887
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1A selection of GEO screenshots from a typical experiment (GEO DataSet GDS877; 16). (A) DataSet record includes experiment summary information, DataSet subset classifications, and access to data mining features such as hierarchical cluster heat map and ‘Query subset A versus B’ tool. (B) DataSet hierarchical cluster heat map calculated by un-centered correlation coefficient/average linkage option. Regions of interest can be selected using the red image cropper box, then either expanded to view sample and gene annotation, downloaded, charted as line plots, or linked directly to corresponding Entrez GEO profiles records. (C) GEO profiles retrieval results; each entity includes sequence identifier and DataSet information, and a thumbnail profile image. (D) Expanded profile chart depicts expression value information for the crystallin gene across each sample in DataSet GDS877. Experimental subset groupings are reflected in labels at foot of chart.
Figure 2A schematic overview of query workflow, and how various features and tools are interlinked. A description of the location and purpose of these features is provided in Table 1.
Summary of location and purpose of various GEO data mining tools and features