| Literature DB >> 18940857 |
Tanya Barrett1, Dennis B Troup, Stephen E Wilhite, Pierre Ledoux, Dmitry Rudnev, Carlos Evangelista, Irene F Kim, Alexandra Soboleva, Maxim Tomashevsky, Kimberly A Marshall, Katherine H Phillippy, Patti M Sherman, Rolf N Muertter, Ron Edgar.
Abstract
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functional genomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as 'Minimum Information About a Microarray Experiment' (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.Entities:
Mesh:
Year: 2008 PMID: 18940857 PMCID: PMC2686538 DOI: 10.1093/nar/gkn764
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A selection of GEO screenshots. The DataSet Browser (A) enables simple keyword searches for DataSets. When a DataSet is selected, a window appears (B) which contains detailed information about that DataSet, download options, and links to analysis features including gene expression profiles (C). Each expression profile can be viewed in more detail to see the activity of that gene across all Samples in the DataSet (D).
GEO deposit options and formats
| Option | Format | Key Points |
|---|---|---|
| Web deposit | Web forms | Deposit of individual records. Simple step-by-step interactive web forms. |
| GEOarchive | Spreadsheets (e.g. Excel) | Batch deposit. Good choice for most users who have many Samples to submit. |
| SOFT (Simple Omnibus Format in Text) | Plain text | Batch deposit. A simple, line-based, tab-delimited format that can be readily generated, particularly if the data are already in a database. |
| MINiML (MIAME notation in Markup Language) | XML | Batch deposit. Basically an XML rendering of SOFT format, and similarly suitable if data are already in a database. The XML schema definition is available at the GEO website. |
Detailed documentation and example submission templates are available online at http://www.ncbi.nlm.nih.gov/projects/geo/info/submission.html.
A summary of the location and purpose of various GEO data mining tools and features
Features introduced within the last 2 years are labeled NEW.
Figure 2.A schematic overview of query workflow, and how various features and tools are interlinked. A description of the location and purpose of many of these features is provided in Table 2.