| Literature DB >> 21097893 |
Tanya Barrett1, Dennis B Troup, Stephen E Wilhite, Pierre Ledoux, Carlos Evangelista, Irene F Kim, Maxim Tomashevsky, Kimberly A Marshall, Katherine H Phillippy, Patti M Sherman, Rolf N Muertter, Michelle Holko, Oluwabukunmi Ayanbule, Andrey Yefanov, Alexandra Soboleva.
Abstract
A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20,000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.Entities:
Mesh:
Year: 2010 PMID: 21097893 PMCID: PMC3013736 DOI: 10.1093/nar/gkq1184
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
List of study types and the number of Series records with those types
| Study type | Number of |
|---|---|
| Expression profiling by array | 17 812 |
| Expression profiling by genome tiling array | 303 |
| Expression profiling by high throughput sequencing | 131 |
| Expression profiling by SAGE | 206 |
| Expression profiling by MPSS | 21 |
| Expression profiling by RT–PCR | 25 |
| Non-coding RNA profiling by array | 341 |
| Non-coding RNA profiling by genome tiling array | 81 |
| Non-coding RNA profiling by high throughput sequencing | 233 |
| Genome binding/occupancy profiling by array | 70 |
| Genome binding/occupancy profiling by genome tiling array | 835 |
| Genome binding/occupancy profiling by high throughput sequencing | 238 |
| Genome variation profiling by array | 309 |
| Genome variation profiling by genome tiling array | 406 |
| Genome variation profiling by SNP array | 269 |
| Methylation profiling by array | 46 |
| Methylation profiling by genome tiling array | 115 |
| Methylation profiling by high throughput sequencing | 30 |
| Protein profiling by protein array | 31 |
| SNP genotyping by SNP array | 149 |
Users can retrieve studies of a particular type using the ‘DataSet Type’ field in the GEO DataSets query interface.
Figure 1.A timeline of GEO database growth, development and events. The chart represents accumulative growth of publicly-available Sample records from 2000 to September 2010. A further 80 000 Samples are currently held private until published, making a total of about 550 000 Samples. The current rate of submission and processing is over 10 000 Samples per month. (A) First data uploaded to database. (B) MIAME proposal is published, outlining the minimum information that should be included when describing a microarray experiment (2). (C) Nature journals announce requirement for microarray data deposit to public databases (4). (D) Reviewer access mechanism enabled, allowing anonymous confidential review of pre-published data. (E and inset 1) GEO Profiles database released, enabling search and visualization of individual gene expression charts. (F and inset 2) Interactive pre-computed cluster heatmaps released, allowing users to view and select regions of interesting gene expression patterns. (G) Major database modifications released aimed at better support of MIAME elements. (H) GEO increases enforcement of provision of raw data. (I) Bioconductor GEOquery package published, allowing GEO data to be imported into R environment (22). (J) GEOarchive spreadsheet submission format released, enabling rapid batch deposit of data. (K) All GEO Series records re-classified according to technology and experiment type making it simple to locate studies of a specific type; types are listed in Table 1. (L) Improvements to DataSet Browser and accompanying analysis tools panel implemented. (M and inset 3) First release of next-generation sequence tracks on NCBI’s Sequence Viewer. These tracks were generated in support of the NIH Roadmap Epigenomics project, http://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/. (N) Links generated to NCBI’s new Epigenomics resource (7) which applies advanced curation and genome browser tracks for hundreds of next-generation sequence Samples derived from GEO. (O) Advanced Search tool released, helping users construct complex queries.