| Literature DB >> 22110038 |
Aedín C Culhane1, Markus S Schröder, Razvan Sultana, Shaita C Picard, Enzo N Martinelli, Caroline Kelly, Benjamin Haibe-Kains, Misha Kapushesky, Anne-Alyssa St Pierre, William Flahive, Kermshlise C Picard, Daniel Gusenleitner, Gerald Papenhausen, Niall O'Connor, Mick Correll, John Quackenbush.
Abstract
GeneSigDB (http://www.genesigdb.org or http://compbio.dfci.harvard.edu/genesigdb/) is a database of gene signatures that have been extracted and manually curated from the published literature. It provides a standardized resource of published prognostic, diagnostic and other gene signatures of cancer and related disease to the community so they can compare the predictive power of gene signatures or use these in gene set enrichment analysis. Since GeneSigDB release 1.0, we have expanded from 575 to 3515 gene signatures, which were collected and transcribed from 1604 published articles largely focused on gene expression in cancer, stem cells, immune cells, development and lung disease. We have made substantial upgrades to the GeneSigDB website to improve accessibility and usability, including adding a tag cloud browse function, facetted navigation and a 'basket' feature to store genes or gene signatures of interest. Users can analyze GeneSigDB gene signatures, or upload their own gene list, to identify gene signatures with significant gene overlap and results can be viewed on a dynamic editable heatmap that can be downloaded as a publication quality image. All data in GeneSigDB can be downloaded in numerous formats including .gmt file format for gene set enrichment analysis or as a R/Bioconductor data file. GeneSigDB is available from http://www.genesigdb.org.Entities:
Mesh:
Year: 2011 PMID: 22110038 PMCID: PMC3245038 DOI: 10.1093/nar/gkr901
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Growth of GeneSigDB. GeneSigDB has grown considerably over 4 database releases (August 2009, March 2010, December 2010, September 2011). The most recent release (Release 4.0, September 2011) contains 3515 human, mouse and rat gene sets curated from 1604 published articles.
Number of processed articles and extracted gene signatures (by species) in GeneSigDB
| Human | Mouse | Rat | Total | |
|---|---|---|---|---|
| Gene Signatures | 2951 | 493 | 71 | 3515 |
| Publications (PMIDs) | 1368 | 208 | 39 | 1604* |
| Genes (EnsEMBL gene IDs) | 20 478 | 16 009 | 5110 |
*There were 10 articles with human and mouse gene signatures, and 1 article with human and rat gene signatures.
Most common disease MeSH terms associated with articles in GeneSigDB
| MeSH Terms | Publications |
|---|---|
| Breast neoplasms | 248 |
| Lung neoplasms | 97 |
| Prostatic neoplasms | 73 |
| Disease progression | 69 |
| Neoplasm metastasis | 66 |
| Ovarian neoplasms | 66 |
| Adenocarcinoma | 65 |
| Cell transformation, neoplastic | 62 |
| Neoplasm invasiveness | 62 |
| Carcinoma, squamous cell | 58 |
| Liver neoplasms | 56 |
| Carcinoma, hepatocellular | 51 |
| Lymphatic metastasis | 42 |
| Colonic neoplasms | 38 |
| Neoplasms | 37 |
| Precursor cell lymphoblastic leukemia–lymphoma | 37 |
| Stomach neoplasms | 35 |
| Neovascularization, pathologic | 34 |
| Disease models, animal | 33 |
| Genetic predisposition to disease | 32 |
| Pancreatic neoplasms | 30 |
| Chromosome aberrations | 29 |
| Carcinoma | 28 |
| Leukemia, myeloid, acute | 28 |
| Brain neoplasms | 27 |
| Carcinoma, non-small-cell lung | 27 |
| Leukemia, myeloid | 25 |
| Neoplasm recurrence, local | 25 |
| Leukemia, lymphocytic, chronic, B-cell | 24 |
aRanking of Diseases MeSH Terms (MeSH prefix code category C) associated with 1552 publications in GeneSigDB. A total of 63 publications were not annotated with MeSH Terms. More details are provided in documentation on the GeneSigDB website.
Figure 2.Overview of the GeneSigDB Data Curation pipeline. Gene signatures in tables or figures are transcribed from published articles indexed in PubMed and we then use a pipeline based on Biomart (20) to map all published gene identifiers to EnsEMBL IDs, to create standardized gene sets.
Figure 3.Screenshot showing the (A) faceting and the (B) ‘Shopping Basket’ search features when we performed a publication search for ‘serous ovarian cancer’ that returned a list of 11 publications. Further detail about (C) publications or (D) Signature can be viewed by clicking on their respective links. By default 10 results are shown but up to 100 search results can be viewed. Selecting ‘Add All’ will add the 15 gene signatures associated with the 11 displayed publications to the basket so they can be compared or downloaded.
Figure 4.Screen shots of the ‘Analyze My Gene Lists’ and ‘Comparison View’. The gene overlap between 15 serous ovarian cancer gene signatures selected in Figure 3 were analyzed. (A) shows the gene overlap between signature, where presence and absence of a gene are indicates by a red or gray pixel in the heatmap. The image can be edited, reordered or genes and gene signatures can be added or removed before the image is exported as a publication quality image. (B) shows results of a Fisher's exact test of enrichment between the 15 gene signatures. These results can also be visualized as a list.