| Literature DB >> 23650583 |
Hiromitsu Araki1, Christoph Knapp, Peter Tsai, Cristin Print.
Abstract
Most "omics" experiments require comprehensive interpretation of the biological meaning of gene lists. To address this requirement, a number of gene set analysis (GSA) tools have been developed. Although the biological value of GSA is strictly limited by the breadth of the gene sets used, very few methods exist for simultaneously analysing multiple publically available gene set databases. Therefore, we constructed GeneSetDB (http://genesetdb.auckland.ac.nz/haeremai.html), a comprehensive meta-database, which integrates 26 public databases containing diverse biological information with a particular focus on human disease and pharmacology. GeneSetDB enables users to search for gene sets containing a gene identifier or keyword, generate their own gene sets, or statistically test for enrichment of an uploaded gene list across all gene sets, and visualise gene set enrichment and overlap using a clustered heat map.Entities:
Keywords: Database; Enrichment analysis; Functional genomics; GSA, gene set analysis; Gene set
Year: 2012 PMID: 23650583 PMCID: PMC3642118 DOI: 10.1016/j.fob.2012.04.003
Source DB: PubMed Journal: FEBS Open Bio ISSN: 2211-5463 Impact factor: 2.693
Sources databases included in GeneSetDB.
| Subclass Name | Sources database | Reference/URL |
|---|---|---|
| Pathway | Biocarta | |
| EHMN | ||
| HumanCyc | ||
| INOH | ||
| NetPath | ||
| PID | ||
| Reactome | ||
| SMPDB | ||
| Wikipathways | ||
| Disease/Phenotype | CancerGenes | |
| HPO | ||
| KEGG Disease | ||
| MethCancerDB | ||
| MethyCancer | ||
| MPO | ||
| SIDER | ||
| Drug/Chemical | CTD | |
| DrugBank | ||
| MATADOR | ||
| STITCH | ||
| T3DB | ||
| Gene Regulation | MicroCosm Targets | |
| miRTarBase | ||
| Rel/NF- | ||
| TFactS | ||
| GO | Gene Ontology |
Fig. 1Database structure and analysis scheme. The gene sets are downloaded from source databases and deposited into a MySQL database. All gene identifiers of both the source databases and the input gene list are converted into Entrez Gene ID using Bioconductor or biomaRt.
Feature comparison between GeneSetDB and existing databases.
| Feature | GeneSetDB | ConceptGen | DAVID | MSigDB | WhichGenes |
|---|---|---|---|---|---|
| Pathway database # | 9 | 3 | 6 | 7 | 3 |
| Disease/Phenotype database # | 7 | 1 | 2 | 2 | 4 |
| Drug/Chemical database # | 5 | 1 | 0 | 1 | 1 |
| Whole data downloadable | Yes | No | Yes | Yes | No |
| Making of original gene set | Yes | No | No | Yes | Yes |
| Gene/gene set intersection map | No | Yes | Yes | Yes | No |
| Gene set/gene set intersection map | Yes | No | No | No | No |
| Organisms | Hs, Mm, Rn | Hs, Mm, Rn | Over 65 000 species | Dr, Hs, Mm, Mmu, Rn | Hs, Mm |
Including original curated datasets.
Dr: Danio rerio, Hs: Homo sapiens, Mm: Mus musculus, Mmu: Macaca mulatta, Rn: Rattus norvegicus.
Fig. 2Top screen of GeneSetDB. Users can query gene names or biological terms in “gene/gene set search” and “gene list in enrichment analysis” modes. Users can also select a subclass of gene sets and database names if they wish to conduct a focused analysis. In the enrichment analysis mode, GeneSetDB allows the use of several gene identifier types (e.g. commercial probe IDs). FDR can be used to filter the gene sets shown in the results.
Fig. 3Gene/gene set search result. GeneSetDB shows subclass of gene set, gene set name, source database, the number of genes in corresponding gene set and gene names (first 10 genes). User can see all gene names in a downloadable text file. A gene set name is hyperlinked to the original database if the original database’s identifier is available.
Fig. 4Enrichment analysis result of SLE signature genes. Enrichment analysis identifies the gene sets meeting a user-assigned FDR and generates a clustered heatmap based on the overlap proportion of genes between the gene sets. The overlap between the top 30 gene sets ranked by FDR, and between these gene sets and the submitted gene list, are shown in a clustered heatmap.