| Literature DB >> 15980575 |
Bing Zhang1, Stefan Kirov, Jay Snoddy.
Abstract
High-throughput technologies have led to the rapid generation of large-scale datasets about genes and gene products. These technologies have also shifted our research focus from 'single genes' to 'gene sets'. We have developed a web-based integrated data mining system, WebGestalt (http://genereg.ornl.gov/webgestalt/), to help biologists in exploring large sets of genes. WebGestalt is composed of four modules: gene set management, information retrieval, organization/visualization, and statistics. The management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections or differences between different gene sets. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set. The organization/visualization module organizes and visualizes gene sets in various biological contexts, including Gene Ontology, tissue expression pattern, chromosome distribution, metabolic and signaling pathways, protein domain information and publications. The statistics module recommends and performs statistical tests to suggest biological areas that are important to a gene set and warrant further investigation. In order to demonstrate the use of WebGestalt, we have generated 48 gene sets with genes over-represented in various human tissue types. Exploration of all the 48 gene sets using WebGestalt is available for the public at http://genereg.ornl.gov/webgestalt/wg_enrich.php.Entities:
Mesh:
Year: 2005 PMID: 15980575 PMCID: PMC1160236 DOI: 10.1093/nar/gki475
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Schematic overview of WebGestalt. WebGestalt is composed of four main modules: gene set management, information retrieval, organization/visualization and statistics. The gene set management module uploads, saves, retrieves and deletes gene sets, as well as performs Boolean operations to generate the unions, intersections and differences between gene sets. The uploading tool accepts datasets defined by experiment data, GO categories or chromosome location ranges. WebGestalt is flexible in the input identifier (Entrez Gene ID, Swiss-Prot ID, Ensembl ID, Unigene ID, gene symbol and Affymetrix Probe Set ID). The saving tool saves sub-sets of genes generated by the organization/visualization module. The information retrieval module currently retrieves information for up to 20 attributes for all genes in a gene set, including nomenclatures, various gene identifiers, map and functional information. Retrieved information can be exported to Microsoft Excel files. The organization/visualization module organizes and visualizes a gene set in figures or tables using eight sub-modules: GO Tree, Tissue Expression Bar Chart, Chromosome Distribution Chart, KEGG Table and Maps, BioCarta Table and Maps, Protein Domain Table, PubMed Table and GRIF Table. The statistics module provides two statistical tests, the hypergeometric test and Fisher's exact test and suggests important biological areas in a gene set.
Gene attributes that can be retrieved by WebGestalt
| Attribute | Source | Website |
|---|---|---|
| Nomenclature information | ||
| Gene symbol | LocusLink | |
| Symbol alias | LocusLink | |
| Gene name | LocusLink | |
| Name alias | LocusLink | |
| IDs reference into different databases | ||
| Entrez Gene ID | EntrezGene | |
| Refseq_NM | Refseq | |
| Refseq_NP | Refseq | |
| Unigene ID | Unigene | |
| Ensembl ID | Ensembl | |
| Swiss-Prot ID | Swiss-Prot | |
| Map information | ||
| Cytogenetic | LocusLink | |
| Physical | UCSC | |
| Functional information | ||
| Domain name | CDD | |
| OMIM ID | OMIM | |
| PubMed ID | PubMed | |
| GRIF record | LocusLink | |
| GO term | GO | |
| KEGG | KEGG | |
| BioCarta | BioCarta | |
| Phenotype | LocusLink | |
Ensembl ID, Ensembl gene stable ID; Cytogenetic, Cytogenetic map location; Physical, Physical map location; KEGG, KEGG pathway name; BioCarta, BioCarta pathway name.
Figure 2Enriched DAG under ‘biological process’ for a set of 23 genes that are significantly over-represented in adrenal cortex, using all genes in the human genome as a reference. The enriched GO categories are brought together and visualized as a DAG. Categories in red are enriched ones while those in black are non-enriched parents. Listed in the boxes are the name of the GO category, the number of genes in the category and the P-value indicating the significance of enrichment.