| Literature DB >> 19465376 |
Jing Chen1, Eric E Bardes, Bruce J Aronow, Anil G Jegga.
Abstract
ToppGene Suite (http://toppgene.cchmc.org; this web site is free and open to all users and does not require a login to access) is a one-stop portal for (i) gene list functional enrichment, (ii) candidate gene prioritization using either functional annotations or network analysis and (iii) identification and prioritization of novel disease candidate genes in the interactome. Functional annotation-based disease candidate gene prioritization uses a fuzzy-based similarity measure to compute the similarity between any two genes based on semantic annotations. The similarity scores from individual features are combined into an overall score using statistical meta-analysis. A P-value of each annotation of a test gene is derived by random sampling of the whole genome. The protein-protein interaction network (PPIN)-based disease candidate gene prioritization uses social and Web networks analysis algorithms (extended versions of the PageRank and HITS algorithms, and the K-Step Markov method). We demonstrate the utility of ToppGene Suite using 20 recently reported GWAS-based gene-disease associations (including novel disease genes) representing five diseases. ToppGene ranked 19 of 20 (95%) candidate genes within the top 20%, while ToppNet ranked 12 of 16 (75%) candidate genes among the top 20%.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19465376 PMCID: PMC2703978 DOI: 10.1093/nar/gkp427
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of ToppGene suite applications
| Application | Description | Input | Output |
|---|---|---|---|
| ToppFun | Detects functional enrichment of input gene list based on Transcriptome (gene expression), Proteome (protein domains and interactions), Regulome (TFBS and miRNA), Ontologies (GO, Pathway), Phenotype (human disease and mouse phenotype), Pharmacome (Drug–Gene associations) and Bibliome (literature co-citation). | Supported identifiers include NCBI Entrez gene IDs, approved human gene symbols, NCBI Reference Sequence accession numbers; single gene list. | HTML output; Tab-delimited downloadable text file; graphical charts |
| ToppGene | Prioritize or rank candidate genes based on functional similarity to training gene list. | Same as above but with two gene lists (training and test) | HTML output |
| ToppNet | Prioritize or rank candidate genes based on topological features in protein–protein interaction network. | Same as above | HTML output; Cytoscape-compatible input file; graphical networks |
| ToppGeNet | Identify and prioritize the neighboring genes of the ‘seeds’ in protein–protein interaction network based on functional similarity to the ‘seed’ list (ToppGene) or topological features in protein–protein interaction network (ToppNet). | Single gene list | Same as above |
Figure 1.Schematic representation of workflow and methodology in ToppGene Suite applications. (A) Genes in the training set are selected based on their attributes or current gene annotations (genes associated with a disease, phenotype, pathway or a GO term). (B) The test gene source can be candidate genes from linkage analysis studies or genes differentially expressed in a particular disease or phenotype or genes from the interactome. (C) ToppFunEnriched terms of the gene annotations and sequence features, namely, GO: Molecular Function, GO: Biological Process, Mouse Phenotype, Pathways, Protein Interactions, Protein Domains, transcription factor-binding sites, miRNA-target genes, disease-gene associations, drug-gene interactions, and Gene Expression, compiled from various data sources and also used to build the training set gene profile. (C and D) ToppGene—a similarity score is generated for each annotation of each test gene by comparing to the enriched terms in the training set of genes. The final prioritized gene list is then computed based on the aggregated values of the 14 similarity scores. (E and F) ToppNet—Training and test set genes are mapped to a protein–protein interaction network. Scoring and ranking of test set genes are based on the relative location to all of the training set genes using global network-distance measures in the PPIN.
Results of the 20 genetic disease prioritizations using ToppGene and ToppNet
| Disease | Reference | Gene | ToppGene rank | ToppNet rank |
|---|---|---|---|---|
| Bipolar disorder | Le-Niculescu | 2 | 15 | |
| Bipolar disorder | Le-Niculescu | 4 | 18 | |
| Bipolar disorder | Le-Niculescu | 7 | 13 | |
| Bipolar disorder | Le-Niculescu | 10 | No interaction data | |
| Bipolar disorder | Le-Niculescu | 11 | No interaction data | |
| Cardiomyopathy | Dhandapany | 1 | 2 | |
| Celiac disease | Hunt | 1 | 8 | |
| Celiac disease | Hunt | 2 | 3 | |
| Celiac disease | Hunt | 3 | 29 | |
| Celiac disease | Hunt | 9 | 26 | |
| Celiac disease | Hunt | 14 | No interaction data | |
| Celiac disease | Hunt et al. ( | 14 | 10 | |
| Crohns disease | Fisher | 1 | 27 | |
| Crohns disease | Fisher | 1 | 27 | |
| Crohns disease | Fisher | 2 | No interaction data | |
| Crohns disease | Villani | 5 | 1 | |
| Crohns disease | Fisher | 7 | 1 | |
| Crohns disease | Barrett | 11 | 1 | |
| Crohns disease | Franke | 30 | 6 | |
| Obesity | Renstrom | 1 | 1 | |
| Mean | 6.8 | 11.75 |
The gene-disease associations were from recently reported GWAS and include novel disease gene associations. The training sets were compiled using ‘phenotype/disease’ annotations in NCBI's Entrez Gene records and OMIM. To build the test set genes, we defined the artificial linkage interval to be the set of genes containing the 99 nearest neighboring genes to the novel disease gene based on their genomic distance on the same chromosome.