| Literature DB >> 22080549 |
Erich J Baker1, Jeremy J Jay, Jason A Bubier, Michael A Langston, Elissa J Chesler.
Abstract
High-throughput genome technologies have produced a wealth of data on the association of genes and gene products to biological functions. Investigators have discovered value in combining their experimental results with published genome-wide association studies, quantitative trait locus, microarray, RNA-sequencing and mutant phenotyping studies to identify gene-function associations across diverse experiments, species, conditions, behaviors or biological processes. These experimental results are typically derived from disparate data repositories, publication supplements or reconstructions from primary data stores. This leaves bench biologists with the complex and unscalable task of integrating data by identifying and gathering relevant studies, reanalyzing primary data, unifying gene identifiers and applying ad hoc computational analysis to the integrated set. The freely available GeneWeaver (http://www.GeneWeaver.org) powered by the Ontological Discovery Environment is a curated repository of genomic experimental results with an accompanying tool set for dynamic integration of these data sets, enabling users to interactively address questions about sets of biological functions and their relations to sets of genes. Thus, large numbers of independently published genomic results can be organized into new conceptual frameworks driven by the underlying, inferred biological relationships rather than a pre-existing semantic framework. An empirical 'ontology' is discovered from the aggregate of experimental knowledge around user-defined areas of biological inquiry.Entities:
Mesh:
Year: 2011 PMID: 22080549 PMCID: PMC3245070 DOI: 10.1093/nar/gkr968
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Curation and integrative analysis of secondary data in the Ontological Discovery Environment. The overall system architecture consists of a centralized database that collects a variety of curated data and metadata and serves a suite of analysis tools. It uses a data from community resources to create clusters of gene homology across supported species, enabling ODE to rapidly translate gene sets.
Figure 2.The analyze gene sets page. GeneWeaver's analysis functions are accessed from this page. Gene sets must first be collected and stored into one or more projects by the user. In this case, a project called ‘Alcohol' contains 121 gene sets, nine of which are selected for analysis using the tools on the right. Options can be selected from this tool bar prior to executing the tool.
Analysis tools and basic functions available in GeneWeaver
| Analysis tools | Gene sets | Genes | Input | Output | Description | Settings | ||
|---|---|---|---|---|---|---|---|---|
| Explore | Prioritize | Explore | Prioritize | |||||
| Anchored bicliques of biomolecular associates | Genes | Highly similar genes, highly connected phenotypes | Find genes that are connected to similar genes sets as your input genes. | Degree threshold | ||||
| Similar gene sets | 1 Gene set | Ranked list of similar gene sets (top 500) | Rank all gene sets by similarity to a given gene set. | |||||
| Phenome map | Gene sets | Graph of hierarchical intersections of gene sets | Generate directed acyclic graph of intersecting gene sets. | Bootstraps | ||||
| Permutations | ||||||||
| Stopping rules | ||||||||
| Gene set graph | Gene sets | High degree connectivity of genes to your gene sets | Plot the bipartite graph of gene sets. | Degree threshold | ||||
| Jaccard similarity | Gene sets | Matrix of pair-wise Venn diagrams and Jaccard similarity scores | Pair-wise similarity of gene sets. | |||||
| Hypergeometric tests | Gene sets | Matrix of pair-wise Venn diagrams and Jaccard similarity scores | Pair-wise similarity of gene sets. | |||||
| Jaccard clustering | Gene sets | Dendogram and cluster heat map revealing gene set similarity. | Similarity clustering of gene sets. | Clustering method | ||||
| Combine | Gene sets | Adjaceny matrix | Create a matrix of genes and gene lists | |||||
| Emphasis genes | Genes | Highlight genes of interest on GeneSet graph and Phenome map output | ||||||
| Boolean gene set logic | Gene sets | Gene set consisting of union, intersection or highly connected genes from a group of gene sets. | Condense a large number of gene sets into a single gene set based on connectivity, e.g. to find the intersection of multiple QTL positional candidates, or the union of all genes annotated to a set of related biological functions. | Connectivity threshold | ||||
All functions are available from the ‘Analyze GeneSets' page, except for ABBA and ‘Find similar gene sets,' which are accessed from the ‘Search for Genes' menu and ‘GeneSet Pages,’ respectively. Emphasis genes may be accessed from either ‘Analyze GeneSets' or any ‘Gene Set Page’.
Figure 3.The gene set graph. The gene set graph reveals the highly connected genes among the nine gene sets selected in Figure 2. This analysis reveals DDX5 as the most highly connected gene, connected to both human and mouse alcohol-related measures. Inset: clicking on a gene node executes a search for gene sets containing the featured gene or its homologs. Clicking on a gene set node reveals the contents and metadata for that gene set.
Figure 4.The phenome graph. The phenome graph drawn from nine inputs selected in Figure 2. The phenome graph is a directed acyclic graph of the intersections of gene sets. Each node represents gene sets and the genes they share. Higher order intersections are represented in the root nodes at the top, and individual gene sets in the leaves at the bottom. Inset: clicking a node opens a page showing the intersections among gene sets in list form. Results from this page can be sent to other tools for annotation, including GAGGLE.