| Literature DB >> 17098937 |
Gil Alterovitz1, Michael Xiang, Mamta Mohan, Marco F Ramoni.
Abstract
Gene Ontology (GO) has been widely used to infer functional significance associated with sets of genes in order to automate discoveries within large-scale genetic studies. A level in GO's direct acyclic graph structure is often assumed to be indicative of its terms' specificities, although other work has suggested this assumption does not hold. Unfortunately, quantitative analysis of biological functions based on nodes at the same level (as is common in gene enrichment analysis tools) can lead to incorrect conclusions as well as missed discoveries due to inefficient use of available information. This paper addresses these using an informational theoretic approach encoded in the GO Partition Database that guarantees to maximize information for gene enrichment analysis. The GO Partition Database was designed to feature ontology partitions with GO terms of similar specificity. The GO partitions comprise varying numbers of nodes and present relevant information theoretic statistics, so researchers can choose to analyze datasets at arbitrary levels of specificity. The GO Partition Database, featuring GO partition sets for functional analysis of genes from human and 10 other commonly studied organisms with a total of 131,972 genes, is available on the internet at: bcl.med.harvard.edu/proj/gopart. The site also includes an online tutorial.Entities:
Mesh:
Year: 2006 PMID: 17098937 PMCID: PMC1669720 DOI: 10.1093/nar/gkl799
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1The specificity of GO terms can be captured in terms of bits of information. Heterocycle metabolism image courtesy of: Dr. Brent P. Krueger (14).
Figure 2The GO Partition Database has an array of features from customized queries to several export options.
Figure 3(a) GO term partition with six GO terms selected including: regulation of metabolism (19222), response to stimulus (50896), transcription (6350), transport (6810), biopolymer metabolism (43283) and organismal physiological process (50874). (b) Visual gene enrichment for transport is evident in these GenMAPP proteins involved in oxidative phosphorylation. Green circles represent proteins (displaying UniProtKB accessions) and rectangles contain the GO terms of the 6-node partition. An arrow going from a protein to a GO term indicates that the protein is annotated by that GO term. (c) Visual enrichment is shown based on GO graphical structure—leading to potentially misleading interpretations.
Figure 4Histogram of GO level 3 versus GO partitions level 3 term information. This shows a tighter distribution for the GO partition-based information compared to that of graphical structure-derived GO level node information.