| Literature DB >> 19923227 |
Andreas Schlicker1, Mario Albrecht.
Abstract
Quantifying the functional similarity of genes and their products based on Gene Ontology annotation is an important tool for diverse applications like the analysis of gene expression data, the prediction and validation of protein functions and interactions, and the prioritization of disease genes. The Functional Similarity Matrix (FunSimMat, http://www.funsimmat.de) is a comprehensive database providing various precomputed functional similarity values for proteins in UniProtKB and for protein families in Pfam and SMART. With this update, we significantly increase the coverage of FunSimMat by adding data from the Gene Ontology Annotation project as well as new functional similarity measures. The applicability of the database is greatly extended by the implementation of a new Gene Ontology-based method for disease gene prioritization. Two new visualization tools allow an interactive analysis of the functional relationships between proteins or protein families. This is enhanced further by the introduction of an automatically derived hierarchy of annotation classes. Additional changes include a revised user front-end and a new RESTlike interface for improving the user-friendliness and online accessibility of FunSimMat.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19923227 PMCID: PMC2808991 DOI: 10.1093/nar/gkp979
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Different visualization options for a result set provided by FunSimMat. The figure shows some of the results obtained by the functional comparison of GTP-binding protein YPT11 (UniProtKB P48559) with GO annotation superclasses of human proteins. (A) The results table lists all functional similarity scores of the query protein with different GOclasses. Each table cell is colored by a gradient; white color represents no similarity and blue color high similarity. The popup box gives all GO terms for the GOclass 397703. (B) Medusa visualization of some CCclasses contained in the results. The classes were clustered using the k-means algorithm with k set to 20 and placed by applying a hierarchical layout. The nodes are colored according to cluster membership. (C) Mondrian scatter plots that compare biological process similarities obtained by different semantic similarity measures. The three plots in the first row show, for example, that the results obtained with simRel (5) are strongly correlated with Lin's; similarity (8) (left), less correlated with Resnik's; similarity (10) (center), and only weakly correlated with scores computed using Jiang & Conrath's; similarity (9) (right). The straight lines in the scatter plots are least-squares regression calculated by Mondrian.