| Literature DB >> 16103910 |
Abstract
Understanding the relationship between protein structure and function is one of the foremost challenges in post-genomic biology. Higher conservation of structure could, in principle, allow researchers to extend current limitations of annotation. However, despite significant research in the area, a precise and quantitative relationship between biochemical function and protein structure has been elusive. Attempts to draw an unambiguous link have often been complicated by pleiotropy, variable transcriptional control, and adaptations to genomic context, all of which adversely affect simple definitions of function. In this paper, I report that integrating genomic information can be used to clarify the link between protein structure and function. First, I present a novel measure of functional proximity between protein structures (F-score). Then, using F-score and other entirely automatic methods measuring structure and phylogenetic similarity, I present a three-dimensional landscape describing their inter-relationship. The result is a "well-shaped" landscape that demonstrates the added value of considering genomic context in inferring function from structural homology. A generalization of methodology presented in this paper can be used to improve the precision of annotation of genes in current and newly sequenced genomes.Entities:
Year: 2005 PMID: 16103910 PMCID: PMC1183515 DOI: 10.1371/journal.pcbi.0010009
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1The Correlations between Z, F, and P Scores
(A) The correlation between structural comparison Z-score and functional distance F-score. (Pearson's r = 0.96 and slope = 0.007.) Each bin contains at least 200 observations. It is worth noting that the average functional distance (F-score) falls from 0.48 to 0.30, only by a third during two decades of structural similarity [14].
(B) The correspondence between phylogenetic profile distances calculated using mutual information and F-score. Slope of the linear fit is 0.36, with Pearson's r = 0.96. The correlation is averaged, i.e., each data point represents a bin containing 150–200 domains, and the functional distances are averaged inside the bin [14].
(C) The landscape of functional distance with respect to Z and P scores. An average F-score is calculated for each of the 36 bins; each bin contains 100–200 observations. Since F-score is a distance metric, hotter colors represent domains that are farther away and cooler colors represent those that are closer.