Literature DB >> 17570143

The 2006 automated function prediction meeting.

Ana P C Rodrigues1, Barry J Grant, Adam Godzik, Iddo Friedberg.   

Abstract

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17570143      PMCID: PMC1892079          DOI: 10.1186/1471-2105-8-S4-S1

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


× No keyword cloud information.
Genomic scale projects have compounded the need for rapid and reliable functional annotation methods. Traditional experimental approaches have become outpaced resulting in an ever-increasing proportion of missing annotations. Computational approaches, including those based on sequence, expression, interaction and tertiary structure, have the potential to impact the growing annotation deficit. Despite a recent increase in the number and variety of prediction methods, the computational annotation of protein function remains difficult. This stems from a combination of issues such as the inherent limitations of current tools and databases, the difficulty of assessing the predictive power of different methods and more fundamental problems related to the ambiguity of the definition of function itself. These and related themes were addressed at the second Automated Function Prediction (AFP) conference, which took place at the University of California, San Diego campus in late August 2006. AFP2006 attracted more than 100 participants and extended over three days with 8 keynote presentations, 19 contributed talks, 20 posters and a panel discussion. A broad range of function prediction methods were presented, focusing on the development of new techniques as well as the thorough utilization of the spectrum of data being produced by genomics and post-genomics research. The importance of standardized functional dictionaries capable of incorporating a range of function definitions, such as those implemented by the popular Gene Ontology (GO) project [1], the Structural Classification of Proteins (SCOP) [2] and the Enzyme Classification (EC) system [3,4] was made apparent throughout the meeting, by their pervasive usage in the implementation and assessment of protein function prediction methods. Discussions of the need for reliability indicators and blind validations of the various methods led to consensus agreement on the value of conducting a community wide assessment of protein function experiment (a-là CASP [5] for assessment of protein structure predictions and CAPRI [6] for the assessment of protein interactions). For this supplement we have chosen nine studies presented at AFP 2006 to be published as full-length articles. Most computational methods for functional annotation rely on the transfer of knowledge accumulated in sequence and structure databases to related proteins. These methods can be distinguished by the manner in which the 'relatedness' of proteins is defined: some employ sequence similarity and structure similarity measures, others gene order conservation, co-occurrence across genomes and even shared interaction partners. The first seven papers in this supplement describe new developments for each of these methods. Homology-based methods are underpinned by the conservation of functionally important residues, and employ sequence and/or structure similarity to identify functionally related proteins. Melvin and colleagues [7] describe SVM-Fold, a new method for remote homology detection and fold recognition. The method employs a support vector machine algorithm with kernels based on PSI-BLAST [8] profiles, as described in [9]. A novel multi-class classification algorithm, termed adaptive codes, is employed to exploit the hierarchical information contained within the SCOP [2] database. The authors show that, in comparison with PSI-BLAST and their previously described algorithm, SVM-fold improves remote homology detection and significantly improves fold recognition. In related work, Audit and colleagues [10] detail the application of their previously described probabilistic framework for homology-based annotations [11] to the ENZYME database [4]. This framework combines the pairwise similarity scores between query sequence and all members of a functional class to measure the relationship between protein and class. It then employs a Bayesian procedure to compute the likelihood that a new sequence belongs to that functional class. ENZYME re-annotations are thus assigned a probability value measuring the reliability of each prediction. Among the different classes, error rates range from 0 to 13.6%, mostly reflecting the inability of sequence similarity search procedures to detect substrate specificity. Marti-Renom and colleagues [12] introduce two new programs, AnnoLite and AnnoLyze, which add functional content to the previously established DBAli database of protein structure alignments [13]. The AnnoLite program utilizes structural alignments to transfer functional annotations using the recognized vocabularies of SCOP [2], CATH [14], EC [3,4], GO [1], InterPro [15] and Pfam [16]. The AnnoLyze program utilizes structural alignments to transfer ligand binding site and domain interaction patch annotations from LigBase [17] and PIBASE [18], respectively. Importantly, for both tools, the authors define annotation specific cutoffs of sequence and structure similarity for confident transfer of annotations between proteins. Henschel and colleagues [19] present a collection of hidden Markov models of protein-binding and ligand-binding interfaces. The models are generated using a multiple-motif approach to represent binding sites as a collection of small HMMs, each derived from sequence segments that constitute structural features of the interaction site. The authors use cross-validation and comparison to literature-curated interactions to show that a significant number of their protein-protein interaction models can be used to recognize protein-protein interaction sites. In addition, they validate the protein-ligand interaction models through comparison with PROSITE motifs [20,21] associated with ligand binding sites. Genome context analysis methods are based on short-range genome co-linearity and conservation of gene regulation, and use gene order and localization to identify functionally related proteins. Li and colleagues [22] describe SynFPS, a new method that uses genomic context to predict function. SynFPS differentiates itself from other methods by its ability to detect gene correspondence among genomes of weakly related organisms, thus eliminating the requirement of prior knowledge of the relationship among them. This is achieved through genome clustering based on gene distribution, followed by support vector machine training for function prediction. The system is shown to be particularly effective in the analysis of bacteriophage genomes, where the phylogenetic relationship among organisms is far from established. Proteins with similar functions are often observed to co-occur across genomes and thus possess similar phylogenetic profiles. Cokus and colleagues [23] define a new heuristic for the application of phylogenetic profile analysis that accounts for the relationship among organisms in a computationally efficient manner. This is achieved through an all-versus-all comparison of phylogenetic profiles and the subsequent re-ordering of those profiles according to the established relationships. The authors show that accounting for the number of runs, or consecutive matches, between ordered profiles improves the identification of functionally related proteins, by distinguishing conservation within closely related organisms from conservation across more divergent species. Protein-protein interaction based methods rely on the emerging protein-protein interaction datasets and exploit interaction partners to identify functionally related proteins. Chua and colleagues [24] employ the previously described FS-Weighted Averaging method [25], which makes functional inferences based on indirect interaction partners and topological weighting, to annotate seven genomes from a diverse range of organisms with GO terms [1] from all three ontologies. Despite the different numbers and types of protein-protein interaction datasets available for each of the genomes, the annotations are shown to be robust against noisy data and complementary to homology based methods. The remaining two papers chosen for this supplement do not rely on annotation transfer from related proteins, but instead tackle the difficult question of de novo identification of ligand binding sites in protein structures. This form of annotation has assumed greater emphasis as the majority of structures solved by structural genomics projects are of unknown function and bear no sequence or structure based similarity to any proteins that have a known function. Xie and Bourne [26] describe a new algorithm for the prediction of ligand-binding sites based on a simplified shape description of protein structure. The method partitions protein space with two boundaries: an environment boundary, containing both protein and potential ligand binding pockets, and a protein boundary. Clusters of atoms at the intersection of these boundaries are identified as potential binding sites and their distance and orientation in relation to the environment boundary are used to compute a novel measure, termed geometric potential. The authors show that the geometric potential can be used to distinguish ligand binding from non-ligand binding sites, is minimally affected by conformational changes and is sufficiently fast to be applied in large-scale calculations. In related work, Yoon and colleagues [27] report the extension of the FEATURE method [28-32] to enable the discovery of unknown functional sites in protein structures. FEATURE vectors describe microenvironments around active sites and binding sites of proteins as normalized counts of physical and chemical properties within sets of concentric shells. The authors show that k-means clustering of these environments, using a weighted version of the Hamming distance between vectors, enables the discovery of microenvironment clusters highly enriched with known functional sites. Analysis of such sites will allow the calibration of size and inter-cluster distances, thus enabling the discovery of novel functional sites. In the introduction to his keynote presentation, Christos Ouzounis (CERTH) commented: "I have been trying to escape this field because it is so hard... but it keeps haunting us, and [thus we] keep revisiting the problems [it presents]". Function prediction is indeed a challenging endeavor that is further hampered by the lack of a standard assessment framework [5,33]. It was inspiring to see that a vibrant community of researchers from varied backgrounds, reflected in the variety and scope of papers showcased in this supplement, is focusing on this important problem. We hope to see this trend continue for the third annual AFP meeting, which will be held in Vienna Austria, July 19 to 20 of 2007. For further information and updates on AFP meetings see:
  31 in total

1.  The ENZYME database in 2000.

Authors:  A Bairoch
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  DBAli: a database of protein structure alignments.

Authors:  M A Martí-Renom; V A Ilyin; A Sali
Journal:  Bioinformatics       Date:  2001-08       Impact factor: 6.937

3.  LigBase: a database of families of aligned ligand binding sites in known protein sequences and structures.

Authors:  Ashley C Stuart; Valentin A Ilyin; Andrej Sali
Journal:  Bioinformatics       Date:  2002-01       Impact factor: 6.937

4.  Recognizing complex, asymmetric functional sites in protein structures using a Bayesian scoring function.

Authors:  Liping Wei; Russ B Altman
Journal:  J Bioinform Comput Biol       Date:  2003-04       Impact factor: 1.122

5.  PROSITE: a dictionary of sites and patterns in proteins.

Authors:  A Bairoch
Journal:  Nucleic Acids Res       Date:  1991-04-25       Impact factor: 16.971

Review 6.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

7.  Characterizing oriented protein structural sites using biochemical properties.

Authors:  S C Bagley; L Wei; C Cheng; R B Altman
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1995

8.  Characterizing the microenvironment surrounding protein sites.

Authors:  S C Bagley; R B Altman
Journal:  Protein Sci       Date:  1995-04       Impact factor: 6.725

9.  SCOP: a structural classification of proteins database for the investigation of sequences and structures.

Authors:  A G Murzin; S E Brenner; T Hubbard; C Chothia
Journal:  J Mol Biol       Date:  1995-04-07       Impact factor: 5.469

10.  A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites.

Authors:  Lei Xie; Philip E Bourne
Journal:  BMC Bioinformatics       Date:  2007-05-22       Impact factor: 3.169

View more
  6 in total

Review 1.  Protein function prediction: towards integration of similarity metrics.

Authors:  Serkan Erdin; Andreas Martin Lisewski; Olivier Lichtarge
Journal:  Curr Opin Struct Biol       Date:  2011-02-24       Impact factor: 6.809

2.  Exploration of uncharted regions of the protein universe.

Authors:  Lukasz Jaroszewski; Zhanwen Li; S Sri Krishna; Constantina Bakolitsa; John Wooley; Ashley M Deacon; Ian A Wilson; Adam Godzik
Journal:  PLoS Biol       Date:  2009-09-29       Impact factor: 8.029

3.  How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis.

Authors:  Mauno Vihinen
Journal:  BMC Genomics       Date:  2012-06-18       Impact factor: 3.969

4.  ANNIE: integrated de novo protein sequence annotation.

Authors:  Hong Sain Ooi; Chia Yee Kwo; Michael Wildpaner; Fernanda L Sirota; Birgit Eisenhaber; Sebastian Maurer-Stroh; Wing Cheong Wong; Alexander Schleiffer; Frank Eisenhaber; Georg Schneider
Journal:  Nucleic Acids Res       Date:  2009-04-23       Impact factor: 16.971

5.  In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.

Authors:  Meghana Chitale; Ishita K Khan; Daisuke Kihara
Journal:  BMC Bioinformatics       Date:  2013-02-28       Impact factor: 3.169

6.  Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

Authors:  Paolo Fontana; Alessandro Cestaro; Riccardo Velasco; Elide Formentin; Stefano Toppo
Journal:  PLoS One       Date:  2009-02-27       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.