Literature DB >> 25101328

Modeling sequence and function similarity between proteins for protein functional annotation.

Roger Higdon1, Brenton Louie2, Eugene Kolker3.   

Abstract

A common task in biological research is to predict function for proteins by comparing sequences between proteins of known and unknown function. This is often done using pair-wise sequence alignment algorithms (e.g. BLAST). A problem with this approach is the assumption of a simple equivalence between a minimum sequence similarity threshold and the function similarity between proteins. This assumption is based on the binary concept of homology in that proteins are or not homologous. The relationship between sequence and function however is more complex as well as pertinent for predicting protein function, e.g. evaluating BLAST alignments or developing training sets for profile models based on functional rather than homologous groupings. Our motivation for this study was to model sequence and function similarity between proteins to gain insights into the "sequence-function similarity relationship between proteins for predicting function. Using our model we found that function similarity generally increases with sequence similarity but with a high degree of variability. This result has implications for pair-wise approaches in that it appears sequence similarity must be very high to ensure high function similarity. Profile models which enable higher sensitivity are a potential solution. However, multiple sequences alignments (a necessary prerequisite) are a problem in that current algorithms have difficulty aligning sequences with very low sequence similarity, which is common in our data set, or are intractable for high numbers of sequences. Given the importance of predicting protein function and the need for multiple sequence alignments, algorithms for accomplishing this task should be further refined and developed.

Entities:  

Keywords:  Bioinformatics; Biostatistics; Experimentation; Multiple Sequence Alignment

Year:  2010        PMID: 25101328      PMCID: PMC4120521          DOI: 10.1145/1851476.1851548

Source DB:  PubMed          Journal:  Proc Int Symp High Perform Distrib Comput


  22 in total

1.  Twilight zone of protein sequence alignments.

Authors:  B Rost
Journal:  Protein Eng       Date:  1999-02

2.  Errors in genome annotation.

Authors:  S E Brenner
Journal:  Trends Genet       Date:  1999-04       Impact factor: 11.639

3.  Practical limits of function prediction.

Authors:  D Devos; A Valencia
Journal:  Proteins       Date:  2000-10-01

4.  Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation.

Authors:  P W Lord; R D Stevens; A Brass; C A Goble
Journal:  Bioinformatics       Date:  2003-07-01       Impact factor: 6.937

5.  Analysis and comparison of benchmarks for multiple sequence alignment.

Authors:  Gordon Blackshields; Iain M Wallace; Mark Larkin; Desmond G Higgins
Journal:  In Silico Biol       Date:  2006

6.  Comparative analysis of multiple protein-sequence alignment methods.

Authors:  M A McClure; T K Vasi; W M Fitch
Journal:  Mol Biol Evol       Date:  1994-07       Impact factor: 16.240

7.  Initial proteome analysis of model microorganism Haemophilus influenzae strain Rd KW20.

Authors:  Eugene Kolker; Samuel Purvine; Michael Y Galperin; Serg Stolyar; David R Goodlett; Alexey I Nesvizhskii; Andrew Keller; Tao Xie; Jimmy K Eng; Eugene Yi; Leroy Hood; Alex F Picone; Tim Cherny; Brian C Tjaden; Andrew F Siegel; Thomas J Reilly; Kira S Makarova; Bernhard O Palsson; Arnold L Smith
Journal:  J Bacteriol       Date:  2003-08       Impact factor: 3.490

8.  Identification and functional analysis of 'hypothetical' genes expressed in Haemophilus influenzae.

Authors:  Eugene Kolker; Kira S Makarova; Svetlana Shabalina; Alex F Picone; Samuel Purvine; Ted Holzman; Tim Cherny; David Armbruster; Robert S Munson; Grigory Kolesov; Dmitrij Frishman; Michael Y Galperin
Journal:  Nucleic Acids Res       Date:  2004-04-30       Impact factor: 16.971

9.  Entrez Gene: gene-centered information at NCBI.

Authors:  Donna Maglott; Jim Ostell; Kim D Pruitt; Tatiana Tatusova
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

10.  A statistical model of protein sequence similarity and function similarity reveals overly-specific function predictions.

Authors:  Brenton Louie; Roger Higdon; Eugene Kolker
Journal:  PLoS One       Date:  2009-10-21       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.