Literature DB >> 18687039

Validating annotations for uncharacterized proteins in Shewanella oneidensis.

Brenton Louie1, Peter Tarczy-Hornoch, Roger Higdon, Eugene Kolker.   

Abstract

Proteins of unknown function are a barrier to our understanding of molecular biology. Assigning function to these "uncharacterized" proteins is imperative, but challenging. The usual approach is similarity searches using annotation databases, which are useful for predicting function. However, since the performance of these databases on uncharacterized proteins is basically unknown, the accuracy of their predictions is suspect, making annotation difficult. To address this challenge, we developed a benchmark annotation dataset of 30 proteins in Shewanella oneidensis. The proteins in the dataset were originally uncharacterized after the initial annotation of the S. oneidensis proteome in 2002. In the intervening 5 years, the accumulation of new experimental evidence has enabled specific functions to be predicted. We utilized this benchmark dataset to evaluate several commonly utilized annotation databases. According to our criteria, six annotation databases accurately predicted functions for at least 60% of proteins in our dataset. Two of these six even had a "conditional accuracy" of 90%. Conditional accuracy is another evaluation metric we developed which excludes results from databases where no function was predicted. Also, 27 of the 30 proteins' functions were correctly predicted by at least one database. These represent one of the first performance evaluations of annotation databases on uncharacterized proteins. Our evaluation indicates that these databases readily incorporate new information and are accurate in predicting functions for uncharacterized proteins, provided that experimental function evidence exists.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18687039      PMCID: PMC3189009          DOI: 10.1089/omi.2008.0051

Source DB:  PubMed          Journal:  OMICS        ISSN: 1536-2310


  24 in total

1.  Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations.

Authors:  S Henikoff; J G Henikoff; S Pietrokovski
Journal:  Bioinformatics       Date:  1999-06       Impact factor: 6.937

2.  CDD: a database of conserved domain alignments with links to domain three-dimensional structure.

Authors:  Aron Marchler-Bauer; Anna R Panchenko; Benjamin A Shoemaker; Paul A Thiessen; Lewis Y Geer; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

3.  The COG database: a tool for genome-scale analysis of protein functions and evolution.

Authors:  R L Tatusov; M Y Galperin; D A Natale; E V Koonin
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 4.  Automatic annotation of protein function.

Authors:  Alfonso Valencia
Journal:  Curr Opin Struct Biol       Date:  2005-06       Impact factor: 6.809

5.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

Authors:  S E Brenner; C Chothia; T J Hubbard
Journal:  Proc Natl Acad Sci U S A       Date:  1998-05-26       Impact factor: 11.205

6.  SMART, a simple modular architecture research tool: identification of signaling domains.

Authors:  J Schultz; F Milpetz; P Bork; C P Ponting
Journal:  Proc Natl Acad Sci U S A       Date:  1998-05-26       Impact factor: 11.205

7.  Pfam: a comprehensive database of protein domain families based on seed alignments.

Authors:  E L Sonnhammer; S R Eddy; R Durbin
Journal:  Proteins       Date:  1997-07

8.  The SWISS-PROT protein sequence data bank and its new supplement TREMBL.

Authors:  A Bairoch; R Apweiler
Journal:  Nucleic Acids Res       Date:  1996-01-01       Impact factor: 16.971

9.  Global profiling of Shewanella oneidensis MR-1: expression of hypothetical genes and improved functional annotations.

Authors:  Eugene Kolker; Alex F Picone; Michael Y Galperin; Margaret F Romine; Roger Higdon; Kira S Makarova; Natali Kolker; Gordon A Anderson; Xiaoyun Qiu; Kenneth J Auberry; Gyorgy Babnigg; Alex S Beliaev; Paul Edlefsen; Dwayne A Elias; Yuri A Gorby; Ted Holzman; Joel A Klappenbach; Konstantinos T Konstantinidis; Miriam L Land; Mary S Lipton; Lee-Ann McCue; Matthew Monroe; Ljiljana Pasa-Tolic; Grigoriy Pinchuk; Samuel Purvine; Margrethe H Serres; Sasha Tsapin; Brian A Zakrajsek; Wenhong Zhu; Jizhong Zhou; Frank W Larimer; Charles E Lawrence; Monica Riley; Frank R Collart; John R Yates; Richard D Smith; Carol S Giometti; Kenneth H Nealson; James K Fredrickson; James M Tiedje
Journal:  Proc Natl Acad Sci U S A       Date:  2005-01-31       Impact factor: 11.205

10.  The Universal Protein Resource (UniProt).

Authors: 
Journal:  Nucleic Acids Res       Date:  2006-11-16       Impact factor: 16.971

View more
  3 in total

1.  Modeling sequence and function similarity between proteins for protein functional annotation.

Authors:  Roger Higdon; Brenton Louie; Eugene Kolker
Journal:  Proc Int Symp High Perform Distrib Comput       Date:  2010

2.  Exploring general-purpose protein features for distinguishing enzymes and non-enzymes within the twilight zone.

Authors:  Yasser B Ruiz-Blanco; Guillermin Agüero-Chapin; Enrique García-Hernández; Orlando Álvarez; Agostinho Antunes; James Green
Journal:  BMC Bioinformatics       Date:  2017-07-21       Impact factor: 3.169

3.  A statistical model of protein sequence similarity and function similarity reveals overly-specific function predictions.

Authors:  Brenton Louie; Roger Higdon; Eugene Kolker
Journal:  PLoS One       Date:  2009-10-21       Impact factor: 3.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.