Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Validating annotations for uncharacterized proteins in Shewanella oneidensis.

Literature DB >> 18687039

Validating annotations for uncharacterized proteins in Shewanella oneidensis.

Brenton Louie¹, Peter Tarczy-Hornoch, Roger Higdon, Eugene Kolker.

Abstract

Proteins of unknown function are a barrier to our understanding of molecular biology. Assigning function to these "uncharacterized" proteins is imperative, but challenging. The usual approach is similarity searches using annotation databases, which are useful for predicting function. However, since the performance of these databases on uncharacterized proteins is basically unknown, the accuracy of their predictions is suspect, making annotation difficult. To address this challenge, we developed a benchmark annotation dataset of 30 proteins in Shewanella oneidensis. The proteins in the dataset were originally uncharacterized after the initial annotation of the S. oneidensis proteome in 2002. In the intervening 5 years, the accumulation of new experimental evidence has enabled specific functions to be predicted. We utilized this benchmark dataset to evaluate several commonly utilized annotation databases. According to our criteria, six annotation databases accurately predicted functions for at least 60% of proteins in our dataset. Two of these six even had a "conditional accuracy" of 90%. Conditional accuracy is another evaluation metric we developed which excludes results from databases where no function was predicted. Also, 27 of the 30 proteins' functions were correctly predicted by at least one database. These represent one of the first performance evaluations of annotation databases on uncharacterized proteins. Our evaluation indicates that these databases readily incorporate new information and are accurate in predicting functions for uncharacterized proteins, provided that experimental function evidence exists.

Entities: Species

Mesh：

Substances：
Bacterial Proteins

Year: 2008 PMID： 18687039 PMCID： PMC3189009 DOI： 10.1089/omi.2008.0051

Source DB: PubMed Journal: OMICS ISSN： 1536-2310

24 in total

1. Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations.

Authors: S Henikoff; J G Henikoff; S Pietrokovski
Journal: Bioinformatics Date: 1999-06 Impact factor: 6.937

2. CDD: a database of conserved domain alignments with links to domain three-dimensional structure.

Authors: Aron Marchler-Bauer; Anna R Panchenko; Benjamin A Shoemaker; Paul A Thiessen; Lewis Y Geer; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

3. The COG database: a tool for genome-scale analysis of protein functions and evolution.

Authors: R L Tatusov; M Y Galperin; D A Natale; E V Koonin
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

Review 4. Automatic annotation of protein function.

Authors: Alfonso Valencia
Journal: Curr Opin Struct Biol Date: 2005-06 Impact factor: 6.809

5. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships.

Authors: S E Brenner; C Chothia; T J Hubbard
Journal: Proc Natl Acad Sci U S A Date: 1998-05-26 Impact factor: 11.205

6. SMART, a simple modular architecture research tool: identification of signaling domains.

Authors: J Schultz; F Milpetz; P Bork; C P Ponting
Journal: Proc Natl Acad Sci U S A Date: 1998-05-26 Impact factor: 11.205

7. Pfam: a comprehensive database of protein domain families based on seed alignments.

Authors: E L Sonnhammer; S R Eddy; R Durbin
Journal: Proteins Date: 1997-07

8. The SWISS-PROT protein sequence data bank and its new supplement TREMBL.

Authors: A Bairoch; R Apweiler
Journal: Nucleic Acids Res Date: 1996-01-01 Impact factor: 16.971

9. Global profiling of Shewanella oneidensis MR-1: expression of hypothetical genes and improved functional annotations.

Authors: Eugene Kolker; Alex F Picone; Michael Y Galperin; Margaret F Romine; Roger Higdon; Kira S Makarova; Natali Kolker; Gordon A Anderson; Xiaoyun Qiu; Kenneth J Auberry; Gyorgy Babnigg; Alex S Beliaev; Paul Edlefsen; Dwayne A Elias; Yuri A Gorby; Ted Holzman; Joel A Klappenbach; Konstantinos T Konstantinidis; Miriam L Land; Mary S Lipton; Lee-Ann McCue; Matthew Monroe; Ljiljana Pasa-Tolic; Grigoriy Pinchuk; Samuel Purvine; Margrethe H Serres; Sasha Tsapin; Brian A Zakrajsek; Wenhong Zhu; Jizhong Zhou; Frank W Larimer; Charles E Lawrence; Monica Riley; Frank R Collart; John R Yates; Richard D Smith; Carol S Giometti; Kenneth H Nealson; James K Fredrickson; James M Tiedje
Journal: Proc Natl Acad Sci U S A Date: 2005-01-31 Impact factor: 11.205

3. A statistical model of protein sequence similarity and function similarity reveals overly-specific function predictions.

Authors: Brenton Louie; Roger Higdon; Eugene Kolker
Journal: PLoS One Date: 2009-10-21 Impact factor: 3.240

3 in total