Literature DB >> 15262818

Filtering erroneous protein annotation.

D Wieser1, E Kretschmann, R Apweiler.   

Abstract

MOTIVATION: Automatically generated annotation on protein data of UniProt (Universal Protein Resource) is planned to be publicly available on the UniProt web pages in April 2004. It is expected that the data content of over 500,000 protein entries in the TrEMBL section will be enhanced by the output of an automated annotation pipeline. However, a part of the automatically added data will be erroneous, as are parts of the information coming from other sources. We present a post-processing system called Xanthippe that is based on a simple exclusion mechanism and a decision tree approach using the C4.5 data-mining algorithm.
RESULTS: It is shown that Xanthippe detects and flags a large part of the annotation errors and considerably increases the reliability of both automatically generated data and annotation from other sources. As a cross-validation to Swiss-Prot shows, errors in protein descriptions, comments and keywords are successfully filtered out. Xanthippe is a contradictive application that can be combined seamlessly with predictive systems. It can be used either to improve the precision of automated annotation at a constant level of recall or increase the recall at a constant level of precision. AVAILABILITY: The application of the Xanthippe rules can be browsed at http://www.ebi.uniprot.org/

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15262818     DOI: 10.1093/bioinformatics/bth938

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  12 in total

Review 1.  In silico characterization of proteins: UniProt, InterPro and Integr8.

Authors:  Nicola Jane Mulder; Paul Kersey; Manuela Pruess; Rolf Apweiler
Journal:  Mol Biotechnol       Date:  2007-10-04       Impact factor: 2.695

Review 2.  Genome and proteome annotation: organization, interpretation and integration.

Authors:  Gabrielle A Reeves; David Talavera; Janet M Thornton
Journal:  J R Soc Interface       Date:  2009-02-06       Impact factor: 4.118

Review 3.  Path to improving the life cycle and quality of genome-scale models of metabolism.

Authors:  Yara Seif; Bernhard Ørn Palsson
Journal:  Cell Syst       Date:  2021-09-22       Impact factor: 11.091

4.  Probabilistic annotation of protein sequences based on functional classifications.

Authors:  Emmanuel D Levy; Christos A Ouzounis; Walter R Gilks; Benjamin Audit
Journal:  BMC Bioinformatics       Date:  2005-12-14       Impact factor: 3.169

5.  The Universal Protein Resource (UniProt).

Authors: 
Journal:  Nucleic Acids Res       Date:  2006-11-16       Impact factor: 16.971

6.  The Universal Protein Resource (UniProt).

Authors:  Amos Bairoch; Rolf Apweiler; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

7.  The Universal Protein Resource (UniProt) in 2010.

Authors: 
Journal:  Nucleic Acids Res       Date:  2009-10-20       Impact factor: 16.971

8.  Automatic policing of biochemical annotations using genomic correlations.

Authors:  Tzu-Lin Hsiao; Olga Revelles; Lifeng Chen; Uwe Sauer; Dennis Vitkup
Journal:  Nat Chem Biol       Date:  2009-11-22       Impact factor: 15.040

9.  The Universal Protein Resource (UniProt) 2009.

Authors: 
Journal:  Nucleic Acids Res       Date:  2008-10-04       Impact factor: 16.971

10.  The universal protein resource (UniProt).

Authors: 
Journal:  Nucleic Acids Res       Date:  2007-11-27       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.