Literature DB >> 11673236

Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT.

E Kretschmann1, W Fleischmann, R Apweiler.   

Abstract

MOTIVATION: The gap between the amount of newly submitted protein data and reliable functional annotation in public databases is growing. Traditional manual annotation by literature curation and sequence analysis tools without the use of automated annotation systems is not able to keep up with the ever increasing quantity of data that is submitted. Automated supplements to manually curated databases such as TrEMBL or GenPept cover raw data but provide only limited annotation. To improve this situation automatic tools are needed that support manual annotation, automatically increase the amount of reliable information and help to detect inconsistencies in manually generated annotations.
RESULTS: A standard data mining algorithm was successfully applied to gain knowledge about the Keyword annotation in SWISS-PROT. 11 306 rules were generated, which are provided in a database and can be applied to yet unannotated protein sequences and viewed using a web browser. They rely on the taxonomy of the organism, in which the protein was found and on signature matches of its sequence. The statistical evaluation of the generated rules by cross-validation suggests that by applying them on arbitrary proteins 33% of their keyword annotation can be generated with an error rate of 1.5%. The coverage rate of the keyword annotation can be increased to 60% by tolerating a higher error rate of 5%. AVAILABILITY: The results of the automatic data mining process can be browsed on http://golgi.ebi.ac.uk:8080/Spearmint/ Source code is available upon request. CONTACT: kretsch@ebi.ac.uk.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11673236     DOI: 10.1093/bioinformatics/17.10.920

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  25 in total

1.  Improvements to CluSTr: the database of SWISS-PROT+TrEMBL protein clusters.

Authors:  E V Kriventseva; F Servant; R Apweiler
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

2.  Plant protein annotation in the UniProt Knowledgebase.

Authors:  Michel Schneider; Amos Bairoch; Cathy H Wu; Rolf Apweiler
Journal:  Plant Physiol       Date:  2005-05       Impact factor: 8.340

Review 3.  In silico characterization of proteins: UniProt, InterPro and Integr8.

Authors:  Nicola Jane Mulder; Paul Kersey; Manuela Pruess; Rolf Apweiler
Journal:  Mol Biotechnol       Date:  2007-10-04       Impact factor: 2.695

Review 4.  Genome and proteome annotation: organization, interpretation and integration.

Authors:  Gabrielle A Reeves; David Talavera; Janet M Thornton
Journal:  J R Soc Interface       Date:  2009-02-06       Impact factor: 4.118

5.  Missing in action: enzyme functional annotations in biological databases.

Authors:  Nicholas Furnham; John S Garavelli; Rolf Apweiler; Janet M Thornton
Journal:  Nat Chem Biol       Date:  2009-08       Impact factor: 15.040

6.  Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection.

Authors:  Zheng Rong Yang
Journal:  Bioinformatics       Date:  2005-03-29       Impact factor: 6.937

7.  Identifying relevant data for a biological database: handcrafted rules versus machine learning.

Authors:  Aditya Kumar Sehgal; Sanmay Das; Keith Noto; Milton H Saier; Charles Elkan
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2011 May-Jun       Impact factor: 3.710

8.  Deep sequencing of the vaginal microbiota of women with HIV.

Authors:  Ruben Hummelen; Andrew D Fernandes; Jean M Macklaim; Russell J Dickson; John Changalucha; Gregory B Gloor; Gregor Reid
Journal:  PLoS One       Date:  2010-08-12       Impact factor: 3.240

9.  Predicting DNA-binding specificities of eukaryotic transcription factors.

Authors:  Adrian Schröder; Johannes Eichner; Jochen Supper; Jonas Eichner; Dierk Wanke; Carsten Henneges; Andreas Zell
Journal:  PLoS One       Date:  2010-11-30       Impact factor: 3.240

10.  Towards a semi-automatic functional annotation tool based on decision-tree techniques.

Authors:  Jérôme Azé; Lucie Gentils; Claire Toffano-Nioche; Valentin Loux; Jean-François Gibrat; Philippe Bessières; Céline Rouveirol; Anne Poupon; Christine Froidevaux
Journal:  BMC Proc       Date:  2008-12-17
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.