Literature DB >> 21393656

Identifying relevant data for a biological database: handcrafted rules versus machine learning.

Aditya Kumar Sehgal1, Sanmay Das, Keith Noto, Milton H Saier, Charles Elkan.   

Abstract

With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21393656      PMCID: PMC3980937          DOI: 10.1109/TCBB.2009.83

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  16 in total

1.  New support vector algorithms

Authors: 
Journal:  Neural Comput       Date:  2000-05       Impact factor: 2.026

2.  Estimating the support of a high-dimensional distribution.

Authors:  B Schölkopf; J C Platt; J Shawe-Taylor; A J Smola; R C Williamson
Journal:  Neural Comput       Date:  2001-07       Impact factor: 2.026

3.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro.

Authors:  Evelyn Camon; Michele Magrane; Daniel Barrell; David Binns; Wolfgang Fleischmann; Paul Kersey; Nicola Mulder; Tom Oinn; John Maslen; Anthony Cox; Rolf Apweiler
Journal:  Genome Res       Date:  2003-03-12       Impact factor: 9.043

4.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup.

Authors:  Alexander S Yeh; Lynette Hirschman; Alexander A Morgan
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

5.  Evaluation of biomedical text-mining systems: lessons learned from information retrieval.

Authors:  William Hersh
Journal:  Brief Bioinform       Date:  2005-12       Impact factor: 11.622

6.  PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

Authors:  Ian Donaldson; Joel Martin; Berry de Bruijn; Cheryl Wolting; Vicki Lay; Brigitte Tuekam; Shudong Zhang; Berivan Baskin; Gary D Bader; Katerina Michalickova; Tony Pawson; Christopher W V Hogue
Journal:  BMC Bioinformatics       Date:  2003-03-27       Impact factor: 3.169

7.  TCDB: the Transporter Classification Database for membrane transport protein analyses and information.

Authors:  Milton H Saier; Can V Tran; Ravi D Barabote
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

8.  FlyBase: anatomical data, images and queries.

Authors:  Gary Grumbling; Victor Strelets
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

9.  Text-mining and information-retrieval services for molecular biology.

Authors:  Martin Krallinger; Alfonso Valencia
Journal:  Genome Biol       Date:  2005-06-28       Impact factor: 13.583

10.  Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009.

Authors:  Michael Y Galperin; Guy R Cochrane
Journal:  Nucleic Acids Res       Date:  2008-11-25       Impact factor: 16.971

View more
  2 in total

1.  The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.

Authors:  Martin Krallinger; Miguel Vazquez; Florian Leitner; David Salgado; Andrew Chatr-Aryamontri; Andrew Winter; Livia Perfetto; Leonardo Briganti; Luana Licata; Marta Iannuccelli; Luisa Castagnoli; Gianni Cesareni; Mike Tyers; Gerold Schneider; Fabio Rinaldi; Robert Leaman; Graciela Gonzalez; Sergio Matos; Sun Kim; W John Wilbur; Luis Rocha; Hagit Shatkay; Ashish V Tendulkar; Shashank Agarwal; Feifan Liu; Xinglong Wang; Rafal Rak; Keith Noto; Charles Elkan; Zhiyong Lu; Rezarta Islamaj Dogan; Jean-Fred Fontaine; Miguel A Andrade-Navarro; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

2.  The transporter classification database.

Authors:  Milton H Saier; Vamsee S Reddy; Dorjee G Tamang; Ake Västermark
Journal:  Nucleic Acids Res       Date:  2013-11-12       Impact factor: 16.971

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.