Literature DB >> 12689350

PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

Ian Donaldson1, Joel Martin, Berry de Bruijn, Cheryl Wolting, Vicki Lay, Brigitte Tuekam, Shudong Zhang, Berivan Baskin, Gary D Bader, Katerina Michalickova, Tony Pawson, Christopher W V Hogue.   

Abstract

BACKGROUND: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND.
RESULTS: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.
CONCLUSIONS: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12689350      PMCID: PMC153503          DOI: 10.1186/1471-2105-4-11

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  24 in total

1.  BIND--The Biomolecular Interaction Network Database.

Authors:  G D Bader; I Donaldson; C Wolting; B F Ouellette; T Pawson; C W Hogue
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  RefSeq and LocusLink: NCBI gene-centered resources.

Authors:  K D Pruitt; D R Maglott
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

3.  Automatic extraction of protein interactions from scientific abstracts.

Authors:  J Thomas; D Milward; C Ouzounis; S Pulman; M Carroll
Journal:  Pac Symp Biocomput       Date:  2000

4.  Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures.

Authors:  K Humphreys; G Demetriou; R Gaizauskas
Journal:  Pac Symp Biocomput       Date:  2000

5.  The potential use of SUISEKI as a protein interaction discovery tool.

Authors:  C Blaschke; A Valencia
Journal:  Genome Inform       Date:  2001

6.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles.

Authors:  C Friedman; P Kra; H Yu; M Krauthammer; A Rzhetsky
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

7.  The NCBI data model.

Authors:  J M Ostell; S J Wheelan; J A Kans
Journal:  Methods Biochem Anal       Date:  2001

8.  A pragmatic information extraction strategy for gathering data on genetic interactions.

Authors:  D Proux; F Rechenmann; L Julliard
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  2000

9.  Database resources of the National Center for Biotechnology Information: 2002 update.

Authors:  David L Wheeler; Deanna M Church; Alex E Lash; Detlef D Leipe; Thomas L Madden; Joan U Pontius; Gregory D Schuler; Lynn M Schriml; Tatiana A Tatusova; Lukas Wagner; Barbara A Rapp
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

10.  SeqHound: biological sequence and structure database as a platform for bioinformatics research.

Authors:  Katerina Michalickova; Gary D Bader; Michel Dumontier; Hao Lieu; Doron Betel; Ruth Isserlin; Christopher W V Hogue
Journal:  BMC Bioinformatics       Date:  2002-10-25       Impact factor: 3.169

View more
  87 in total

1.  Computational approaches to protein-protein interaction.

Authors:  Giacomo Franzot; Oliviero Carugo
Journal:  J Struct Funct Genomics       Date:  2003

2.  Dragon TF Association Miner: a system for exploring transcription factor associations through text-mining.

Authors:  Hong Pan; Li Zuo; Vidhu Choudhary; Zhuo Zhang; Shoi Houi Leow; Fui Teen Chong; Yingliang Huang; Victor Wui Siong Ong; Bijayalaxmi Mohanty; Sin Lam Tan; S P T Krishnan; Vladimir B Bajic
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

Review 3.  Biomedical language processing: what's beyond PubMed?

Authors:  Lawrence Hunter; K Bretonnel Cohen
Journal:  Mol Cell       Date:  2006-03-03       Impact factor: 17.970

4.  Extraction of protein interaction data: a comparative analysis of methods in use.

Authors:  Hena Jose; Thangavel Vadivukarasi; Jyothi Devakumar
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

5.  Intrinsic evaluation of text mining tools may not predict performance on realistic tasks.

Authors:  J Gregory Caporaso; Nita Deshpande; J Lynn Fink; Philip E Bourne; K Bretonnel Cohen; Lawrence Hunter
Journal:  Pac Symp Biocomput       Date:  2008

6.  Cross-topic learning for work prioritization in systematic review creation and update.

Authors:  Aaron M Cohen; Kyle Ambert; Marian McDonagh
Journal:  J Am Med Inform Assoc       Date:  2009-06-30       Impact factor: 4.497

Review 7.  Allergen databases: current status and perspectives.

Authors:  Adriano Mari; Chiara Rasi; Paola Palazzo; Enrico Scala
Journal:  Curr Allergy Asthma Rep       Date:  2009-09       Impact factor: 4.806

8.  Towards classifying species in systems biology papers using text mining.

Authors:  Qi Wei; Nigel Collier
Journal:  BMC Res Notes       Date:  2011-02-04

9.  Textpresso: an ontology-based information retrieval and extraction system for biological literature.

Authors:  Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg
Journal:  PLoS Biol       Date:  2004-09-21       Impact factor: 8.029

10.  Finding falls in ambulatory care clinical documents using statistical text mining.

Authors:  James A McCart; Donald J Berndt; Jay Jarman; Dezon K Finch; Stephen L Luther
Journal:  J Am Med Inform Assoc       Date:  2012-12-15       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.