Literature DB >> 18834488

Automating curation using a natural language processing pipeline.

Beatrice Alex1, Claire Grover, Barry Haddow, Mijail Kabadjov, Ewan Klein, Michael Matthews, Richard Tobin, Xinglong Wang.   

Abstract

BACKGROUND: The tasks in BioCreative II were designed to approximate some of the laborious work involved in curating biomedical research papers. The approach to these tasks taken by the University of Edinburgh team was to adapt and extend the existing natural language processing (NLP) system that we have developed as part of a commercial curation assistant. Although this paper concentrates on using NLP to assist with curation, the system can be equally employed to extract types of information from the literature that is immediately relevant to biologists in general.
RESULTS: Our system was among the highest performing on the interaction subtasks, and competitive performance on the gene mention task was achieved with minimal development effort. For the gene normalization task, a string matching technique that can be quickly applied to new domains was shown to perform close to average.
CONCLUSION: The technologies being developed were shown to be readily adapted to the BioCreative II tasks. Although high performance may be obtained on individual tasks such as gene mention recognition and normalization, and document classification, tasks in which a number of components must be combined, such as detection and normalization of interacting protein pairs, are still challenging for NLP systems.

Entities:  

Mesh:

Year:  2008        PMID: 18834488      PMCID: PMC2559981          DOI: 10.1186/gb-2008-9-s2-s10

Source DB:  PubMed          Journal:  Genome Biol        ISSN: 1474-7596            Impact factor:   13.583


  14 in total

1.  A simple algorithm for identifying abbreviation definitions in biomedical text.

Authors:  Ariel S Schwartz; Marti A Hearst
Journal:  Pac Symp Biocomput       Date:  2003

2.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup.

Authors:  Alexander S Yeh; Lynette Hirschman; Alexander A Morgan
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

3.  MedPost: a part-of-speech tagger for bioMedical text.

Authors:  L Smith; T Rindflesch; W J Wilbur
Journal:  Bioinformatics       Date:  2004-04-08       Impact factor: 6.937

4.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Authors:  A Bairoch; R Apweiler
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

5.  Investigation into biomedical literature classification using support vector machines.

Authors:  Nalini Polavarapu; Shamkant B Navathe; Ramprasad Ramnarayanan; Abrar ul Haque; Saurav Sahay; Ying Liu
Journal:  Proc IEEE Comput Syst Bioinform Conf       Date:  2005

6.  PreBIND and Textomy--mining the biomedical literature for protein-protein interactions using a support vector machine.

Authors:  Ian Donaldson; Joel Martin; Berry de Bruijn; Cheryl Wolting; Vicki Lay; Brigitte Tuekam; Shudong Zhang; Berivan Baskin; Gary D Bader; Katerina Michalickova; Tony Pawson; Christopher W V Hogue
Journal:  BMC Bioinformatics       Date:  2003-03-27       Impact factor: 3.169

7.  Probabilistic linkage of large public health data files.

Authors:  M A Jaro
Journal:  Stat Med       Date:  1995 Mar 15-Apr 15       Impact factor: 2.373

8.  Identifying gene and protein mentions in text using conditional random fields.

Authors:  Ryan McDonald; Fernando Pereira
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

9.  Facts from text--is text mining ready to deliver?

Authors:  Dietrich Rebholz-Schuhmann; Harald Kirsch; Francisco Couto
Journal:  PLoS Biol       Date:  2005-02       Impact factor: 8.029

10.  Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Authors:  Martin Krallinger; Florian Leitner; Carlos Rodriguez-Penagos; Alfonso Valencia
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more
  5 in total

1.  A literature search tool for intelligent extraction of disease-associated genes.

Authors:  Jae-Yoon Jung; Todd F DeLuca; Tristan H Nelson; Dennis P Wall
Journal:  J Am Med Inform Assoc       Date:  2013-09-02       Impact factor: 4.497

2.  Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature.

Authors:  Xinglong Wang; Rafal Rak; Angelo Restificar; Chikashi Nobata; C J Rupp; Riza Theresa B Batista-Navarro; Raheel Nawaz; Sophia Ananiadou
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

3.  Detection of interaction articles and experimental methods in biomedical literature.

Authors:  Gerold Schneider; Simon Clematide; Fabio Rinaldi
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

4.  Introducing meta-services for biomedical information extraction.

Authors:  Florian Leitner; Martin Krallinger; Carlos Rodriguez-Penagos; Jörg Hakenberg; Conrad Plake; Cheng-Ju Kuo; Chun-Nan Hsu; Richard Tzong-Han Tsai; Hsi-Chuan Hung; William W Lau; Calvin A Johnson; Rune Saetre; Kazuhiro Yoshida; Yan Hua Chen; Sun Kim; Soo-Yong Shin; Byoung-Tak Zhang; William A Baumgartner; Lawrence Hunter; Barry Haddow; Michael Matthews; Xinglong Wang; Patrick Ruch; Frédéric Ehrler; Arzucan Ozgür; Güneş Erkan; Dragomir R Radev; Michael Krauthammer; ThaiBinh Luong; Robert Hoffmann; Chris Sander; Alfonso Valencia
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

5.  Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Authors:  Martin Krallinger; Florian Leitner; Carlos Rodriguez-Penagos; Alfonso Valencia
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.