Literature DB >> 16779021

Empirical data on corpus design and usage in biomedical natural language processing.

K Bretonnel Cohen1, Lynne Fox, Philip V Ogren, Lawrence Hunter.   

Abstract

This paper describes the design of six publicly available biomedical corpora. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have implications for the design of the next generation of biomedical corpora.

Mesh:

Year:  2005        PMID: 16779021      PMCID: PMC1560643     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  4 in total

1.  Constructing biological knowledge bases by extracting information from text sources.

Authors:  M Craven; J Kumlien
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1999

2.  Automatic extraction of biological information from scientific text: protein-protein interactions.

Authors:  C Blaschke; M A Andrade; C Ouzounis; A Valencia
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1999

3.  Protein names and how to find them.

Authors:  Kristofer Franzén; Gunnar Eriksson; Fredrik Olsson; Lars Asker; Per Lidén; Joakim Cöster
Journal:  Int J Med Inform       Date:  2002-12-04       Impact factor: 4.046

4.  GENETAG: a tagged corpus for gene/protein named entity recognition.

Authors:  Lorraine Tanabe; Natalie Xie; Lynne H Thom; Wayne Matten; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

  4 in total
  5 in total

1.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

Authors:  Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute
Journal:  J Am Med Inform Assoc       Date:  2010 Sep-Oct       Impact factor: 4.497

2.  Detection of gene interactions based on syntactic relations.

Authors:  Mi-Young Kim
Journal:  J Biomed Biotechnol       Date:  2008

3.  Chapter 16: text mining for translational bioinformatics.

Authors:  K Bretonnel Cohen; Lawrence E Hunter
Journal:  PLoS Comput Biol       Date:  2013-04-25       Impact factor: 4.475

4.  A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools.

Authors:  Karin Verspoor; Kevin Bretonnel Cohen; Arrick Lanfranchi; Colin Warner; Helen L Johnson; Christophe Roeder; Jinho D Choi; Christopher Funk; Yuriy Malenkiy; Miriam Eckert; Nianwen Xue; William A Baumgartner; Michael Bada; Martha Palmer; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2012-08-17       Impact factor: 3.169

5.  Corpus refactoring: a feasibility study.

Authors:  Helen L Johnson; William A Baumgartner; Martin Krallinger; K Bretonnel Cohen; Lawrence Hunter
Journal:  J Biomed Discov Collab       Date:  2007-09-13
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.