Literature DB >> 16337567

Domain-specific language models and lexicons for tagging.

Anni R Coden1, Serguei V Pakhomov, Rie K Ando, Patrick H Duffy, Christopher G Chute.   

Abstract

Accurate and reliable part-of-speech tagging is useful for many Natural Language Processing (NLP) tasks that form the foundation of NLP-based approaches to information retrieval and data mining. In general, large annotated corpora are necessary to achieve desired part-of-speech tagger accuracy. We show that a large annotated general-English corpus is not sufficient for building a part-of-speech tagger model adequate for tagging documents from the medical domain. However, adding a quite small domain-specific corpus to a large general-English one boosts performance to over 92% accuracy from 87% in our studies. We also suggest a number of characteristics to quantify the similarities between a training corpus and the test data. These results give guidance for creating an appropriate corpus for building a part-of-speech tagger model that gives satisfactory accuracy results on a new domain at a relatively small cost.

Mesh:

Year:  2005        PMID: 16337567     DOI: 10.1016/j.jbi.2005.02.009

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  11 in total

1.  Part-of-speech tagging for clinical text: wall or bridge between institutions?

Authors:  Jung-wei Fan; Rashmi Prasad; Rommel M Yabut; Richard M Loomis; Daniel S Zisook; John E Mattison; Yang Huang
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

2.  Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules.

Authors:  Siddhartha Reddy Jonnalagadda; Dingcheng Li; Sunghwan Sohn; Stephen Tze-Inn Wu; Kavishwar Wagholikar; Manabu Torii; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2012-06-16       Impact factor: 4.497

3.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

Authors:  Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute
Journal:  J Am Med Inform Assoc       Date:  2010 Sep-Oct       Impact factor: 4.497

Review 4.  Natural Language Processing methods and systems for biomedical ontology learning.

Authors:  Kaihong Liu; William R Hogan; Rebecca S Crowley
Journal:  J Biomed Inform       Date:  2010-07-18       Impact factor: 6.317

5.  Heuristic sample selection to minimize reference standard training set for a part-of-speech tagger.

Authors:  Kaihong Liu; Wendy Chapman; Rebecca Hwa; Rebecca S Crowley
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

Review 6.  Coreference resolution: a review of general methodologies and applications in the clinical domain.

Authors:  Jiaping Zheng; Wendy W Chapman; Rebecca S Crowley; Guergana K Savova
Journal:  J Biomed Inform       Date:  2011-08-12       Impact factor: 6.317

7.  Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation.

Authors:  Jeffrey P Ferraro; Hal Daumé; Scott L Duvall; Wendy W Chapman; Henk Harkema; Peter J Haug
Journal:  J Am Med Inform Assoc       Date:  2013-03-13       Impact factor: 4.497

8.  Concept annotation in the CRAFT corpus.

Authors:  Michael Bada; Miriam Eckert; Donald Evans; Kristin Garcia; Krista Shipley; Dmitry Sitnikov; William A Baumgartner; K Bretonnel Cohen; Karin Verspoor; Judith A Blake; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2012-07-09       Impact factor: 3.169

9.  Automated ancillary cancer history classification for mesothelioma patients from free-text clinical reports.

Authors:  Richard A Wilson; Wendy W Chapman; Shawn J Defries; Michael J Becich; Brian E Chapman
Journal:  J Pathol Inform       Date:  2010-10-11

10.  Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.

Authors:  Dietrich Rebholz-Schuhmann; Senay Kafkas; Jee-Hyub Kim; Chen Li; Antonio Jimeno Yepes; Robert Hoehndorf; Rolf Backofen; Ian Lewin
Journal:  J Biomed Semantics       Date:  2013-10-11
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.