Literature DB >> 23486109

Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation.

Jeffrey P Ferraro1, Hal Daumé, Scott L Duvall, Wendy W Chapman, Henk Harkema, Peter J Haug.   

Abstract

OBJECTIVE: Natural language processing (NLP) tasks are commonly decomposed into subtasks, chained together to form processing pipelines. The residual error produced in these subtasks propagates, adversely affecting the end objectives. Limited availability of annotated clinical data remains a barrier to reaching state-of-the-art operating characteristics using statistically based NLP tools in the clinical domain. Here we explore the unique linguistic constructions of clinical texts and demonstrate the loss in operating characteristics when out-of-the-box part-of-speech (POS) tagging tools are applied to the clinical domain. We test a domain adaptation approach integrating a novel lexical-generation probability rule used in a transformation-based learner to boost POS performance on clinical narratives.
METHODS: Two target corpora from independent healthcare institutions were constructed from high frequency clinical narratives. Four leading POS taggers with their out-of-the-box models trained from general English and biomedical abstracts were evaluated against these clinical corpora. A high performing domain adaptation method, Easy Adapt, was compared to our newly proposed method ClinAdapt.
RESULTS: The evaluated POS taggers drop in accuracy by 8.5-15% when tested on clinical narratives. The highest performing tagger reports an accuracy of 88.6%. Domain adaptation with Easy Adapt reports accuracies of 88.3-91.0% on clinical texts. ClinAdapt reports 93.2-93.9%.
CONCLUSIONS: ClinAdapt successfully boosts POS tagging performance through domain adaptation requiring a modest amount of annotated clinical data. Improving the performance of critical NLP subtasks is expected to reduce pipeline error propagation leading to better overall results on complex processing tasks.

Entities:  

Keywords:  Clinical Narratives; Domain Adaptation; NLP; Natural Language Processing; POS Tagging

Mesh:

Year:  2013        PMID: 23486109      PMCID: PMC3756264          DOI: 10.1136/amiajnl-2012-001453

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  15 in total

1.  Comparing syntactic complexity in medical and non-medical corpora.

Authors:  D A Campbell; S B Johnson
Journal:  Proc AMIA Symp       Date:  2001

2.  GENIA corpus--semantically annotated corpus for bio-textmining.

Authors:  J-D Kim; T Ohta; Y Tateisi; J Tsujii
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

3.  MedPost: a part-of-speech tagger for bioMedical text.

Authors:  L Smith; T Rindflesch; W J Wilbur
Journal:  Bioinformatics       Date:  2004-04-08       Impact factor: 6.937

4.  Part-of-speech tagging for clinical text: wall or bridge between institutions?

Authors:  Jung-wei Fan; Rashmi Prasad; Rommel M Yabut; Richard M Loomis; Daniel S Zisook; John E Mattison; Yang Huang
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

5.  Developing a corpus of clinical notes manually annotated for part-of-speech.

Authors:  Serguei V Pakhomov; Anni Coden; Christopher G Chute
Journal:  Int J Med Inform       Date:  2005-09-19       Impact factor: 4.046

6.  Domain-specific language models and lexicons for tagging.

Authors:  Anni R Coden; Serguei V Pakhomov; Rie K Ando; Patrick H Duffy; Christopher G Chute
Journal:  J Biomed Inform       Date:  2005-04-02       Impact factor: 6.317

7.  The distinction between linguistic and conceptual semantics in medical terminology and its implication for NLP-based knowledge acquisition.

Authors:  W Ceusters; F Buekens; G De Moor; A Waagmeester
Journal:  Methods Inf Med       Date:  1998-11       Impact factor: 2.176

8.  A natural language parsing system for encoding admitting diagnoses.

Authors:  P J Haug; L Christensen; M Gundersen; B Clemons; S Koehler; K Bauer
Journal:  Proc AMIA Annu Fall Symp       Date:  1997

9.  Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports.

Authors:  N L Jain; C Friedman
Journal:  Proc AMIA Annu Fall Symp       Date:  1997

10.  Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports.

Authors:  N L Jain; C A Knirsch; C Friedman; G Hripcsak
Journal:  Proc AMIA Annu Fall Symp       Date:  1996
View more
  10 in total

1.  Domain adaptation for semantic role labeling of clinical text.

Authors:  Yaoyun Zhang; Buzhou Tang; Min Jiang; Jingqi Wang; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2015-06-10       Impact factor: 4.497

2.  The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance.

Authors:  Jeffrey P Ferraro; Ye Ye; Per H Gesteland; Peter J Haug; Fuchiang Rich Tsui; Gregory F Cooper; Rudy Van Bree; Thomas Ginter; Andrew J Nowalk; Michael Wagner
Journal:  Appl Clin Inform       Date:  2017-05-31       Impact factor: 2.342

Review 3.  Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.

Authors:  S Velupillai; D Mowery; B R South; M Kvist; H Dalianis
Journal:  Yearb Med Inform       Date:  2015-08-13

4.  Creation of a new longitudinal corpus of clinical narratives.

Authors:  Vishesh Kumar; Amber Stubbs; Stanley Shaw; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-10-01       Impact factor: 6.317

Review 5.  Evolving Role and Future Directions of Natural Language Processing in Gastroenterology.

Authors:  Fredy Nehme; Keith Feldman
Journal:  Dig Dis Sci       Date:  2020-02-27       Impact factor: 3.199

Review 6.  Natural Language Processing in Nephrology.

Authors:  Tielman T Van Vleck; Douglas Farrell; Lili Chan
Journal:  Adv Chronic Kidney Dis       Date:  2022-09       Impact factor: 4.305

7.  Use of adjectives in abstracts when reporting results of randomized, controlled trials from industry and academia.

Authors:  M Soledad Cepeda; Jesse A Berlin; Susan C Glasser; Wendy P Battisti; Martijn J Schuemie
Journal:  Drugs R D       Date:  2015-03

8.  Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus.

Authors:  Donald C Comeau; Haibin Liu; Rezarta Islamaj Doğan; W John Wilbur
Journal:  Database (Oxford)       Date:  2014-06-16       Impact factor: 3.451

9.  Ranking Medical Terms to Support Expansion of Lay Language Resources for Patient Comprehension of Electronic Health Record Notes: Adapted Distant Supervision Approach.

Authors:  Jinying Chen; Abhyuday N Jagannatha; Samah J Fodeh; Hong Yu
Journal:  JMIR Med Inform       Date:  2017-10-31

10.  CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital.

Authors:  Richard Jackson; Ismail Kartoglu; Clive Stringer; Genevieve Gorrell; Angus Roberts; Xingyi Song; Honghan Wu; Asha Agrawal; Kenneth Lui; Tudor Groza; Damian Lewsley; Doug Northwood; Amos Folarin; Robert Stewart; Richard Dobson
Journal:  BMC Med Inform Decis Mak       Date:  2018-06-25       Impact factor: 2.796

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.