Literature DB >> 25661593

Domain adaption of parsing for operative notes.

Yan Wang1, Serguei Pakhomov2, James O Ryan1, Genevieve B Melton3.   

Abstract

BACKGROUND: Full syntactic parsing of clinical text as a part of clinical natural language processing (NLP) is critical for a wide range of applications. Several robust syntactic parsers are publicly available to produce linguistic representations for sentences. However, these existing parsers are mostly trained on general English text and may require adaptation for optimal performance on clinical text. Our objective was to adapt an existing general English parser for the clinical text of operative reports via lexicon augmentation, statistics adjusting, and grammar rules modification based on operative reports.
METHOD: The Stanford unlexicalized probabilistic context-free grammar (PCFG) parser lexicon was expanded with SPECIALIST lexicon along with statistics collected from a limited set of operative notes tagged by two POS taggers (GENIA tagger and MedPost). The most frequently occurring verb entries of the SPECIALIST lexicon were adjusted based on manual review of verb usage in operative notes. Stanford parser grammar production rules were also modified based on linguistic features of operative reports. An analogous approach was then applied to the GENIA corpus to test the generalizability of this approach to biologic text.
RESULTS: The new unlexicalized PCFG parser extended with the extra lexicon from SPECIALIST along with accurate statistics collected from an operative note corpus tagged with GENIA POS tagger improved the F-score by 2.26% from 87.64% to 89.90%. There was a progressive improvement with the addition of multiple approaches. Lexicon augmentation combined with statistics from the operative notes corpus provided the greatest improvement of parser performance. Application of this approach on the GENIA corpus increased the F-score by 3.81% with a simple new grammar and addition of the GENIA corpus lexicon.
CONCLUSION: Using statistics collected from clinical text tagged with POS taggers along with proper modification of grammars and lexicons of an unlexicalized PCFG parser may improve parsing performance of existing parsers on specialized clinical text.
Copyright © 2015. Published by Elsevier Inc.

Entities:  

Keywords:  Natural language processing; Operative reports; Parser adaption; Probabilistic context-free grammar (PCFG); SPECIALIST; Unlexicalized parser

Mesh:

Year:  2015        PMID: 25661593      PMCID: PMC4764060          DOI: 10.1016/j.jbi.2015.01.016

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  13 in total

1.  Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon.

Authors:  Yang Huang; Henry J Lowe; Dan Klein; Russell J Cucina
Journal:  J Am Med Inform Assoc       Date:  2005-01-31       Impact factor: 4.497

2.  Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach.

Authors:  Fabio Rinaldi; Gerold Schneider; Kaarel Kaljurand; Michael Hess; Christos Andronis; Ourania Konstandi; Andreas Persidis
Journal:  Artif Intell Med       Date:  2006-10-18       Impact factor: 5.326

3.  Porting a lexicalized-grammar parser to the biomedical domain.

Authors:  Laura Rimell; Stephen Clark
Journal:  J Biomed Inform       Date:  2008-12-25       Impact factor: 6.317

4.  Relation mining experiments in the pharmacogenomics domain.

Authors:  Fabio Rinaldi; Gerold Schneider; Simon Clematide
Journal:  J Biomed Inform       Date:  2012-05-10       Impact factor: 6.317

5.  Extracting causal relations on HIV drug resistance from literature.

Authors:  Quoc-Chinh Bui; Breanndán O Nualláin; Charles A Boucher; Peter M A Sloot
Journal:  BMC Bioinformatics       Date:  2010-02-23       Impact factor: 3.169

6.  A study of actions in operative notes.

Authors:  Yan Wang; Serguei Pakhomov; Nora E Burkart; James O Ryan; Genevieve B Melton
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

7.  Combining active learning and semi-supervised learning techniques to extract protein interaction sentences.

Authors:  Min Song; Hwanjo Yu; Wook-Shin Han
Journal:  BMC Bioinformatics       Date:  2011-11-24       Impact factor: 3.169

8.  An environment for relation mining over richly annotated corpora: the case of GENIA.

Authors:  Fabio Rinaldi; Gerold Schneider; Kaarel Kaljurand; Michael Hess; Martin Romacker
Journal:  BMC Bioinformatics       Date:  2006-11-24       Impact factor: 3.169

9.  Benchmarking natural-language parsers for biological applications using dependency graphs.

Authors:  Andrew B Clegg; Adrian J Shepherd
Journal:  BMC Bioinformatics       Date:  2007-01-25       Impact factor: 3.169

10.  Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches.

Authors:  Sampo Pyysalo; Tapio Salakoski; Sophie Aubin; Adeline Nazarenko
Journal:  BMC Bioinformatics       Date:  2006-11-24       Impact factor: 3.169

View more
  4 in total

1.  Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features.

Authors:  Yaoyun Zhang; Min Jiang; Jingqi Wang; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

2.  Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest.

Authors:  A Névéol; P Zweigenbaum
Journal:  Yearb Med Inform       Date:  2016-11-10

3.  Parsing clinical text using the state-of-the-art deep learning based parsers: a systematic comparison.

Authors:  Yaoyun Zhang; Firat Tiryaki; Min Jiang; Hua Xu
Journal:  BMC Med Inform Decis Mak       Date:  2019-04-04       Impact factor: 2.796

4.  Text Mining the History of Medicine.

Authors:  Paul Thompson; Riza Theresa Batista-Navarro; Georgios Kontonatsios; Jacob Carter; Elizabeth Toon; John McNaught; Carsten Timmermann; Michael Worboys; Sophia Ananiadou
Journal:  PLoS One       Date:  2016-01-06       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.