Literature DB >> 30092358

Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable.

Jianlin Shi1, John F Hurdle2.   

Abstract

OBJECTIVE: To develop and evaluate an efficient Trie structure for large-scale, rule-based clinical natural language processing (NLP), which we call n-trie.
BACKGROUND: Despite the popularity of machine learning techniques in natural language processing, rule-based systems boast important advantages: distinctive transparency, ease of incorporating external knowledge, and less demanding annotation requirements. However, processing efficiency remains a major obstacle for adopting standard rule-base NLP solutions in big data analyses.
METHODS: We developed n-trie to specifically address the token-based nature of context detection, an important facet of clinical NLP that is known to slow down NLP pipelines. N-trie, a new rule processing engine using a revised Trie structure, allows fast execution of lexicon-based NLP rules. To determine its applicability and evaluate its performance, we applied the n-trie engine in an implementation (called FastContext) of the ConText algorithm and compared its processing speed and accuracy with JavaConText and GeneralConText, two widely used Java ConText implementations, as well as with a standalone machine learning NegEx implementation, NegScope.
RESULTS: The n-trie engine ran two orders of magnitude faster and was far less sensitive to rule set size than the comparison implementations, and it proved faster than the best machine learning negation detector. Additionally, the engine consistently gained accuracy improvement as the rule set increased (the desired outcome of adding new rules), while the other implementations did not.
CONCLUSIONS: The n-trie engine is an efficient, scalable engine to support NLP rule processing and shows the potential for application in other NLP tasks beyond context detection.
Copyright © 2018 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Algorithms; Data accuracy; Medical informatics applications; Natural language processing

Mesh:

Year:  2018        PMID: 30092358      PMCID: PMC6171746          DOI: 10.1016/j.jbi.2018.08.002

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  15 in total

1.  A simple algorithm for identifying negated findings and diseases in discharge summaries.

Authors:  W W Chapman; W Bridewell; P Hanbury; G F Cooper; B G Buchanan
Journal:  J Biomed Inform       Date:  2001-10       Impact factor: 6.317

2.  Extracting medication information from clinical text.

Authors:  Ozlem Uzuner; Imre Solti; Eithon Cadag
Journal:  J Am Med Inform Assoc       Date:  2010 Sep-Oct       Impact factor: 4.497

3.  Biomedical negation scope detection with conditional random fields.

Authors:  Shashank Agarwal; Hong Yu
Journal:  J Am Med Inform Assoc       Date:  2010 Nov-Dec       Impact factor: 4.497

4.  Scaling-up NLP Pipelines to Process Large Corpora of Clinical Notes.

Authors:  G Divita; M Carter; A Redd; Q Zeng; K Gupta; B Trautner; M Samore; A Gundlapalli
Journal:  Methods Inf Med       Date:  2015-11-04       Impact factor: 2.176

Review 5.  Extracting information from textual documents in the electronic health record: a review of recent research.

Authors:  S M Meystre; G K Savova; K C Kipper-Schuler; J F Hurdle
Journal:  Yearb Med Inform       Date:  2008

6.  Launching HITECH.

Authors:  David Blumenthal
Journal:  N Engl J Med       Date:  2009-12-30       Impact factor: 91.245

Review 7.  "Big data" and the electronic health record.

Authors:  M K Ross; W Wei; L Ohno-Machado
Journal:  Yearb Med Inform       Date:  2014-08-15

Review 8.  Mining electronic health records: towards better research applications and clinical care.

Authors:  Peter B Jensen; Lars J Jensen; Søren Brunak
Journal:  Nat Rev Genet       Date:  2012-05-02       Impact factor: 53.242

9.  ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports.

Authors:  Henk Harkema; John N Dowling; Tyler Thornblade; Wendy W Chapman
Journal:  J Biomed Inform       Date:  2009-05-10       Impact factor: 6.317

10.  Document Sublanguage Clustering to Detect Medical Specialty in Cross-institutional Clinical Texts.

Authors:  Kristina Doing-Harris; Olga Patterson; Sean Igo; John Hurdle
Journal:  Proc ACM Int Workshop Data Text Min Biomed Inform       Date:  2013 Oct-Nov
View more
  8 in total

1.  Facilitating information extraction without annotated data using unsupervised and positive-unlabeled learning.

Authors:  Zfania Tom Korach; Sharmitha Yerneni; Jonathan Einbinder; Carl Kallenberg; Li Zhou
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

2.  Using Natural Language Processing to improve EHR Structured Data-based Surgical Site Infection Surveillance.

Authors:  Jianlin Shi; Siru Liu; Liese C C Pruitt; Carolyn L Luppens; Jeffrey P Ferraro; Adi V Gundlapalli; Wendy W Chapman; Brian T Bucher
Journal:  AMIA Annu Symp Proc       Date:  2020-03-04

3.  Determination of Marital Status of Patients from Structured and Unstructured Electronic Healthcare Data.

Authors:  Brian T Bucher; Jianlin Shi; Robert John Pettit; Jeffrey Ferraro; Wendy W Chapman; Adi Gundlapalli
Journal:  AMIA Annu Symp Proc       Date:  2020-03-04

4.  Deep Learning from Incomplete Data: Detecting Imminent Risk of Hospital-acquired Pneumonia in ICU Patients.

Authors:  Travis R Goodwin; Dina Demner-Fushman
Journal:  AMIA Annu Symp Proc       Date:  2020-03-04

5.  Extraction of Treatment Information From Electronic Health Records and Evaluation of Testosterone Recovery in Patients With Prostate Cancer.

Authors:  Sunny Guin; Tomi Jun; Vaibhav G Patel; Kristin L Ayers; Matthew Deitz; Yuqin Cai; Xiang Zhou; Che-Kai Tsao; William K Oh; Rong Chen; Bobby C Liaw
Journal:  JCO Clin Cancer Inform       Date:  2022-06

6.  A customizable deep learning model for nosocomial risk prediction from critical care notes with indirect supervision.

Authors:  Travis R Goodwin; Dina Demner-Fushman
Journal:  J Am Med Inform Assoc       Date:  2020-04-01       Impact factor: 4.497

7.  Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach.

Authors:  Jianlin Shi; Keaton L Morgan; Richard L Bradshaw; Se-Hee Jung; Wendy Kohlmann; Kimberly A Kaphingst; Kensaku Kawamoto; Guilherme Del Fiol
Journal:  JMIR Med Inform       Date:  2022-08-11

8.  Portable Automated Surveillance of Surgical Site Infections Using Natural Language Processing: Development and Validation.

Authors:  Brian T Bucher; Jianlin Shi; Jeffrey P Ferraro; David E Skarda; Matthew H Samore; John F Hurdle; Adi V Gundlapalli; Wendy W Chapman; Samuel R G Finlayson
Journal:  Ann Surg       Date:  2020-10       Impact factor: 13.787

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.