Literature DB >> 26534722

Scaling-up NLP Pipelines to Process Large Corpora of Clinical Notes.

G Divita1, M Carter, A Redd, Q Zeng, K Gupta, B Trautner, M Samore, A Gundlapalli.   

Abstract

INTRODUCTION: This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare".
OBJECTIVES: This paper describes the scale-up efforts at the VA Salt Lake City Health Care System to address processing large corpora of clinical notes through a natural language processing (NLP) pipeline. The use case described is a current project focused on detecting the presence of an indwelling urinary catheter in hospitalized patients and subsequent catheter-associated urinary tract infections.
METHODS: An NLP algorithm using v3NLP was developed to detect the presence of an indwelling urinary catheter in hospitalized patients. The algorithm was tested on a small corpus of notes on patients for whom the presence or absence of a catheter was already known (reference standard). In planning for a scale-up, we estimated that the original algorithm would have taken 2.4 days to run on a larger corpus of notes for this project (550,000 notes), and 27 days for a corpus of 6 million records representative of a national sample of notes. We approached scaling-up NLP pipelines through three techniques: pipeline replication via multi-threading, intra-annotator threading for tasks that can be further decomposed, and remote annotator services which enable annotator scale-out.
RESULTS: The scale-up resulted in reducing the average time to process a record from 206 milliseconds to 17 milliseconds or a 12- fold increase in performance when applied to a corpus of 550,000 notes.
CONCLUSIONS: Purposely simplistic in nature, these scale-up efforts are the straight forward evolution from small scale NLP processing to larger scale extraction without incurring associated complexities that are inherited by the use of the underlying UIMA framework. These efforts represent generalizable and widely applicable techniques that will aid other computationally complex NLP pipelines that are of need to be scaled out for processing and analyzing big data.

Entities:  

Keywords:  Natural language processing; big data; scale-up

Mesh:

Year:  2015        PMID: 26534722     DOI: 10.3414/ME14-02-0018

Source DB:  PubMed          Journal:  Methods Inf Med        ISSN: 0026-1270            Impact factor:   2.176


  8 in total

1.  Timely and Efficient AI Insights on EHR: System Design.

Authors:  Parthasarathy Suryanarayanan; Edward A Epstein; Abhishek Malvankar; Burn L Lewis; Lou DeGenaro; Jennifer J Liang; Ching-Huei Tsou; Divya Pathak
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

2.  Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies.

Authors:  Majid Afshar; Dmitriy Dligach; Brihat Sharma; Xiaoyuan Cai; Jason Boyda; Steven Birch; Daniel Valdez; Suzan Zelisko; Cara Joyce; François Modave; Ron Price
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

3.  Trie-based rule processing for clinical NLP: A use-case study of n-trie, making the ConText algorithm more efficient and scalable.

Authors:  Jianlin Shi; John F Hurdle
Journal:  J Biomed Inform       Date:  2018-08-06       Impact factor: 6.317

4.  From Sour Grapes to Low-Hanging Fruit: A Case Study Demonstrating a Practical Strategy for Natural Language Processing Portability.

Authors:  Stephen B Johnson; Prakash Adekkanattu; Thomas R Campion; James Flory; Jyotishman Pathak; Olga V Patterson; Scott L DuVall; Vincent Major; Yindalon Aphinyanaphongs
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2018-05-18

5.  Natural language processing for the surveillance of postoperative venous thromboembolism.

Authors:  Jianlin Shi; John F Hurdle; Stacy A Johnson; Jeffrey P Ferraro; David E Skarda; Samuel R G Finlayson; Matthew H Samore; Brian T Bucher
Journal:  Surgery       Date:  2021-06-03       Impact factor: 4.348

6.  Toward a Learning Health-care System - Knowledge Delivery at the Point of Care Empowered by Big Data and NLP.

Authors:  Vinod C Kaggal; Ravikumar Komandur Elayavilli; Saeed Mehrabi; Joshua J Pankratz; Sunghwan Sohn; Yanshan Wang; Dingcheng Li; Majid Mojarad Rastegar; Sean P Murphy; Jason L Ross; Rajeev Chaudhry; James D Buntrock; Hongfang Liu
Journal:  Biomed Inform Insights       Date:  2016-06-23

7.  v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text.

Authors:  Guy Divita; Marjorie E Carter; Le-Thuy Tran; Doug Redd; Qing T Zeng; Scott Duvall; Matthew H Samore; Adi V Gundlapalli
Journal:  EGEMS (Wash DC)       Date:  2016-08-11

8.  Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing.

Authors:  Jose Luis Izquierdo; Julio Ancochea; Joan B Soriano
Journal:  J Med Internet Res       Date:  2020-10-28       Impact factor: 5.428

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.