Literature DB >> 25746391

Regular expression-based learning to extract bodyweight values from clinical notes.

Maureen A Murtaugh1, Bryan Smith Gibson2, Doug Redd3, Qing Zeng-Treitler3.   

Abstract

BACKGROUND: Bodyweight related measures (weight, height, BMI, abdominal circumference) are extremely important for clinical care, research and quality improvement. These and other vitals signs data are frequently missing from structured tables of electronic health records. However they are often recorded as text within clinical notes. In this project we sought to develop and validate a learning algorithm that would extract bodyweight related measures from clinical notes in the Veterans Administration (VA) Electronic Health Record to complement the structured data used in clinical research.
METHODS: We developed the Regular Expression Discovery Extractor (REDEx), a supervised learning algorithm that generates regular expressions from a training set. The regular expressions generated by REDEx were then used to extract the numerical values of interest. To train the algorithm we created a corpus of 268 outpatient primary care notes that were annotated by two annotators. This annotation served to develop the annotation process and identify terms associated with bodyweight related measures for training the supervised learning algorithm. Snippets from an additional 300 outpatient primary care notes were subsequently annotated independently by two reviewers to complete the training set. Inter-annotator agreement was calculated. REDEx was applied to a separate test set of 3561 notes to generate a dataset of weights extracted from text. We estimated the number of unique individuals who would otherwise not have bodyweight related measures recorded in the CDW and the number of additional bodyweight related measures that would be additionally captured.
RESULTS: REDEx's performance was: accuracy=98.3%, precision=98.8%, recall=98.3%, F=98.5%. In the dataset of weights from 3561 notes, 7.7% of notes contained bodyweight related measures that were not available as structured data. In addition 2 additional bodyweight related measures were identified per individual per year.
CONCLUSION: Bodyweight related measures are frequently stored as text in clinical notes. A supervised learning algorithm can be used to extract this data. Implications for clinical care, epidemiology, and quality improvement efforts are discussed.
Copyright © 2015 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Bodyweight; Natural language processing; Text classification

Mesh:

Year:  2015        PMID: 25746391     DOI: 10.1016/j.jbi.2015.02.009

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  9 in total

1.  Extracting Information from Electronic Medical Records to Identify the Obesity Status of a Patient Based on Comorbidities and Bodyweight Measures.

Authors:  Rosa L Figueroa; Christopher A Flores
Journal:  J Med Syst       Date:  2016-07-11       Impact factor: 4.460

2.  A method to advance adolescent sexual health research: Automated algorithm finds sexual history documentation.

Authors:  Caryn Robertson; Gargi Mukherjee; Holly Gooding; Swaminathan Kandaswamy; Evan Orenstein
Journal:  Front Digit Health       Date:  2022-07-22

Review 3.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

Authors:  Kory Kreimeyer; Matthew Foster; Abhishek Pandey; Nina Arya; Gwendolyn Halford; Sandra F Jones; Richard Forshee; Mark Walderhaug; Taxiarchis Botsis
Journal:  J Biomed Inform       Date:  2017-07-17       Impact factor: 6.317

4.  Accurate Identification of Fatty Liver Disease in Data Warehouse Utilizing Natural Language Processing.

Authors:  Joseph S Redman; Yamini Natarajan; Jason K Hou; Jingqi Wang; Muzammil Hanif; Hua Feng; Jennifer R Kramer; Roxanne Desiderio; Hua Xu; Hashem B El-Serag; Fasiha Kanwal
Journal:  Dig Dis Sci       Date:  2017-08-31       Impact factor: 3.199

Review 5.  Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0).

Authors:  Abhyuday Jagannatha; Feifan Liu; Weisong Liu; Hong Yu
Journal:  Drug Saf       Date:  2019-01       Impact factor: 5.606

6.  Regular Expression-Based Learning for METs Value Extraction.

Authors:  Douglas Redd; Jinqiu Kuang; April Mohanty; Bruce E Bray; Qing Zeng-Treitler
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2016-07-20

Review 7.  Clinical concept extraction: A methodology review.

Authors:  Sunyang Fu; David Chen; Huan He; Sijia Liu; Sungrim Moon; Kevin J Peterson; Feichen Shen; Liwei Wang; Yanshan Wang; Andrew Wen; Yiqing Zhao; Sunghwan Sohn; Hongfang Liu
Journal:  J Biomed Inform       Date:  2020-08-06       Impact factor: 6.317

Review 8.  Factors influencing the development of primary care data collection projects from electronic health records: a systematic review of the literature.

Authors:  Marie-Line Gentil; Marc Cuggia; Laure Fiquet; Camille Hagenbourger; Thomas Le Berre; Agnès Banâtre; Eric Renault; Guillaume Bouzille; Anthony Chapron
Journal:  BMC Med Inform Decis Mak       Date:  2017-09-25       Impact factor: 2.796

9.  The internal validation of weight and weight change coding using weight measurement data within the UK primary care Electronic Health Record.

Authors:  Brian D Nicholson; Paul Aveyard; Willie Hamilton; Clare R Bankhead; Constantinos Koshiaris; Sarah Stevens; Frederick Dr Hobbs; Rafael Perera
Journal:  Clin Epidemiol       Date:  2019-01-25       Impact factor: 4.790

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.