Literature DB >> 33091829

Automatic classification of scanned electronic health record documents.

Heath Goodrum1, Kirk Roberts1, Elmer V Bernstam2.   

Abstract

OBJECTIVES: Electronic Health Records (EHRs) contain scanned documents from a variety of sources such as identification cards, radiology reports, clinical correspondence, and many other document types. We describe the distribution of scanned documents at one health institution and describe the design and evaluation of a system to categorize documents into clinically relevant and non-clinically relevant categories as well as further sub-classifications. Our objective is to demonstrate that text classification systems can accurately classify scanned documents.
METHODS: We extracted text using Optical Character Recognition (OCR). We then created and evaluated multiple text classification machine learning models, including both "bag of words" and deep learning approaches. We evaluated the system on three different levels of classification using both the entire document as input, as well as the individual pages of the document. Finally, we compared the effects of different text processing methods.
RESULTS: A deep learning model using ClinicalBERT performed best. This model distinguished between clinically-relevant documents and not clinically-relevant documents with an accuracy of 0.973; between intermediate sub-classifications with an accuracy of 0.949; and between individual classes with an accuracy of 0.913. DISCUSSION: Within the EHR, some document categories such as "external medical records" may contain hundreds of scanned pages without clear document boundaries. Without further sub-classification, clinicians must view every page or risk missing clinically-relevant information. Machine learning can automatically classify these scanned documents to reduce clinician burden.
CONCLUSION: Using machine learning applied to OCR-extracted text has the potential to accurately identify clinically-relevant scanned content within EHRs.
Copyright © 2020 Elsevier B.V. All rights reserved.

Keywords:  Classification; Electronic health records; Machine learning; Optical character recognition; Patient safety; Scanned documents

Mesh:

Year:  2020        PMID: 33091829      PMCID: PMC7731898          DOI: 10.1016/j.ijmedinf.2020.104302

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  14 in total

1.  A frequency-based technique to improve the spelling suggestion rank in medical queries.

Authors:  Jonathan Crowell; Qing Zeng; Long Ngo; Eve-Marie Lacroix
Journal:  J Am Med Inform Assoc       Date:  2004-02-05       Impact factor: 4.497

2.  Practice brief. Document imaging as a bridge to the EHR.

Authors:  Harry Rhodes; Michelle Dougherty
Journal:  J AHIMA       Date:  2003-06

3.  Is document imaging the right choice for your organization?

Authors:  Elizabeth Liette; Chris Meyers; Keith Olenik
Journal:  J AHIMA       Date:  2008 Nov-Dec

4.  Note on the sampling error of the difference between correlated proportions or percentages.

Authors:  Q McNEMAR
Journal:  Psychometrika       Date:  1947-06       Impact factor: 2.500

5.  A typology of electronic health record workarounds in small-to-medium size primary care practices.

Authors:  Asia Friedman; Jesse C Crosson; Jenna Howard; Elizabeth C Clark; Maria Pellerano; Ben-Tzion Karsh; Benjamin Crabtree; Carlos Roberto Jaén; Deborah J Cohen
Journal:  J Am Med Inform Assoc       Date:  2013-07-31       Impact factor: 4.497

6.  An Ensemble Method for Spelling Correction in Consumer Health Questions.

Authors:  Halil Kilicoglu; Marcelo Fiszman; Kirk Roberts; Dina Demner-Fushman
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05

7.  Dermatologist-level classification of skin cancer with deep neural networks.

Authors:  Andre Esteva; Brett Kuprel; Roberto A Novoa; Justin Ko; Susan M Swetter; Helen M Blau; Sebastian Thrun
Journal:  Nature       Date:  2017-01-25       Impact factor: 49.962

8.  CLUSTERING AND PRIORITIZING PATIENT SAFETY ISSUES DURING EHR IMPLEMENTATION AND UPGRADES IN HOSPITAL SETTINGS.

Authors:  Emily S Patterson; Shilo Anders; Susan Moffatt-Bruce
Journal:  Proc Int Symp Hum Factors Ergon Healthc       Date:  2017-05-15

9.  MIMIC-III, a freely accessible critical care database.

Authors:  Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark
Journal:  Sci Data       Date:  2016-05-24       Impact factor: 6.444

10.  Detecting and classifying lesions in mammograms with Deep Learning.

Authors:  Dezső Ribli; Anna Horváth; Zsuzsa Unger; Péter Pollner; István Csabai
Journal:  Sci Rep       Date:  2018-03-15       Impact factor: 4.379

View more
  3 in total

1.  Deep learning-based NLP data pipeline for EHR-scanned document information extraction.

Authors:  Enshuo Hsu; Ioannis Malagaris; Yong-Fang Kuo; Rizwana Sultana; Kirk Roberts
Journal:  JAMIA Open       Date:  2022-06-11

2.  Searching the PDF Haystack: Automated Knowledge Discovery in Scanned EHR Documents.

Authors:  Alexander L Kostrinsky-Thomas; Fuki M Hisama; Thomas H Payne
Journal:  Appl Clin Inform       Date:  2021-03-24       Impact factor: 2.342

3.  A Smartphone App to Increase Immunizations in the Pediatric Solid Organ Transplant Population: Development and Initial Usability Study.

Authors:  Amy G Feldman; Susan Moore; Sheana Bull; Megan A Morris; Kumanan Wilson; Cameron Bell; Margaret M Collins; Kathryn M Denize; Allison Kempe
Journal:  JMIR Form Res       Date:  2022-01-13
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.