John D Osborne1, Matthew Wyatt2, Andrew O Westfall3, James Willig4, Steven Bethard5, Geoff Gordon6. 1. Center for Clinical and Translational Science, University of Alabama at Birmingham, Birmingham, Alabama, USA, 35294 ozborn@uab.edu. 2. Center for Clinical and Translational Science, University of Alabama at Birmingham, Birmingham, Alabama, USA, 35294. 3. Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, USA, 35294. 4. Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA, 35294. 5. Department of Computer and Information Science, University of Alabama at Birmingham, Birmingham, Alabama, USA, 35294. 6. Informatics Institute, University of Alabama at Birmingham, Birmingham, Alabama, USA, 35294.
Abstract
OBJECTIVE: To help cancer registrars efficiently and accurately identify reportable cancer cases. MATERIAL AND METHODS: The Cancer Registry Control Panel (CRCP) was developed to detect mentions of reportable cancer cases using a pipeline built on the Unstructured Information Management Architecture - Asynchronous Scaleout (UIMA-AS) architecture containing the National Library of Medicine's UIMA MetaMap annotator as well as a variety of rule-based UIMA annotators that primarily act to filter out concepts referring to nonreportable cancers. CRCP inspects pathology reports nightly to identify pathology records containing relevant cancer concepts and combines this with diagnosis codes from the Clinical Electronic Data Warehouse to identify candidate cancer patients using supervised machine learning. Cancer mentions are highlighted in all candidate clinical notes and then sorted in CRCP's web interface for faster validation by cancer registrars. RESULTS: CRCP achieved an accuracy of 0.872 and detected reportable cancer cases with a precision of 0.843 and a recall of 0.848. CRCP increases throughput by 22.6% over a baseline (manual review) pathology report inspection system while achieving a higher precision and recall. Depending on registrar time constraints, CRCP can increase recall to 0.939 at the expense of precision by incorporating a data source information feature. CONCLUSION: CRCP demonstrates accurate results when applying natural language processing features to the problem of detecting patients with cases of reportable cancer from clinical notes. We show that implementing only a portion of cancer reporting rules in the form of regular expressions is sufficient to increase the precision, recall, and speed of the detection of reportable cancer cases when combined with off-the-shelf information extraction software and machine learning.
OBJECTIVE: To help cancer registrars efficiently and accurately identify reportable cancer cases. MATERIAL AND METHODS: The Cancer Registry Control Panel (CRCP) was developed to detect mentions of reportable cancer cases using a pipeline built on the Unstructured Information Management Architecture - Asynchronous Scaleout (UIMA-AS) architecture containing the National Library of Medicine's UIMA MetaMap annotator as well as a variety of rule-based UIMA annotators that primarily act to filter out concepts referring to nonreportable cancers. CRCP inspects pathology reports nightly to identify pathology records containing relevant cancer concepts and combines this with diagnosis codes from the Clinical Electronic Data Warehouse to identify candidate cancerpatients using supervised machine learning. Cancer mentions are highlighted in all candidate clinical notes and then sorted in CRCP's web interface for faster validation by cancer registrars. RESULTS: CRCP achieved an accuracy of 0.872 and detected reportable cancer cases with a precision of 0.843 and a recall of 0.848. CRCP increases throughput by 22.6% over a baseline (manual review) pathology report inspection system while achieving a higher precision and recall. Depending on registrar time constraints, CRCP can increase recall to 0.939 at the expense of precision by incorporating a data source information feature. CONCLUSION: CRCP demonstrates accurate results when applying natural language processing features to the problem of detecting patients with cases of reportable cancer from clinical notes. We show that implementing only a portion of cancer reporting rules in the form of regular expressions is sufficient to increase the precision, recall, and speed of the detection of reportable cancer cases when combined with off-the-shelf information extraction software and machine learning.
Authors: Anthony N Nguyen; Michael J Lawley; David P Hansen; Rayleen V Bowman; Belinda E Clarke; Edwina E Duhig; Shoni Colquist Journal: J Am Med Inform Assoc Date: 2010 Jul-Aug Impact factor: 4.497
Authors: Jeff Friedlin; Marc Overhage; Mohammed A Al-Haddad; Joshua A Waters; J Juan R Aguilar-Saavedra; Joe Kesterson; Max Schmidt Journal: AMIA Annu Symp Proc Date: 2010-11-13
Authors: Anni Coden; Guergana Savova; Igor Sominsky; Michael Tanenblatt; James Masanz; Karin Schuler; James Cooper; Wei Guan; Piet C de Groen Journal: J Biomed Inform Date: 2008-12-27 Impact factor: 6.317
Authors: Gaurav Trivedi; Esmaeel R Dadashzadeh; Robert M Handzel; Wendy W Chapman; Shyam Visweswaran; Harry Hochheiser Journal: Appl Clin Inform Date: 2019-09-04 Impact factor: 2.342
Authors: John D Osborne; Adarsh Khare; Donald M Dempsey; J Michael Wells; Matt Wyatt; Geoff Gordon; Wayne H Liang; James Cimino Journal: AMIA Annu Symp Proc Date: 2018-12-05
Authors: Guergana K Savova; Ioana Danciu; Folami Alamudun; Timothy Miller; Chen Lin; Danielle S Bitterman; Georgia Tourassi; Jeremy L Warner Journal: Cancer Res Date: 2019-08-08 Impact factor: 12.701