Kabir Yadav1, Efsun Sarioglu2, Hyeong Ah Choi3, Walter B Cartwright4, Pamela S Hinds5, James M Chamberlain6. 1. Department of Emergency Medicine, Harbor-UCLA Medical Center, Torrance, CA. 2. Computer Science Department, Portland State University, Portland, OR. 3. Computer Science Department, The George Washington University, Washington, DC. 4. Howard University School of Medicine, Washington, DC. 5. Children's Research Institute, Washington, DC. 6. Division of Emergency Medicine, Children's National Health System, Washington, DC.
Abstract
BACKGROUND: The authors have previously demonstrated highly reliable automated classification of free-text computed tomography (CT) imaging reports using a hybrid system that pairs linguistic (natural language processing) and statistical (machine learning) techniques. Previously performed for identifying the outcome of orbital fracture in unprocessed radiology reports from a clinical data repository, the performance has not been replicated for more complex outcomes. OBJECTIVES: To validate automated outcome classification performance of a hybrid natural language processing (NLP) and machine learning system for brain CT imaging reports. The hypothesis was that our system has performance characteristics for identifying pediatric traumatic brain injury (TBI). METHODS: This was a secondary analysis of a subset of 2,121 CT reports from the Pediatric Emergency Care Applied Research Network (PECARN) TBI study. For that project, radiologists dictated CT reports as free text, which were then deidentified and scanned as PDF documents. Trained data abstractors manually coded each report for TBI outcome. Text was extracted from the PDF files using optical character recognition. The data set was randomly split evenly for training and testing. Training patient reports were used as input to the Medical Language Extraction and Encoding (MedLEE) NLP tool to create structured output containing standardized medical terms and modifiers for negation, certainty, and temporal status. A random subset stratified by site was analyzed using descriptive quantitative content analysis to confirm identification of TBI findings based on the National Institute of Neurological Disorders and Stroke (NINDS) Common Data Elements project. Findings were coded for presence or absence, weighted by frequency of mentions, and past/future/indication modifiers were filtered. After combining with the manual reference standard, a decision tree classifier was created using data mining tools WEKA 3.7.5 and Salford Predictive Miner 7.0. Performance of the decision tree classifier was evaluated on the test patient reports. RESULTS: The prevalence of TBI in the sampled population was 159 of 2,217 (7.2%). The automated classification for pediatric TBI is comparable to our prior results, with the notable exception of lower positive predictive value. Manual review of misclassified reports, 95.5% of which were false-positives, revealed that a sizable number of false-positive errors were due to differing outcome definitions between NINDS TBI findings and PECARN clinical important TBI findings and report ambiguity not meeting definition criteria. CONCLUSIONS: A hybrid NLP and machine learning automated classification system continues to show promise in coding free-text electronic clinical data. For complex outcomes, it can reliably identify negative reports, but manual review of positive reports may be required. As such, it can still streamline data collection for clinical research and performance improvement.
BACKGROUND: The authors have previously demonstrated highly reliable automated classification of free-text computed tomography (CT) imaging reports using a hybrid system that pairs linguistic (natural language processing) and statistical (machine learning) techniques. Previously performed for identifying the outcome of orbital fracture in unprocessed radiology reports from a clinical data repository, the performance has not been replicated for more complex outcomes. OBJECTIVES: To validate automated outcome classification performance of a hybrid natural language processing (NLP) and machine learning system for brain CT imaging reports. The hypothesis was that our system has performance characteristics for identifying pediatric traumatic brain injury (TBI). METHODS: This was a secondary analysis of a subset of 2,121 CT reports from the Pediatric Emergency Care Applied Research Network (PECARN) TBI study. For that project, radiologists dictated CT reports as free text, which were then deidentified and scanned as PDF documents. Trained data abstractors manually coded each report for TBI outcome. Text was extracted from the PDF files using optical character recognition. The data set was randomly split evenly for training and testing. Training patient reports were used as input to the Medical Language Extraction and Encoding (MedLEE) NLP tool to create structured output containing standardized medical terms and modifiers for negation, certainty, and temporal status. A random subset stratified by site was analyzed using descriptive quantitative content analysis to confirm identification of TBI findings based on the National Institute of Neurological Disorders and Stroke (NINDS) Common Data Elements project. Findings were coded for presence or absence, weighted by frequency of mentions, and past/future/indication modifiers were filtered. After combining with the manual reference standard, a decision tree classifier was created using data mining tools WEKA 3.7.5 and Salford Predictive Miner 7.0. Performance of the decision tree classifier was evaluated on the test patient reports. RESULTS: The prevalence of TBI in the sampled population was 159 of 2,217 (7.2%). The automated classification for pediatric TBI is comparable to our prior results, with the notable exception of lower positive predictive value. Manual review of misclassified reports, 95.5% of which were false-positives, revealed that a sizable number of false-positive errors were due to differing outcome definitions between NINDS TBI findings and PECARN clinical important TBI findings and report ambiguity not meeting definition criteria. CONCLUSIONS: A hybrid NLP and machine learning automated classification system continues to show promise in coding free-text electronic clinical data. For complex outcomes, it can reliably identify negative reports, but manual review of positive reports may be required. As such, it can still streamline data collection for clinical research and performance improvement.
Authors: Sheng Yu; Kanako K Kumamaru; Elizabeth George; Ruth M Dunne; Arash Bedayat; Matey Neykov; Andetta R Hunsaker; Karin E Dill; Tianxi Cai; Frank J Rybicki Journal: J Biomed Inform Date: 2014-08-10 Impact factor: 6.317
Authors: Nathan Kuppermann; James F Holmes; Peter S Dayan; John D Hoyle; Shireen M Atabaki; Richard Holubkov; Frances M Nadel; David Monroe; Rachel M Stanley; Dominic A Borgialli; Mohamed K Badawy; Jeff E Schunk; Kimberly S Quayle; Prashant Mahajan; Richard Lichenstein; Kathleen A Lillis; Michael G Tunik; Elizabeth S Jacobs; James M Callahan; Marc H Gorelick; Todd F Glass; Lois K Lee; Michael C Bachman; Arthur Cooper; Elizabeth C Powell; Michael J Gerardi; Kraig A Melville; J Paul Muizelaar; David H Wisner; Sally Jo Zuspan; J Michael Dean; Sandra L Wootton-Gorges Journal: Lancet Date: 2009-09-14 Impact factor: 79.321
Authors: Margaret Mahan; Daniel Rafter; Hannah Casey; Marta Engelking; Tessneem Abdallah; Charles Truwit; Mark Oswood; Uzma Samadani Journal: PLoS One Date: 2020-07-01 Impact factor: 3.240
Authors: Wei-Hung Weng; Kavishwar B Wagholikar; Alexa T McCray; Peter Szolovits; Henry C Chueh Journal: BMC Med Inform Decis Mak Date: 2017-12-01 Impact factor: 2.796
Authors: Arlene Casey; Emma Davidson; Michael Poon; Hang Dong; Daniel Duma; Andreas Grivas; Claire Grover; Víctor Suárez-Paniagua; Richard Tobin; William Whiteley; Honghan Wu; Beatrice Alex Journal: BMC Med Inform Decis Mak Date: 2021-06-03 Impact factor: 2.796