Literature DB >> 31977252

Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma.

Joeky T Senders1,2, Logan D Cho1,3, Paola Calvachi1, John J McNulty1,4, Joanna L Ashby1, Isabelle S Schulte1, Ahmad Kareem Almekkawi1, Alireza Mehrtash5, William B Gormley1, Timothy R Smith1, Marike L D Broekman2,6, Omar Arnaout1.   

Abstract

PURPOSE: The aim of this study was to develop an open-source natural language processing (NLP) pipeline for text mining of medical information from clinical reports. We also aimed to provide insight into why certain variables or reports are more suitable for clinical text mining than others.
MATERIALS AND METHODS: Various NLP models were developed to extract 15 radiologic characteristics from free-text radiology reports for patients with glioblastoma. Ten-fold cross-validation was used to optimize the hyperparameter settings and estimate model performance. We examined how model performance was associated with quantitative attributes of the radiologic characteristics and reports.
RESULTS: In total, 562 unique brain magnetic resonance imaging reports were retrieved. NLP extracted 15 radiologic characteristics with high to excellent discrimination (area under the curve, 0.82 to 0.98) and accuracy (78.6% to 96.6%). Model performance was correlated with the inter-rater agreement of the manually provided labels (ρ = 0.904; P < .001) but not with the frequency distribution of the variables of interest (ρ = 0.179; P = .52). All variables labeled with a near perfect inter-rater agreement were classified with excellent performance (area under the curve > 0.95). Excellent performance could be achieved for variables with only 50 to 100 observations in the minority group and class imbalances up to a 9:1 ratio. Report-level classification accuracy was not associated with the number of words or the vocabulary size in the distinct text documents.
CONCLUSION: This study provides an open-source NLP pipeline that allows for text mining of narratively written clinical reports. Small sample sizes and class imbalance should not be considered as absolute contraindications for text mining in clinical research. However, future studies should report measures of inter-rater agreement whenever ground truth is based on a consensus label and use this measure to identify clinical variables eligible for text mining.

Entities:  

Mesh:

Year:  2020        PMID: 31977252     DOI: 10.1200/CCI.19.00060

Source DB:  PubMed          Journal:  JCO Clin Cancer Inform        ISSN: 2473-4276


  3 in total

1.  Comparison and interpretability of machine learning models to predict severity of chest injury.

Authors:  Sujay Kulshrestha; Dmitriy Dligach; Cara Joyce; Richard Gonzalez; Ann P O'Rourke; Joshua M Glazer; Anne Stey; Jacqueline M Kruser; Matthew M Churpek; Majid Afshar
Journal:  JAMIA Open       Date:  2021-03-01

2.  A Semiautomated Chart Review for Assessing the Development of Radiation Pneumonitis Using Natural Language Processing: Diagnostic Accuracy and Feasibility Study.

Authors:  Jordan McKenzie; Rasika Rajapakshe; Hua Shen; Shan Rajapakshe; Angela Lin
Journal:  JMIR Med Inform       Date:  2021-11-12

Review 3.  Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.

Authors:  Liwei Wang; Sunyang Fu; Andrew Wen; Xiaoyang Ruan; Huan He; Sijia Liu; Sungrim Moon; Michelle Mai; Irbaz B Riaz; Nan Wang; Ping Yang; Hua Xu; Jeremy L Warner; Hongfang Liu
Journal:  JCO Clin Cancer Inform       Date:  2022-07
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.