Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Towards automated generation of curated datasets in radiology: Application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism.

Literature DB >> 32135443

Towards automated generation of curated datasets in radiology: Application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism.

Thomas Weikert¹, Ivan Nesic², Joshy Cyriac², Jens Bremerich², Alexander W Sauter², Gregor Sommer², Bram Stieltjes².

Abstract

PURPOSE: To design and evaluate a self-trainable natural language processing (NLP)-based procedure to classify unstructured radiology reports. The method enabling the generation of curated datasets is exemplified on CT pulmonary angiogram (CTPA) reports.
METHOD: We extracted the impressions of CTPA reports created at our institution from 2016 to 2018 (n = 4397; language: German). The status (pulmonary embolism: yes/no) was manually labelled for all exams. Data from 2016/2017 (n = 2801) served as a ground truth to train three NLP architectures that only require a subset of reference datasets for training to be operative. The three architectures were as follows: a convolutional neural network (CNN), a support vector machine (SVM) and a random forest (RF) classifier. Impressions of 2018 (n = 1377) were kept aside and used for general performance measurements. Furthermore, we investigated the dependence of classification performance on the amount of training data with multiple simulations.
RESULTS: The classification performance of all three models was excellent (accuracies: 97 %-99 %; F1 scores 0.88-0.97; AUCs: 0.993-0.997). Highest accuracy was reached by the CNN with 99.1 % (95 % CI 98.5-99.6 %). Training with 470 labelled impressions was sufficient to reach an accuracy of > 93 % with all three NLP architectures.
CONCLUSION: Our NLP-based approaches allow for an automated and highly accurate retrospective classification of CTPA reports with manageable effort solely using unstructured impression sections. We demonstrated that this approach is useful for the classification of radiology reports not written in English. Moreover, excellent classification performance is achieved at relatively small training set sizes.

Entities: Chemical Disease

Keywords: Classification; Computed tomography angiography; Data curation; Natural language processing; Pulmonary embolism

Year: 2020 PMID： 32135443 DOI： 10.1016/j.ejrad.2020.108862

Source DB: PubMed Journal: Eur J Radiol ISSN： 0720-048X Impact factor: 3.528

Keyword Cloud
Cited

3 in total

1. The Use of BP Neural Network Algorithm and Natural Language Processing in the Impact of Social Audit on Enterprise Innovation Ability.

Authors: Jie Wang; Xiaomei Wang; Haili Wen
Journal: Comput Intell Neurosci Date: 2022-05-18

2. Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance.

Authors: A W Olthof; P M A van Ooijen; L J Cornelissen
Journal: J Med Syst Date: 2021-09-04 Impact factor: 4.460

3. Predicting pulmonary embolism among hospitalized patients with machine learning algorithms.

Authors: Logan Ryan; Jenish Maharjan; Samson Mataraso; Gina Barnes; Jana Hoffman; Qingqing Mao; Jacob Calvert; Ritankar Das
Journal: Pulm Circ Date: 2022-01-11 Impact factor: 2.886

3 in total