| Literature DB >> 35972739 |
Victor M Torres-Lopez1, Grace E Rovenolt1, Angelo J Olcese1, Gabriella E Garcia1, Sarah M Chacko1, Amber Robinson1, Edward Gaiser1, Julian Acosta1, Alison L Herman1, Lindsey R Kuohn1, Megan Leary1, Alexandria L Soto1, Qiang Zhang1, Safoora Fatima2, Guido J Falcone1, M Seyedmehdi Payabvash3, Richa Sharma1, Aaron F Struck2,4, Kevin N Sheth1, M Brandon Westover5, Jennifer A Kim1.
Abstract
Importance: Clinical text reports from head computed tomography (CT) represent rich, incompletely utilized information regarding acute brain injuries and neurologic outcomes. CT reports are unstructured; thus, extracting information at scale requires automated natural language processing (NLP). However, designing new NLP algorithms for each individual injury category is an unwieldy proposition. An NLP tool that summarizes all injuries in head CT reports would facilitate exploration of large data sets for clinical significance of neuroradiological findings. Objective: To automatically extract acute brain pathological data and their features from head CT reports. Design, Setting, and Participants: This diagnostic study developed a 2-part named entity recognition (NER) NLP model to extract and summarize data on acute brain injuries from head CT reports. The model, termed BrainNERD, extracts and summarizes detailed brain injury information for research applications. Model development included building and comparing 2 NER models using a custom dictionary of terms, including lesion type, location, size, and age, then designing a rule-based decoder using NER outputs to evaluate for the presence or absence of injury subtypes. BrainNERD was evaluated against independent test data sets of manually classified reports, including 2 external validation sets. The model was trained on head CT reports from 1152 patients generated by neuroradiologists at the Yale Acute Brain Injury Biorepository. External validation was conducted using reports from 2 outside institutions. Analyses were conducted from May 2020 to December 2021. Main Outcomes and Measures: Performance of the BrainNERD model was evaluated using precision, recall, and F1 scores based on manually labeled independent test data sets.Entities:
Mesh:
Year: 2022 PMID: 35972739 PMCID: PMC9382443 DOI: 10.1001/jamanetworkopen.2022.27109
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Figure 1. Named Entity Recognition Model Workflow
Figure 2. Graphic Representation of Text Input to Searchable Output Vector
Step 1 of the process is to input unstructured text from a head computed tomography report; step 2, identify clinically relevant injury terms present in the text using the named entity recognition output in long-data format; step 3, identify pertinent injury, negation, uncertainty and property terms from the long data; step 4, produce an output summary of these terms.
Figure 3. Named Entity Recognition Category Tree and Output Visualization
A, The named entity recognition (NER) category tree contains all the entities in the model. The entities are divided into 5 major categories for better conceptualization and visualization of the entities. B, Example of an output visualization of the 2-part NLP model using NER and a rule-based decoder (BrainNERD) on actual head computed tomography report.
NER Performance Metrics Compared With Expert Review and External Validation Data Sets
| Validation | 10 000 Cross validation, mean (range), % | % (95% CI) | |||
|---|---|---|---|---|---|
| Final model | Expert review | External data set | |||
| MGH | UW | ||||
|
| |||||
| Precision | 98.09 (93.62-99.31) | 98.82 (98.37-98.93) | 99.06 (97.89-99.13) | 98.51 (97.91-98.89) | 96.31 (95.39-96.91) |
| Recall | 98.12 (93.46-99.93) | 98.81 (98.46-99.06) | 98.10 (97.93-98.77) | 98.40 (97.89-98.63) | 96.87 (95.65-97.19) |
| 98.1X (95.13-99.41) | 98.81 (98.40-98.94) | 98.57 (97.78-99.10) | 98.95 (98.42-99.08) | 96.59 (95.48-97.13) | |
|
| |||||
| Precision | NA | 97.50 (97.49-97.52) | 98.60 (98.58-98.63) | 99.16 (99.14-99.19) | 97.71 (97.70-97.74) |
| Recall | NA | 99.32 (99.29-99.34) | 99.10 (99.06-99.13) | 98.75 (98.73-98.78) | 98.70 (98.67-98.73) |
| NA | 98.40 (98.39-98.43) | 98.83 (98.82-98.87) | 98.95 (98.91-98.96) | 98.20 (97.17-99.21) | |
Abbreviations: MGH, Massachusetts General Hospital; NA, not applicable; NER, named entity recognition; UW, University of Wisconsin.
Figure 4. Example Applications for the BrainNERD Model Outputs