| Literature DB >> 30646240 |
Maxwell Taggart1, Wendy W Chapman1, Benjamin A Steinberg2, Shane Ruckel3, Arianna Pregenzer-Wenzler3, Yishuai Du1, Jeffrey Ferraro1, Brian T Bucher4, Donald M Lloyd-Jones5, Matthew T Rondina3,6,7, Rashmee U Shah2.
Abstract
Importance: To improve patient safety, health care systems need reliable methods to detect adverse events in large patient populations. Events are often described in clinical notes, rather than structured data, which make them difficult to identify on a large scale. Objective: To develop and compare 2 natural language processing methods, a rules-based approach and a machine learning (ML) approach, for identifying bleeding events in clinical notes. Design, Setting, and Participants: This diagnostic study used deidentified notes from the Medical Information Mart for Intensive Care, which spans 2001 to 2012. A training set of 990 notes and a test set of 660 notes were randomly selected. Physicians classified each note as present or absent for a clinically relevant bleeding event during the hospitalization. A bleeding dictionary was developed for the rules-based approach; bleeding mentions were then aggregated to arrive at a classification for each note. Three ML models (support vector machine, extra trees, and convolutional neural network) were developed and trained using the 990-note training set. Another instance of each ML model was also trained on a sample of 450 notes, with equal numbers of bleeding-present and bleeding-absent notes. The notes were represented using term frequency-inverse document frequency vectors and global vectors for word representation. Main Outcomes and Measures: The main outcomes were accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for each model. Following training, the models were tested on the test set and sensitivities were compared using a McNemar test.Entities:
Mesh:
Year: 2018 PMID: 30646240 PMCID: PMC6324448 DOI: 10.1001/jamanetworkopen.2018.3451
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Patient Characteristics for Training and Testing Sets
| Characteristic | Training Set | Test Set |
|---|---|---|
| Notes, No. | 990 | 660 |
| Unique patients, No. | 769 | 527 |
| Female, No. (%) | 296 (38.5) | 211 (40.0) |
| Age, mean (SD), y | 67.42 (14.7) | 67.86 (14.7) |
| Bleeding present, % | 22.5 | 22.1 |
Figure 1. Frequency of Bleeding-Related Terms Automatically Identified by pyConText in 120 Clinical Notes
Terms that are frequently used by clinicians to describe bleeding in clinical notes were identified during the knowledge generation phase and then used as target terms in the natural language processing algorithm. BRBPR indicates bright red blood per rectum; GIB, gastrointestinal bleeding; ICH, intracerebral hemorrhage; SAH, subarachnoid hemorrhage; and SDH, subdural hematoma.
Figure 2. Receiver Operating Characteristic (ROC) Curves for Identifying Clinically Relevant Bleeding From Clinical Notes Using a Rules-Based Approach
We evaluated 3 natural language processing–derived parameters to identify notes with clinically relevant bleeding in the training data for the total number of mentions (A), the number of bleeding-present mentions (B), and the number of bleeding-absent mentions (C). At least 1 absent or present reference to bleeding, identified by the algorithm, had almost 100% sensitivity for identifying notes with clinically relevant bleeding, but had poor specificity. At least 1 present reference to bleeding was 93% sensitive for identifying notes with clinically relevant bleeding, with greater than 70% specificity.
Figure 3. Test Characteristics for Different Computer Algorithms to Automatically Identify Clinically Relevant Bleeding
Model names that include -DS (down-sampled) were trained on a note set with an equal number of present and absent events (n = 450) and then tested on the full 660-note test set. Those that include -FS (full sample) were trained on the full training set (n = 990) and then tested on the full 660-note test set. CNN indicates convolutional neural network; ET, extra trees; RB, rules based; SVM, support vector machine.
Model Performance on the Training Set and Test Set for Identifying Clinically Relevant Bleeding
| Model | Accuracy | Sensitivity | Positive Predictive Value | Negative Predictive Value | Specificity | |
|---|---|---|---|---|---|---|
| SVM-DS | 0.992 | 0.893 | 0.948 | 0.920 | 0.899 | 0.951 |
| ET-DS | 0.878 | 0.996 | 0.806 | 0.891 | 0.994 | 0.760 |
| CNN-DS | 0.924 | 0.929 | 0.921 | 0.925 | 0.928 | 0.920 |
| SVM-FS | 0.928 | 0.747 | 0.923 | 0.826 | 0.929 | 0.982 |
| ET-FS | 0.847 | 0.978 | 0.601 | 0.745 | 0.992 | 0.809 |
| CNN-FS | 0.892 | 0.724 | 0.784 | 0.753 | 0.921 | 0.941 |
| Rules-based | 0.894 | 0.924 | 0.703 | 0.798 | 0.976 | 0.885 |
| SVM-DS | 0.818 | 0.849 | 0.559 | 0.674 | 0.950 | 0.809 |
| ET-DS | 0.767 | 0.938 | 0.486 | 0.640 | 0.976 | 0.718 |
| CNN-DS | 0.582 | 0.616 | 0.290 | 0.395 | 0.840 | 0.572 |
| SVM-FS | 0.871 | 0.712 | 0.707 | 0.710 | 0.918 | 0.916 |
| ET-FS | 0.800 | 0.815 | 0.531 | 0.643 | 0.938 | 0.796 |
| CNN-FS | 0.679 | 0.356 | 0.306 | 0.329 | 0.808 | 0.770 |
| Rules-based | 0.861 | 0.911 | 0.627 | 0.743 | 0.971 | 0.846 |
Abbreviations: CNN, convolutional neural network; DS, down-sampled (the model was trained on a note set with an equal number of present and absent events [n = 450] and then tested on the full 660-note test set); ET, extra trees; FS, full sample (the model was trained on the full training set [n = 990] and then tested on the full 660-note test set); SVM, support vector machine.
F score is defined as the harmonic mean of the sensitivity and positive predictive value.
Highest-performing model for each metric.