| Literature DB >> 35294539 |
Devin Singh1,2,3, Sujay Nagaraj2,3, Pouria Mashouri4, Erik Drysdale2, Jason Fischer2,3, Anna Goldenberg1,2,5,6, Michael Brudno1,4,5,6.
Abstract
Importance: Increased wait times and long lengths of stay in emergency departments (EDs) are associated with poor patient outcomes. Systems to improve ED efficiency would be useful. Specifically, minimizing the time to diagnosis by developing novel workflows that expedite test ordering can help accelerate clinical decision-making. Objective: To explore the use of machine learning-based medical directives (MLMDs) to automate diagnostic testing at triage for patients with common pediatric ED diagnoses. Design, Setting, and Participants: Machine learning models trained on retrospective electronic health record data were evaluated in a decision analytical model study conducted at the ED of the Hospital for Sick Children Toronto, Canada. Data were collected on all patients aged 0 to 18 years presenting to the ED from July 1, 2018, to June 30, 2019 (77 219 total patient visits). Exposure: Machine learning models were trained to predict the need for urinary dipstick testing, electrocardiogram, abdominal ultrasonography, testicular ultrasonography, bilirubin level testing, and forearm radiographs. Main Outcomes and Measures: Models were evaluated using area under the receiver operator curve, true-positive rate, false-positive rate, and positive predictive values. Model decision thresholds were determined to limit the total number of false-positive results and achieve high positive predictive values. The time difference between patient triage completion and test ordering was assessed for each use of MLMD. Error rates were analyzed to assess model bias. In addition, model explainability was determined using Shapley Additive Explanations values.Entities:
Mesh:
Year: 2022 PMID: 35294539 PMCID: PMC8928004 DOI: 10.1001/jamanetworkopen.2022.2599
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Figure 1. Approach to Autonomously Ordering Tests in an Emergency Department (ED) Using Machine Learning Medical Directives (MLMDs)
Standard ED workflow vs MLMD augmentation of preexisting ED workflows with enabling aspects of clinical automation. With MLMDs, patients for whom the directive is activated have immediate testing ordered before being seen by a clinician. When the directive is not activated, patients proceed to the current standard of care pathway and wait for clinician assessment before testing is ordered. Overtesting can be addressed proactively by ensuring model decision thresholds yield high positive predictive values and low false-positive rates. This model threshold approach inevitably produces false-negative cases, but simultaneously allows for true automation of test ordering for a subset of patients as a result of maintaining a high positive predictive value. When false-negative cases occur, if the MLMD is not activated, the patient travels through the standard of care ED process. This dual pathway for streamlining care for patients identified by MLMDs and sending those not identified back into the typical workflow can allow for clinical automation in the ED for common presenting signs and symptoms without risking missed diagnoses or overtesting. EHR indicates electronic health record.
MLMD Summary Statistics and Model Performance
| Clinical test | Tests ordered, No. | Associated diagnoses, No. | Mean triage completion time to test order time, min | Patients with either test and/or diagnoses, No. | Estimated clinical PPV baseline | MLMD model | PPV (95% CI) | TPR (95% CI) | FPR (95% CI) | AUROC | Proportion of excess testing |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Abdominal ultrasonography | 2259 | 550 | 162.7 | 2709 | 0.11 | NN | 0.86 (0.003) | 0.10 (0.001) | 0.0006 (1.2 × 10−5) | 0.94 (0.0006) | 1.02 (0.0003) |
| RF | 0.55 (0.003) | 0.10 (0.001) | 0.003 (2.3 × 10−5) | 0.93 (0.0006) | 1.08 (0.0006) | ||||||
| LR | 0.78 (0.003) | 0.10 (0.001) | 0.001 (1.6 × 10−5) | 0.93 (0.0006) | 1.03 (0.0004) | ||||||
| ECG | 1731 | 1054 | 136.2 | 2032 | 0.44 | NN | 0.84 (0.001) | 0.60 (0.002) | 0.003 (2.5 × 10−5) | 0.96 (0.0008) | 1.12 (0.001) |
| RF | 0.76 (0.001) | 0.60 (0.002) | 0.005 (3.6 × 10−5) | 0.96 (0.0008) | 1.19 (0.001) | ||||||
| LR | 0.80 (0.001) | 0.60 (0.002) | 0.004 (2.8 × 10−5) | 0.96 (0.0008) | 1.15 (0.001) | ||||||
| Urine dipstick | 9348 | 1271 | 183.2 | 9631 | 0.11 | NN | 0.91 (0.006) | 0.30 (0.001) | 0.004 (3.4 × 10−5) | 0.88 (0.0007) | 1.03 (0.0002) |
| RF | 0.88 (0.001) | 0.30 (0.001) | 0.006 (3.3 × 10−5) | 0.91 (0.0007) | 1.04 (0.0002) | ||||||
| LR | 0.90 (0.001) | 0.30 (0.001) | 0.005 (3.0 × 10−5) | 0.89 (0.0007) | 1.03 (0.0002) | ||||||
| Testicular ultrasonography | 347 | 366 | 77.6 | 460 | 0.60 | NN | 0.88 (0.002) | 0.40 (0.003) | 0.0003 (7.2 × 10−5) | 0.99 (0.001) | 1.06 (0.001) |
| RF | 0.81 (0.003) | 0.40 (0.003) | 0.005 (8.5 × 10−5) | 0.99 (0.001) | 1.09 (0.001) | ||||||
| LR | 0.78 (0.003) | 0.40 (0.003) | 0.0006 (9.2 × 10−5) | 0.99 (0.001) | 1.11 (0.002) | ||||||
| Bilirubin level | 1321 | 217 | 131.2 | 1344 | 0.15 | NN | 0.94 (0.001) | 0.90 (0.002) | 0.001 (1.6 × 10−5) | 0.99 (0.001) | 1.06 (0.0008) |
| RF | 0.76 (0.001) | 0.90 (0.002) | 0.005 (3.0 × 10−5) | 0.99 (0.001) | 1.28 (0.0017) | ||||||
| LR | 0.89 (0.001) | 0.90 (0.002) | 0.002 (1.6 × 10−5) | 0.99 (0.001) | 1.11 (0.0008) | ||||||
| Forearm radiograph | 991 | 190 | 123.2 | 1038 | 0.14 | NN | 0.77 (0.005) | 0.10 (0002) | 0.0004 (1.0 × 10−5) | 0.98 (0.001) | 1.03 (0.0005) |
| RF | 0.73 (0.006) | 0.10 (0.002) | 0.0005 (1.5 × 10−5) | 0.98 (0.001) | 1.04 (0.0008) | ||||||
| LR | 0.66 (0.005) | 0.10 (0.002) | 0.0007 (1.0 × 10−5) | 0.98 (0.001) | 1.05 (0.0005) | ||||||
| Totals | 15 997 | 3648 | 165 Weighted mean | 17 214 | NA | NA | NA | NA | NA | NA | NA |
Abbreviations: AUROC, area under the receiver operator curve; ECG, electrocardiogram; ED, emergency department; FPR, false-positive rate; LR, logistic regression; MLMD, machine learning medical directive; NA, not applicable; NN, neural network; PPV, positive predictive value; RF, random forest; TPR, true-positive rate.
Machine learning medical directive use cases with corresponding total number of tests ordered (does not include patients who present with testing completed before ED visit, such as those transferred in with radiograph and ultrasonography imaging already done at a community site), number of patients with associated diagnoses for each use case test, the total number of patients who had either a positive test result and/or an associated diagnosis, an estimated clinical PPV baseline, time difference from triage completion to test order time, MLMD model outcome metrics (AUROC, PPV, TPR, and FPR), and percent of potential excess testing with model automation. The estimated clinical PPV is computed by totaling the number of patients who had a test ordered in the ED and the number of associated diagnoses that were made from that testing specifically. Patients with outside imaging were excluded from this analysis unless a repeat test was ordered in the ED. A negative test result can be informative by ruling out a condition. The clinical PPV baseline thus serves as an aid in the development and optimization of MLMD models but does not represent the sole benchmark for determining model success. All of the 95% CIs were generated using a bootstrap approach with 1000 resamples each.
Figure 2. Area Under the Receiver Operator Curve for Each Machine Learning–Based Directive Use Case With Corresponding Model Operating Thresholds for Top-Performing Models
Top-performing models were those with the highest positive predictive value (PPV). Neural network (NN) models obtained the highest PPVs across all use cases: abdominal ultrasonography (true-positive rate [TPR], 0.10; false-positive rate [FPR], 0.0006; PPV 0.86) (A), electrocardiogram (TPR, 0.60; FPR, 0.003; PPV, 0.84) (B), urine dipstick (TPR, 0.30; FPR, 0.004; PPV, 0.91) (C), and testicular ultrasonography (TPR, 0.40; FPR, 0.0003; PPV, 0.88) (D). The corresponding operating thresholds (gray dots) are displayed for each NN model. Model thresholds can be adjusted such that the true-positive rate is increased to capture more positive cases; however, this comes at the expense of additional false-positive results and potential for overtesting.
Figure 3. Feature Importance Assessment Using Shapley Additive Explanations (SHAP) Values
The top 20 features for each model are ranked. Blue represents low values (or 0 for a binary feature that is not present) and red high values (or 1 for a binary feature that is present). Individual patient-level explainability was also computed using SHAP values (eFigure 1 in the Supplement). CSN indicates an EHR encounter number that is ordered based on time of patient arrival; CTAS4, Canadian Triage Acuity Scale, score 4; UTI, urinary tract infection.
aConcept unique identifier coded feature input that organizes free text symptoms into higher-level groupings and does not represent the electronic health record diagnosis label, which is not used as a feature input into our models.
Figure 4. Error Analysis Stratified by Sex for Top-Performing Machine Learning–Based Directive (MLMD) Models Using Pearson χ2 Test
A, Overall false-positive rates. B, Subgroup error analysis by age for urine dipstick testing. C, Subgroup error analysis by age for abdominal ultrasonography testing. ECG indicates electrocardiogram.