| Literature DB >> 30348771 |
Robert Lindsey1,2, Aaron Daluiski3,4, Sumit Chopra3, Alexander Lachapelle3,2, Michael Mozer3,5, Serge Sicular3,6, Douglas Hanel3,7, Michael Gardner3,8, Anurag Gupta3,9, Robert Hotchkiss3,4, Hollis Potter3,10.
Abstract
Suspected fractures are among the most common reasons for patients to visit emergency departments (EDs), and X-ray imaging is the primary diagnostic tool used by clinicians to assess patients for fractures. Missing a fracture in a radiograph often has severe consequences for patients, resulting in delayed treatment and poor recovery of function. Nevertheless, radiographs in emergency settings are often read out of necessity by emergency medicine clinicians who lack subspecialized expertise in orthopedics, and misdiagnosed fractures account for upward of four of every five reported diagnostic errors in certain EDs. In this work, we developed a deep neural network to detect and localize fractures in radiographs. We trained it to accurately emulate the expertise of 18 senior subspecialized orthopedic surgeons by having them annotate 135,409 radiographs. We then ran a controlled experiment with emergency medicine clinicians to evaluate their ability to detect fractures in wrist radiographs with and without the assistance of the deep learning model. The average clinician's sensitivity was 80.8% (95% CI, 76.7-84.1%) unaided and 91.5% (95% CI, 89.3-92.9%) aided, and specificity was 87.5% (95 CI, 85.3-89.5%) unaided and 93.9% (95% CI, 92.9-94.9%) aided. The average clinician experienced a relative reduction in misinterpretation rate of 47.0% (95% CI, 37.4-53.9%). The significant improvements in diagnostic accuracy that we observed in this study show that deep learning methods are a mechanism by which senior medical specialists can deliver their expertise to generalists on the front lines of medicine, thereby providing substantial improvements to patient care.Entities:
Keywords: CAD; X-ray; deep learning; fractures; radiology
Mesh:
Year: 2018 PMID: 30348771 PMCID: PMC6233134 DOI: 10.1073/pnas.1806905115
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.A, Left shows a typical radiograph, which is provided as an input to the model. A, Right depicts a heat map overlaid on the radiograph. When the model determines that a fracture is present, the heat map represents the model’s confidence that a particular location is part of the fracture, with yellow and blue being more and less confident, respectively. (B) Close-up views of four additional example inputs and heat map overlays.
Fig. 3.The model accurately detects the presence of visible fractures in wrist radiographs on two separate test datasets. When given a radiograph, one of the model’s outputs is a probability that the patient has a fracture visible in the radiograph. A decision threshold has to be chosen such that, for any probability value greater than the threshold, the CAD system alerts the clinician. The above curves show, for all possible values of , what the corresponding sensitivity (true positive rate) and specificity (true negative rate) of the system would be on that test dataset. The dashed black line restricts the analysis to the subset of Test Set 2, on which there was no interexpert disagreement about the presence or absence of a visible fracture (1,243 of 1,400 radiographs).
Fig. 4.Performance of the emergency medicine clinicians in the experiment. Each clinician read each radiograph first unaided (without the assistance of the model) and then aided (with the assistance of the model). The average clinician’s sensitivities were 80.8% (95% CI, 76.7–84.1%) unaided and 91.5% (95% CI, 89.3–92.9%) aided, and specificities were 87.5% (95% CI, 85.3–89.5%) unaided and 93.9% (95% CI, 92.9–94.9%) aided. The model operated at 93.9% sensitivity and 94.5% specificity (shown as the star) using a decision threshold set on the model development dataset.
Fig. 5.Each point represents a bin containing one-10th of the radiographs used in the experiment. The horizontal location of a point indicates the median unaided response time in seconds for the radiographs within the bin. The vertical location of a point indicates the across-clinician average diagnostic accuracy on the radiographs within the bin. The difference in accuracy between the aided and unaided reading conditions increases with unaided reading time, which is a proxy for the radiograph’s difficulty. The dashed horizontal black line indicates the accuracy that a clinician would have achieved had he or she reported “no fracture” on every radiograph. The aided reading condition never has an average accuracy worse than baseline guessing.