A F Voter1, M E Larson2, J W Garrett2, J-P J Yu3,4,5. 1. School of Medicine and Public Health (A.F.V.), University of Wisconsin-Madison, Madison, Wisconsin. 2. Department of Radiology (M.E.L., J.W.G., J.-P.J.Y.), University of Wisconsin-Madison, Madison, Wisconsin. 3. Department of Radiology (M.E.L., J.W.G., J.-P.J.Y.), University of Wisconsin-Madison, Madison, Wisconsin jpyu@uwhealth.org. 4. Department of Biomedical Engineering (J.-P.J.Y.), College of Engineering, University of Wisconsin-Madison, Madison, Wisconsin. 5. Department of Psychiatry (J.-P.J.Y.), University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin.
Abstract
BACKGROUND AND PURPOSE: Artificial intelligence decision support systems are a rapidly growing class of tools to help manage ever-increasing imaging volumes. The aim of this study was to evaluate the performance of an artificial intelligence decision support system, Aidoc, for the detection of cervical spinal fractures on noncontrast cervical spine CT scans and to conduct a failure mode analysis to identify areas of poor performance. MATERIALS AND METHODS: This retrospective study included 1904 emergent noncontrast cervical spine CT scans of adult patients (60 [SD, 22] years, 50.3% men). The presence of cervical spinal fracture was determined by Aidoc and an attending neuroradiologist; discrepancies were independently adjudicated. Algorithm performance was assessed by calculation of the diagnostic accuracy, and a failure mode analysis was performed. RESULTS: Aidoc and the neuroradiologist's interpretation were concordant in 91.5% of cases. Aidoc correctly identified 67 of 122 fractures (54.9%) with 106 false-positive flagged studies. Diagnostic performance was calculated as the following: sensitivity, 54.9% (95% CI, 45.7%-63.9%); specificity, 94.1% (95% CI, 92.9%-95.1%); positive predictive value, 38.7% (95% CI, 33.1%-44.7%); and negative predictive value, 96.8% (95% CI, 96.2%-97.4%). Worsened performance was observed in the detection of chronic fractures; differences in diagnostic performance were not altered by study indication or patient characteristics. CONCLUSIONS: We observed poor diagnostic accuracy of an artificial intelligence decision support system for the detection of cervical spine fractures. Many similar algorithms have also received little or no external validation, and this study raises concerns about their generalizability, utility, and rapid pace of deployment. Further rigorous evaluations are needed to understand the weaknesses of these tools before widespread implementation.
BACKGROUND AND PURPOSE: Artificial intelligence decision support systems are a rapidly growing class of tools to help manage ever-increasing imaging volumes. The aim of this study was to evaluate the performance of an artificial intelligence decision support system, Aidoc, for the detection of cervical spinal fractures on noncontrast cervical spine CT scans and to conduct a failure mode analysis to identify areas of poor performance. MATERIALS AND METHODS: This retrospective study included 1904 emergent noncontrast cervical spine CT scans of adult patients (60 [SD, 22] years, 50.3% men). The presence of cervical spinal fracture was determined by Aidoc and an attending neuroradiologist; discrepancies were independently adjudicated. Algorithm performance was assessed by calculation of the diagnostic accuracy, and a failure mode analysis was performed. RESULTS: Aidoc and the neuroradiologist's interpretation were concordant in 91.5% of cases. Aidoc correctly identified 67 of 122 fractures (54.9%) with 106 false-positive flagged studies. Diagnostic performance was calculated as the following: sensitivity, 54.9% (95% CI, 45.7%-63.9%); specificity, 94.1% (95% CI, 92.9%-95.1%); positive predictive value, 38.7% (95% CI, 33.1%-44.7%); and negative predictive value, 96.8% (95% CI, 96.2%-97.4%). Worsened performance was observed in the detection of chronic fractures; differences in diagnostic performance were not altered by study indication or patient characteristics. CONCLUSIONS: We observed poor diagnostic accuracy of an artificial intelligence decision support system for the detection of cervical spine fractures. Many similar algorithms have also received little or no external validation, and this study raises concerns about their generalizability, utility, and rapid pace of deployment. Further rigorous evaluations are needed to understand the weaknesses of these tools before widespread implementation.
Authors: Adam L Sharp; Brian Z Huang; Tania Tang; Ernest Shen; Edward R Melnick; Arjun K Venkatesh; Michael H Kanter; Michael K Gould Journal: Ann Emerg Med Date: 2017-07-21 Impact factor: 5.721
Authors: Peter E Fischer; Debra G Perina; Theodore R Delbridge; Mary E Fallat; Jeffrey P Salomone; Jimm Dodd; Eileen M Bulger; Mark L Gestring Journal: Prehosp Emerg Care Date: 2018-08-09 Impact factor: 3.077
Authors: Nicholas M Beckmann; O Clark West; Diego Nunez; Claudia F E Kirsch; Joseph M Aulino; Joshua S Broder; R Carter Cassidy; Gregory J Czuczman; Jennifer L Demertzis; Michele M Johnson; Kambiz Motamedi; Charles Reitman; Lubdha M Shah; Khoi Than; Elizabeth Ying-Kou Yung; Francesca D Beaman; Mark J Kransdorf; Julie Bykowski Journal: J Am Coll Radiol Date: 2019-05 Impact factor: 5.532
Authors: Xiaoxuan Liu; Livia Faes; Aditya U Kale; Siegfried K Wagner; Dun Jack Fu; Alice Bruynseels; Thushika Mahendiran; Gabriella Moraes; Mohith Shamdas; Christoph Kern; Joseph R Ledsam; Martin K Schmid; Konstantinos Balaskas; Eric J Topol; Lucas M Bachmann; Pearse A Keane; Alastair K Denniston Journal: Lancet Digit Health Date: 2019-09-25
Authors: Brigitta Britt Y M van der Kolk; Gabriella Gaby J van den Wittenboer; Niek Warringa; Ingrid M Nijholt; Boudewijn A A M van Hasselt; Lonneke N Buijteweg; Niels W L Schep; Mario Maas; Martijn F Boomsma Journal: J Am Coll Emerg Physicians Open Date: 2022-01-20
Authors: Javier Bravo; Arvin R Wali; Brian R Hirshman; Tilvawala Gopesh; Jeffrey A Steinberg; Bernard Yan; J Scott Pannell; Alexander Norbash; James Friend; Alexander A Khalessi; David Santiago-Dieppa Journal: Cureus Date: 2022-03-30