Daichi Hayashi1,2, Andrew J Kompel3, Jeanne Ventre4, Alexis Ducarouge4, Toan Nguyen4,5, Nor-Eddine Regnard4,6, Ali Guermazi3,7. 1. Department of Radiology, Boston University School of Medicine, 820 Harrison Avenue, FGH Building, 3rd Floor, Boston, MA, 02118, USA. daichi_hayashi@hotmail.com. 2. Department of Radiology, Stony Brook University Renaissance School of Medicine, HSc Level 4, Room 120, Stony Brook, NY, 11794, USA. daichi_hayashi@hotmail.com. 3. Department of Radiology, Boston University School of Medicine, 820 Harrison Avenue, FGH Building, 3rd Floor, Boston, MA, 02118, USA. 4. Gleamer, 117-119 Quai de Valmy, 75010, Paris, France. 5. Service de Radiopédiatrie, Hôpital Armand-Trousseau, AP-HP, Médecine Sorbonne Université, 26 avenue du Docteur Arnold-Netter, 75012, Paris, France. 6. Réseau d'Imagerie Sud Francilien, 2 avenue de Mousseau, 91000, Evry, France. 7. Department of Radiology, VA Boston Healthcare System, 1400 VFW Parkway, Suite 1B105, West Roxbury, MA, 02132, USA.
Abstract
OBJECTIVE: We aimed to perform an external validation of an existing commercial AI software program (BoneView™) for the detection of acute appendicular fractures in pediatric patients. MATERIALS AND METHODS: In our retrospective study, anonymized radiographic exams of extremities, with or without fractures, from pediatric patients (aged 2-21) were included. Three hundred exams (150 with fractures and 150 without fractures) were included, comprising 60 exams per body part (hand/wrist, elbow/upper arm, shoulder/clavicle, foot/ankle, leg/knee). The Ground Truth was defined by experienced radiologists. A deep learning algorithm interpreted the radiographs for fracture detection, and its diagnostic performance was compared against the Ground Truth, and receiver operating characteristic analysis was done. Statistical analyses included sensitivity per patient (the proportion of patients for whom all fractures were identified) and sensitivity per fracture (the proportion of fractures identified by the AI among all fractures), specificity per patient, and false-positive rate per patient. RESULTS: There were 167 boys and 133 girls with a mean age of 10.8 years. For all fractures, sensitivity per patient (average [95% confidence interval]) was 91.3% [85.6, 95.3], specificity per patient was 90.0% [84.0,94.3], sensitivity per fracture was 92.5% [87.0, 96.2], and false-positive rate per patient in patients who had no fracture was 0.11. The patient-wise area under the curve was 0.93 for all fractures. AI diagnostic performance was consistently high across all anatomical locations and different types of fractures except for avulsion fractures (sensitivity per fracture 72.7% [39.0, 94.0]). CONCLUSION: The BoneView™ deep learning algorithm provides high overall diagnostic performance for appendicular fracture detection in pediatric patients.
OBJECTIVE: We aimed to perform an external validation of an existing commercial AI software program (BoneView™) for the detection of acute appendicular fractures in pediatric patients. MATERIALS AND METHODS: In our retrospective study, anonymized radiographic exams of extremities, with or without fractures, from pediatric patients (aged 2-21) were included. Three hundred exams (150 with fractures and 150 without fractures) were included, comprising 60 exams per body part (hand/wrist, elbow/upper arm, shoulder/clavicle, foot/ankle, leg/knee). The Ground Truth was defined by experienced radiologists. A deep learning algorithm interpreted the radiographs for fracture detection, and its diagnostic performance was compared against the Ground Truth, and receiver operating characteristic analysis was done. Statistical analyses included sensitivity per patient (the proportion of patients for whom all fractures were identified) and sensitivity per fracture (the proportion of fractures identified by the AI among all fractures), specificity per patient, and false-positive rate per patient. RESULTS: There were 167 boys and 133 girls with a mean age of 10.8 years. For all fractures, sensitivity per patient (average [95% confidence interval]) was 91.3% [85.6, 95.3], specificity per patient was 90.0% [84.0,94.3], sensitivity per fracture was 92.5% [87.0, 96.2], and false-positive rate per patient in patients who had no fracture was 0.11. The patient-wise area under the curve was 0.93 for all fractures. AI diagnostic performance was consistently high across all anatomical locations and different types of fractures except for avulsion fractures (sensitivity per fracture 72.7% [39.0, 94.0]). CONCLUSION: The BoneView™ deep learning algorithm provides high overall diagnostic performance for appendicular fracture detection in pediatric patients.
Authors: Ali Guermazi; Chadi Tannoury; Andrew J Kompel; Akira M Murakami; Alexis Ducarouge; André Gillibert; Xinning Li; Antoine Tournier; Youmna Lahoud; Mohamed Jarraya; Elise Lacave; Hamza Rahimi; Aloïs Pourchot; Robert L Parisien; Alexander C Merritt; Douglas Comeau; Nor-Eddine Regnard; Daichi Hayashi Journal: Radiology Date: 2021-12-21 Impact factor: 11.105
Authors: Rebecca M Jones; Anuj Sharma; Robert Hotchkiss; John W Sperling; Jackson Hamburger; Christian Ledig; Robert O'Toole; Michael Gardner; Srivas Venkatesh; Matthew M Roberts; Romain Sauvestre; Max Shatkhin; Anant Gupta; Sumit Chopra; Manickam Kumaravel; Aaron Daluiski; Will Plogger; Jason Nascone; Hollis G Potter; Robert V Lindsey Journal: NPJ Digit Med Date: 2020-10-30