Jarrel C Y Seah1, Cyril H M Tang2, Quinlan D Buchlak3, Xavier G Holt2, Jeffrey B Wardman2, Anuar Aimoldin2, Nazanin Esmaili4, Hassan Ahmad2, Hung Pham2, John F Lambert2, Ben Hachey2, Stephen J F Hogg2, Benjamin P Johnston2, Christine Bennett5, Luke Oakden-Rayner6, Peter Brotchie7, Catherine M Jones8. 1. Annalise.ai, Sydney, NSW, Australia; Department of Radiology, Alfred Health, Melbourne, VIC, Australia. 2. Annalise.ai, Sydney, NSW, Australia. 3. Annalise.ai, Sydney, NSW, Australia. Electronic address: quinlan.buchlak1@my.nd.edu.au. 4. School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia; Faculty of Engineering and IT, University of Technology Sydney, Sydney, NSW, Australia. 5. School of Medicine, University of Notre Dame Australia, Sydney, NSW, Australia. 6. Australian Institute for Machine Learning, The University of Adelaide, Adelaide, SA, Australia. 7. Annalise.ai, Sydney, NSW, Australia; Department of Radiology, St Vincent's Health Australia, Melbourne, VIC, Australia. 8. I-MED Radiology Network, Brisbane, QLD, Australia.
Abstract
BACKGROUND: Chest x-rays are widely used in clinical practice; however, interpretation can be hindered by human error and a lack of experienced thoracic radiologists. Deep learning has the potential to improve the accuracy of chest x-ray interpretation. We therefore aimed to assess the accuracy of radiologists with and without the assistance of a deep-learning model. METHODS: In this retrospective study, a deep-learning model was trained on 821 681 images (284 649 patients) from five data sets from Australia, Europe, and the USA. 2568 enriched chest x-ray cases from adult patients (≥16 years) who had at least one frontal chest x-ray were included in the test dataset; cases were representative of inpatient, outpatient, and emergency settings. 20 radiologists reviewed cases with and without the assistance of the deep-learning model with a 3-month washout period. We assessed the change in accuracy of chest x-ray interpretation across 127 clinical findings when the deep-learning model was used as a decision support by calculating area under the receiver operating characteristic curve (AUC) for each radiologist with and without the deep-learning model. We also compared AUCs for the model alone with those of unassisted radiologists. If the lower bound of the adjusted 95% CI of the difference in AUC between the model and the unassisted radiologists was more than -0·05, the model was considered to be non-inferior for that finding. If the lower bound exceeded 0, the model was considered to be superior. FINDINGS: Unassisted radiologists had a macroaveraged AUC of 0·713 (95% CI 0·645-0·785) across the 127 clinical findings, compared with 0·808 (0·763-0·839) when assisted by the model. The deep-learning model statistically significantly improved the classification accuracy of radiologists for 102 (80%) of 127 clinical findings, was statistically non-inferior for 19 (15%) findings, and no findings showed a decrease in accuracy when radiologists used the deep-learning model. Unassisted radiologists had a macroaveraged mean AUC of 0·713 (0·645-0·785) across all findings, compared with 0·957 (0·954-0·959) for the model alone. Model classification alone was significantly more accurate than unassisted radiologists for 117 (94%) of 124 clinical findings predicted by the model and was non-inferior to unassisted radiologists for all other clinical findings. INTERPRETATION: This study shows the potential of a comprehensive deep-learning model to improve chest x-ray interpretation across a large breadth of clinical practice. FUNDING: Annalise.ai.
BACKGROUND: Chest x-rays are widely used in clinical practice; however, interpretation can be hindered by human error and a lack of experienced thoracic radiologists. Deep learning has the potential to improve the accuracy of chest x-ray interpretation. We therefore aimed to assess the accuracy of radiologists with and without the assistance of a deep-learning model. METHODS: In this retrospective study, a deep-learning model was trained on 821 681 images (284 649 patients) from five data sets from Australia, Europe, and the USA. 2568 enriched chest x-ray cases from adult patients (≥16 years) who had at least one frontal chest x-ray were included in the test dataset; cases were representative of inpatient, outpatient, and emergency settings. 20 radiologists reviewed cases with and without the assistance of the deep-learning model with a 3-month washout period. We assessed the change in accuracy of chest x-ray interpretation across 127 clinical findings when the deep-learning model was used as a decision support by calculating area under the receiver operating characteristic curve (AUC) for each radiologist with and without the deep-learning model. We also compared AUCs for the model alone with those of unassisted radiologists. If the lower bound of the adjusted 95% CI of the difference in AUC between the model and the unassisted radiologists was more than -0·05, the model was considered to be non-inferior for that finding. If the lower bound exceeded 0, the model was considered to be superior. FINDINGS: Unassisted radiologists had a macroaveraged AUC of 0·713 (95% CI 0·645-0·785) across the 127 clinical findings, compared with 0·808 (0·763-0·839) when assisted by the model. The deep-learning model statistically significantly improved the classification accuracy of radiologists for 102 (80%) of 127 clinical findings, was statistically non-inferior for 19 (15%) findings, and no findings showed a decrease in accuracy when radiologists used the deep-learning model. Unassisted radiologists had a macroaveraged mean AUC of 0·713 (0·645-0·785) across all findings, compared with 0·957 (0·954-0·959) for the model alone. Model classification alone was significantly more accurate than unassisted radiologists for 117 (94%) of 124 clinical findings predicted by the model and was non-inferior to unassisted radiologists for all other clinical findings. INTERPRETATION: This study shows the potential of a comprehensive deep-learning model to improve chest x-ray interpretation across a large breadth of clinical practice. FUNDING: Annalise.ai.
Authors: Catherine M Jones; Luke Danaher; Michael R Milne; Cyril Tang; Jarrel Seah; Luke Oakden-Rayner; Andrew Johnson; Quinlan D Buchlak; Nazanin Esmaili Journal: BMJ Open Date: 2021-12-20 Impact factor: 2.692
Authors: Fatemeh Homayounieh; Subba Digumarthy; Shadi Ebrahimian; Johannes Rueckel; Boj Friedrich Hoppe; Bastian Oliver Sabel; Sailesh Conjeti; Karsten Ridder; Markus Sistermanns; Lei Wang; Alexander Preuhs; Florin Ghesu; Awais Mansoor; Mateen Moghbel; Ariel Botwin; Ramandeep Singh; Samuel Cartmell; John Patti; Christian Huemmer; Andreas Fieselmann; Clemens Joerger; Negar Mirshahzadeh; Victorine Muse; Mannudeep Kalra Journal: JAMA Netw Open Date: 2021-12-01
Authors: Quinlan D Buchlak; Nazanin Esmaili; Christine Bennett; Yi Yuen Wang; James King; Tony Goldschlager Journal: PLoS One Date: 2022-07-27 Impact factor: 3.752
Authors: Jong Seok Ahn; Shadi Ebrahimian; Shaunagh McDermott; Sanghyup Lee; Laura Naccarato; John F Di Capua; Markus Y Wu; Eric W Zhang; Victorine Muse; Benjamin Miller; Farid Sabzalipour; Bernardo C Bizzo; Keith J Dreyer; Parisa Kaviani; Subba R Digumarthy; Mannudeep K Kalra Journal: JAMA Netw Open Date: 2022-08-01