Kai-Uwe Lewandrowski1, Narendran Muraleedharan2, Steven Allen Eddy3, Vikram Sobti4, Brian D Reece5, Jorge Felipe Ramírez León6, Sandeep Shah3. 1. Staff Orthopaedic Spine Surgeon Center for Advanced Spine Care of Southern Arizona and Surgical Institute of Tucson, Tucson, Arizona. 2. Aptus Engineering, Inc, Scottsdale, Arizona, and Multus Medical, LLC, Phoenix, Arizona. 3. Multus Medical, LLC, Phoenix, Arizona. 4. Innovative Radiology, PC, River Forest, Illinois. 5. The Spine and Orthopedic Academic Research Institute, Lewisville, Texas. 6. Fundación Universitaria Sanitas, Bogotá, Colombia, Research Team, Centro de Columna. Bogotá, Colombia, Centro de Cirugía de Mínima Invasión, CECIMIN-Clínica Reina Sofía, Bogotá, Colombia.
Abstract
BACKGROUND: Artificial intelligence could provide more accurate magnetic resonance imaging (MRI) predictors of successful clinical outcomes in targeted spine care. OBJECTIVE: To analyze the level of agreement between lumbar MRI reports created by a deep learning neural network (RadBot) and the radiologists' MRI reading. METHODS: The compressive pathology definitions were extracted from the radiologist lumbar MRI reports from 65 patients with a total of 383 levels for the central canal: (0) no disc bulge/protrusion/canal stenosis, (1) disc bulge without canal stenosis, (2) disc bulge resulting in canal stenosis, and (3) disc herniation/protrusion/extrusion resulting in canal stenosis. For both, neural foramina were assessed with either (0) neural foraminal stenosis absent or (1) neural foramina stenosis present. Reporting criteria for the pathologies at each disc level and, when available, the grading of severity were extracted, and the Natural Language Processing model was used to generate a verbal and written report. The RadBot report was analyzed similarly as the MRI report by the radiologist. MRI reports were investigated by dichotomizing the data into 2 categories: normal and stenosis. The quality of the RadBot test was assessed by determining its sensitivity, specificity, and positive and negative predictive value as well as its reliability with the calculation of the Cronbach alpha and Cohen kappa using the radiologist MRI report as a gold standard. RESULTS: The authors found a RadBot sensitivity of 73.3%, a specificity of 88.4%, a positive predictive value of 80.3%, and a negative predictive value of 83.7%. The reliability analysis revealed the Cronbach alpha as 0.772. The highest individual values of the Cronbach alpha were 0.629 and 0.681 when compared to the MRI report by the radiologist, rending values of 0.566 and 0.688, respectively. Analysis of interobserver reliability rendered an overall kappa for the RadBot of 0.627. Analysis of receiver operating characteristics (ROC) showed a value of 0.808 for the area under the ROC curve. CONCLUSIONS: Deep learning algorithms, when used for routine reporting in lumbar spine MRI, showed excellent quality as a diagnostic test that can distinguish the presence of neural element compression (stenosis) at a statistically significant level (P < .0001) from a random event distribution. This research should be extended to validated and directly visualized pain generators to improve the accuracy and prognostic value of the routine lumbar MRI scan for favorable clinical outcomes with intervention and surgery. LEVEL OF EVIDENCE: 3. CLINICAL RELEVANCE: Validity, clinical teaching, and evaluation study. This manuscript is generously published free of charge by ISASS, the International Society for the Advancement of Spine Surgery.
BACKGROUND: Artificial intelligence could provide more accurate magnetic resonance imaging (MRI) predictors of successful clinical outcomes in targeted spine care. OBJECTIVE: To analyze the level of agreement between lumbar MRI reports created by a deep learning neural network (RadBot) and the radiologists' MRI reading. METHODS: The compressive pathology definitions were extracted from the radiologist lumbar MRI reports from 65 patients with a total of 383 levels for the central canal: (0) no disc bulge/protrusion/canal stenosis, (1) disc bulge without canal stenosis, (2) disc bulge resulting in canal stenosis, and (3) disc herniation/protrusion/extrusion resulting in canal stenosis. For both, neural foramina were assessed with either (0) neural foraminal stenosis absent or (1) neural foramina stenosis present. Reporting criteria for the pathologies at each disc level and, when available, the grading of severity were extracted, and the Natural Language Processing model was used to generate a verbal and written report. The RadBot report was analyzed similarly as the MRI report by the radiologist. MRI reports were investigated by dichotomizing the data into 2 categories: normal and stenosis. The quality of the RadBot test was assessed by determining its sensitivity, specificity, and positive and negative predictive value as well as its reliability with the calculation of the Cronbach alpha and Cohen kappa using the radiologist MRI report as a gold standard. RESULTS: The authors found a RadBot sensitivity of 73.3%, a specificity of 88.4%, a positive predictive value of 80.3%, and a negative predictive value of 83.7%. The reliability analysis revealed the Cronbach alpha as 0.772. The highest individual values of the Cronbach alpha were 0.629 and 0.681 when compared to the MRI report by the radiologist, rending values of 0.566 and 0.688, respectively. Analysis of interobserver reliability rendered an overall kappa for the RadBot of 0.627. Analysis of receiver operating characteristics (ROC) showed a value of 0.808 for the area under the ROC curve. CONCLUSIONS:Deep learning algorithms, when used for routine reporting in lumbar spine MRI, showed excellent quality as a diagnostic test that can distinguish the presence of neural element compression (stenosis) at a statistically significant level (P < .0001) from a random event distribution. This research should be extended to validated and directly visualized pain generators to improve the accuracy and prognostic value of the routine lumbar MRI scan for favorable clinical outcomes with intervention and surgery. LEVEL OF EVIDENCE: 3. CLINICAL RELEVANCE: Validity, clinical teaching, and evaluation study. This manuscript is generously published free of charge by ISASS, the International Society for the Advancement of Spine Surgery.
Authors: Inamullah Khan; Mohamad Bydon; Kristin R Archer; Ahilan Sivaganesan; Anthony M Asher; Muhammad Ali Alvi; Panagiotis Kerezoudis; John J Knightly; Kevin T Foley; Erica F Bisson; Christopher Shaffrey; Anthony L Asher; Dan M Spengler; Clinton J Devin Journal: Spine J Date: 2019-08-20 Impact factor: 4.166
Authors: Wolf E Mehling; Viranjini Gopisetty; Elizabeth Bartmess; Mike Acree; Alice Pressman; Harley Goldberg; Frederick M Hecht; Tim Carey; Andrew L Avins Journal: Spine (Phila Pa 1976) Date: 2012-04-15 Impact factor: 3.468
Authors: Zamir Merali; Justin Z Wang; Jetan H Badhiwala; Christopher D Witiw; Jefferson R Wilson; Michael G Fehlings Journal: Sci Rep Date: 2021-05-18 Impact factor: 4.379
Authors: Kai-Uwe Lewandrowski; Ivo Abraham; Jorge Felipe Ramírez León; Albert E Telfeian; Morgan P Lorio; Stefan Hellinger; Martin Knight; Paulo Sérgio Teixeira De Carvalho; Max Rogério Freitas Ramos; Álvaro Dowling; Manuel Rodriguez Garcia; Fauziyya Muhammad; Namath Hussain; Vicky Yamamoto; Babak Kateb; Anthony Yeung Journal: J Pers Med Date: 2022-06-29