Tackeun Kim1, Young-Gon Kim2,3, Seyeon Park2, Jae-Koo Lee1, Chang-Hyun Lee1,4, Seung-Jae Hyun1,5, Chi Heon Kim4,5, Ki-Jeong Kim1,5, Chun Kee Chung4,5,6. 1. 1Department of Neurosurgery, Seoul National University Bundang Hospital, Seongnam. 2. 2Transdisciplinary Department of Medicine & Advanced Technology, Seoul National University Hospital, Seoul. 3. 3AI Institute, Seoul National University, Seoul. 4. 4Department of Neurosurgery, Seoul National University Hospital, Seoul. 5. 5Seoul National University College of Medicine, Seoul; and. 6. 6Department of Brain and Cognitive Sciences, Seoul National University College of Natural Sciences, Seoul, Republic of Korea.
Abstract
OBJECTIVE: Magnetic resonance imaging (MRI) is the gold-standard tool for diagnosing lumbar spinal stenosis (LSS), but it is difficult to promptly examine all suspected cases with MRI considering the modality's high cost and limited accessibility. Although radiography is an efficient screening technique owing to its low cost, rapid operability, and wide availability, its diagnostic accuracy is relatively poor. In this study, the authors aimed to develop a deep learning model with a convolutional neural network (CNN) for diagnosing severe central LSS using radiography and to evaluate radiological diagnostic features using gradient-weighted class activation mapping (Grad-CAM). METHODS: Patients who had undergone both spinal MRI and radiography in the period from May 1, 2005, to December 31, 2017, were screened. According to the formal MRI report, participants were consecutively included in the severe central LSS or healthy control group, and radiographs for both groups were collected. A CNN-based transfer learning algorithm was developed to classify radiographic findings as LSS or normal (binary classification). The proposed models were evaluated using six performance metrics: area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, and positive and negative predictive values. RESULTS: The VGG19 model achieved the highest accuracy with an AUROC of 90.0% (95% CI 89.8%-90.3%) by training 12,442 images. Accuracy was 82.8% (95% CI 82.5%-83.1%) by averaging 5-fold models. Feature points on Grad-CAM were reasonable, and the features could be categorized into reduced disc height, narrow foramina, short pedicle, and hyperdense facet joint. The AUROC in the extra validation was 89.3% (95% CI 88.7%-90.0%). Accuracy was 81.8% (95% CI 80.6%-83.0%) by averaging 5-fold models. Multivariate logistic regression analysis showed that a combination of demographic factors (age and sex) did not improve the model performance. CONCLUSIONS: The algorithm trained by a CNN to identify central LSS on radiographs showed high diagnostic accuracy and is expected to be useful as a triage tool. The algorithm could accurately localize the stenotic lesion to assist physicians in the identification of LSS.
OBJECTIVE: Magnetic resonance imaging (MRI) is the gold-standard tool for diagnosing lumbar spinal stenosis (LSS), but it is difficult to promptly examine all suspected cases with MRI considering the modality's high cost and limited accessibility. Although radiography is an efficient screening technique owing to its low cost, rapid operability, and wide availability, its diagnostic accuracy is relatively poor. In this study, the authors aimed to develop a deep learning model with a convolutional neural network (CNN) for diagnosing severe central LSS using radiography and to evaluate radiological diagnostic features using gradient-weighted class activation mapping (Grad-CAM). METHODS: Patients who had undergone both spinal MRI and radiography in the period from May 1, 2005, to December 31, 2017, were screened. According to the formal MRI report, participants were consecutively included in the severe central LSS or healthy control group, and radiographs for both groups were collected. A CNN-based transfer learning algorithm was developed to classify radiographic findings as LSS or normal (binary classification). The proposed models were evaluated using six performance metrics: area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, and positive and negative predictive values. RESULTS: The VGG19 model achieved the highest accuracy with an AUROC of 90.0% (95% CI 89.8%-90.3%) by training 12,442 images. Accuracy was 82.8% (95% CI 82.5%-83.1%) by averaging 5-fold models. Feature points on Grad-CAM were reasonable, and the features could be categorized into reduced disc height, narrow foramina, short pedicle, and hyperdense facet joint. The AUROC in the extra validation was 89.3% (95% CI 88.7%-90.0%). Accuracy was 81.8% (95% CI 80.6%-83.0%) by averaging 5-fold models. Multivariate logistic regression analysis showed that a combination of demographic factors (age and sex) did not improve the model performance. CONCLUSIONS: The algorithm trained by a CNN to identify central LSS on radiographs showed high diagnostic accuracy and is expected to be useful as a triage tool. The algorithm could accurately localize the stenotic lesion to assist physicians in the identification of LSS.