| Literature DB >> 35804990 |
James Thomas Patrick Decourcy Hallinan1,2, Lei Zhu3,4, Wenqiao Zhang4, Tricia Kuah1, Desmond Shi Wei Lim1, Xi Zhen Low1, Amanda J L Cheng1,2, Sterling Ellis Eide1,2, Han Yang Ong1,2, Faimee Erwan Muhamat Nor1,2, Ahmed Mohamed Alsooreti1,5, Mona I AlMuhaish1,6, Kuan Yuen Yeong7, Ee Chin Teo1, Nesaretnam Barr Kumarakulasinghe8, Qai Ven Yap9, Yiong Huak Chan9, Shuxun Lin10, Jiong Hao Tan11, Naresh Kumar11, Balamurugan A Vellayappan12, Beng Chin Ooi4, Swee Tian Quek1,2, Andrew Makmur1,2.
Abstract
BACKGROUND: Metastatic epidural spinal cord compression (MESCC) is a disastrous complication of advanced malignancy. Deep learning (DL) models for automatic MESCC classification on staging CT were developed to aid earlier diagnosis.Entities:
Keywords: Bilsky classification; CT; MRI; deep learning model; epidural spinal cord compression; metastatic epidural spinal cord compression; metastatic spinal cord compression; spinal metastases classification; spinal metastatic disease
Year: 2022 PMID: 35804990 PMCID: PMC9264856 DOI: 10.3390/cancers14133219
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1Bilsky grading of metastatic epidural spinal cord compression (MESCC) on MRI and corresponding staging CT of the thoracic spine. Axial contrast-enhanced staging CT studies (portal-venous phase) and corresponding axial T2-weighted (repetition time in msec/echo time in msec, 5300/100) MRI were used. Deep learning model training was performed by radiologists using bounding boxes to highlight the region of interest at each axial CT image. Grade 0/normal: No epidural disease, Grade 1a: Epidural disease with no thecal sac indentation, Grade 1b: Epidural disease with thecal sac indentation, Grade 1c: Epidural disease touching the cord with no displacement, Grade 2: Spinal cord compression with some cerebrospinal fluid (CSF) visible, Grade 3: Spinal cord compression with no CSF visible at the site of compression.
Figure 2Machine learning pipeline development for both model training and model deployment in a clinical setting. ROI = Region of Interest.
Figure 3Overview of the developed deep learning pipeline. The developed deep learning pipeline takes input images with information from three different windows. For the first step, a region of interest (ROI) detector is applied to all images of the three windows. In the second step, a window-specific classifier is applied to calculate the prediction probability for each windowed image. We have developed both a separated window learning (SWL) method and a combined window learning (CWL) method. In the separated window learning (SWL) method, three separate networks are trained for the information from three different windows. In the combined window learning (CWL) method, a single network with window-specific batch normalization layers is trained on all windowed images to reduce the memory size. Multimodal fusion techniques, specifically average fusion and max fusion, are applied at the prediction probability vectors to aggregate all the complementary information from images of different windows for the final prediction.
Patient demographical data and characteristics for the deep learning model training/validation and test datasets.
| Characteristics | Internal Training/Validation Set | Internal Test Set |
|---|---|---|
| Age (years) * | 60 ± 12.1 (18–93) | 58 ± 11.6 (32–76) |
| Women | 77 (49.7) | 12 (40.0) |
| Men | 78 (50.3) | 18 (60.0) |
|
| ||
| Lung | 36 (23.2) | 8 (26.7) |
| Breast | 33 (21.3) | 9 (30.0) |
| Colon | 15 (9.7) | 3 (10.0) |
| Prostate | 13 (8.4) | 0 (0) |
| Renal cell carcinoma | 12 (7.7) | 2 (6.7) |
| Multiple Myeloma | 10 (6.5) | 1 (3.3) |
| Hepatocellular carcinoma | 8 (5.2) | 1 (3.3) |
| Nasopharyngeal carcinoma | 6 (3.9) | 0 (0) |
| Others | 22 (14.2) | 6 (20.0) |
|
| 316/358 (88) | 42/358 (12) |
|
| ||
| Diffuse thoracic # | 49 (15.5) | 8 (19.0) |
| C7-T2 | 21 (6.6) | 2 (4.8) |
| T3-T10 | 97 (30.7) | 14 (33.3) |
| T11-L3 | 128 (40.5) | 15 (35.7) |
| No epidural disease | 21 (6.6) | 3 (7.1) |
Note- MESCC = Malignant epidural spinal cord compression. * The values are mean ± SD (range) for numerical variables and n (%) for categorical variables. # Two or greater sites of thoracic epidural disease.
Figure 4Flow chart of the overall study design and deep learning model development. The model performance was compared with a specialist radiologist (reference standard) and four radiologists. * All CT studies had a corresponding MRI of the thoracic region available. ROI = Region of interest. MSK = musculoskeletal radiologist (specialised in reading spine studies).
Reference standard Bilsky grades for metastatic epidural spinal cord compression.
| MESCC Grade on CT | Internal Training/Validation Set | Internal Test Set |
|---|---|---|
| Normal/Bilsky 0 | 10,594 (79.1%) | 2323 (84.9%) |
| Low-grade Bilsky (1a, 1b) | 1477 (11.0%) | 209 (7.6%) |
| High-grade Bilsky (1c, 2, 3) | 1329 (9.9%) | 203 (7.4%) |
| Totals | 13,400 | 2735 |
Note- Values are numbers (%). A region of interest (bounding box) for Bilsky grade was drawn at each axial contrast-enhanced CT image using axial T2-weighted MRI as a reference standard. MESCC = Malignant Epidural Spinal Cord Compression.
Internal test set classifications using trichotomous and dichotomous Bilsky gradings on staging CT.
| Trichotomous Grading | Dichotomous Grading | |||||
|---|---|---|---|---|---|---|
| Normal, Low and High | Normal/Low vs. High | Normal vs. Low/High | ||||
| Reader | Kappa | Kappa | Kappa | |||
| AJLC | 0.907 | <0.001 | 0.960 | <0.001 | 0.915 (0.903–0.928) | <0.001 |
| SEE | 0.907 | <0.001 | 0.963 | <0.001 | 0.928 (0.916–0.940) | <0.001 |
| FEM | 0.820 | <0.001 | 0.954 | <0.001 | 0.816 (0.796–0.836) | <0.001 |
| HYO | 0.726 | <0.001 | 0.975 | <0.001 | 0.683 (0.656–0.710) | <0.001 |
| Combined method | ||||||
| Abdomen-window | 0.891 | <0.001 | 0.966 | <0.001 | 0.929 (0.917–0.941) | <0.001 |
| Bone-window | 0.903 | <0.001 | 0.965 | <0.001 | 0.901 (0.887–0.915) | <0.001 |
| Spine-window | 0.901 | <0.001 | 0.972 | <0.001 | 0.927 (0.915–0.939) | <0.001 |
| Max Fusion-1 | 0.909 | <0.001 | 0.968 | <0.001 | 0.919 (0.906–0.932) | <0.001 |
| Average Fusion-1 | 0.911 | <0.001 | 0.968 | <0.001 | 0.929 (0.917–0.941) | <0.001 |
| Separated method | ||||||
| Abdomen-window | 0.885 | <0.001 | 0.938 | <0.001 | 0.914 (0.900–0.927) | <0.001 |
| Bone-window | 0.897 | <0.001 | 0.953 | <0.001 | 0.908 (0.894–0.921) | <0.001 |
| Spine-window | 0.873 | <0.001 | 0.971 | <0.001 | 0.889 (0.874–0.905) | <0.001 |
| Max Fusion | 0.891 | <0.001 | 0.956 | <0.001 | 0.915 (0.901–0.928) | <0.001 |
| Average Fusion | 0.904 | <0.001 | 0.962 | <0.001 | 0.923 (0.910–0.935) | <0.001 |
Internal test set sensitivity, specificity, and AUCs for the DL model and radiologists using dichotomous Bilsky grading (normal/low versus high) on CT.
| Reader | Sensitivity (95% CI) | Specificity (95% CI) | AUC (95% CI) |
|---|---|---|---|
| AJLC | 66.5 (59.6–73.0) | 98.9 (98.5–99.3) | 0.827 (0.795–0.860) |
| SEE | 59.1 (52.0–65.9) | 99.8 (99.5–99.9) | 0.794 (0.760–0.828) |
| FEM | 80.8 (74.7–86.0) | 97.3 (96.6–97.9) | 0.891 (0.863–0.918) |
| HYO | 78.8 (72.5–84.2) | 99.3 (98.9–99.6) | 0.891 (0.863–0.919) |
| Combined method | |||
| Abdomen-window | 96.6 (93.0–98.6) | 97.2 (96.5–97.8) | 0.969 (0.956–0.982) |
| Bone-window | 95.6 (91.8–98.0) | 97.1 (96.4–97.8) | 0.964 (0.949–0.978) |
| Spine-window | 95.1 (91.1–97.6) | 97.8 (97.2–98.4) | 0.965 (0.949–0.980) |
| Max Fusion-1 | 96.6 (93.0–98.6) | 97.4 (96.7–97.9) | 0.970 (0.957–0.982) |
| Average Fusion-1 | 96.6 (93.0–98.6) | 97.4 (96.7–97.9) | 0.970 (0.957–0.982) |
| Separated method | |||
| Abdomen-window | 96.6 (93.0–98.6) | 94.8 (93.8–95.6) | 0.957 (0.943–0.970) |
| Bone-window | 97.0 (93.7–98.9) | 96.0 (95.1–96.7) | 0.965 (0.953–0.978) |
| Spine-window | 92.6 (88.1–95.8) | 97.9 (97.3–98.4) | 0.953 (0.934–0.971) |
| Max Fusion-1 | 98.0 (95.0–99.5) | 96.2 (95.3–96.9) | 0.971 (0.961–0.981) |
| Average Fusion-1 | 95.6 (91.8–98.0) | 96.9 (96.1–97.5) | 0.962 (0.948–0.977) |
Internal test set sensitivity, specificity, and AUCs for the DL model and radiologists using dichotomous Bilsky gradings (normal versus low/high) on CT.
| Reader | Sensitivity (95% CI) | Specificity (95% CI) | AUC (95% CI) |
|---|---|---|---|
| AJLC | 58.5 (53.6–63.3) | 99.5 (99.2–99.8) | 0.790 (0.766–0.814) |
| SEE | 68.4 (63.7–72.9) | 99.1 (98.6–99.4) | 0.838 (0.815–0.860) |
| FEM | 85.7 (81.9–88.9) | 87.6 (86.2–88.9) | 0.866 (0.848–0.885) |
| HYO | 89.3 (85.9–92.1) | 78.0 (76.3–79.7) | 0.837 (0.820–0.854) |
| Combined method | |||
| Abdomen-window | 83.0 (79.0–86.5) | 96.8 (96.0–97.5) | 0.899 (0.881–0.918) |
| Bone-window | 88.3 (84.7–91.2) | 93.8 (92.7–94.7) | 0.910 (0.894–0.927) |
| Spine-window | 84.1 (80.2–87.5) | 96.5 (95.7–97.2) | 0.903 (0.885–0.921) |
| Max Fusion-1 | 87.1 (83.5–90.2) | 95.3 (94.4–96.1) | 0.912 (0.895–0.929) |
| Average Fusion-1 | 85.2 (81.4–88.5) | 96.4 (95.6–97.1) | 0.908 (0.891–0.926) |
| Separated method | |||
| Abdomen-window | 89.1 (85.7–91.9) | 94.6 (93.6–95.5) | 0.918 (0.903–0.934) |
| Bone-window | 89.0 (85.6–91.9) | 94.2 (93.1–95.1) | 0.916 (0.900–0.932) |
| Spine-window | 92.7 (89.7–95.0) | 92.2 (91.0–93.2) | 0.924 (0.910–0.938) |
| Max Fusion-1 | 89.6 (86.2–92.3) | 94.6 (93.6–95.5) | 0.921 (0.905–0.936) |
| Average Fusion-1 | 89.3 (85.9–92.1) | 95.3 (94.3–96.1) | 0.923 (0.907–0.938) |