Background: Modic changes (MCs) are the most prevalent classification system for describing magnetic resonance imaging (MRI) signal intensity changes in the vertebrae. However, there is a growing need for novel quantitative and standardized methods of characterizing these anomalies, particularly for lesions of transitional or mixed nature, due to the lack of conclusive evidence of their associations with low back pain. This retrospective imaging study aims to develop an interpretable deep learning-based detection tool for voxel-wise mapping of MCs. Methods: Seventy-five lumbar spine MRI exams that presented with acute-to-chronic low back pain, radiculopathy, and other symptoms of the lumbar spine were enrolled. The pipeline consists of two deep convolutional neural networks to generate an interpretable voxel-wise Modic map. First, an autoencoder was trained to segment vertebral bodies from T1-weighted sagittal lumbar spine images. Next, two radiologists segmented and labeled MCs from a combined T1- and T2-weighted assessment to serve as ground truth for training a second autoencoder that performs segmentation of MCs. The voxels in the detected regions were then categorized to the appropriate Modic type using a rule-based signal intensity algorithm. Post hoc, three radiologists independently graded a second dataset with the aid of the model predictions in an artificial (AI)-assisted experiment. Results: The model successfully identified the presence of changes in 85.7% of samples in the unseen test set with a sensitivity of 0.71 (±0.072), specificity of 0.95 (±0.022), and Cohen's kappa score of 0.63. In the AI-assisted experiment, the agreement between the junior radiologist and the senior neuroradiologist significantly improved from Cohen's kappa score of 0.52 to 0.58 (p < 0.05). Conclusions: This deep learning-based approach demonstrates substantial agreement with radiologists and may serve as a tool to improve inter-rater reliability in the assessment of MCs.
Background: Modic changes (MCs) are the most prevalent classification system for describing magnetic resonance imaging (MRI) signal intensity changes in the vertebrae. However, there is a growing need for novel quantitative and standardized methods of characterizing these anomalies, particularly for lesions of transitional or mixed nature, due to the lack of conclusive evidence of their associations with low back pain. This retrospective imaging study aims to develop an interpretable deep learning-based detection tool for voxel-wise mapping of MCs. Methods: Seventy-five lumbar spine MRI exams that presented with acute-to-chronic low back pain, radiculopathy, and other symptoms of the lumbar spine were enrolled. The pipeline consists of two deep convolutional neural networks to generate an interpretable voxel-wise Modic map. First, an autoencoder was trained to segment vertebral bodies from T1-weighted sagittal lumbar spine images. Next, two radiologists segmented and labeled MCs from a combined T1- and T2-weighted assessment to serve as ground truth for training a second autoencoder that performs segmentation of MCs. The voxels in the detected regions were then categorized to the appropriate Modic type using a rule-based signal intensity algorithm. Post hoc, three radiologists independently graded a second dataset with the aid of the model predictions in an artificial (AI)-assisted experiment. Results: The model successfully identified the presence of changes in 85.7% of samples in the unseen test set with a sensitivity of 0.71 (±0.072), specificity of 0.95 (±0.022), and Cohen's kappa score of 0.63. In the AI-assisted experiment, the agreement between the junior radiologist and the senior neuroradiologist significantly improved from Cohen's kappa score of 0.52 to 0.58 (p < 0.05). Conclusions: This deep learning-based approach demonstrates substantial agreement with radiologists and may serve as a tool to improve inter-rater reliability in the assessment of MCs.
Low back pain (LBP) is the leading cause of disability globally, accounting for 60.1 million disability‐adjusted life‐years in 2015.
,
While the nociceptive source in the vast majority of LBP cases cannot be identified,
,
there has been a growing collection of evidence showing that properties of vertebral endplates are closely linked to intervertebral disc degeneration and LBP.
,
,
,
Modic changes (MCs) are the most commonly used classification system for describing changes in endplate‐adjacent vertebral bone marrow.
Despite its prevalence, the association of MCs with LBP is inconsistent.
,
,
,Hypothesized to cause LBP through structural and inflammatory changes in the bony structures of the spine,
,
,
MCs are defined as signal variations seen in the combined assessment of T1‐weighted and T2‐weighted magnetic resonance imaging (MRI).
Bone marrow edema‐like changes or fibrovascular changes appear distinctly hypointense on T1‐weighted images and hyperintense on T2‐weighted images (Modic type 1).
,
Meanwhile, conversion of red hematopoietic bone marrow to yellow fatty marrow is hyperintense on T1 MRI and iso‐ to hyperintense in fat saturated T2 and non‐fat saturated T2 sequences, respectively (Modic type 2). And lastly, sclerotic bone appears hypointense in both sequences (Modic type 3).Thus, the semiquantitative nature of the MC classification system is highly susceptible to variability in non‐standardized imaging. Fields et al., detailed how evaluation of MCs is prone to inter‐rater variability through a wide range of factors related to equipment and image acquisition parameters.
Magnetic field strength, in particular, has been shown to have significant effects on the prevalence of MCs, with type 2 changes being easily distinguishable in low‐field MRI and type 1 changes visualized more easily in high‐field MRI.
Pulse sequence design and parameters can also effectively influence image quality, signal‐to‐noise, fat suppression, and, importantly, tissue contrast. Due to a lack of systemic standardization in spine imaging, it is pivotal to adapt grading procedures with objective and quantitative methodologies.Several quantitative approaches have been recently applied to the assessment of vertebral changes. Specialized pulse sequences, such as chemical shift encoding‐based water‐fat imaging,
magnetic resonance spectroscopy,
diffusion, and perfusion,
can provide additional information on tissue composition. Post‐acquisition, Wang et al. extracted morphological and signal intensity‐based metrics from contours of MCs, reporting improved inter‐ and intra‐rater agreement as compared to unassisted MC classification.
However, a limitation with these approaches is the need for manual demarcation of MCs, which may be labor‐intensive.Data‐driven strategies to address these drawbacks have emerged from the recent surge of development in deep learning (DL) and convolutional neural networks. Notable applications to spinal imaging analysis include automated segmentation of spinal structures,
,
,
detection of spinal anomalies,
,
,
and predictive modeling of spinal surgery outcomes.
,
Automated endplate assessments have seen relative success, as well. Jamaludin et al. have shown that endplate defects can be detected from MRI using convolutional neural networks with approximately 83.7% and 86.9% accuracy in their test set for upper and lower endplates, respectively.
While these efforts automate spinal analysis to near human‐performance, there remain opportunities to translate such models into clinical utility.The adoption of a DL model into widespread use to address inconsistencies of the assessment and reporting of MCs hinges on its interpretability. Our study aims to (1) develop a DL‐based automatic contouring method to identify MCs in vertebral bodies, (2) classify these changes as Modic types 1, 2, or 3 (MC 1/2/3) on a voxel‐wise level, thereby providing granular, quantitative information about the vertebral bodies as a Modic map, and (3) use the automatic detection as an aid to radiologists to demonstrate capability to improve agreement and pave the way for more consistent evaluations of the relationship between MCs and LBP.
MATERIALS AND METHODS
This retrospective, single‐center study was approved by the local Institutional Review Board, and the informed consent requirement was waived.
Dataset and annotations
Seventy‐five exams with the following inclusion and exclusion criteria were sampled at random from lumbar spine MRIs acquired between 2008 and 2019 at our institution. Inclusion: patients aged 19 years or older presenting with acute‐to‐chronic LBP, radiculopathy, and other symptoms of the lumbar spine including numbness, tingling, weakness, dysesthesia, and tightness. Exclusion: (1) vertebral fractures, (2) post‐operative changes, (3) extensive hardware, (4) primary tumors, (5) metastatic spinal disease, (6) infection, and (7) transitional anatomy. Imaging was performed on GE Signa HDxt 1.5 T and GE Discovery MR750 3.0 T (GE Healthcare, Milwaukee, WI) with acquisition details of the relevant T1‐weighted sagittal and T2‐weighted sagittal sequences provided in Table 1. All images were deidentified for this study.
TABLE 1
Summary of the range of acquisition parameters from dataset curated from clinical magnetic resonance imaging (MRI) exams
T1‐weighted
T2‐weighted
Field strength (T)
1.5, 3.0
1.5, 3.0
Matrix
256 × 256–512 × 512
256 × 256–512 × 512
Field‐of‐view (cm)
24.0–37.0
24.0–37.0
Slice thickness (mm)
3.0–4.0
3.0–4.0
Pixel bandwidth (Hz)
88.8–250.0
81.4–325.5
Repetition time (ms)
377–975
2430–6307
Echo time (ms)
6.8–31.8
26.1–107.8
Flip angle (°)
90–180
90–160
Summary of the range of acquisition parameters from dataset curated from clinical magnetic resonance imaging (MRI) examsTo serve as ground truth for the DL components, vertebral bodies with visible MCs were segmented for these changes (Type 1, 2, and 3) by a board‐certified neuroradiologist (C. C. with over 25 years of experience) and a musculoskeletal junior radiologist (U. U. B. with 3 years of experience) after initial adjudication for calibration on 15 exams not included in the study cohort. To promote further standardization between grading assessments, MCs with diameter less than 5 mm were excluded and mixed MCs were annotated as the predominant type. All manual annotations were performed using the medical imaging platform, MD.ai (MD.ai, New York, NY).
Image analysis
This Modic mapping scheme consists of three stages, as depicted in Figure 1: (1) segmentation and localization of the vertebral bodies, (2) binary detection and segmentation of signal variabilities characteristic of MCs, and (3) voxel‐wise classification of the detected regions to classify Modic type.
FIGURE 1
Schematic of the full Modic mapping approach. Vertebral bodies are first segmented and extracted from T1‐weighted magnetic resonance imaging (MRI), allowing extraction of the bodies on the T1 and aligned T2 images. Next, a binary segmentation network localizes and detects regions of Modic changes (MCs). Lastly, each voxel of the detected regions is classified to a Modic type using a nearest neighbor algorithm and T1 and T2
z‐scores to form a Modic map.
Schematic of the full Modic mapping approach. Vertebral bodies are first segmented and extracted from T1‐weighted magnetic resonance imaging (MRI), allowing extraction of the bodies on the T1 and aligned T2 images. Next, a binary segmentation network localizes and detects regions of Modic changes (MCs). Lastly, each voxel of the detected regions is classified to a Modic type using a nearest neighbor algorithm and T1 and T2
z‐scores to form a Modic map.
Image alignment
As MCs are characterized by local signal variations in both T1‐ and T2‐weighted images, these images were aligned with image position coordinates prior to processing. The rigid alignment was performed by first matching positions of each sagittal slice of the T2‐weighted images to the T1‐weighted images in the frontal axis. Then, T2‐weighted slices were rotated, translated, and scaled to the dimensions of their corresponding T1 counterpart. Finally, each slice was similarly translated and scaled to harmonize in‐plane resolution using bicubic interpolation.
Vertebral body localization
Our first goal was to isolate vertebral bodies to fixate on image features pertaining to the vertebral body and endplates. To achieve this, we developed and trained a preliminary V‐Net convolutional neural network
for semantic segmentation. A research associate (G. I.) manually segmented vertebral bodies from T1‐weighted images in a subset of 40 exams. These MRIs were randomly split into training (n = 20), validation (n = 17), and test (n = 3) sets and then separated into 2D slices. The V‐Net was trained on a single NVIDIA TITAN X GPU using Tensorflow v1.14 with the following hyperparameters: batch size = 3; optimizer = Adam; learning rate = 1e−4; loss function = Dice (Equation (1)); dropout rate = 0.8. Post‐training, the performance of the segmentation model was assessed using the Dice coefficient overlap between the manual and predicted segmentations. To evaluate inter‐rater variability, a second research associate (K. T. G.) manually segmented vertebral bodies from a subset of five exams.
where is the total number of voxels, represents voxel values of the prediction, and represents voxel values of the ground truth.We utilized this model to segment vertebral bodies of the 75 lumbar spine MRI exams in the dataset. The individual vertebral bodies in the inferred masks were identified using 3D connected component labeling, in which segmented masks joined within a six‐connected neighborhood were given a unique label. The masked vertebral body masks were then zero‐padded to a standardized size of 100 × 100.
Modic detection and segmentation
MC detection was achieved using a second segmentation neural network that utilized these localized vertebral bodies and the radiologist‐annotated MCs. In each exam, we used z‐score standardization to convert each voxel to the number of standard deviations from the mean signal intensity in the segmented vertebral bodies. Next, the 100 × 100 vertebral body masks were applied to the T1‐weighted and aligned T2‐weighted images and these images were stacked, producing input images of dimensions 100 × 100 × 2. Binary radiologist‐annotated MC segmentations (presence vs. absence of MCs) were similarly masked. The 75 exams, consisting of 1872 vertebral body image‐Modic segmentation pairs, were randomly split into training (n = 50), validation (n = 15), and test (n = 10) sets. Figure 2 portrays the demographic distribution of the data splits.
FIGURE 2
Distribution of subject demographics, including (A) age, (B) BMI, (C) gender, and (D) race, of the 75 magnetic resonance imaging (MRI) exams after randomly splitting into training (n = 50), validation (n = 15), and test sets (n = 10)
Distribution of subject demographics, including (A) age, (B) BMI, (C) gender, and (D) race, of the 75 magnetic resonance imaging (MRI) exams after randomly splitting into training (n = 50), validation (n = 15), and test sets (n = 10)We developed and modified the 2D V‐Net for MC segmentation. The network consists of two branches, each with four levels. The encoder branch is responsible for compressing the input to an abstract latent space of representative features. At each level, convolutional layers (1, 2, 3, and 3 layers in the respective levels) extract features with 32 kernels of size 5 × 5 and stride 1 followed by downsampling with a 2 × 2 kernel with stride 2. The subsequent decoder branch deconvolves the latent space back to the input's original dimension and passes the array through a combined cross‐entropy and Dice loss layer with sigmoid activation to ultimately produce probabilistic segmentation masks for MCs. Hyperparameters for training include: batch size = 128; optimizer = Adam; learning rate = 1e−4; loss function = weighted cross entropy and Dice (Equation (2)); loss weights = 20:1 (foreground:background); dropout = 0.2. Training was deemed complete after a designated 15 validation cycles without improvement (500 iterations per cycle).
where is a weighting coefficient set to 0.1, and
Voxel‐wise Modic change mapping
With a trained model for Modic segmentation, we then utilized a nearest‐neighbor algorithm to classify each voxel in the detected MCs into one of three types. Again, we utilized the training set; each voxel in the regions annotated by the radiologist was characterized by its T1
z‐score and T2
z‐score and then grouped into the appropriate MC group. The centroid of the [T1
z‐score, T2
z‐score] clusters was computed. To classify the test set and exams in inference, each voxel in detected MCs was similarly characterized by [T1
z‐score, T2
z‐score] then categorized by the nearest cluster centroid neighbor. This ultimately produced voxel‐wise Modic maps.
Statistical analysis
We created a rule‐based algorithm that produces binary labels of each MC in upper and lower vertebral bodies to assess the effectiveness of this scheme as compared to human performance and past works. Upper and lower sections were approximated by finding the convex hull of the vertebral body mask and bisecting them along the long axis. Thus, each bisection was described with three binary labels, representing the presence or absence of voxels characteristic of Modic types 1, 2, and 3, respectively. Sensitivity, specificity, and Cohen's kappa score (κ) were computed to evaluate the overall Modic detection performance, and the subsequent classification.
AI‐assisted experiment
A second dataset (n = 20) was curated to explore the effect of inter‐rater agreement of Modic grading with the aid of this Modic mapping pipeline. A senior neuroradiologist (C. C., over 25 years of experience), a senior musculoskeletal radiologist (T. M. L., over 25 years of experience), and a junior radiologist in‐training (U. U. B., 3 years of experience) graded these exams independently. Inter‐rater reliability was assessed using Cohen's kappa coefficient. After a 4‐week washout period, the musculoskeletal radiologist and junior radiologist re‐graded the same dataset, with the aid of Modic maps generated from our developed pipeline. Agreement was reassessed to measure differences with the initial trial using Cohen's kappa score and the McNemar's test, with the neuroradiologist established as the baseline. The experimental setup is summarized in Figure 3.
FIGURE 3
Experimental setup of the AI‐assisted assessments in the labeling platform, MD.ai. Three readers graded an independently curated dataset (n = 20). Using the trained Modic mapping schema, predictions for Modic changes (MCs) were generated in the same dataset, and after a 4‐week washout period, readers 2 and 3 re‐graded these exams with the assistance of the model predictions.
Experimental setup of the AI‐assisted assessments in the labeling platform, MD.ai. Three readers graded an independently curated dataset (n = 20). Using the trained Modic mapping schema, predictions for Modic changes (MCs) were generated in the same dataset, and after a 4‐week washout period, readers 2 and 3 re‐graded these exams with the assistance of the model predictions.
RESULTS
Vertebral body localization
Training the vertebral body segmentation network was completed in approximately 10 h with 20 000 iterations. Evaluated with the unseen test set, the model achieved 0.882 ± 0.018 Dice overlap with the ground truth segmentations. This performance is comparable to the inter‐rater Dice overlap between two research associates, which was reported as 0.927 ± 0.011.Post hoc analysis of vertebral body segmentation was performed (Figure 4). The mean volumetric error of the model prediction was 0.28 cm3 per vertebral body, approximately 1.1% of the average vertebral volume. Manually segmented ground truth and model predictions were well correlated with an R‐value of 0.94 and p‐value <0.001 using Pearson correlation.
FIGURE 4
Post hoc analysis of vertebral body segmentation of the test set. (A) Bland–Altman plot indicates the average difference in vertebral body volume between model prediction and ground truth was 0.28 cm3. The gray areas portray the 95% confidence intervals. (B) The correlation plot of vertebral body volume has an intercept of 14.8 cm3, demonstrating a measurement bias, and R‐value of 0.94. (C) Representative example of vertebral body segmentation contours on T1‐weighted image
Post hoc analysis of vertebral body segmentation of the test set. (A) Bland–Altman plot indicates the average difference in vertebral body volume between model prediction and ground truth was 0.28 cm3. The gray areas portray the 95% confidence intervals. (B) The correlation plot of vertebral body volume has an intercept of 14.8 cm3, demonstrating a measurement bias, and R‐value of 0.94. (C) Representative example of vertebral body segmentation contours on T1‐weighted image
Modic detection and segmentation
The Modic detection model, after training for 11 500 iterations, successfully identified the presence or absence of changes in 85.7% of samples in the unseen test set. Sensitivity and specificity of the model were computed and summarized in Table 2, resulting in 0.71 (±0.072) and 0.95 (±0.022), respectively. Cohen's kappa score was similarly computed against the radiologist‐annotated ground truth as 0.63, interpreted as substantial agreement.
TABLE 2
Performance of the full pipeline on the unseen test set
Performance of the full pipeline on the unseen test setAbbreviations: CI, 95% confidence interval; MC, Modic change.
Voxel‐wise Modic change mapping
Figure 5 shows the [T1
z‐score, T2
z‐score] voxel‐wise characterization of MCs in the training set. Cluster centroids of Modic 1, 2, and 3 were centered at [0.23 (±0.73), 1.20 (±1.16)], [1.04 (±1.00), 0.37 (±0.85)], and [−0.53 (±0.41), −0.52 (±0.85)], respectively, corresponding well with the qualitative classification system defined by hyper‐ and hypo‐intensities. Labeling of upper and lower vertebral bodies using the rule‐based classification system resulted in sensitivities of [0.67 (±0.113), 0.67 (±0.102), and 0.44 (±0.324)] and specificities of [0.87 (±0.030), 0.89 (±0.028), and 0.83 (±0.032)] for Modic types 1, 2, and 3, respectively, as seen in Table 2. The overall prevalence of MCs in the test set was 0.27 in the ground truth and, correspondingly, 0.23 in the model predictions. Further stratification of MC prevalence is described in Figure 6. In Figure 7, representative examples of Modic maps are shown with their corresponding T1 and T2 images.
FIGURE 5
Paired T1 and T2
z‐score coordinates of each voxel within Modic changes in the training set. These centroid coordinates align well with the qualitative Modic grading system and its corresponding variations in signal intensity (e.g., Modic type 1 is hyperintense in T2‐weighted imaging, Modic type 2 is hyperintense in T1‐weighted imaging). Detected Modic changes in the test set were classified on a voxel‐by‐voxel basis using a nearest neighbor algorithm to these cluster centroids.
FIGURE 6
Representative examples of the model inputs (T1 and T2 images), radiologist‐annotated ground truth segmentations, and the predicted Modic maps. The mapping technique is advantageous for visualizing heterogeneity and transitional pathology. Notably, in the top row, the model detects Modic change (MC) 3‐like characteristics in the anterior inferior endplate. In the second row, a small MC 1 region in the anterior superior endplate, unnoticed by the radiologist, was annotated by the automatic model.
FIGURE 7
Prevalence of Modic changes (MCs) in the ground truth and prediction of the test set, stratified by vertebral body level. The two distributions share similarities, with the highest number of MCs in the lower lumbar region (L4‐S1). The prevalence is further apportioned by the relative ratios of each Modic type. The model tends to overestimate MC 3s due to low representation in the ground truth and inductive bias.
Paired T1 and T2
z‐score coordinates of each voxel within Modic changes in the training set. These centroid coordinates align well with the qualitative Modic grading system and its corresponding variations in signal intensity (e.g., Modic type 1 is hyperintense in T2‐weighted imaging, Modic type 2 is hyperintense in T1‐weighted imaging). Detected Modic changes in the test set were classified on a voxel‐by‐voxel basis using a nearest neighbor algorithm to these cluster centroids.Representative examples of the model inputs (T1 and T2 images), radiologist‐annotated ground truth segmentations, and the predicted Modic maps. The mapping technique is advantageous for visualizing heterogeneity and transitional pathology. Notably, in the top row, the model detects Modic change (MC) 3‐like characteristics in the anterior inferior endplate. In the second row, a small MC 1 region in the anterior superior endplate, unnoticed by the radiologist, was annotated by the automatic model.Prevalence of Modic changes (MCs) in the ground truth and prediction of the test set, stratified by vertebral body level. The two distributions share similarities, with the highest number of MCs in the lower lumbar region (L4‐S1). The prevalence is further apportioned by the relative ratios of each Modic type. The model tends to overestimate MC 3s due to low representation in the ground truth and inductive bias.
AI‐assisted experiment
Inter‐rater agreement was initially assessed with an independently curated dataset (n = 20) (Table 3). Between the three radiologists, the two senior readers (reader 1 [C. C.] and reader 2 [T. M. L.]) were in the most agreement, with a Cohen's kappa score κ = 0.63. The junior radiologist (reader 3 [U. U. B.]) had moderate agreement, κ = 0.52, with reader 1 and, κ = 0.45, with reader 2.
TABLE 3
Cohen's kappa coefficients between three readers in AI‐assisted experiment
Initial agreement (κ)
Post‐AI‐assist experiment (κ)
Δκ
p‐value
Readers 1 and 2
0.63
0.62
−0.01
NS
Readers 1 and 3
0.52
0.58
+0.06
<0.05
Readers 2 and 3
0.45
0.48
+0.03
NS
Abbreviation: NS, not significant.
Cohen's kappa coefficients between three readers in AI‐assisted experimentAbbreviation: NS, not significant.With the assistance of the model prediction, agreement of reader 3 with reader 1 significantly improved to κ = 0.58 (p < 0.05). Agreement between readers 3 and 2 increased to κ = 0.48, though this result was insignificant by the McNemar's test. Meanwhile, reliability between readers 1 and 2 decreased slightly to κ = 0.62, again, without statistical significance.
DISCUSSION
This study used DL‐based models to automatically localize and map MCs in vertebral bodies. Overall, these results demonstrate substantial agreement of the detection model with radiologist‐annotated grading and a novel Modic mapping technique that provides grading assistance when incorporated into a radiology workflow. A design goal of this schema is to provide clinical utility through objective and interpretable models. We aimed to achieve this in two ways.The first pertains to reducing and streamlining the semiquantitative Modic classification system into a data‐driven, yet easily understood multistep algorithm. To limit the effective field‐of‐view to regions of the vertebral bone, rather than confounding structures such as the neighboring intervertebral discs, foramen, or spinal cord, we performed vertebral body segmentation using the V‐Net,
a widely used encoder‐decoder for biomedical image segmentation. This is particularly important when considering intervertebral disc degeneration due to the strong correlation between presence of the two anomalies.
,
The performance of this model is consistent with previous works in spinal segmentation
,
and conveys to users of this tool which regions were evaluated by the subsequent Modic detection tool. Similarly, the rule‐based classification system proposed here, based on T1 and T2
z‐scores, intuitively follows the semiquantitative blueprint originally proposed by Modic et al.
Ultimately, the availability of intermediary results and interfaces for the pipeline's decision‐making process may build confidence toward the adoption of such methodologies into clinical settings.The second strategy adopted in this approach capitalizes on the ability of Modic maps to describe heterogeneous tissues. Systematic reviews of works involving MCs note inconsistencies in reporting procedures.
,
In both research studies and in clinical practice, MCs are dictated as isolated, homogeneous lesions when they are often conglomerated and characterized by spatial heterogeneity. Past literature suggested that MRI changes may progress from Modic type 1 to type 2 to type 3 in a linear fashion,
though recent studies have demonstrated that pathologies are often reversible.
Not only can MCs be transitional, it has been reported that 27.2% of MCs are regarded as mixed, comprising of characteristics of multiple Modic types.
Capturing the granularity of mixed MCs is challenging for the human eye, yet neural networks have proven capable of identifying detailed textural and shape features from medical imaging.
,
In this work, we chose to implement a voxel‐wise MC segmentation method over a classification model due to the key capability of visualizing the heterogeneity of mixed MCs. In addition, the segmentation methodology offers higher degree of supervision, where each voxel in an image is attributed with a label. This granular supervision retains context of the neighboring tissue and improves label specificity. Further works using this approach can unravel attributes of progressive or transitional MCs that may interact with LBP, as heterogeneous tissues are often correlated with degeneration.Performance of the vertebral body segmentation and MC detection components reached or neared human reliabilities. Error analysis showed predictive inaccuracies in the lateral‐most slices where partial volume effects tend to impact the delineation of bone from surrounding tissues. The performance metric is artificially deflated as the research associate manually segmented complete vertebral bodies while the model would be apt to predict all instances of bone, some of which were only partially visible in the prescribed field of view. In the MC detection component, the distribution of predicted MCs across the lumbar vertebras was predominantly in the L4‐S1 range (74.4%), which matches well with the radiologist annotations (78.8%) and past work (75.5%).
Detection of MCs in L1 was notably underestimated by the model. We speculate this is due to signal loss at the periphery of the coil. Voxel‐wise classification of MCs yielded high predictive value of Modic types 1 and 2, arguably the two groups most important to classify due to their prevalence
,
and the strong association of MC 1 with nonspecific LBP.
,
Notably, the models are trained and evaluated on a dataset with a wide arrangement of acquisition parameters to capture the variability in non‐standardized imaging procedures.In the pilot AI‐assisted experiment, we found that the additional utility of the model predictions improved agreement of the junior radiologist with the senior radiologists (Δκ = +0.06 and Δκ = +0.03 with reader 1 and reader 2, respectively). However, agreement did not improve, but rather slightly decreased (Δκ = −0.01 with reader 1), for reassessment by reader 2. This is likely explained by the differences in training and preferences between neuroradiology and musculoskeletal radiology. The participating readers reported that a key advantage of the tool was its utility as “attention focuses,” which may have contributed to boosting agreement between reader 3 with reader 1.The technologies developed in this study can be applied in various ways. With further development, this tool could potentially assist training efforts of junior radiologists by highlighting complex cases which depict the nuances of heterogeneous spinal pathologies. Furthermore, because this model was trained using non‐standardized clinical data, the AI‐assist tool can be adapted to a continuous learning paradigm to improve model generalizability and utility without the need for additional data curation. Specifically, this model demonstrates the capability to predict transitional and heterogenous MCs which have been hypothesized to be associated with LBP. Using this tool, more data can be gathered on these changes to make consistent associations with LBP and help pave the path to elucidate the mechanisms of nonspecific LBP.While our results demonstrate that DL‐based approaches can contribute to identifying MCs, there are several notable limitations. First, despite the quantitative nature of this methodology, data‐driven techniques are still biased by its training data and annotators. Two participants of the AI‐assisted experiment were responsible for labeling the training data, which may have biased the agreement metrics against other readers. For these reasons, this algorithm is not intended to be a standalone fully diagnostic tool. Second, relatedly, we acknowledge that the exams used in this study are from a single institution, and the model is not validated with multi‐institutional testing. Lastly, our results are limited by the small sample size with poor representation of Modic type 3. Modic type 3 is described by signal void in both T1‐ and T2‐weighted images, which makes it difficult to grade and susceptible to errors in cases with low signal‐to‐noise ratio. This is impactful in the nearest neighbor component of the pipeline, which is notably sensitive. Fortunately, several collaborative efforts are in‐progress to amass additional data from other institutions with wider variability in imaging equipment and acquisition parameters. We also aim to extend this work by exploring domain adaptation strategies to improve generalizability and performing longitudinal analysis to further investigate transitional pathologies.
CONCLUSION
In this work, we present a novel DL‐based approach to localize and segment MCs, with results that demonstrate high agreement with radiologist grading. The introduction of this fully automatic, quantitative mapping technique may increase inter‐rater reliability and ultimately improve robustness in understanding the associations of MCs with LBP and spinal degeneration.
AUTHOR CONTRIBUTIONS
Kenneth T. Gao, Radhika Tibrewala, Upasana U. Bharadwaj, Cynthia T. Chin, Valentina Pedoia, and Sharmila Majumdar contributed to study design. Upasana U. Bharadwaj, Gaurav Inamdar, and Cynthia T. Chin manually annotated the MRI exams. Kenneth T. Gao, Radhika Tibrewala, and Madeline Hess developed and trained the deep learning models. Upasana U. Bharadwaj, Thomas M. Link, and Cynthia T. Chin performed the AI‐assisted experiment and interpreted the results. All authors provided critical feedback and approved the final submitted manuscript.
Authors: Aaron J Fields; Michele C Battié; Richard J Herzog; Jeffrey G Jarvik; Roland Krug; Thomas M Link; Jeffrey C Lotz; Conor W O'Neill; Aseem Sharma Journal: Eur Spine J Date: 2019-08-24 Impact factor: 3.134
Authors: W Brinjikji; F E Diehn; J G Jarvik; C M Carr; D F Kallmes; M H Murad; P H Luetmer Journal: AJNR Am J Neuroradiol Date: 2015-09-10 Impact factor: 3.825
Authors: Tom Bendix; Joan S Sorensen; Gustaf A C Henriksson; Jørn Espen Bolstad; Eva K Narvestad; Tue S Jensen Journal: Spine (Phila Pa 1976) Date: 2012-09-15 Impact factor: 3.468
Authors: Kevin Y Wang; Ijezie Ikwuezunma; Varun Puvanesarajah; Jacob Babu; Adam Margalit; Micheal Raad; Amit Jain Journal: Global Spine J Date: 2021-05-26
Authors: Christofer Herlin; Per Kjaer; Ansgar Espeland; Jan Sture Skouen; Charlotte Leboeuf-Yde; Jaro Karppinen; Jaakko Niinimäki; Joan Solgaard Sørensen; Kjersti Storheim; Tue Secher Jensen Journal: PLoS One Date: 2018-08-01 Impact factor: 3.240