Literature DB >> 29071211

Deep learning of joint myelin and T1w MRI features in normal-appearing brain tissue to distinguish between multiple sclerosis patients and healthy controls.

Youngjin Yoo¹, Lisa Y W Tang², Tom Brosch³, David K B Li², Shannon Kolind⁴, Irene Vavasour⁵, Alexander Rauscher⁶, Alex L MacKay⁷, Anthony Traboulsee⁸, Roger C Tam⁹.

Abstract

Myelin imaging is a form of quantitative magnetic resonance imaging (MRI) that measures myelin content and can potentially allow demyelinating diseases such as multiple sclerosis (MS) to be detected earlier. Although focal lesions are the most visible signs of MS pathology on conventional MRI, it has been shown that even tissues that appear normal may exhibit decreased myelin content as revealed by myelin-specific images (i.e., myelin maps). Current methods for analyzing myelin maps typically use global or regional mean myelin measurements to detect abnormalities, but ignore finer spatial patterns that may be characteristic of MS. In this paper, we present a machine learning method to automatically learn, from multimodal MR images, latent spatial features that can potentially improve the detection of MS pathology at early stage. More specifically, 3D image patches are extracted from myelin maps and the corresponding T1-weighted (T1w) MRIs, and are used to learn a latent joint myelin-T1w feature representation via unsupervised deep learning. Using a data set of images from MS patients and healthy controls, a common set of patches are selected via a voxel-wise t-test performed between the two groups. In each MS image, any patches overlapping with focal lesions are excluded, and a feature imputation method is used to fill in the missing values. A feature selection process (LASSO) is then utilized to construct a sparse representation. The resulting normal-appearing features are used to train a random forest classifier. Using the myelin and T1w images of 55 relapse-remitting MS patients and 44 healthy controls in an 11-fold cross-validation experiment, the proposed method achieved an average classification accuracy of 87.9% (SD = 8.4%), which is higher and more consistent across folds than those attained by regional mean myelin (73.7%, SD = 13.7%) and T1w measurements (66.7%, SD = 10.6%), or deep-learned features in either the myelin (83.8%, SD = 11.0%) or T1w (70.1%, SD = 13.6%) images alone, suggesting that the proposed method has strong potential for identifying image features that are more sensitive and specific to MS pathology in normal-appearing brain tissues.

Entities: Chemical Disease Gene Species

Keywords: Deep learning; Machine learning; Magnetic resonance imaging; Multiple sclerosis; Myelin water imaging

Mesh：

Year: 2017 PMID： 29071211 PMCID： PMC5651626 DOI： 10.1016/j.nicl.2017.10.015

Source DB: PubMed Journal: Neuroimage Clin ISSN： 2213-1582 Impact factor: 4.881

Introduction

Multiple sclerosis (MS) is an autoimmune disorder characterized by inflammation, demyelination, and degeneration in the central nervous system. Magnetic resonance imaging (MRI) is invaluable for monitoring and understanding the pathology of MS in vivo from the earliest stages of the disease. One promising MR imaging modality is myelin water imaging (MWI) (MacKay et al., 1994), which is a quantitative MRI technique that specifically measures myelin content (Fig. 1) in the form of the myelin water fraction (MWF), which is defined as the ratio of water trapped within myelin over the total amount of water. Although white matter lesions have been traditionally considered the hallmark of MS pathology, histological studies and the MWI technique have shown that MS alterations also occur in tissues that appear normal in conventional MRIs. For example, a study using MWI found that a cohort of MS patients had 16% lower mean global MWF in normal-appearing white matter (NAWM) than healthy controls (Laule et al., 2004). In addition, normal-appearing gray matter (NAGM) has also been shown to have reduced MWF in MS patients (Steenwijk et al., 2014). Although myelin imaging has been indispensable in enhancing our understanding of MS, most analyses to date (e.g., MacKay et al., 1994, Laule et al., 2004, Yoo and Tam, 2013 only use mean myelin measurements, either over the whole brain or in predefined regions, and disregard the fine-scale spatial patterns of myelin content that may potentially be useful for MS diagnosis.

Fig. 1

An example of a myelin map of a healthy control subject at several different slices in the dataset described in Section 2.1. The intensity reflects the relative amount of myelin present, except for the extraparenchymal areas. Deep learning (LeCun et al., 2015) is a machine learning approach that uses layered hierarchical, graphical networks to extract features from data at progressively higher levels of abstraction. In recent years, methods based on deep learning have attracted much attention due to their breakthrough performance for classification in many application domains, including image recognition and natural language processing (LeCun et al., 2015). Unsupervised deep learning can be particularly useful in neuroimaging, a domain in which the number of labeled training images is typically limited. For example, unsupervised deep learning of neuroimaging data has been used to perform various tasks such as classification between mild cognitive impairment (MCI) and Alzheimer's disease (AD) (Suk et al., 2014), and to model morphological and lesion variability in MS (Brosch et al., 2014). In view of this, we employ deep learning to extract latent spatial features in myelin maps, both on their own and combined with structural MRIs, to determine whether the deep-learned features can improve the detection of MS pathology. In doing so, we employ multimodal deep learning (Ngiam et al., 2011) to discover and model correlations between hidden patterns in the normal-appearing brain tissues of coregistered pairs of myelin maps and T1-weighted (T1w) MRIs. Myelin and T1w scans are used to provide complementary information in that the former contain myelin-specific features while the latter contain more general morphological features. Both types of features are known to be impacted by MS, but the benefits of deep learning for extracting myelin or myelin-T1w features are unknown. We hypothesize that deep learning can uncover spatial features in myelin maps that are more sensitive and specific to MS pathology than mean myelin measurements, and that multimodal deep learning can extract more sensitive and specific features than those extracted from either myelin or T1w modality alone. Our method uses a four-layer deep belief network (DBN) (Hinton et al., 2006) that is applied to 3D image patches of NAWM and NAGM to learn a latent feature representation. The image patches are selected via a voxel-wise t-test that is performed between the MS and healthy control groups. To target only normal-appearing image patches, any patches overlapping with focal MS lesions are excluded, and a feature imputation technique is used to account for missing features originating from regions with focal lesions. We then apply the least absolute shrinkage and selection operator (LASSO) (Tibshirani, 1996) as a feature selection method to construct a sparse feature representation for reducing the risk of overfitting to the training data. The final features are then used to train a random forest (Breiman, 2001) that would discriminate images of MS subjects from those of normal subjects.

Material and methods

Subjects

A cohort of 55 relapsing-remitting MS (RRMS) patients and a cohort of 44 age- and gender-matched normal control (NC) subjects were included in this study. The median age and range for both groups were 45 and 30–60. For the RRMS patients, 63.6% (35/55) were female, and 63.5% (28/44) of the NC subjects were female. The McDonald 2010 criteria (Polman et al., 2011) were used to diagnose the patients for MS. All patients underwent a neurological assessment and were scored on the Expanded Disability Status Scale (EDSS) (Kurtzke, 1983). The median EDSS and range were 4 and 0–5. Informed consent from each participant and ethical approval by the local ethics committee were obtained prior to the study.

MRI acquisition and pre-processing

The T1w images were acquired with a gradient echo sequence (TR = 28 ms, TE = 4 ms, flip angle = 27°, voxel size = 0.977 × 0.977 × 3.000 mm3 and image dimensions = 256 × 256 × 60). The myelin images were acquired with a 3D GRASE sequence (Prasloski et al., 2012b) (32 echoes, TE = 10, 20, 30,…, 320 ms, TR = 1200 ms, voxel size = 0.958 × 0.958 × 2.500 mm3 and image dimensions = 256 × 256 × 40), and processed with the non-negative least squares fitting algorithm with non-local spatial regularization (Yoo and Tam, 2013) and stimulated echo correction (Prasloski et al., 2012a). All images were acquired on a Philips Achieva 3T scanner with an 8-channel SENSE head coil. Lesion masks were produced for the MS images using a semi-automatic segmentation method (McAusland et al., 2010) applied to T2-weighted and proton density-weighted MRI pairs. The T1w images were preprocessed by applying the N3 inhomogeneity correction method (Sled et al., 1998) iteratively over multiple scales (Jones and Wong, 2002), then followed by denoising and skull-stripping. The multi-scale N3 method works similarly to the N4 algorithm (Tustison et al., 2010), but was optimized to work with the magnetic field of our scanner. The non-zero T1w intensities were normalized to have a range from 0 to 1, and then standardized to have zero mean and unit standard deviation (SD) to enable the use of Gaussian visible units (explained in Appendix B). In addition, normalization of the T1w intensities made the appearance of high-contrast edge features between the normal-appearing tissue and cerebrospinal fluid more consistent across individuals. In general, this allows the distribution of these edge features to be modeled more accurately during deep learning, which makes training of the networks faster and more stable. The brain masks computed from the T1w images and intensity standardization were also applied to the myelin images. The myelin images were registered to the T1w images using linear registration. Non-linear registration with FSL FNIRT (Jenkinson et al., 2012) was performed on the T1w images to align them to the MNI152 template (Mazziotta et al., 2001), and the computed transforms were also applied to the myelin images.

Cross-validation procedure

To maximize use of the available data, we performed a cross-validation procedure in which a rotating subset of the subjects acted as the test data, while the rest were used for training. We used an 11-fold cross-validation procedure in which each fold consisted of 9 test subjects (5 MS and 4 NC) and 90 training subjects (50 MS and 40 NC). This partitioning allowed all 99 subjects to be tested once.

Overview of the feature learning and classification pipeline

Fig. 2 shows a schematic diagram of our proposed method. The main steps are as follows. First, a common set of class-discriminative patches are extracted from both modalities in the MNI152 template space. Next, for each subject we exclude those patches that overlap with focal lesions. The resulting normal-appearing patches are then used to learn a latent joint myelin-T1w feature representation via unsupervised deep learning. To account for missing features from focal lesions, an imputation method (Marlin, 2008) is performed on the learned feature vectors. To reduce the risk of overfitting by increasing sparsity, we apply LASSO to the feature vectors. We then train a random forest classifier with the joint myelin-T1w features and class labels.

Fig. 2

A schematic illustration of the proposed algorithm for detecting multiple sclerosis pathology on normal-appearing brain tissues using a latent hierarchical myelin-T1w feature representation.

Normal-appearing patch extraction

Instead of using all voxels in an image, patch extraction is commonly used for medical image classification to improve discriminative task accuracy and to reduce computational burden (Wu et al., 2015). We extract discriminative candidate patches on normal-appearing brain tissue from the myelin and T1w images in the MNI152 template space using a voxel-wise t-test to determine the statistical significance of the group difference between MS and NC images, as similarly done in previous studies Suk et al., 2014, Tong et al., 2014 for AD/MCI diagnosis. Details on normal-appearing patch extraction are provided in Appendix A.

Unsupervised deep learning of joint myelin-T1w features

The network architecture (Fig. 3) for unsupervised deep learning consists of two modality-specific DBNs, one for myelin features and the other for T1w features, which are fed into a joint network that learns multimodal features. The number of network layers and number of hidden units were determined from previous literature Suk et al., 2014, Yoo et al., 2014 and a widely used guide (Hinton, 2012) for training restricted Boltzmann machines (RBMs). Fig. 4 shows visualizations of spatial features learned by this network from both myelin and T1w images. Technical details on modeling a joint myelin-T1w feature representation by unsupervised deep learning are described in Appendix B.

Fig. 3

The multimodal deep learning network architecture used to extract a joint myelin-T1w feature representation.

Fig. 4

Features at two RBM layers learned from myelin images (top) and T1w images (bottom). The deep network is able to learn a large variety of spatial features from both myelin and T1w images, which supports the hypothesis that myelin maps contain potentially useful structural information for detecting MS pathology.

The multimodal deep learning network architecture used to extract a joint myelin-T1w feature representation. Features at two RBM layers learned from myelin images (top) and T1w images (bottom). The deep network is able to learn a large variety of spatial features from both myelin and T1w images, which supports the hypothesis that myelin maps contain potentially useful structural information for detecting MS pathology.

Image-level feature vector construction and random forest training

For input into a supervised image-level classifier, single-modality or multi-modality features can be used. Single-modality feature vectors can be constructed by concatenating the second-layer activations from the individual myelin and T1w DBNs for all normal-appearing patches. Joint multimodal feature vectors can be constructed by concatenating the top-level hidden unit activations of the multimodal DBN. For the patches excluded due to lesions, we model each missing feature element using a normal distribution whose parameters μ and σ are estimated as the mean and SD of all feature values from the training dataset, where i = 1,…,m and m = P × K4, and P is the number of patches and K4 is the number of hidden units in the top layer. We then impute each missing feature element with a value sampled from the normal distribution, as previously described (Marlin, 2008). Since the feature dimension is very high (P × K4, depending on the number of patches P) relative to the number of training samples (90), we construct a sparse representation using a linear regression model to reduce the risk of overfitting during supervised training. In previous work by Kim et al. (2013), it has been shown that applying LASSO (Tibshirani, 1996) as a feature selection method to reduce the dimensionality of the latent features learned by DBNs is beneficial to classification performance. Accordingly, we also explore the impact of using LASSO in our framework. More specifically, LASSO employs the following objective function: where X ∈ℝ and y ∈ℝ denote the data matrix and the label vector respectively, and T is the number of subjects used for training. The vector q ∈ℝ holds the regression coefficients and λ1 is a regularization parameter that controls the sparsity of the model. After LASSO, the non-zero elements in the regression coefficient vector q are used to select the covariates to form a sparse feature representation for each image. Finally, given the feature vectors and labels, we train a random forest (Breiman, 2001) using the information gain to measure the quality of a split for each node. We also compute the relative importance of each patch for classification between MS and NC by permuting the features for each patch among the training data and computing the generalization error as measured by the out-of-bag error (Breiman, 2001). Then, we relate the feature importance to anatomical regions by using the Harvard-Oxford sub-cortical structural atlas (Desikan et al., 2006), which was derived from the MNI152 template (Mazziotta et al., 2001), and the central voxel of each patch, to enhance the interpretability of the results. The feature importance for each anatomical region is determined by averaging feature importance values from all patches belonging to each anatomical region. The procedure of determining random forest and LASSO parameters is described in Appendix C.

Results

Performance evaluation

Let TP, TN, FP and FN denote True Positive, True Negative, False Positive and False Negative, respectively. The ability of our proposed method to extract discriminative features was evaluated by using the deep-learned features of normal-appearing brain tissues to classify each subject as MS or NC, and measuring several different aspects of classification performance: Accuracy = (TP + TN) / (TP + TN + FP + FN) Sensitivity = TP / (TP + FN) Specificity = TN / (TN + FP) Area under the curve (AUC) of the receiver operating characteristic curve We performed an 11-fold cross-validation procedure as described in Section 2.3. We performed the patch selection, unsupervised deep learning and random forest training using only the training data for each fold. We used three regional features as baseline comparators: the regional mean T1w intensity, the regional mean myelin content, and the regional mean myelin-T1w features, which were formed by concatenation of the myelin content and T1w intensity feature vectors for each image. All regional means were computed on the same class-discriminative patches as used for unsupervised deep learning. We independently trained the random forest classifier for each mean-based feature. To determine the LASSO regularization parameter λ1, we performed the same nested cross-validation procedure described in Appendix C. The top three rows of Table 1 show the classification performance for each mean-based feature type. To analyze the effect of LASSO, the supervised training was initially done without this regularization. Rows 4 to 6 of Table 1 show the classification performance of the three mean-based features when including the LASSO regularization.

Table 1

Feature type	Accuracy	Sensitivity	Specificity	AUC
Regional mean without LASSO
T1w intensity	63.6 (16.5)	74.5 (18.7)	50.0 (28.2)	62.3 (16.9)
Myelin content	72.7 (13.7)	74.6 (18.7)	68.2 (17.9)	72.3 (13.8)
Myelin-T1w	67.7 (8.8)	72.7 (22.5)	61.4 (14.7)	67.1 (9.2)
Regional mean with LASSO
T1w intensity	66.7 (10.6)	76.4 (17.2)	54.5 (24.1)	65.5 (11.5)
Myelin content	73.7 (13.7)	76.4 (18.7)	70.5 (17.9)	73.4 (12.6)
Myelin-T1w	70.7 (12.8)	70.9 (21.4)	70.5 (20.8)	70.7 (12.0)
Deep-learned without LASSO
T1w	70.1 (13.6)	81.8 (20.9)	56.8 (22.3)	69.3 (13.7)
Myelin	83.8 (11.0)	85.5 (18.0)	81.8 (14.4)	83.6 (10.5)
Myelin-T1w	86.9 (9.3)	85.5 (15.0)	88.6 (12.5)	87.0 (9.0)
Deep-learned with LASSO
T1w	70.1 (13.6)	81.8 (20.9)	56.8 (22.3)	69.3 (13.7)
Myelin	83.8 (11.0)	85.5 (18.0)	81.8 (14.4)	83.6 (10.5)
Myelin-T1w	87.9 (8.4)	87.3 (12.9)	88.6 (12.5)	88.0 (8.5)

Performance comparison (%) between 6 different feature types with and without LASSO for MS/NC classification on normal-appearing brain tissues. We performed an 11-fold cross-validation on 55 RRMS and 44 NC images and computed the average performance (and standard deviation) for each feature type. The highest value for each measure is in bold. Overall, deep learning improved the classification results over the regional mean-based features across all four measures. In addition, LASSO had a positive effect, but more so for the regional mean-based features than the deep-learned features. To determine the effectiveness of feature extraction by deep learning, we compared three deep-learned feature types, which are deep-learned T1w features, deep-learned myelin features, and the output of the multimodal DBN, which combines the deep-learned myelin and T1w features. These features were also tested with and without LASSO regularization, with the results shown in Table 1. Overall, deep learning improved the classification results over the regional mean-based features across all four evaluation metrics. In addition, LASSO had a positive effect, but more so for the mean-based features than the deep-learned features. The accuracy rate attained by the deep-learned myelin-T1w feature with LASSO was statistically better than the ones attained by all of the regional features and the deep-learned T1w feature with LASSO (p < 0.01, two-sided Wilcoxon test), but it was not statistically better than the one attained by the deep-learned myelin features with LASSO.

Separate analysis in NAWM and NAGM

To determine the relative contributions of white and gray matter to classification performance, we evaluated each deep-learned feature type on predominantly NAWM and NAGM separately. Since LASSO proved to be beneficial for the previous experiments, we applied it to all of the regional NAWM and NAGM analyses. Using the WM and GM masks computed from the T1w MNI152 template, we excluded all patches that did not overlap with WM or GM. For the normal-appearing patches that overlapped with both the WM and GM masks, we labeled each patch as a NAWM patch if the WM voxel count is larger than the GM voxel count, and vice versa. The separate analysis results computed with each deep-learned feature type are summarized in Table 2. For both NAWM and NAGM, the deep-learned myelin features alone provided the best overall classification performance. Using NAWM patches gave higher performance than that attained by using NAGM patches for the myelin images, while for the T1w images, using NAGM patches gave better performance. This is consistent with the observations that the most discriminative patches in the myelin images came from subcortical WM, while the most discriminative patches in the T1w images came from the cortical and periventricular regions. Overall, the maximum classification performance of using NAWM and NAGM patches separately did not approach that of using all normal-appearing patches together.

Table 2

Separate analysis results in NAWM and NAGM. The table shows a performance comparison (%) between deep-learned features for MS/NC classification. We performed an 11-fold cross-validation on 55 RRMS and 44 NC images and computed the average performance (and standard deviation).

Feature type	Accuracy	Sensitivity	Specificity	AUC
Deep-learned on NAWM
T1w	66.7 (7.0)	78.2 (19.2)	56.8 (24.1)	67.3 (10.3)
Myelin	82.8 (11.5)	81.8 (19.9)	84.1 (11.8)	83.2 (10.4)
Myelin-T1w	74.7 (13.5)	74.5 (17.6)	75.0 (16.3)	74.8 (13.7)
Deep-learned on NAGM
T1w	68.7 (6.7)	78.2 (19.1)	59.1 (22.0)	68.9 (9.9)
Myelin	80.8 (12.6)	85.5 (15.0)	75.0 (21.3)	80.2 (13.0)
Myelin-T1w	73.7 (14.4)	74.5 (22.5)	72.7 (14.7)	73.6 (14.7)

Discussion

The regional mean myelin content features were more discriminative than the regional mean T1w intensity features, which is not surprising given that the T1w sequence is a structural MR sequence designed to show tissue contrast and not for direct quantification. The mean myelin features achieved mean classification performance rates of 73.7% (accuracy, SD 13.7%) and 73.4% (AUC, SD 12.6%) with LASSO, which are approximately 7% (accuracy) and 8% (AUC) higher than those of the regional mean T1w intensity features. However, the regional combined mean myelin-T1w features produced mean classification performance rates of 70.7% for both accuracy and AUC, showing that direct concatenation of the regional mean myelin content and T1w intensity features resulted in reduced classification performance when compared to mean myelin content alone, largely due to reduced specificity, but were still better than the performance achieved using T1w intensity features alone. Applying LASSO as a feature selection method improved the classification performance for the regional mean features. When including LASSO regularization in the supervised classifier for the regional mean features, the feature dimensionality reduction rate by LASSO was about 60–80%. For the regional mean T1w intensities, LASSO improved classification performance rates by approximately 3% for both mean accuracy and AUC. The impact of LASSO was smaller for the regional mean myelin features, resulting in about a 1% improvement in classification accuracy. LASSO also improved the classification performance of the regional combined mean myelin-T1w features, but did not beyond than that attained by the regional mean myelin features, again suggesting that direct concatenation of heterogeneous modalities is not an effective strategy for improving the classification performance. Overall, the regional mean myelin content feature type was the most accurate, sensitive, and specific regional mean MRI biomarker for distinguishing between MS and NC on normal-appearing brain tissues. Unsupervised deep learning of the regional myelin contents and T1w intensities yielded superior classification performance over using the regional mean myelin contents and T1w intensities. Without LASSO, the deep-learned T1w features improved the classification performance by about 6% in both mean accuracy and mean AUC over the regional mean T1w intensity features. In addition, the SDs for accuracy and AUC decreased by approximately 3%, showing a more consistent performance across folds. Similarly, the deep-learned myelin features improved the classification performance over the regional mean myelin content features by about 11% in both mean accuracy and mean AUC, demonstrating that spatial feature learning of myelin maps by unsupervised deep learning can produce radiologically useful information associated with MS pathology. Similarly to the deep-learned T1w features, the SDs for accuracy and AUC also decreased by about 3%. The joint deep-learned regional myelin-T1w features were more discriminative than either of the modality-specific deep-learned feature types, and improved accuracy and AUC by about 4% over the deep-learned myelin features, showing that, in contrast to the case of simple concatenation of regional mean T1w and myelin features, deep-learned joint features improved the classification performance, and decreased the SDs by about 2%. Compared to the regional mean myelin-T1w features, the deep-learned multimodal features improved the classification performance by approximately 17% in mean accuracy and AUC. We observed a relatively small impact when including the LASSO regularization in the supervised classifier for the deep-learned features. The feature dimensionality reduction rate by LASSO was about 30–50%, which is smaller than the case of regional mean features, suggesting that the deep-learned features had less redundancy. For both the deep-learned T1w features and the deep-learned myelin features, LASSO did not change the classification performance. For the deep-learned joint myelin-T1w features, LASSO improved the classification performance by about 1% in both mean accuracy and AUC. The impact of LASSO was smaller than for the regional mean features. This could be due to the fact that unsupervised deep learning is already capable of extracting less redundant and more independent feature sets which reduced the impact of dimensionality reduction by LASSO. Overall, the proposed deep-learned joint myelin-T1w features provided the best performance, surpassing all other feature types substantially in accuracy, sensitivity, specificity, and AUC. In addition, they significantly reduced the SDs of both accuracy and AUC compared to the regional mean features, showing a more consistent classification performance across folds. When used independently, the deep-learned myelin features also performed well, surpassing all other regional mean features on all four evaluation measures. All deep-learned features outperformed their regional mean counterparts, with or without LASSO, which indicates that both myelin and T1w modalities contain discriminative latent spatial patterns. Our main conclusion is that the deep-learned myelin features provide valuable pathological information that is more sensitive and specific than the use of regional mean myelin and/or T1w measurements for MS diagnosis on normal-appearing brain tissues, especially when combined jointly with deep-learned T1w features. We analyzed the effect of separately using predominantly NAWM and NAGM patches on the various deep learning models we built. The deep-learned myelin features extracted from NAWM patches achieved mean classification performance rates of 82.8% in accuracy (SD 11.5%) and 83.2% in AUC (SD 10.4%), which are approximately 2–3% higher in accuracy and AUC than those of the deep-learned myelin features with NAGM patches, suggesting that the deep-learned myelin features are more pathologically relevant to NAWM than NAGM, which is expected due to the greater myelin content in WM. When using NAWM and NAGM patches separately, the variety of feature patterns learned by the T1w-specific network was reduced compared to those learned by the T1w-specific network with all normal-appearing patches as shown in Fig. 5. The more limited feature set led to classification accuracy rates of 66.7% with NAWM patches and 68.7% with NAGM patches, which are lower than those of the deep-learned T1w features with all normal-appearing patches (70.1%). Since this limited feature set was used as an input to the multimodal myelin-T1w layer, the deep-learned myelin-T1w features did not improve the classification performance in NAWM nor NAGM patches as shown in Table 2.

Fig. 5

Deep-learned features separately extracted from predominantly NAWM, NAGM and all normal-appearing patches by the T1w modality-specific network. The variety of feature patterns learned by the T1w-specific network with NAWM and NAGM patches is reduced compared to that learned by the T1w-specific network with all normal-appearing patches. To enhance interpretability of the results and examine their relationship to published MS literature, we determined the relative contribution of the deep-learned joint myelin-T1w features in each patch location, and used the Harvard-Oxford sub-cortical atlas to compute the mean importance values in particular sub-cortical regions and structures. As shown in Fig. 6, the six most discriminative sub-cortical brain regions and structures were found to be the cerebral white matter, lateral ventricles, putamen, thalamus, hippocampus, and amygdala. The most discriminative sub-cortical brain regions were the cerebral white matter and lateral ventricles. The high importance of the cerebral white matter and lateral ventricles are likely due to demyelination in the periventricular region, combined with morphological changes caused by brain atrophy, both of which are strongly associated with MS pathology. The observed importance of the sub-cortical gray matter structures is consistent with previous MS studies (e.g., Hulst and Geurts, 2011), which showed that these specific structures undergo substantial structural and/or chemical changes.

Fig. 6

The relative importance of the deep-learned joint myelin-T1w features in different sub-cortical brain areas for RRMS vs. NC classification on normal-appearing brain tissues.

The relative importance of the deep-learned joint myelin-T1w features in different sub-cortical brain areas for RRMS vs. NC classification on normal-appearing brain tissues. It is important to acknowledge limitations of our study. Due to the relatively small training sample size, this study can only provide preliminary results and does not ensure that the proposed model will generalize to produce the exact same results in other cohorts. Secondly, our study only included RRMS patients and did not include progressive MS patients. The proposed model may extract different features for progressive MS cohorts because patients with progressive MS can have different patterns of demyelination and morphological changes throughout the brain. To evaluate this approach for detecting very early MS pathology, our future work should include patients with clinically isolated syndrome, a prodromal stage of MS, with the clinical goal of enabling earlier diagnosis. As we stated above, the T1w sequence is a structural imaging sequence for displaying tissue contrast and not for direct quantification, and this limitation cannot be corrected by intensity normalization, which is likely a main reason why the regional mean T1w intensity features produced relatively low classification accuracy as shown in Table 1. It could be argued that for complementing the myelin scans, a quantitative T1 relaxometry may be appropriate. However, a primary goal of this study was to determine whether myelin scans contained spatial features finer than regional means that would be useful for distinguishing MS, so a comparison to regional mean intensity features seems appropriate. We believe the primary reason that deep learning on T1w images produced useful features for classification is that the model captured spatial variabilities in the high-contrast boundaries between normal-appearing tissue and cerebrospinal fluid as induced by atrophy and other morphological changes. This is visually verifiable by Fig. 4, Fig. 5, which show that the features extracted by deep learning from the T1w images are mostly high-contrast edges in various shapes and orientations. For deep learning, the intensity normalization procedure is meant to enable the use of Gaussian visible units and to allow the distribution of these edge features to be modeled more accurately which generally makes training of the networks faster and more stable as also stated in Section 2.2. In contrast, deep learning appeared to capture more intensity variations in the myelin images, indicative of changes in myelin content, as shown in Fig. 4, especially in RBM layer 1. In summary, our experimental results have demonstrated the following for the task of detecting MS pathology on normal-appearing brain tissues: The regional mean myelin content features were more discriminative than the regional mean T1w intensity features. Direct concatenation of the regional mean myelin content features and the regional mean T1w intensity features did not improve the classification accuracy over using each feature type alone. Unsupervised deep learning of the regional myelin contents and T1w intensities yielded superior classification performance over using the regional mean myelin contents and T1w intensities. The joint deep-learned regional myelin-T1w features were more discriminative than either of the modality-specific deep-learned feature types. Applying LASSO to produce sparser feature representations improved the classification performance for the regional mean features, but the impact of LASSO was marginal for the deep-learned regional features. The maximum classification performance of using predominantly NAWM and NAGM patches separately was achieved by the deep-learned regional myelin features. However, it did not approach that of using all normal-appearing patches together.

Conclusions

We have demonstrated that unsupervised deep learning of normal-appearing brain tissues on myelin and T1w images can extract information that could be useful for early MS detection and provides superior classification performance to the traditional regional mean MRI measurements when using the same supervised classifier. In addition, we have shown that unsupervised deep learning of joint myelin and T1w features improves the classification performance over deep learning of either modality alone. Using a four-layer multimodal deep learning network to learn latent features, unbiased feature imputation to exclude lesion voxels, a feature selection method (LASSO) to construct sparse feature representations, and a random forest for a supervised classifier, we achieved a mean classification accuracy of 87.9% between RRMS and healthy controls on normal-appearing brain tissues, using an 11-fold cross-validation procedure. The local brain structures that were found to be important for MS classification by our method were consistent with known MS pathology and previous MS literature. In future work, we plan to extend the proposed framework to include other MRI modalities used for studying MS pathology such as multi-echo susceptibility-weighted imaging (Denk and Rauscher, 2010). We also plan to apply our framework to other subgroups of MS patients and longitudinal data for applications in MS prognostication.

22 in total

1. Rapid whole cerebrum myelin water imaging using a 3D GRASE sequence.

Authors: Thomas Prasloski; Alexander Rauscher; Alex L MacKay; Madeleine Hodgson; Irene M Vavasour; Corree Laule; Burkhard Mädler
Journal: Neuroimage Date: 2012-07-06 Impact factor: 6.556

2. Optimizing the use of radiologist seed points for improved multiple sclerosis lesion segmentation.

Authors: Jon McAusland; Roger C Tam; Erick Wong; Andrew Riddehough; David K B Li
Journal: IEEE Trans Biomed Eng Date: 2010-11 Impact factor: 4.538

3. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest.

Authors: Rahul S Desikan; Florent Ségonne; Bruce Fischl; Brian T Quinn; Bradford C Dickerson; Deborah Blacker; Randy L Buckner; Anders M Dale; R Paul Maguire; Bradley T Hyman; Marilyn S Albert; Ronald J Killiany
Journal: Neuroimage Date: 2006-03-10 Impact factor: 6.556

4. What explains gray matter atrophy in long-standing multiple sclerosis?

Authors: Martijn D Steenwijk; Marita Daams; Petra J W Pouwels; Lisanne J Balk; Prejaas K Tewarie; Joep Killestein; Bernard M J Uitdehaag; Jeroen J G Geurts; Frederik Barkhof; Hugo Vrenken
Journal: Radiology Date: 2014-04-23 Impact factor: 11.105

5. A nonparametric method for automatic correction of intensity nonuniformity in MRI data.

Authors: J G Sled; A P Zijdenbos; A C Evans
Journal: IEEE Trans Med Imaging Date: 1998-02 Impact factor: 10.048

6. Non-local spatial regularization of MRI T2 relaxation images for myelin water quantification.

Authors: Youngjin Yoo; Roger Tam
Journal: Med Image Comput Comput Assist Interv Date: 2013

7. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis.

Authors: Heung-Il Suk; Seong-Whan Lee; Dinggang Shen
Journal: Neuroimage Date: 2014-07-18 Impact factor: 6.556

8. N4ITK: improved N3 bias correction.

Authors: Nicholas J Tustison; Brian B Avants; Philip A Cook; Yuanjie Zheng; Alexander Egan; Paul A Yushkevich; James C Gee
Journal: IEEE Trans Med Imaging Date: 2010-04-08 Impact factor: 10.048

9. A probabilistic atlas and reference system for the human brain: International Consortium for Brain Mapping (ICBM).

Authors: J Mazziotta; A Toga; A Evans; P Fox; J Lancaster; K Zilles; R Woods; T Paus; G Simpson; B Pike; C Holmes; L Collins; P Thompson; D MacDonald; M Iacoboni; T Schormann; K Amunts; N Palomero-Gallagher; S Geyer; L Parsons; K Narr; N Kabani; G Le Goualher; D Boomsma; T Cannon; R Kawashima; B Mazoyer
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2001-08-29 Impact factor: 6.237

10. Water content and myelin water fraction in multiple sclerosis. A T2 relaxation study.

Authors: C Laule; I M Vavasour; G R W Moore; J Oger; D K B Li; D W Paty; A L MacKay
Journal: J Neurol Date: 2004-03 Impact factor: 4.849

15 in total

1. Convolutional Neural Network for Automated FLAIR Lesion Segmentation on Clinical Brain MR Imaging.

Authors: M T Duong; J D Rudie; J Wang; L Xie; S Mohan; J C Gee; A M Rauschecker
Journal: AJNR Am J Neuroradiol Date: 2019-07-25 Impact factor: 3.825

Review 2. Machine learning studies on major brain diseases: 5-year trends of 2014-2018.

Authors: Koji Sakai; Kei Yamada
Journal: Jpn J Radiol Date: 2018-11-29 Impact factor: 2.374

Review 3. Post-Mortem MRI and Histopathology in Neurologic Disease: A Translational Approach.

Authors: Laura E Jonkman; Boyd Kenkhuis; Jeroen J G Geurts; Wilma D J van de Berg
Journal: Neurosci Bull Date: 2019-02-21 Impact factor: 5.203

4. Foundations of Lesion Detection Using Machine Learning in Clinical Neuroimaging.

Authors: Manoj Mannil; Nicolin Hainc; Risto Grkovski; Sebastian Winklhofer
Journal: Acta Neurochir Suppl Date: 2022

5. qMRI-BIDS: An extension to the brain imaging data structure for quantitative magnetic resonance imaging data.

Authors: Agah Karakuzu; Stefan Appelhoff; Tibor Auer; Mathieu Boudreau; Franklin Feingold; Ali R Khan; Alberto Lazari; Chris Markiewicz; Martijn Mulder; Christophe Phillips; Taylor Salo; Nikola Stikov; Kirstie Whitaker; Gilles de Hollander
Journal: Sci Data Date: 2022-08-24 Impact factor: 8.501

Review 6. Machine Learning Approaches in Study of Multiple Sclerosis Disease Through Magnetic Resonance Images.

Authors: Faezeh Moazami; Alain Lefevre-Utile; Costas Papaloukas; Vassili Soumelis
Journal: Front Immunol Date: 2021-08-11 Impact factor: 7.561

7. Predicting conversion from clinically isolated syndrome to multiple sclerosis-An imaging-based machine learning approach.

Authors: Haike Zhang; Esther Alberts; Viola Pongratz; Mark Mühlau; Claus Zimmer; Benedikt Wiestler; Paul Eichinger
Journal: Neuroimage Clin Date: 2018-11-05 Impact factor: 4.881

8. Uncovering convolutional neural network decisions for diagnosing multiple sclerosis on conventional MRI using layer-wise relevance propagation.

Authors: Fabian Eitel; Emily Soehler; Judith Bellmann-Strobl; Alexander U Brandt; Klemens Ruprecht; René M Giess; Joseph Kuchling; Susanna Asseyer; Martin Weygandt; John-Dylan Haynes; Michael Scheel; Friedemann Paul; Kerstin Ritter
Journal: Neuroimage Clin Date: 2019-09-06 Impact factor: 4.881

9. Patch individual filter layers in CNNs to harness the spatial homogeneity of neuroimaging data.

Authors: Fabian Eitel; Jan Philipp Albrecht; Martin Weygandt; Friedemann Paul; Kerstin Ritter
Journal: Sci Rep Date: 2021-12-27 Impact factor: 4.996

Review 10. A systematic review of the applications of artificial intelligence and machine learning in autoimmune diseases.

Authors: I S Stafford; M Kellermann; E Mossotto; R M Beattie; B D MacArthur; S Ennis
Journal: NPJ Digit Med Date: 2020-03-09