Literature DB >> 35692429

Decomposition-Based Correlation Learning for Multi-Modal MRI-Based Classification of Neuropsychiatric Disorders.

Liangliang Liu1, Jing Chang1, Ying Wang1, Gongbo Liang2, Yu-Ping Wang3, Hui Zhang1.   

Abstract

Multi-modal magnetic resonance imaging (MRI) is widely used for diagnosing brain disease in clinical practice. However, the high-dimensionality of MRI images is challenging when training a convolution neural network. In addition, utilizing multiple MRI modalities jointly is even more challenging. We developed a method using decomposition-based correlation learning (DCL). To overcome the above challenges, we used a strategy to capture the complex relationship between structural MRI and functional MRI data. Under the guidance of matrix decomposition, DCL takes into account the spike magnitude of leading eigenvalues, the number of samples, and the dimensionality of the matrix. A canonical correlation analysis (CCA) was used to analyze the correlation and construct matrices. We evaluated DCL in the classification of multiple neuropsychiatric disorders listed in the Consortium for Neuropsychiatric Phenomics (CNP) dataset. In experiments, our method had a higher accuracy than several existing methods. Moreover, we found interesting feature connections from brain matrices based on DCL that can differentiate disease and normal cases and different subtypes of the disease. Furthermore, we extended experiments on a large sample size dataset and a small sample size dataset, compared with several other well-established methods that were designed for the multi neuropsychiatric disorder classification; our proposed method achieved state-of-the-art performance on all three datasets.
Copyright © 2022 Liu, Chang, Wang, Liang, Wang and Zhang.

Entities:  

Keywords:  canonical correlation analysis; decomposition-based; matrix decomposition; multi-modal; neuropsychiatric disorders

Year:  2022        PMID: 35692429      PMCID: PMC9174798          DOI: 10.3389/fnins.2022.832276

Source DB:  PubMed          Journal:  Front Neurosci        ISSN: 1662-453X            Impact factor:   5.152


1. Introduction

Many neuropsychiatric disorders (NDs) not only result in a huge socioeconomic burden but are also accompanied by several comorbidities (Kessler et al., 2012). Although NDs arise from physical defects or injuries, they are usually considered a chronic course of mental disease, resulting in the collapse of an understanding of the real world, cognitive problems, and persistent damage (Heinrichs and Zakzanis, 1998). Diagnosis of NDs is important for tracking the development of the disease and for choosing and evaluating the effects of an intervention such as drug treatment. Furthermore, subtyping an ND can help in personalizing treatment. As a result, increasing attention has been paid to the identification of the subtypes of the ND, such as schizophrenia (SZ), bipolar disorder (BD), and attention deficit hyperactivity disorder (ADHD). However, it is difficult to distinguish these subtypes due to a lack of standard clinical criteria (McIntosh et al., 2005; Strasser et al., 2005; Finn et al., 2015; Liu Z. et al., 2018; Hu et al., 2019; Lake et al., 2019; Jiang et al., 2020). Multi-modal magnetic resonance imaging (MRI) is a useful tool for clinical diagnosis of ND. It can provide information on different aspects of the brain. Functional MRI (fMRI) can be used to analyze the functional connections (FCs) between different brain regions. These FCs reveal individual differences in neural activity patterns, which can predict continuous phenotypic measurements (Dubois and Adolphs, 2016; Rosenberg et al., 2018; Hu et al., 2021). On the other hand, structural MRI (sMRI) reflects the location, volume, and lesions of brain tissue (McIntosh et al., 2005; Liu et al., 2019), in addition to providing information about structural connections among brain regions (Wang et al., 2009). A number of MRI studies have been conducted on ND classification, including Alzheimer's disease (Fan et al., 2020), ADHD (Connaughton et al., 2022), SZ (de Filippis et al., 2019), BD (Madeira et al., 2020), depression (Han et al., 2019), and autism (Rakić et al., 2020). However, most of these studies focus only on one type of MRI image or one type of ND. They overlook complementary information, resulting in lower classification accuracy. Compared to natural image studies, the limited number of medical MRI samples is a challenge for the state-of-the-art convolutional neural networks and graph convolutional networks (Yu et al., 2019; Willemink et al., 2020). In particular, the high-dimensionality of MRI and nonlinear relations between the matrices of MRIs pose challenges for these machine learning methods. In addition, the imaging principles of sMRI and fMRI are different, and there is no direct correlation between them. Exploring the relationship between them is itself challenging. Previous multi-modal MRI studies have demonstrated the potential of a multi-modal fusion approach in studying the relationship between fMRI and sMRI images (Qiao et al., 2019; Gao et al., 2020; Jiang et al., 2021; Mill et al., 2021). For example, Qiao et al. (2019) proposed a hybrid feature selection method based on statistical approaches and machine learning. This method explored the brain abnormalities in SZ using both fMRI and sMRI images. A multi-kernel support vector machine (SVM) was used for SZ classification, which was based on the similarity of the decomposed components from multi-modal MRI (Gao et al., 2020). Jiang et al. (2021) combined the multi-dimensional features of sMRI and fMRI to predict the state of SZ and guide medication. Different modalities contain complementary information, which can improve the performance of the model (Jiang et al., 2021; Mill et al., 2021). However, the poor interpretability of some models has become an issue when identifying significant biomarkers (Olesen et al., 2003; Seghier et al., 2004). Various strategies are widely used in multi-modal data analysis, including multi-modal canonical correlation analysis (CCA) (Correa et al., 2010), deep collaborative learning (Hu et al., 2019), parallel independent component analysis (Liu et al., 2008), and methods similar to independent component analysis (Sui et al., 2009; Calhoun et al., 2010; Groves et al., 2011). Some previous studies have identified a correlation between fMRI and sMRI images in ND groups (Sui et al., 2011; Qiao et al., 2019; Su et al., 2020). Therefore, we propose a prediction method, called decomposition-based correlation learning (DCL), for the multi-modal MRI-based classification of NDs. We first used the shrinkage principal orthogonal complement thresholding method (S-POET) (Fan and Wang, 2015) to estimate spiked fMRI and sMRI matrices. Subsequently, in the DCL method, we use decomposition-based CCA to decompose each pair of matrices into two common matrices and two orthogonal distinctive matrices. Finally, we computed the correlation between the common matrices and the distinctive matrices. We validated the DCL method on the Consortium for Neuropsychiatric Phenomics (CNP) dataset. Our results demonstrate that the proposed DCL model outperforms several other methods. We also discovered interesting feature connections when identifying significant features in fMRI data. The rest of this paper is organized as follows. Section 2 describes the DCL pipeline and provides a quantitative evaluation of our method. The dataset and experiments in applying DCL to NDs are presented in Sections 3, 4. A discussion and analysis of the results are in Section 5. Section 6 concludes this paper.

2. Methodology

The DCL pipeline is shown in Figure 1. DCL has three steps: data processing (feature extraction), S-POET (spiked covariance matrix estimation), and CCA (canonical correlation and matrix construction).
Figure 1

Overview of the architecture of the proposed integration model.

Overview of the architecture of the proposed integration model.

2.1. Overview of Principal Component Analysis (PCA)

Principal component analysis is a powerful tool for feature extraction and data visualization. PCA can extract principal components from multivariate data by maximizing the variance of the features while minimizing the reconstruction error. Let X ∈ ℝ be a matrix, where m and n are the size of the matrix. Hence, Let be the average signal, which is defined as follows: The normalized vectors are computed by subtracting the average signal from each training vector. They are defined as follows: These vectors go through PCA. Let C be a covariance matrix:

2.2. Overview of S-POET

The shrinkage principal orthogonal complement thresholding method (Fan and Wang, 2015) is a covariance estimator with an approximate factor model. It is based on sparse PCA. Feature matrices from fMRI and sMRI data are input into S-POET, which calculates an asymptotic first-order distribution for the eigenvalues and eigenvectors of the sample correlation matrices. Specifically, let k be the number of datasets and n be the number of samples in the k-th dataset. A high-dimensional dataset can be written as matrix . In our experiment, we have two matrices, one from fMRI and one from sMRI, so we set k = 2. p is a row, which corresponds to a mean-zero variable. S-POET constructs , which is the estimate of matrix X. Before defining , we let the full singular value decomposition of Y be as follows: where V and V are two orthogonal matrices. λ is a rectangular diagonal matrix whose singular values on the main diagonal are arranged in descending order. is a matrix: where and . We summarize the S-POET method in Algorithm 1.
Algorithm 1

S-POET

Input: Xpk×n Output: X~k
1:  K ← rank cov(X) //Covariance estimator
2:  p, n ← shape(X)
3:  V, S, Ut ← SVD(X, fullmatrices = False)
4:  S ← diag(S)
5:  LambdaS **2/n //lambda expression
6:  c~Sum(Lambda.diagonal()[K:])/(p-K-p*K/n)
7:  LambdasMaximum(Lambda[:K,:K]-c~*p/n,0)
8:  X~kV[:,:K]@Sqrt(Lambdas*n)@Ut[:K,:]
9:  return X~k, Lambdas, V[:, :K], K
S-POET

2.3. Overview of CCA

Canonical correlation analysis is a multivariate statistical analysis method. It determines the overall correlation between two groups of indicators. We use CCA to examine the cross-covariances of multi-modal MRI data. Let and be two matrices, where n is the number of samples, and r and s are the feature sizes of the two matrices, respectively. CCA is used to find two coefficient vectors and by optimizing the Pearson correlation between and , which is defined as follows: where , , , , and . and are two identified canonical vectors, both of which are linear combinations of raw features in the original data, and , respectively. and facilitate the interpretation of multi-omics associations by reducing the dimensionality (). We use Equation (9) as a constraint, and can be used as the cross-data correlation, i.e., Canonical correlation analysis is used to guarantee the highest total correlation of the pair-wise independent canonical vectors, which is defined as follows: where , , , and . Since Φ11 and Φ22 may be singular when calculating the loading vectors, matrix regularization is usually enforced on them to ensure that they are positive definite:

2.4. Decomposition-Based Correlation Learning

Let X1 and X2 be paired matrices of fMRI and sMRI, which are the input of S-POET methods. We use the DCL method to decompose this pair of matrices into two common matrices and two orthogonal distinctive matrices. Then, we collect these two types of matrices into a common matrix (C) and a distinctive matrix (D), respectively. Based on the output () of S-POET, we use to develop two estimators for C and D. First, we define the common variable c as follows: where the constraints X1 = C1 + D1, X2 = C2 + D2, corr(D1, D2) = 0, and c ∈ [0, 1]. Then, the estimator of C can be defined as follows: where , C1 and C2 have the maximum correlation between each other, while the vectors within each are uncorrelated and whitened. Their correlation vectors , ,…, are called the canonical correlation coefficients. The estimator of D is defined as follows: In our experiment, we use the relationship between and to represent the orthogonal relationship between two distinctive matrices, and . Finally, , the estimator of X, is defined as follows: We summarize DCL in Algorithm 2.
Algorithm 2

DCL

Input: X1p×n, X2s×n //Input of sMRI and fMRI, respectively. Output: X^1,X^2
1:  X~1,Lambda1,U1S-POET(X1) //processed by S-POET method
2:  X~2,Lambda2,U2S-POET(X2) //processed by S-POET method
3:  Lambda11 ← Construct diag(Lambda1)
4:  Lambda22 ← Construct diag(Lambda2)
5:  Theta(Lambda11@U1.T@X~1)@(X~2.T@U2@Lambda22)/n
6:  Vtheta, Dtheta ← SVD(Theta, fullmatrices = True) //Singular Value Decomposition
7:  Gamma1U1@Lambda11@Vtheta
8:  Gamma2U2@Lambda22@Vtheta
9:  Amat ← diag(Dtheta) //Diagonal matrix
10:  Cbase ← Common variables corr(X~1,X~2)
11:  C~1 Common matrix (X~1,Cbase,Amat)
12:  C~2 Common matrix (X~2,Cbase,Amat)
13:  D~1 Distinctive matrix (X~1,C~1)
14:  D~2 Distinctive matrix (X~2,C~2)
15:  X~1 Combination of common and distinctive matrices
16:  X~2 Combination of common and distinctive matrices
17:  return X^1, X^2
DCL

3. Methods

3.1. CNP Dataset

We evaluated the proposed DCL method in classifying NDs in the CNP dataset (Poldrack et al., 2016). The CNP dataset was collected by a consortium at the University of California, Los Angeles (UCLA), with financial support provided by the National Institutes of Health. This dataset has been used to elucidate the association between the human genome and complex psychological syndromes and promote the development of new therapies for NDs. All of this research was based on image phenotypic features in the mental disease. The consortium for neuropsychiatric dataset was obtained from the OpenfMRI project (Gorgolewski et al., 2016). It includes sMRI data, task-based fMRI data, and resting-state fMRI data. These MRI images were acquired on one of two 3T Siemens Trio scanners at UCLA. The database contains extensive details of neuropsychologic assessments, neurocognitive tasks, and demographic information (including biological sex, age, and education). In addition, there are also details of the medication taken by those in ND groups. The present study includes 272 images of subjects in one of four categories: 130 healthy controls (HCs), 50 SZ subjects, 49 BD subjects, and 43 ADHD subjects. These 272 images were from people in the Los Angeles area aged between 21 and 50 years old who were recruited through community advertisements. The details of the CNP dataset are listed in Table 1.
Table 1

Details of the Consortium for Neuropsychiatric Phenomics (CNP) dataset.

ID Subtype Number Details
0Healthy controls (HC)130
1Schizophrenia (SZ)50Disorganized, paranoid, or residual types
2Bipolar disorder (BD)49Most recent hypomanic or manic episode, mild or moderate
3Attention deficit hyperactivity disorder (ADHD)43Predominantly inattentive, combined, or predominantly hyperactive-impulsive types
Details of the Consortium for Neuropsychiatric Phenomics (CNP) dataset.

3.2. Brain Connectivity Data

Brain connectivity information may be reflected in fMRI images. In the CNP dataset, each sample has seven fMRI modalities, which were collected during different task states: BOLD contrast, resting state (with physiological monitoring), breath-holding tasks (with physiological monitoring), balloon analog risk tasks, stop-signal tasks, task switching, and spatial working memory capacity tasks. In this study, we attempted to classify NDs using resting-state fMRI images. Resting-state fMRI is an imaging technique that obtains a brain activity function map when the subject is in a resting state undisturbed by other activities, which is better for distinguishing ND groups. The CNP dataset has resting-state fMRI images with scans lasting 304 s. The participants were relaxed with their eyes open. They were not stimulated or asked to respond during scanning (Poldrack et al., 2016). The fMRI data were collected under the following parameters: the slice thickness was 4 mm, 34 slices were taken, TR was 2 s, TE was 30 ms, the flip angle was 90°, the matrix size was 64 × 64, the field of view was 192 mm, and the orientation was an oblique slice. In addition, high-resolution anatomical MP-RAGE data were collected under the following parameters: TR was 1.9 s, TE was 2.26 ms, the field of view was 250 mm, the matrix size was 256 × 256, the slices were in the sagittal plane, the slice thickness was 1 mm, and 176 slices were taken. We excluded 24 samples for which the whole-brain image volumes were unavailable or the head had moved excessively. Finally, we had 248 samples. Before subsequent experiments, we preprocessed the fMRI data according to Gorgolewski et al. (2017), including slice timing, head motion corrections, spatial smoothing, band-pass filtering (0.01–0.1 Hz), nuisance signal regression, and Montreal Neurological Institute (MNI) space normalization and so on. Then, we used FSL to skull stripped and co-registered fMRI to the corresponding T1 weighted volume using boundary based registration with 9 degrees of freedom implemented in FreeSurfer. Finally, we obtained the functional connectivity matrix of the brain through the following steps: first, we used the BioImage Suite (Joshi et al., 2011) to calculate connectivity matrices for the fMRI images. We then used the Anatomical Automatic Labeling 90 (AAL90) brain atlas, which divided the brain images into 90 regions. The Pearson correlation coefficient was used to calculate the node values. The Fisher transformation was used to normalize the z scores. Finally, we obtained a 90 × 90 symmetric connectivity matrix for each sample. These connectivity matrices were not thresholded or binarized.

3.3. Brain Structure Data

Structural MRI are also used as inputs to the DCL method. It was obtained with the same parameter values used for the fMRI images. We used the open-source software FreeSurfer to process and analyze these sMRI images. FreeSurfer is used to analyze and visualize cross-sectional structural images. It can be used for stripping the skull, correcting the B1 bias field, registering an image, reconstructing the cortical surface, and estimating the cortical thickness. We used FreeSurfer to generate high-precision gray and white matter segmentation surfaces and gray matter and cerebrospinal fluid segmentation surfaces. From these two surfaces, we calculated the cortical thickness and other surface features, such as the cortical surface area, curvature, and gray matter volume. Overall, there were 248 subjects, we obtained 2,196 features from the sMRI image of a subject. Finally, we constructed a 248 × 2, 196 matrix from the sMRI image of 248 subjects.

4. Experiments and Results

4.1. Experimental Design and Metrics

In our experiments, we focused on two aspects of brain connectivity: (1) classifying NDs into different subtypes using fMRI and sMRI data and (2) extracting important features from the fMRI and sMRI images. The classification task was to validate the performance of the DCL method for the different ND groups, whereas the feature extraction task was used to assess the capability of the method in detecting correlated features. We obtained the correlation matrices by inputting the 248 fMRI (90 × 90) and sMRI (248 × 2196) matrices into S-POET. Then, we decomposed each pair of canonical matrices and computed their correlations. Finally, we used the leave-one-out (LOO) method to select the important features in the test sample matrix. For a dataset with n samples, verification based on LOO is carried out over n iterations. In each iteration, the classifier uses n − 1 samples as training samples and uses the remaining sample as testing samples. In our experiments, accuracy (ACC), precision (PRE), recall (REC), and F-score (F1) are used to measure the classification performance. They are defined as follows: where TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives. The values of these metrics were obtained from a LOO-based cross-validation. Our experiments were implemented in Python on an NVIDIA Titan X Pascal CUDA GPU processor.

4.2. LOO Classification Method

We compared the performance of the DCL method with other methods: SVM, random forest (RF), XGBoost, PCA+SVM, PCA+RF, PCA+XGBoost, CCA+SVM, CCA+RF, and CCA+XGBoost. The linear kernel in the SVM classifier was used, as it provides better experimental performance than other kernels. As a trade-off between performance and computational cost, we set the number of trees in RF to 100. To prevent overfitting by XGBoost, we set the maximum tree depth for base learners and the turning parameter for the L2 regularization term to 10 and 5, respectively. In the experiments, SVM, RF, and XGBoost use concatenated fMRI and sMRI matrices as their input, while the fMRI and sMRI matrices input to the other methods were first processed by the PCA, CCA, or DCL methods. The classification results for the DCL method and the other classifiers are shown in Table 2. Each experiment was verified with 10-fold cross-validation. The conventional machine learning classifiers (SVM, RF, and XGBoost) had the lowest accuracy. These classifiers cannot capture distinguishable information from the union matrix. Compared with SVM, RF, and XGBoost, the PCA and CCA classifiers achieved better classification results. The best accuracy for both was 49.00%, which demonstrates that correlation information can be incorporated to improve the classification. The classifiers based on DCL had much better performance than those based on PCA or CCA. The best accuracy was 72.00%. Our proposed DCL method is a natural extension of the traditional CCA method. Based on the CCA decomposition, DCL determines the common and discernibility matrices and establishes an orthogonal relationship between the two discernibility matrices.
Table 2

Mean values in the evaluation of the classification performance on the CNP dataset.

Classifier ACC (%) PRE (%) REC (%) F1 (%)
SVM38.00 (4.00)40.00 (10.00)39.00 (5.00)37.00 (6.00)
RF41.00 (10.00)32.00 (11.00)42.00 (7.00)35.00 (9.00)
XGBoost45.00 (6.00)32.00 (9.00)46.00 (4.00)36.00 (5.00)
PCA+SVM46.00 (2.00)43.00 (7.00)50.00 (7.00)40.00 (3.00)
PCA+RF47.00 (9.00)49.00 (7.00)46.00 (6.00)44.00 (3.00)
PCA+XGBoost49.00 (11.00)45.00 (8.00)49.00 (8.00)45.00 (7.00)
CCA+SVM45.00 (9.00)42.00 (18.00)49.00 (15.00)38.00 (12.00)
CCA+RF47.00 (13.00)48.00 (11.00)48.00 (10.00)43.00 (10.00)
CCA+XGBoost49.00 (8.00)46.00 (14.00)49.00 (12.00)44.00 (14.00)
DCL+SVM64.00 (9.00)69.00 (7.00)66.00 (6.00)65.00 (8.00)
DCL+RF68.00 (10.00)73.00 (3.00)72.00 (4.00)72.00 (4.00)
DCL+XGBoost72.00 (8.00)81.00 (2.00)70.00 (3.00)75.00 (3.00)
Mean values in the evaluation of the classification performance on the CNP dataset. In addition, our comparative experiment was based on a sample size of 248. As shown in Table 2, we used three typical machine learning methods (SVM, RF, and XGBoost) as the baseline. The performance of these three machine learning methods was very different from that based on the PCA, CCA, or DCL methods. There are two reasons: Machine learning methods can be effective for classifying simple images, but because medical images are very complex, these three machine learning methods were overwhelmed. The limited sample size does not meet the training requirements of the three machine learning methods. The multi-class classification task increased the imbalance for the samples, making it difficult for these methods to obtain key feature information from the high latitude and limited samples. Therefore, unlike the other methods, the DCL method first preprocesses the complex relationship between the sMRI and fMRI data, which reduces the complexity of the input data. Table 2 shows that, despite the limited sample size, DCL can better deal with the relations in high latitude data and improve the performance of machine learning. Of the DCL-based classifiers, XGBoost had the best results in the multi-class classification task. The best accuracy was 72.00%. The receiver operating characteristic (ROC) curves for XGBoost in multi-class classification is plotted in Figure 2. The areas under the micro-averaged and macro-averaged ROC curves in Figures 2B,C are much larger than those in Figure 2A. Moreover, the areas under the curves for the four subtypes in Figures 2B,C are much larger than those in Figure 2A. These results indicate that the correlation information obtained by PCA or CCA can improve the performance of a classifier. The classification results for DCL are much better than those for PCA or CCA. The areas under all the ROC curves in Figure 2D are larger than those in Figures 2B,C. This indicates that our DCL method can better describe brain connection networks and thus improve the performance of the classifiers.
Figure 2

Receiver operating characteristics (ROC) curves of XGBoosts with different pretreatment methods. (A) XGBoost method is used in classification task. (B) The PCA-based XGBoost is used in classification task. (C,D) CCA and DCL-based XGBoosts are used in classification task.

Receiver operating characteristics (ROC) curves of XGBoosts with different pretreatment methods. (A) XGBoost method is used in classification task. (B) The PCA-based XGBoost is used in classification task. (C,D) CCA and DCL-based XGBoosts are used in classification task.

4.3. Feature Selection Based on the LOO Method

Besides assessing the performance of the DCL method, we also identified the important features with the DCL+XGBoost method. The aim was to find which edges contribute to brain connectivity. The extracted features are mapped back into the brain space, which facilitates the interpretation of the known relationship between brain structure and function. However, due to the dimensionality of the connectivity network, the visualization is challenging. In the LOO method, we used a weight-based method to evaluate the importance of features in the test sample matrix. The weight in XGBoost is used to calculate the number of times a feature is used as a split point across all trees. Finally, we counted the number of samples whose feature weights were >0. We visualized the representations of all important features for both the sMRI and fMRI data.

4.4. Visualization of FCs

It is interesting to investigate how different brain networks cooperate and connect with each other. We found that there were significant differences between the FCs of each group, which indicates that these FCs not only reflect the information common to the different groups but also the differences among them. We used the BrainnetViewer software (https://www.nitrc.org/projects/bnv/) to visualize which FCs have the strongest relationships in the brain network. The first row in Figure 3 is for the HC group, whereas the second row is for the ND group. Figure 3A shows 3D plots of the brain network to visualize the selected edges. A sphere denotes the center of a node. Different colors denote different brain regions. If two brain regions are functionally related, they are connected by a colored line. The colors of the lines indicate the edge strength and whether there is a positive correlation between the behaviors and the FCs. The brain network visualization has a small number of edges, which demonstrate the degree of the distribution across the whole brain network.
Figure 3

Visualizations of the connectivity of HC and neuropsychiatric disorders (NDs) in different manners on the CNP dataset. The first row and second row show the HC group and ND group, respectively. (A) Shows the connectivity in glass brain plots. (B) Shows the connectivity in circle plots. (C) Shows the connectivity in symmetric matrices.

Visualizations of the connectivity of HC and neuropsychiatric disorders (NDs) in different manners on the CNP dataset. The first row and second row show the HC group and ND group, respectively. (A) Shows the connectivity in glass brain plots. (B) Shows the connectivity in circle plots. (C) Shows the connectivity in symmetric matrices. The 2D circle plots in Figure 3B are also used to visualize relationships between pairs of brain regions. The wider the edge between two regions, the closer their relationship is. These circle plots indicate how many FCs a region has with other brain regions. Figure 3C has mappings of the 90 × 90 connectivity matrices, which are used to visualize aggregate statistics within and between predefined regions or networks. In a connectivity matrix, nodes represent brain regions and links measure conditional dependence between the brain regions. Brain connectivity analysis is equivalently transformed into the estimation of a spatial partial correlation matrix.

4.5. Analysis of HCs and NDs

In both HC group (the first row in Figure 3) and ND group (the second row in Figure 3), most of the FCs are common to both groups. These overlapping FCs are mainly within or across the temporal lobes or across the frontal, occipital, and parietal lobes, which confirm the results of previous studies. For instance, Haier et al. (2005) and Rubia et al. (2007) showed that temporal lobe dysfunction is strongly correlated with ADHD. Several brain regions in the frontal, parietal, temporal, and occipital lobes have been identified as significant predictors of ND (Gaudio et al., 2019; Zhang et al., 2020). Furthermore, Figures 3A,B show that there are significant differences between the FCs of the two groups. Compared with the HCs, the ND group has abnormal brain regions, mainly in the supramarginal gyrus, cingulate gyrus, middle frontal gyrus, etc. Other studies have also found that there are fewer FCs in the middle frontal gyrus and anterior cingulate regions in SZ brains compared to HCs (Camchong et al., 2011; Liu et al., 2011). However, the FCs in the ND group are more complicated than those in the HC group, which may be due to their mental illness. These differences may affect the behaviors and mental states of the ND group. There are many highlighted cells in the HC matrix in Figure 3C, whereas the highlighted cells in the ND matrix are more dispersed. This also indicates that NDs may affect the FCs between brain regions.

4.6. Analysis of Different NDs

To study the specificity of subtypes in NDs, we visualized the FCs of the three ND subtypes in Figure 4. Figure 4A is for all the ND subtypes. Figure 4B is for the SZ subtype. Figures 4C,D are for the BD and ADHD subtypes, respectively.
Figure 4

Visualizations of the connectivity of three ND subtypes in glass brain plot graph, circle plot graph, and symmetric matrix graph on CNP dataset. (A) Shows all the ND subtypes. (B) Shows the SZ subtype. (C,D) Show the BD and ADHD subtypes, respectively.

Visualizations of the connectivity of three ND subtypes in glass brain plot graph, circle plot graph, and symmetric matrix graph on CNP dataset. (A) Shows all the ND subtypes. (B) Shows the SZ subtype. (C,D) Show the BD and ADHD subtypes, respectively. The brain networks clearly suggest that the FCs of these diseases are very similar, but their differences are also very obvious. In particular, the FCs in the ADHD plots are obviously different from those in the SZ and BD plots. This is why classifying ADHD is usually a separate task in most approaches to classifying NDs. Moreover, the connections between brain regions shown in the circle plots in the second column are obviously different for the three diseases.

4.7. Features Distribution of PCA and DCL

Figure 5 compares the principal components found by the PCA method with those found by the proposed DCL method. Figures 5A,B visualize the fMRI and sMRI feature matrices found by PCA. Figure 5C is the visualization of the combined feature matrix for the fMRI and sMRI images for PCA. Figure 5D is the feature matrix produced by DCL.
Figure 5

Representation of feature distribution on CNP dataset. (A,B) Visualize the fMRI and sMRI feature matrices processed by PCA, respectively. (C) Visualizes the combined feature matrix for the fMRI and sMRI images processed by PCA. (D) Visualizes the feature matrix produced by DCL. In the legend, 0 represents HC, 1 represents SZ, 2 represents BD, and 3 represents ADHD.

Representation of feature distribution on CNP dataset. (A,B) Visualize the fMRI and sMRI feature matrices processed by PCA, respectively. (C) Visualizes the combined feature matrix for the fMRI and sMRI images processed by PCA. (D) Visualizes the feature matrix produced by DCL. In the legend, 0 represents HC, 1 represents SZ, 2 represents BD, and 3 represents ADHD. As shown in Figure 5, the figure shows that the three distributions of features produced by PCA are disordered (Figures 5A–C). Although the distributions of the PCA-processed fMRI and sMRI matrices (Figure 5C) are relatively concentrated, the four icons of subtypes are still indistinguishable. It would be difficult for classifiers to distinguish the features of the four subtypes. In contrast, the distribution of fMRI and sMRI matrices after DCL processing shows the effect of aggregation, which is shown in Figure 5D. The features of the four subtypes can be clearly distinguished. Therefore, the performance of a classifier would be greatly improved by using a feature matrix produced by the DCL method. At the same time, in order to eliminate the difference in the distribution of subtypes, we normalized the matrices in the DCL method, so that the subtypes are distributed in a smaller range.

5. Ablation Experiments and Discussion

We proposed the DCL framework to classify psychiatric disorders using fMRI and sMRI. In this section, we discussed several factors that influence the experimental results. To validate the performance of DCL on different size of datasets, we extended experiments on a larger sample size dataset (a subset of ADNI) and a small sample size dataset (a subset of OpenfMRI), respectively.

5.1. Influence of S-POET

The shrinkage principal orthogonal complement thresholding method is a covariance estimator with the approximate factor model, which is based on sparse PCA. In our method, we used the S-POET method to obtain asymptotic first-order distribution for the eigenvalues and eigenvectors of the fMRI and sMRI correlation matrices, respectively. To verify the effect of the S-POET method in our proposed DCL method, we extended two different DCL methods on XGBoost: one is based on PCA[DCL(PCA)] and another is based on S-POET[DCL(S-POET)]. As shown in Table 3, we extended the experiments on CNP dataset. For both datasets, compared with the DCL(PCA)-based XGBoost, the DCL(S-POET)-based XGBoost obtained the super performance. The accuracy was almost improved by 13% on CNP. Although S-POET is obtained by sparse PCA extension, S-POET is more suitable for sparse high-latitude data. PCA has widely been proved that it is a powerful tool for dimensionality reduction and data visualization. Its theoretical properties such as the consistency and asymptotic distributions of empirical eigenvalues and eigenvectors are challenging especially in the high dimensional regime. While, in the method S-POET, the spike magnitude of leading eigenvalues, sample size, and dimensionality of the leading eigenvalues are considered. In addition, a new covariance estimator is introduced in S-POET to correct the bias of PCA estimation of leading eigenvalues and eigenvectors. Therefore, S-POET is more advantageous in the process of fMRI and sMRI matrices analysis with high dimensionality and sparse features (Fan and Wang, 2015). Therefore, in the end, we build the DCL method with S-POET.
Table 3

Influence of shrinkage principal orthogonal complement thresholding method (S-POET) on XGBoost with CNP dataset.

Method ACC (%) PRE (%) REC (%) F1 (%)
DCL(PCA)59.00 (5.00)60.00 (9.00)61.00 (11.00)59.00 (7.00)
DCL(S-POET)72.00 (8.00)81.00 (2.00)70.00 (3.00)75.00 (3.00)
Influence of shrinkage principal orthogonal complement thresholding method (S-POET) on XGBoost with CNP dataset.

5.2. Effectiveness of Different Inputs on XGBoosts

To verify the influence of different MRI modalities on model, we separately used fMRI, sMRI, and fMRI+sMRI matrices as inputs to three types of XGBoosts, namely PCAXGBoost, CCA-XGBoost, and DCL-XGBoost. The results are shown in Table 4. The classification results of three XGBoost-based methods, using a single fMRI or sMRI matrix as input, are similar. However, the results of using PCA, CCA, and DCL processed fMRI and sMRI matrices as input to the XGBoost classifier have greatly improved. Especially for the DCL-XGBoost method, the accuracy is improved by almost 14% on the CNP dataset. As the two modalities complement each other, their combination results in higher classification accuracy. Furthermore, the performance of PCA and CCA-processed matrices is not as good as when using DCL-processed matrices as the XGBoost input.
Table 4

Evaluation of different inputs to the different combinations of XGBoost on the CNP dataset.

Method Input ACC (%) PRE (%) REC (%) F1 (%)
PCA+XGBoostfMRI38.00 (7.00)35.00 (4.00)36.16 (3.00)33.00 (2.00)
sMRI37.00 (8.00)35.00 (3.00)36.00 (10.00)32.00 (9.00)
fMRI+sMRI49.00 (11.00)45.00 (8.00)49.00 (8.00)45.00 (7.00)
CCA+XGBoostfMRI36.00 (10.00)34.00 (4.00)36.00 (7.00)35.00 (8.00)
sMRI38.20 (1.00)37.06 (7.00)35.00 (9.00)36.00 (4.00)
fMRI+sMRI49.00 (8.00)46.00 (14.00)49.00 (12.00)44.00 (14.00)
DCL-XGBoostfMRI56.00 (2.00)58.00 (8.00)60.00 (3.00)53.00 (9.00)
sMRI58.00 (6.00)62.00 (8.00)52.00 (11.00)55.00 (6.00)
fMRI+sMRI72.00 (8.00)81.00 (2.00)70.00 (3.00)75.00 (3.00)
Evaluation of different inputs to the different combinations of XGBoost on the CNP dataset.

5.3. Influence of Medication Taken

Some patients in the ND group had taken medication for their mental illness. To analyze the impact of these medications on the patients, we visualized the selected FCs for a group who had taken medication and for a group who had not. There are significant differences between these two groups, as shown in Figure 6. Figure 6A shows NDs without medication. Figure 6B shows NDs with medication. The representations of the FCs over the whole brain are similar, but for the group who had not used medication, there are more edges over the boundary of the brain. This may be due to the fact that some FCs are interrupted by the patient taking certain medication, resulting in remission or deepening of mental illness.
Figure 6

Visualizations of the connectivity of NDs who took medicine or not in the glass brain plot graph on the CNP dataset. (A) Shows NDs without medication. (B) Shows NDs with medication.

Visualizations of the connectivity of NDs who took medicine or not in the glass brain plot graph on the CNP dataset. (A) Shows NDs without medication. (B) Shows NDs with medication.

5.4. Extend Experiments

To verify the performance of DCL on different datasets, we extended experiments on a larger sample size dataset (a subset of ADNI) and a small sample size dataset (a subset of OpenfMRI), respectively. The Alzheimer's Disease Neuroimaging Initiative (ADNI) (Carrillo et al., 2012) is a large dataset including Alzheimer's disease (AD) and mild cognitive impairment (MCI). We selected a subset of the ADNI dataset to evaluate our proposed DCL method. This subset includes 420 samples with sMRI (T1w MRI) and fMRI (rs-fMRI). It consists of 105 subjects with AD, 105 late mild cognitive impairment (LMCI) subjects, 105 early mild cognitive impairment (EMCI) subjects, and 105 HC subjects. The OpenfMRI Poldrack et al. (2013) was designed to serve as a repository for the open sharing and dissemination of task-based fMRI data. As it has grown, it has broadened to encompass other data types as well, including EEG, MEG, rs-fMRI (fMRI), and diffusion MRI (sMRI), which were acquired on both healthy and clinical populations. We selected a small subset of OpenfMRI dataset with the resting state. This subset includes 93 samples with sMRI and fMRI. It consists of 20 HC subjects, 16 BD subjects, 28 SC subjects, and 29 ADHD subjects. In our study, the subsets of ADNI and OpenfMRI are used as the external datasets to evaluate the performance of DCL. The data processing steps followed the manner in Section 3. The experimental design and metrics follow the design in Section 4. The classification results of this subset are show in Table 5. We also used three typical machine learning methods (SVM, RF, and XGBoost) as the baseline. As shown in Tables 5, 6, the accuracy trend of the experimental results is similar to that in Table 2. The DCL-based classifiers achieve much better classification results, which further proves that the DCL method can reduce the complexity of the data by preprocessing the two types of MRI, thereby improving the classification performance of the classifiers. By comparing Tables 2, 5, 6, it can be found that the classification results of the three classifiers on the subset of ADNI achieve the best performance and that on the subset of OpenfMRI achieve the worst performance. In addition to the reasons for the samples themselves, in these three datasets, the subset of ADNI has the largest sample size, which can lead to better training and prediction of the machine learning methods. While the subset of OpenfMRI has the smallest sample size, which limits the training and prediction of the machine learning methods. Furthermore, in the case of a limited sample size on the subset of OpenfMRI, the performance of DCL-based methods got obvious advantages compared to other methods.
Table 5

Mean values in the evaluation of the classification performance on the subset of Alzheimer's Disease Neuroimaging Initiative (ADNI).

Classifier ACC (%) PRE (%) REC (%) F1 (%)
SVM54.00 (8.00)53.00 (5.00)59.00 (4.00)57.00 (7.00)
RF54.00 (4.00)52.00 (10.00)58.00 (4.00)55.00 (8.00)
XGBoost55.00 (10.00)52.00 (15.00)58.00 (7.00)56.00 (9.00)
PCA+SVM60.00 (10.00)62.00 (4.00)61.00 (7.00)60.00 (6.00)
PCA+RF65.00 (7.00)61.00 (8.00)63.00 (10.00)63.00 (3.00)
PCA+XGBoost72.00 (3.00)68.00 (10.00)73.00 (13.00)72.00 (9.00)
CCA+SVM62.00 (10.00)63.00 (11.00)65.00 (8.00)63.00 (7.00)
CCA+RF62.00 (4.00)64.00 (3.00)66.00 (6.00)62.00 (6.00)
CCA+XGBoost75.00 (4.00)73.00 (6.00)76.00 (7.00)75.00 (9.00)
DCL+SVM77.00 (12.00)78.00 (3.00)77.00 (9.00)79.00 (10.00)
DCL+RF78.00 (6.00)79.00 (4.00)78.00 (10.00)80.00 (13.00)
DCL+XGBoost80.00 (9.00)79.00 (9.00)80.00 (5.00)82.00 (7.00)
Table 6

Mean values in the evaluation of the classification performance on the subset of OpenfMRI.

Classifier ACC (%) PRE (%) REC (%) F1 (%)
SVM33.00 (7.00)33.00 (3.00)35.00 (9.00)34.00 (8.00)
RF34.00 (6.00)34.00 (11.00)33.00 (2.00)35.00 (7.00)
XGBoost35.00 (6.00)35.00 (7.00)34.00 (9.00)36.00 (2.00)
PCA+SVM39.00 (7.00)38.00 (10.00)39.00 (10.00)40.00 (4.00)
PCA+RF41.00 (2.00)40.00 (10.00)41.00 (6.00)41.00 (13.00)
PCA+XGBoost43.00 (9.00)44.00 (8.00)45.00 (8.00)42.00 (7.00)
CCA+SVM43.00 (2.00)44.00 (5.00)46.00 (9.00)45.00 (2.00)
CCA+RF45.00 (7.00)46.00 (7.00)47.00 (3.00)46.00 (10.00)
CCA+XGBoost51.00 (14.00)53.00 (7.00)56.00 (7.00)50.00 (5.00)
DCL+SVM55.00 (6.00)57.00 (8.00)56.00 (8.00)57.00 (6.00)
DCL+RF62.00 (10.00)64.00 (3.00)64.00 (7.00)63.00 (9.00)
DCL+XGBoost67.00 (8.00)69.00 (10.00)68.00 (9.00)68.00 (10.00)
Mean values in the evaluation of the classification performance on the subset of Alzheimer's Disease Neuroimaging Initiative (ADNI). Mean values in the evaluation of the classification performance on the subset of OpenfMRI. We compared DCL+XGBoost with several other well-established methods that were designed for the multi neuropsychiatric disorders classification: mMLDA (Janousova et al., 2015), MFMK-SVM (Liu J. et al., 2018), KFCM (Baskar et al., 2019), MK-SVM (Zhuang et al., 2019), and mRMR-SVM (Zhang et al., 2021). These methods used one or both types of MRI data as input of the model for multi neuropsychiatric disorder classification. These methods were trained using different datasets and utilize very different predictive architectures. We either re-implemented them exactly as described by the authors or used the code released by the author. To ensure that the comparative evaluation is fair, we used the same training data and test data for all considered methods on tree datasets. The results are shown in Table 7, it can be found that our proposed method achieves state-of-the-art performance on all three datasets. These methods needed much more feature selection work and parameter settings, for example, mRMR-SVM needs mutual selected information as a measure to solve the trade-off between feature redundancy and relevance (Morgado et al., 2015). It increases the difficulty of model optimization. In addition, the performance of these methods improved as the sample size increased. This means that sample size and model performance are positively correlated.
Table 7

Comparison results with other methods on tree datasets.

Dataset Classifier MRIs ACC (%) PRE (%) REC (%) F1 (%)
CNPmMLDA (Janousova et al., 2015)sMRI65.00 (7.00)65.00 (6.00)67.00 (9.00)64.00 (6.00)
MFMK-SVM (Liu J. et al., 2018)sMRI, DTI67.00 (9.00)64.00 (12.00)65.00 (7.00)68.00 (9.00)
KFCM (Baskar et al., 2019)sMRI70.00 (7.00)71.00 (7.00)70.00 (6.00)69.00 (10.00)
MK-SVM (Zhuang et al., 2019)sMRI, fMRI70.00 (11.00)75.00 (4.00)72.00 (4.00)74.00 (7.00)
mRMR-SVM (Zhang et al., 2021)sMRI, fMRI71.00 (9.00)78.00 (7.00)71.00 (6.00)72.00 (10.00)
DCL+XGBoostsMRI, fMRI72.00 (8.00)81.00 (2.00)70.00 (3.00)75.00 (3.00)
ADNImMLDA (Janousova et al., 2015)sMRI70.00 (8.00)72.00 (8.00)70.00 (10.00)69.00 (9.00)
MFMK-SVM (Liu J. et al., 2018)sMRI, DTI73.00 (9.00)72.00 (10.00)74.00 (6.00)75.00 (7.00)
KFCM (Baskar et al., 2019)sMRI75.00 (9.00)74.00 (1.00)76.00 (4.00)74.00 (8.00)
MK-SVM (Zhuang et al., 2019)sMRI, fMRI75.00 (11.00)74.00 (9.00)75.00 (8.00)75.00 (2.00)
mRMR-SVM (Zhang et al., 2021)sMRI, fMRI79.00 (12.00)82.00 (10.00)79.00 (6.00)81.00 (7.00)
DCL+XGBoostsMRI, fMRI80.00 (9.00)79.00 (9.00)80.00 (5.00)82.00 (7.00)
OpenfMRImMLDA (Janousova et al., 2015)sMRI54.00 (7.00)53.00 (10.00)55.00 (7.00)53.00 (9.00)
MFMK-SVM (Liu J. et al., 2018)sMRI, DTI57.00 (5.00)58.00 (7.00)56.00 (10.00)57.00 (6.00)
KFCM (Baskar et al., 2019)sMRI63.00 (11.00)64.00 (8.00)64.00 (4.00)64.00 (9.00)
MK-SVM (Zhuang et al., 2019)sMRI, fMRI66.00 (8.00)65.00 (12.00)67.00 (8.00)64.00 (10.00)
mRMR-SVM (Zhang et al., 2021)sMRI, fMRI67.00 (5.00)70.00 (7.00)71.00 (6.00)72.00 (10.00)
DCL+XGBoostsMRI, fMRI67.00 (8.00)69.00 (10.00)68.00 (9.00)68.00 (10.00)
Comparison results with other methods on tree datasets.

5.5. Limitations

There are several limitations to this study. (1) We used only MRI data as the input. However, the classification of complex disorders could be made more accurate by including phenotypic information. (2) The amount and uneven quality of the MRI data have a significant influence on the performance of a model and reduce the accuracy of classification.

6. Conclusion

This work demonstrated that the DCL method can effectively combine different information from fMRI and sMRI images. DCL identifies both the common and distinct information between the two input MRI matrices. The decomposition-based CCA is used to analyze the correlation and construct the required matrices. Thus, DCL has better performance in both classification and identifying FCs. The DCL method can be used to detect complex and nonlinear relationships between the two types of MRI images. Our experiments showed that the DCL method can improve classification performance so that it is a suitable method for classifying mental illnesses.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Author Contributions

LL, HZ, and GL contributed to the conception and design of the study. JC organized the database. YW performed the statistical analysis. LL wrote the first draft of the manuscript. Y-PW modified the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Funding

This work was funded partially by the National Natural Science Foundation of China under Grant Nos. 62172444, 61877059, and 62102454, the 111 Project (No. B18059), and the Henan Provincial Key Research and Promotion Projects (No. 222102310085).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
  45 in total

1.  Combined analysis of DTI and fMRI data reveals a joint maturation of white and grey matter in a fronto-parietal network.

Authors:  Pernille J Olesen; Zoltan Nagy; Helena Westerberg; Torkel Klingberg
Journal:  Brain Res Cogn Brain Res       Date:  2003-12

2.  Linked independent component analysis for multimodal data fusion.

Authors:  Adrian R Groves; Christian F Beckmann; Steve M Smith; Mark W Woolrich
Journal:  Neuroimage       Date:  2010-10-14       Impact factor: 6.556

3.  White matter density in patients with schizophrenia, bipolar disorder and their unaffected relatives.

Authors:  Andrew M McIntosh; Dominic E Job; T William J Moorhead; Lesley K Harrison; Stephen M Lawrie; Eve C Johnstone
Journal:  Biol Psychiatry       Date:  2005-08-01       Impact factor: 13.382

4.  Unified framework for development, deployment and robust testing of neuroimaging algorithms.

Authors:  Alark Joshi; Dustin Scheinost; Hirohito Okuda; Dominique Belhachemi; Isabella Murphy; Lawrence H Staib; Xenophon Papademetris
Journal:  Neuroinformatics       Date:  2011-03

Review 5.  Worldwide Alzheimer's disease neuroimaging initiative.

Authors:  Maria C Carrillo; Lisa J Bain; Giovanni B Frisoni; Michael W Weiner
Journal:  Alzheimers Dement       Date:  2012-07       Impact factor: 21.566

6.  Deep Collaborative Learning With Application to the Study of Multimodal Brain Development.

Authors:  Wenxing Hu; Biao Cai; Aiying Zhang; Vince D Calhoun; Yu-Ping Wang
Journal:  IEEE Trans Biomed Eng       Date:  2019-03-13       Impact factor: 4.538

7.  A Parallel Independent Component Analysis Approach to Investigate Genomic Influence on Brain Function.

Authors:  Jingyu Liu; Oguz Demirci; Vince D Calhoun
Journal:  IEEE Signal Process Lett       Date:  2008-01-01       Impact factor: 3.109

8.  Distinct temporal brain dynamics in bipolar disorder and schizophrenia during emotion regulation.

Authors:  Liwen Zhang; Hui Ai; Esther M Opmeer; Jan-Bernard C Marsman; Lisette van der Meer; Henricus G Ruhé; André Aleman; Marie-José van Tol
Journal:  Psychol Med       Date:  2019-02-18       Impact factor: 7.723

9.  Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and functional MRI.

Authors:  Nicolle M Correa; Tom Eichele; Tülay Adali; Yi-Ou Li; Vince D Calhoun
Journal:  Neuroimage       Date:  2010-01-25       Impact factor: 6.556

Review 10.  Deep learning in mental health outcome research: a scoping review.

Authors:  Chang Su; Zhenxing Xu; Jyotishman Pathak; Fei Wang
Journal:  Transl Psychiatry       Date:  2020-04-22       Impact factor: 6.222

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.