Literature DB >> 32623137

Stable biomarker identification for predicting schizophrenia in the human connectome.

Leonardo Gutiérrez-Gómez¹, Jakub Vohryzek², Benjamin Chiêm³, Philipp S Baumann⁴, Philippe Conus⁵, Kim Do Cuenod⁶, Patric Hagmann⁷, Jean-Charles Delvenne⁸.

Abstract

Schizophrenia, as a psychiatric disorder, has recognized brain alterations both at the structural and at the functional magnetic resonance imaging level. The developing field of connectomics has attracted much attention as it allows researchers to take advantage of powerful tools of network analysis in order to study structural and functional connectivity abnormalities in schizophrenia. Many methods have been proposed to identify biomarkers in schizophrenia, focusing mainly on improving the classification performance or performing statistical comparisons between groups. However, the stability of biomarkers selection has been for long overlooked in the connectomics field. In this study, we follow a machine learning approach where the identification of biomarkers is addressed as a feature selection problem for a classification task. We perform a recursive feature elimination and support vector machines (RFE-SVM) approach to identify the most meaningful biomarkers from the structural, functional, and multi-modal connectomes of healthy controls and patients. Furthermore, the stability of the retrieved biomarkers is assessed across different subsamplings of the dataset, allowing us to identify the affected core of the pathology. Considering our technique altogether, it demonstrates a principled way to achieve both accurate and stable biomarkers while highlighting the importance of multi-modal approaches to brain pathology as they tend to reveal complementary information.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
Biomarkers

Year: 2020 PMID： 32623137 PMCID： PMC7334612 DOI： 10.1016/j.nicl.2020.102316

Source DB: PubMed Journal: Neuroimage Clin ISSN： 2213-1582 Impact factor: 4.881

Introduction

Schizophrenia (SZ) is a severe psychiatric disorder characterized by hallucinations and delusions, as well as impairments in memory, attention, executive and other high-order cognitive dysfunctions (van Os et al., 2010). The development of magnetic resonance imaging (MRI) has offered an effective way to examine white and grey matter changes of the brain and has motivated numerous scientists to explore the underlying neuropathology of SZ (Fornito et al., 2009, Bora et al., 2011. Over the past few years, advances in high-field structural and functional neuroimaging have made it possible to map the macroscopic neural wiring system of the human brain (Hagmann et al., 2008, Sporns et al., 2005 with many studies showing reproducible alterations of both structural and functional connectivity in SZ (Fornito et al., 2012a, Griffa et al., 2013, Lynall et al., 2010, Gilson, 2020. These studies with focus on connectivity abnormalities have marked a shift in support of schizophrenia conceptualisation as a dysconnectivity syndrome (Friston et al., 1996) and thus demonstrated the importance of structural and functional connectivity in characterisation of schizophrenia (Fornito et al., 2012b). As such, a thorough characterisation of the structural and functional connectome is of importance to the development of novel biomarkers both for prediction and treatment (Bassett et al., 2008, Kim et al., 2010. From the connectivity standpoint multi-modal analysis has demonstrated aberrant behaviours in fronto-striatal, fronto-thalamic and fronto-temporal coupling in SZ (Cocchi et al., 2014) supported by studies on structural connectivity showing decreased white-matter integrity in frontal, temporal and parietal regions with studies of functional connectivity showing changes in activation in frontal and parietal areas(Van Den Heuvel, 2014). Studies from topological perspective have shown alterations of both structural and functional brain topology in SZ characterized by a less efficient global brain network organization and a limited capacity of functional integration (Griffa et al., 2013). Further research using diffusion spectrum imaging has reported brain areas mainly responsible for the loss of global integration and segregation properties with prefrontal, pericentral, superior, left temporal-occipital and thalamic areas as well as striatum (Griffa et al., 2013, Griffa et al., 2015, Griffa et al., 2019. However, such findings were identified using conventional univariate strategies performing a separate statistical test at each edge of the connectome under scrutiny, thereby requiring excessively stringent corrections for multiple comparisons. On the other hand, multivariate methods are promising, although they require specialized approaches when the number of parameters dominates the observations (Bühlmann and van de Geer, 2011). In this study, we adopt a machine learning approach that aims at discovering the most relevant set of biomarkers for discriminating subjects groups and thus quantitatively describing the group differences, both in terms of classification accuracy and stability of selected features.

Machine learning and automatic biomarker selection

The identification of regions or connections of interest associated with a neural disorder is referred to as biomarker discovery. The identification of such biomarkers in schizophrenia could lead to clinically useful tools for establishing both diagnosis and prognosis. From a machine learning perspective, the choice of biomarkers can be addressed as a feature selection problem, aiming to find a subset of relevant features allowing us to differentiate patients from control subjects accurately. In this work, we perform an automatic feature selection procedure in order to identify biomarkers that are relevant for the diagnosis of schizophrenia from brain connectivity data. In this context, biomarkers, therefore, correspond to structural or functional links between neural Regions of Interest (ROIs). A key challenge in feature selection lies in the fact that diverse feature selection methods might result in different sets of retrieved features. Even when using the same technique, it may produce different results when applied to different splittings of the data. When the dimensionality of the input data is large and exceeds the number of training examples, the complexity grows by several orders of magnitude (Boser et al., 1992, Guyon and Elisseeff, 2003. Therefore, the problem of deciding between two subsets of features has significant uncertainty that needs to be addressed (He and Weichuan, 2010). These issues underline the need to integrate the stability in the feature selection process so that the method can retrieve consistent features across random subsamplings of the dataset. This is especially true in a biomedical context (Abraham et al., 2017) where many authors have focused on improving the classification performance in several mental disorders such as schizophrenia (Lu et al., 2016), Alzheimer (Dai et al., 2012), depression (Chi et al., 2015). Even though the stability of biomarker selection has been studied mainly in genomics and proteomics (He and Weichuan, 2010, Abeel et al., 2010, the stability of feature selection has been overlooked in the connectomics community. Therefore, we propose a general framework for stability analysis of selected features, thereby enabling the robust identification of impaired connections in the connectome of schizophrenic patients. The proposed approach is extendable to other brain disorders as well. In the present work, we use Support Vector Machines (SVM) as a classifier (Boser et al., 1992). This is a supervised machine learning method that aims to classify data points by maximizing the margin between classes in a high-dimensional space. This classifier offers state of the art classification performances on a wide range of applications and is particularly appropriate for high-dimensional problems with few examples. The SVM classifier has been integrated into an embedded feature selection approach (Saeys et al., 2007). The so-called Recursive Feature Elimination with Support Vector Machine (RFE-SVM) technique was first introduced to perform gene selection for cancer diagnosis on microarray data (Guyon et al., 2002). It has been used for mapping and classification of fMRI spatial patterns on voxels (De Martino et al., 2008), and functional connectivity (Pallarés et al., 2018). More recently, it has also been used on human brain networks to identify differences in structural connectivity related to gender (Chi em et al., 2018). This method trains an SVM classifier removing the less important features and iteratively re-estimating the classifier with the remaining features until reaching the desired number of them. Accordingly, we adopt the RFE-SVM approach to automatically select brain connections that lead to the best discrimination between patients and controls, and consequently to highlight brain regions that are responsible for the disease. The aim of the present work is threefold: First, we investigate the effect of structural, functional, and multi-modal (structural + functional) connectome with different resolutions in the classification performance of schizophrenia. Second, we perform a careful feature selection procedure across modalities in order to assess the robustness of the selected features providing the best trade-off between high accuracy and stability. Finally, the analysis of retrieved biomarkers allows us to identify a distributed set of brain regions engaged in the discrimination of patients and control subjects. This paper is organized as follows: Section 2 introduces the properties of the dataset, the procedure for connectomes estimation as well as the general protocol we used in biomarkers identification. In Section 3, we present the results on stability, classification performances, and identification of brain areas indicative of the pathology. Finally, in Section 4, we lead a discussion on our findings and conclusions.

Materials and methods

Subjects

For this study, two age-balanced groups were considered. The cohort consisted of a schizophrenic group of 27 subjects with a mean age of years and a control group of 27 healthy subjects with a mean age of years. The patients in the schizophrenic group were recruited from the Service of General Psychiatry at the Lausanne University Hospital. They met DSM-IV criteria for schizophrenic and schizoaffective disorders (American psychiatry association, 20002). Healthy controls were recruited throught advertisement and assessed with the Diagnostic Interview for Genetic Studies (Preisig et al., 1999). Subjects with major mood, psychotic, or substance-use disorders and having first-degree relative with a psychotic disorder were excluded. Moreover, a history of neurological disease was an exclusion criterion for all subjects. This study was carried out with 24 out of the 27 schizophrenic patients being under medication with chlorpromazine equivalent dose (CPZ) (average medication mg)(Andreasen et al., 2010). We obtained written consent from all the subjects following the institutional guidelines approved by the Ethics Committee of Clinical Research of the Faculty of Biology and Medicine, University of Lausanne, Switzerland. The dataset is public and available in Zenodo platform3 (Vohryzek et al., 2020).

Brain network estimation

Magnetic resonance imaging

All subjects were scanned on the 3 Tesla Siemens Trio scanner with a 32-channel head coil. Three acquisition protocols were part of the MRI session, namely structural, functional and diffusion MRI scans. Structural MRI: magnetization-prepared rapid acquisition gradient echo (MPRAGE) sequence with in-plane resolution of 1 mm, slice thickness of 1.2 mm of total voxel number of 240 × 257 × 160 and TR, TE and TI were 2300, 2.98 and 900 ms respectively. Diffusion MRI: diffusion spectrum imaging (DSI) sequence with 128 diffusion-weighted images of b0 as a reference image and a maximum b-value of 8000 s/mm2. The time of acquisition was 13 min and 27 s. The number of voxels was 96 × 96 × 34 with a resolution of 2.2 × 2.2 × 3.0 mm, and TR and TE were 6100 and 144 ms respectively. The issue of motion- artifacts linked to signal drop-outs was dealt with by visually inspecting the signal, and no subject had to be excluded as a result of this (Yendiki et al., 2014). Functional MRI: a resting-state functional MRI (fMRI) acquired for 8 min (3.3 × 3.3 × 3.3 mm voxel size, TR = 1920 ms, TE = 30 ms, 32 slices, flip angle ). During the fMRI acquisition, subjects were asked not to fall asleep and let their mind wander while fixating their vision to the cross on the screen.

Structural networks

Structural and diffusion MRI data were used to estimate the weighted and undirected structural connectivity matrices in the Connectome Mapping Toolkit (Daducci et al., 2012, Cammoun et al., 2012, Griffa et al., 2015). Firstly, white matter, grey matter, and cerebrospinal fluid segmentation was performed on the structural data and further linearly registered to the b0 volumes of the DSI dataset. Secondly, the first three scales of the Lausanne multi-scale atlas were used to parcellate the grey matter. In detail, the first scale consisted of 68 cortical brain regions and 14 subcortical regions with scale two and three subdividing the first scale into 114 and 219 cortical regions (Cammoun et al., 2012). Further, deterministic streamline tractography, estimating 32 diffusion directions per voxels, was used to reconstruct the structural connectivities from the DSI data (Van Wedeen et al., 2005). The normalized connection density quantified the structural connectivity between brain regions and is defined as follows,where w represents an edge between brain regions i and and are the surface areas of regions i and represents a streamline f in the set of streamlines E and is a length of a given streamline f (Hagmann et al., 2008, Griffa et al., 2015). The normalisation by brain region surfaces accounts for their slightly varying size and the streamline length normalisation accounts for a bias towards longer connections imposed by the tractography algorithm.

Functional networks

Functional connectivity matrices were computed from fMRI BOLD time-series. Firstly, the first four-time points were excluded, yielding the number of time points to be (Jenkinson et al., 2002). Rigid-body registration was applied to individual timeslices for motion-correction. The signal was then linearly detrended and corrected for physiological confounds and further motion artifacts by regressing white-matter, cerebrospinal fluid, and six motion (translations and rotations) signals. Lastly, the signal was spatially smoothed and bandpass-filtered between 0.01 and 0.1 Hz with Hamming windowed sinc FIR filter. Linear registration was performed between the average fMRI and MPRAGE images to obtain the ROIs timeseries (Jenkinson et al., 2012). An average timecourse for each brain region was computed for the three atlas scales. In order to obtain the functional matrices, the absolute value of the Pearson correlation was computed between individual brain regions’ timecourses. All of the above was performed in subject native space with Connectome Mapper Toolkit and personalized Python and Matlab scripts (Daducci et al., 2012, Griffa et al., 2017).

Biomarker evaluation protocol

Our evaluation methodology is based on Abeel et al. 2010 (Abeel et al., 2010) used for biomarker identification in cancer diagnosis on microarray data. In order to assess the robustness of the biomarker selection process, we generate slight variations of the dataset and compare the outcome of selected features across these variations. Therefore, for a stable marker selection algorithm, small variations in the training set should not produce important changes in the retrieved set of features. We perform a nested 5-fold cross-validation (CV) approach. Here, the external CV is used to provide an unbiased estimate of the performance of the method, whereas the inner CV loop is used to fitting, tunning and selecting the optimal parameters of the model. Concretely, we generate 100 subsamplings of the original dataset, shuffling the outer 5-fold CV scheme 20 times. The of the data, i.e., four folds of the outer CV (pink color in Fig. 1), is used as training set within the inner CV, where the best model and features are selected. That is, four folds are used as training set and the held-out fold as validation set to tune the parameters of the model. The model achieving the best performance on the validation set is selected together with the features selected by the RFE-SVM method. The remaining of the outer CV, i.e., the hold-out fold, is used as testing set to provide an unbiased evaluation of the final model and assess the performance of the classifier. Therefore, the overall accuracy is given by the average testing accuracy across subsamplings. See Fig. 1 for a schematic view of the methodology.

Fig. 1

Overview of the proposed method. The figure represents the nested 5-fold CV subsampling of the entire dataset, i.e., top-left gray bar. (Left) The outer CV is used to evaluate the performance of the model. The 80% of the data, i.e., four folds (pink box), is used as training set, where the best model and features are selected. The remaining 20% is used as testing set, to evaluate the performance of the model. (Right) Within the inner CV, four folds are used for training and the hold-out fold as validation set. The best model, features and parameters are selected according with the best CV accuracy. The outer CV is shuffled 20 times, generating 100 subsamplings of the dataset and therefore the same number of selected features ‘fingerprints’. The stability of selected biomarkers and the final accuracy is assessed over all subsampling estimations.

Embedded feature selection (RFE-SVM)

In this work we use a linear SVM classifier (Boser et al., 1992). SVM has proven state of the art performance in computational biology (Ben-Hur et al., 2008) in particular with problems of very high dimension, scaling very well as a function of the number of examples. Given a set of data examples , and a vector representing the group membership of data points, SVM aims to find the hyperplane that has the largest distance to the nearest training data points of any class. The mathematical formulation can be written as an optimization problem in its primal form (Boser et al., 1992):where are slack variables controlling the overlapping between classes cased by noisy examples, and is a normal vector of the hyperplane. A classifier which generalizes well is then found by controlling both the classifier capacity , and the sum of the slacks . (Boser et al., 1992). The solution of the optimization problem in its dual form provides the coefficients of such a hyperplane as the sum of the support vectors , i.e., points lying on the max-margin hyperplane of separation between classes, and the training examples as: These coefficients can be interpreted as a strength or contribution of each feature to the decision of the hyperplane. As a consequence, the square value of each coefficient (or weight) can be used as a score to rank features from the most to the least important for the selection process. Recursive feature elimination SVM (RFE-SVM) (Guyon et al., 2002) is an iterative algorithm integrating a ranking criterion for eliminating features in a backward fashion. Starting with the whole set of features, a linear SVM is estimated using the training set, and their features are ranked according to the weights assigned by the algorithm. Consequently, the least important features according with the mentioned criterion are removed and the remaining ones are used to train a new model, repeating the process until reaching a desired minimal subset of features. The RFE-SVM algorithm has a set of internal parameters influencing the computational complexity and the accuracy of the method. The fraction E of features to remove at each step of RFE (also called step size) is critical for the running time. Dropping one feature at a time allows a finer selection but with a prohibitive computational cost. Setting the step size to 100%, RFE reduces to a single SVM estimation which ranks all the features in one step. Following the work of Abeel et al. (2010), we drop of the least relevant features at each iteration by default. Additional sensitivity analysis is reported in our experiments to check the influence of this parameter, when varies between and 100 percent. Yet, a stopping criterion is needed to finish the iterative process. Thus, in our experiments, we dropped features until reaching a minimum of percentage of selected features (stopping criterion). Another critical parameter is the regularization constant C of the SVM (Eq. (2)). The C parameter controls the misclassification rate of the classifier. A larger value makes the optimization choose a smaller margin hyperplane, losing generalization capabilities. The smaller the C, the larger the margin of separation, yielding more misclassified points. Therefore this parameter influences the classification accuracy of the model. We cross-validate the optimal C by performing a grid-search over the set of values using only the training set, which corresponds to the of folds from the outer CV scheme. Finally, we introduce a random feature selector for each subsampling of the dataset where one subset of features is selected uniformly at random before training the classifier. We used Python 2.7 for implementing our approach, with the RFE-SVM implementation of scikit-learn 0.22.2.4

Measuring the classification performance

Because our dataset is class balanced and the task is a binary classification problem, we adopt the accuracy as the metric to quantify the performance of classification. This metric is defined as the ratio of correct classifications to the number of classifications done as follows:where TP is true positive (number of control subjects classified correctly), TN is the true negative (patients classified correctly), FP is false positives (number of patients classified as control subjects) and FN is false negatives (number of control subjects misclassified).

Assessing the stability of feature selection

We consider the vectorized connectivity matrices of the connectomes as input features for the biomarker selection process. Therefore, for a given connectome, one structural feature refers to the normalized connection density between two linked brain regions, whereas a functional feature refers to the Pearson correlation between two individual brain time courses. The number of input features used for each modality and resolution is shown in Table 1.

Table 1

Number of input features for each modality and resolution.

Resolution	83	129	234
Num of features (SC/FC)	3403	8256	27261
Num of features (MM)	6806	16512	54522

Number of input features for each modality and resolution. We consider a dataset with subjects and N features without considering self-loops, i.e., discarding the diagonal non-zero values of the connectivity matrices. If we denote the considered connectome resolution as , the number of input features is because of the symmetry of the connectivity matrices. Drawing subsamplings and after applying a feature selection procedure (RFE-SVM) in the of each subsampling, we obtain a respective feature signature, i.e. sequence of indices of selected features. Considering two signatures and obtained from different subsamplings i and j, the stability index (Kuncheva, 2007) between and is defined as:where and , the size of the signature, is the number of selected features. This quantity measures the consistency between pairs of features. For s and N fixed, KI increases until reaching maximum at 1 when the two subsets are identical. The minimum value is bounded from below by when the subsets are perfectly disjoint and the signature size of . The overall stability index for a sequence of signatures is defined as the average of all pairwise stability indices on k subsamplings: Given that is bounded between and 1, the greater this value the better the agreement between the selected subsets of features. In particular, a negative value for indicates that the potential agreement between the selected biomarkers is mostly due to chance. In the sequel we will refer to the overall stability as the Kuncheva index (Kuncheva, 2007).

Results

Connectomes classification and features stability

First, we investigated the effect of different brain connectivity modalities and different scales in the discrimination of patients and normal controls. For each case, we control the step size and the percentage of selected features of the RFE-SVM algorithm, assessing their impact on the classification accuracy and the stability of the selected features. Fig. 2, Fig. 4, Fig. 5 show the average classification accuracy after performing RFE-SVM as well as the stability of selected biomarkers across modalities and scales. It can be seen that across scales, the functional connectivity matrices (Fig. 4) achieve better accuracies than the structural features (Fig. 2), but conversely, structural matrices are more stable than functional. However, when combining the two modalities, i.e., by concatenating features of both modalities and letting the algorithm choose a blend of structural and functional features we achieve the best performances (Fig. 5) in terms of both accuracy and stability. Note that in all figures, the curves are highly overlapping each other showing that the number of dropped features in the RFE-SVM algorithm, i.e., the step size, is not a critical parameter in this dataset. It is to be noted that in the multi-modal case, the percentage of finally selected features is divided by two since we combine twice as many features as in the case of structural or functional connectivity alone. Table 2 shows the actual number of selected biomarkers across modalities and resolutions.

Fig. 2

Fig. 4

Each column represents the mean accuracy of the outer CV folds ± standard deviation (top) and stability (bottom) for a given scale of the functional connectome. The step size corresponds to the percentage of features dropped at each step of the RFE-SVM algorithm. Red curve represents a random selection of features.

Fig. 5

Each column represents the mean accuracy of the outer CV folds ± standard deviation (top) and stability (bottom) for a given scale of the multimodal connectome. The step size corresponds to the percentage of features dropped at each step of the RFE-SVM algorithm. Red curve represents a random selection of features.

Table 2

Number of selected features for SC, FC and MM connectomes.

SC/FC	MM	83×83	129×129	234×234
0.5 %	0.25 %	17	41	136
1 %	0.5 %	34	82	272
2 %	1 %	68	165	545
5 %	2.5 %	170	412	1363
10 %	5 %	340	825	2726
25 %	12.5 %	850	2064	6815
50 %	25 %	1701	4128	13630

Each column represents the mean accuracy of the outer CV folds ± standard deviation (top) and stability (bottom) for a given scale of the structural connectome. The step size corresponds to the percentage of features dropped at each step of the RFE-SVM algorithm. Red curve represents a random selection of features. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Best accuracy versus stability for all considered modalities and resolutions. Each column represents the mean accuracy of the outer CV folds ± standard deviation (top) and stability (bottom) for a given scale of the functional connectome. The step size corresponds to the percentage of features dropped at each step of the RFE-SVM algorithm. Red curve represents a random selection of features. Each column represents the mean accuracy of the outer CV folds ± standard deviation (top) and stability (bottom) for a given scale of the multimodal connectome. The step size corresponds to the percentage of features dropped at each step of the RFE-SVM algorithm. Red curve represents a random selection of features. Number of selected features for SC, FC and MM connectomes. We observe in all cases that the stability increases with the percentage of selected features, which is expected since the overlapping between signatures in Eq. (6) is more likely when more features are considered. To further investigate the effect of modalities and resolutions on both classification accuracy and stability, we select the best scores, i.e., best accuracy with the associated stability, from Fig. 2, Fig. 4, Fig. 5, and plot them in Fig. 3. As can be seen in Fig. 3, the 234 × 234 structural resolution has the lowest accuracy and stability, which can be explained by the fact that as the ROIs are smaller, it makes them more susceptible to noise introduced from differences in quality of alignment of the parcellation to the native scan across the subjects. Also, the fact of averaging them over less voxels make the signal more variable between subjects, affecting the stability of the method.

Fig. 3

Best accuracy versus stability for all considered modalities and resolutions.

Selected markers in the multimodal connectome. Colormap shows the number of times each feature is selected across subsamplings. In order to quantify the respective involvement of features extracted from both modalities, we provide in Fig. (7) a summarized view of the share of structural features in the total number of multimodal features. We observe that our method always selects more functional features than structural ones. This is consistent with the better accuracies obtained with functional features, and with the fact that the RFE-SVM ultimately optimizes such accuracy scores. However, the share of selected structural features is always non-zero (although it decreases with the connectome resolution), thereby indicating their respective importance in the retrieved features sets. It is interesting to observe that structural and functional features alone achieve the best stability and accuracy respectively, whereas the multimodal combination achieves the best trade-off between both metrics, for all resolutions.

Fig. 7

Percentage structural and functional features selected from the multi-modal connectome across resolutions.

Identification of brain regions in schizophrenia diagnosis

We proceed with the identification of brain areas involved in the classification of patients and controls. For simplicity in the identification of brain regions and comparison with other authors, we analyze the results for the multi-modal connectome, but we reported the results with other resolutions in the appendix section. Selected features in the graph space correspond to links representing either connection densities in the structural matrices or Pearson correlations in the functional connectome, see Fig. 6. Furthermore, inspecting the frequency of each selected feature across subsamplings informs us about the overall relevance of the edge in the classification. In other words, the frequency of an edge is indicative of the importance of the associated ROIs in the classification task.

Fig. 6

Selected markers in the multimodal connectome. Colormap shows the number of times each feature is selected across subsamplings.

Given a brain connectivity matrix at a resolution r we define W as the matrix where the element encodes the frequency at which the edge is selected as relevant across subsamplings (see Fig. 6). Thus, the degree of relevance of an ROI i reads: Fig. 8, Fig. 9 represent the degree of relevance of brain regions for the multi-modal connectome (structural and functional respectively), sorted in decreasing order. We defined the affected core (a-core) to be composed of brain areas with a degree of relevance higher than the overall average. As can be seen in Fig. 8, our affected core overlaps the a-core definition of Griffa et al. 2015 (Griffa et al., 2015) (shown as blue bars). It can be notice that while caudalmiddlefrontal, inferiortemporal, postcentral, precentral, parsopercularis and parsorbilatis regions are included in the Griffa’s a-core, our method does not consider them assigning them a lower degree of relevance. A similar behavior can be seen at the right-hemisphere where our method discards Griffa’s a-core regions such as caudalmiddlefrontal, medialorbitofrontal, parstriangularis, postcentral, rostralmiddlefrontal and supramarginal.

Fig. 8

Fig. 9

Degree of relevance for ROIs in the functional mode of the multimodal connectome. The horizontal line is the average degree of relevance. ROIs above the mean belong to our a-core.

Degree of relevance for ROIs in the structural mode of the multimodal connectome. Blue bars correspond to Griffa’s a-core overlapping. The horizontal line is the average degree of relevance. ROIs above the mean belong to our a-core. Degree of relevance for ROIs in the functional mode of the multimodal connectome. The horizontal line is the average degree of relevance. ROIs above the mean belong to our a-core. We plot the brain surface in Fig. 10, Fig. 11, normalizing both SC and FC by the sum of all their connections and plotted the regions above the mean distribution.

Fig. 10

Brain surface representation of brain areas with higher relevance degree than the average for the resolution of the multi-modal structural mode.

Fig. 11

Brain surface representation of brain areas with higher relevance degree than the average for the resolution of the multi-modal functional mode.

Brain surface representation of brain areas with higher relevance degree than the average for the resolution of the multi-modal structural mode. Brain surface representation of brain areas with higher relevance degree than the average for the resolution of the multi-modal functional mode.

Discussion

This paper has investigated the effect of different connectivity modes of the human connectome in the selection of robust biomarkers for the identification of subjects with schizophrenia. We perform an automatic feature selection process on the edge space aiming to retrieve a compact subset of meaningful biomarkers performing accurately on the identification of schizophrenia versus healthy controls. Besides, we analyze the robustness of the retrieved features concerning the sample variation, based on the fact that stable biomarkers will not change dramatically in different subsamplings of the underlying dataset (Abeel et al., 2010). We found out that combining structural and functional connectivity matrices as a multi-modal representation of connectomes provides the best trade-off between high accurate and stable biomarkers in schizophrenia classification. As such, we identified the affected core of the pathology from the retrieved features and carried out a confirmatory analysis of the structural alterations of this core showing decrease both in gFA and iADC of edges within the a-core with no change for edges outside of a-core in patients compared to healthy subjects. It is well known that anatomical connections constrain functional communications (Deco et al., 2013). Structural and functional connections have been shown to correlate (Hagmann et al., 2008) and white-matter pathways have accurately predicted functional connectivity (Honey et al., 2009). Thus, both modalities can yield a very unique information about the pathology. The fact that the best accuracy and stability was achieved by combining SC and FC features reflects that the two modalities are not well coupled (Skudlarski et al., 2010). Furthermore, it highlights the importance of understanding both approaches in the explanation of complex neural disorders in terms of structure and function, at least insofar the prediction of them is concerned. Based on the frequency from which the RFE-SVM algorithm selects relevant edges, we can map edges to the node space by looking at the strengths of connection densities or correlation between brain regions. The degree of relevance for nodes allows us to identify the affected core as the brain regions that are highly active during the training phase and therefore, are selected more often for relevant edges. Our findings slightly overlap results from Griffa et al. (2015) and provide further evidence of brain regions involved in the pathology. Furthermore, they offer a unique perspective in defining the affected core both from the influence of SC and FC and hence defining an SC-FC a-core which might give a richer description of the regions that are affected in SCHZ (Griffa et al., 2015, Lynall et al., 2010). Finally, it is worth mentioning that this work was performed on a small sample dataset which is not suited to identify regions with great confidence. Furthermore, a possible confound might be introduced due to the medication in the 24 out of 27 patients, in comparison to the control group. Here we present a proof of concept having in mind the importance of using it in larger dataset and especially in subjects with At Risk Mental State (ARMS) who have not yet developed a full blown illness.

Conclusion

In summary, we investigated the classification performance between schizophrenic patients and healthy subjects in the structural, functional and multimodal connectomes of varying atlas resolutions. Moreover, we focused on robustness of the selected features and thus enriched the outcome of the classification to the trade-off between high accuracy and stability. We showed that a combination of structural and functional connections achieves the best performance possibly due to the complementary nature of the information content both modalities hold. Lastly, we used the biologically relevant features to define the affected core of the pathology and confirmed the structural alterations associated with the affected core edges between schizophrenic patients and healthy controls. By providing an important addition to the classification of human pathologies in form of stability analysis, we hope this study to be a step in the right direction for the diagnosis and treatment in the clinical practice.

CRediT authorship contribution statement

Leonardo Gutiérrez-Gómez: Conceptualization, Methodology, Software, Writing - review & editing. Jakub Vohryzek: Conceptualization, Methodology, Investigation, Writing - review & editing. Benjamin Chiêm: Conceptualization, Methodology, Software, Writing - original draft. Philipp S. Baumann: Validation, Writing - original draft. Philippe Conus: Investigation. Kim Do Cuenod: Investigation. Patric Hagmann: Supervision, Validation. Jean-Charles Delvenne: Conceptualization, Supervision.

37 in total

Review 1. Stable feature selection for biomarker discovery.

Authors: Zengyou He; Weichuan Yu
Journal: Comput Biol Chem Date: 2010-08-10 Impact factor: 2.877

2. The environment and schizophrenia.

Authors: Jim van Os; Gunter Kenis; Bart P F Rutten
Journal: Nature Date: 2010-11-11 Impact factor: 49.962

3. Resting-state functional connectivity emerges from structurally and dynamically shaped slow linear fluctuations.

Authors: Gustavo Deco; Adrián Ponce-Alvarez; Dante Mantini; Gian Luca Romani; Patric Hagmann; Maurizio Corbetta
Journal: J Neurosci Date: 2013-07-03 Impact factor: 6.167

Review 4. Structural connectomics in brain diseases.

Authors: Alessandra Griffa; Philipp S Baumann; Jean-Philippe Thiran; Patric Hagmann
Journal: Neuroimage Date: 2013-04-25 Impact factor: 6.556

5. Antipsychotic dose equivalents and dose-years: a standardized method for comparing exposure to different drugs.

Authors: Nancy C Andreasen; Marcus Pressler; Peg Nopoulos; Del Miller; Beng-Choon Ho
Journal: Biol Psychiatry Date: 2009-11-07 Impact factor: 13.382

6. Mapping grey matter reductions in schizophrenia: an anatomical likelihood estimation analysis of voxel-based morphometry studies.

Authors: A Fornito; M Yücel; J Patti; S J Wood; C Pantelis
Journal: Schizophr Res Date: 2009-01-20 Impact factor: 4.939

7. Mapping the structural core of human cerebral cortex.

Authors: Patric Hagmann; Leila Cammoun; Xavier Gigandet; Reto Meuli; Christopher J Honey; Van J Wedeen; Olaf Sporns
Journal: PLoS Biol Date: 2008-07-01 Impact factor: 8.029

8. Discriminative analysis of schizophrenia using support vector machine and recursive feature elimination on structural MRI images.

Authors: Xiaobing Lu; Yongzhe Yang; Fengchun Wu; Minjian Gao; Yong Xu; Yue Zhang; Yongcheng Yao; Xin Du; Chengwei Li; Lei Wu; Xiaomei Zhong; Yanling Zhou; Ni Fan; Yingjun Zheng; Dongsheng Xiong; Hongjun Peng; Javier Escudero; Biao Huang; Xiaobo Li; Yuping Ning; Kai Wu
Journal: Medicine (Baltimore) Date: 2016-07 Impact factor: 1.889

9. Hierarchical organization of human cortical networks in health and schizophrenia.

Authors: Danielle S Bassett; Edward Bullmore; Beth A Verchinski; Venkata S Mattay; Daniel R Weinberger; Andreas Meyer-Lindenberg
Journal: J Neurosci Date: 2008-09-10 Impact factor: 6.167

10. Model-based whole-brain effective connectivity to study distributed cognition in health and disease.

Authors: Matthieu Gilson; Gorka Zamora-López; Vicente Pallarés; Mohit H Adhikari; Mario Senden; Adrià Tauste Campo; Dante Mantini; Maurizio Corbetta; Gustavo Deco; Andrea Insabato
Journal: Netw Neurosci Date: 2020-04-01

7 in total

1. Single-Cell Sequencing Analysis and Multiple Machine Learning Methods Identified G0S2 and HPSE as Novel Biomarkers for Abdominal Aortic Aneurysm.

Authors: Tao Xiong; Xiao-Shuo Lv; Gu-Jie Wu; Yao-Xing Guo; Chang Liu; Fang-Xia Hou; Jun-Kui Wang; Yi-Fan Fu; Fu-Qiang Liu
Journal: Front Immunol Date: 2022-06-13 Impact factor: 8.786

2. Feature and decision-level fusion for schizophrenia detection based on resting-state fMRI data.

Authors: Ali H Algumaei; Rami F Algunaid; Muhammad A Rushdi; Inas A Yassine
Journal: PLoS One Date: 2022-05-24 Impact factor: 3.752

3. Improving Functional Connectome Fingerprinting with Degree-Normalization.

Authors: Benjamin Chiêm; Kausar Abbas; Enrico Amico; Duy Anh Duong-Tran; Frédéric Crevecoeur; Joaquín Goñi
Journal: Brain Connect Date: 2021-08-23

4. Evaluating the performance of machine learning models for automatic diagnosis of patients with schizophrenia based on a single site dataset of 440 participants.

Authors: Lung-Hao Lee; Chang-Hao Chen; Wan-Chen Chang; Po-Lei Lee; Kuo-Kai Shyu; Mu-Hong Chen; Ju-Wei Hsu; Ya-Mei Bai; Tung-Ping Su; Pei-Chi Tu
Journal: Eur Psychiatry Date: 2021-12-23 Impact factor: 5.361