| Literature DB >> 30686984 |
Amirali Kazeminejad1,2, Roberto C Sotero1,2,3.
Abstract
Automatic algorithms for disease diagnosis are being thoroughly researched for use in clinical settings. They usually rely on pre-identified biomarkers to highlight the existence of certain problems. However, finding such biomarkers for neurodevelopmental disorders such as Autism Spectrum Disorder (ASD) has challenged researchers for many years. With enough data and computational power, machine learning (ML) algorithms can be used to interpret the data and extract the best biomarkers from thousands of candidates. In this study, we used the fMRI data of 816 individuals enrolled in the Autism Brain Imaging Data Exchange (ABIDE) to introduce a new biomarker extraction pipeline for ASD that relies on the use of graph theoretical metrics of fMRI-based functional connectivity to inform a support vector machine (SVM). Furthermore, we split the dataset into 5 age groups to account for the effect of aging on functional connectivity. Our methodology achieved better results than most state-of-the-art investigations on this dataset with the best model for the >30 years age group achieving an accuracy, sensitivity, and specificity of 95, 97, and 95%, respectively. Our results suggest that measures of centrality provide the highest contribution to the classification power of the models.Entities:
Keywords: ABIDE; SVM–support vector machine; brain connectitvity; fMRI; graph theoiy; machine learing
Year: 2019 PMID: 30686984 PMCID: PMC6335365 DOI: 10.3389/fnins.2018.01018
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Distribution of the data.
| CALTECH (146) | 0 | 0 | 3 | 13 | 5 | 20 |
| CMUA (236) | 0 | 0 | 0 | 1 | 0 | 1 |
| KKI (152) | 20 | 18 | 0 | 0 | 0 | 38 |
| LEUVEN1 (246) | 0 | 0 | 8 | 21 | 1 | 29 |
| LEUVEN2 (246) | 0 | 24 | 7 | 0 | 0 | 31 |
| MAXMUNA (116) | 0 | 0 | 1 | 3 | 7 | 11 |
| MAXMUNB (116) | 0 | 0 | 0 | 2 | 5 | 6 |
| MAXMUNC (116) | 0 | 0 | 0 | 13 | 3 | 14 |
| MAXMUND (196) | 2 | 5 | 1 | 0 | 2 | 9 |
| NYU (176) | 35 | 66 | 26 | 34 | 5 | 166 |
| OHSU (78) | 7 | 15 | 1 | 0 | 0 | 23 |
| OLIN (206) | 0 | 9 | 14 | 7 | 0 | 25 |
| PITT (196) | 0 | 16 | 4 | 7 | 5 | 32 |
| SBL (196) | 0 | 0 | 0 | 3 | 6 | 8 |
| SDSU (176) | 1 | 17 | 9 | 0 | 0 | 27 |
| STANFORD (176–236) | 21 | 15 | 0 | 0 | 0 | 36 |
| TRINITY (146) | 0 | 14 | 20 | 9 | 0 | 43 |
| UCLA1 (116) | 3 | 38 | 14 | 0 | 0 | 55 |
| UCLA2 (116) | 1 | 18 | 1 | 0 | 0 | 20 |
| UM1 (296) | 9 | 42 | 32 | 0 | 0 | 82 |
| UM2 (296) | 0 | 13 | 16 | 2 | 0 | 31 |
| USM (236) | 0 | 6 | 21 | 22 | 12 | 61 |
| YALE (196) | 10 | 26 | 12 | 0 | 0 | 48 |
| All Sites | 109 | 342 | 190 | 137 | 51 | 816 |
Number of participants from each site for each age group as well as the overall number of participants in a site that were used for this study. Last row shows the total number of subjects in each age-range. The number of MRI samples per fMRI time-series is annotated in brackets in the first column. The Stanford time-series did not have a consistent number of samples thus the number is presented as a range.
Figure 1Graphical framework of the experiment. (A) Raw fMRI images of subjects; (B) After preprocessing the brain is divided into 116 regions of interest (ROI); (C) By averaging the BOLD activity in each ROI, a time-series is extracted representing brain activity in that region; (D) Using different measures of connectivity, a connectivity matrix is generated from the ROI time-series quantifying the connectivity level between individual ROIs; (E) By treating the ROIs as graph nodes and the connectivity matrix as graph weights the brain network is expressed in graph form; (F) A threshold is applied to keep only the strongest connections; (G) Graph theoretical analysis is applied to the resulting graph from part F to obtain a feature vector for each subject; (H) A wrapper method called sequential feature selection is applied to choose a handful of features that contribute to the highest classification accuracy; (I,J) The resulting feature subset is passed to a linear SVM which trains a model to distinguish between ASD and HC.
Figure 2Comparison of Model Performance; Left Column: Accuracy of the models trained using features extracted from the pipeline specified on the X axis for the age range specified on the far left (in years). Y axis labels specify the chance level for the classification task. Top preforming model is highlighted in dark blue; Middle Column: p-values of the Welch's t-test preformed on the models trained on different pipelines. Statistical significance (p < 0.05) is highlighted in dark blue; Right Panel: FDR corrected p-values based on the Benjamini, Hochberg method (Benjamini and Hochberg, 1995). The corrected p-values were capped at 1 therefore any value over that threshold was set to 1.
Classification performance of the best models.
| 5–10 | Concatenation | 86 | 91 | 79 |
| 10–15 | Concatenation | 69 | 80 | 55 |
| 15–20 | Spearman | 78 | 80 | 76 |
| 20–30 | Mutual information | 80 | 87 | 69 |
| >30 | Covariance | 95 | 91 | 97 |
Figure 3Visualization of the top 10 selected features for each Age range. Two age-ranges show only 9 features. This is because in the 5–10 range PreCG.L was selected two times. In the >30 group the last selected feature was the global Characteristic path length. The full region names along with the abbreviations can be found in Supplementary Table B.
Previous model performance on the ABIDE dataset.
| 69 | 72 | 67 | Linear SVM | Plitt et al., |
| 66 | 60 | 72 | Gaussian SVM | Chen et al., |
| 90.8 | 89 | 93 | Random forest | Chen et al., |
| 67 | NA | NA | SVC | Abraham et al., |
| 70 | 74 | 63 | Deep neural network | Heinsfeld et al., |