| Literature DB >> 34960265 |
Yaser ElNakieb1, Mohamed T Ali1, Ahmed Elnakib1, Ahmed Shalaby1, Ahmed Soliman1, Ali Mahmoud1, Mohammed Ghazal2, Gregory Neal Barnes3, Ayman El-Baz1.
Abstract
Autism spectrum disorder (ASD) is a combination of developmental anomalies that causes social and behavioral impairments, affecting around 2% of US children. Common symptoms include difficulties in communications, interactions, and behavioral disabilities. The onset of symptoms can start in early childhood, yet repeated visits to a pediatric specialist are needed before reaching a diagnosis. Still, this diagnosis is usually subjective, and scores can vary from one specialist to another. Previous literature suggests differences in brain development, environmental, and/or genetic factors play a role in developing autism, yet scientists still do not know exactly the pathology of this disorder. Currently, the gold standard diagnosis of ASD is a set of diagnostic evaluations, such as the Autism Diagnostic Observation Schedule (ADOS) or Autism Diagnostic Interview-Revised (ADI-R) report. These gold standard diagnostic instruments are an intensive, lengthy, and subjective process that involves a set of behavioral and communications tests and clinical history information conducted by a team of qualified clinicians. Emerging advancements in neuroimaging and machine learning techniques can provide a fast and objective alternative to conventional repetitive observational assessments. This paper provides a thorough study of implementing feature engineering tools to find discriminant insights from brain imaging of white matter connectivity and using a machine learning framework for an accurate classification of autistic individuals. This work highlights important findings of impacted brain areas that contribute to an autism diagnosis and presents promising accuracy results. We verified our proposed framework on a large publicly available DTI dataset of 225 subjects from the Autism Brain Imaging Data Exchange-II (ABIDE-II) initiative, achieving a high global balanced accuracy over the 5 sites of up to 99% with 5-fold cross validation. The data used was slightly unbalanced, including 125 autistic subjects and 100 typically developed (TD) ones. The achieved balanced accuracy of the proposed technique is the highest in the literature, which elucidates the importance of feature engineering steps involved in extracting useful knowledge and the promising potentials of adopting neuroimaging for the diagnosis of autism.Entities:
Keywords: ABIDE-II; DTI; autism spectrum disorder (ASD); diagnosis; neuroimaging
Mesh:
Year: 2021 PMID: 34960265 PMCID: PMC8703859 DOI: 10.3390/s21248171
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1(a) Pipeline of the DTI-diagnosis algorithm. (b) Usage of the new derived feature representation and feature selection before classification.
Used hyper-parameter values in a cross-validated grid search. Names between parentheses are parameter names in the ML package.
| Classifier | Hyper-Parameter | Range/ Values |
|---|---|---|
| (1) LSVM | Regularization (C) | 0.1, 1, 5, 10 |
| Loss function (loss) | L1, L2 | |
| Penalization strategy (penalty) | squared_hinge, hinge | |
| (2) LR | Penalization strategy (penalty) | L1, L2 elastic |
| Regularization (C) | 0.1, 1, 5, 10 | |
| Solver algorithm (solver) | newton-cg, lbfgs, liblinear, sag, saga | |
| (3) PassiveAgressive | Regularization (C) | 0.1, 1, 5, 10 |
| N idle iteration before stop (n_iter_no_change) | 1, 5, 10 | |
| (4) Nonlinear-SVM | Regularization (C) | 0.1, 1, 5, 10 |
| Kernel used (kernel) | rbf, poly, sigmoid | |
| Polynomial kernel degree (degree) | 2–6 | |
| Kernel coefficient (gamma) | scale, auto | |
| Independent term in kernel function (coef0) | 0.0, 0.01, 0.1, 1, 5, 10, 50, 100 | |
| (5) GNB | Default parameters | priors = None, var_smoothing = |
| (6) RF | Number of features to consider when looking for the best split (max_features) | auto, sqrt, log2 |
| Number of trees in the forest (n_estimators) | 50, 100, 200, 500, 1000 | |
| Function to measure the quality of a split (criterion) | gini, entropy | |
| Bootstrap samples when building trees (bootstrap) | True, False | |
| Min # of samples required to split an internal node (min_samples) | 1, 2, 5, 10 | |
| (7) XGB | Which booster to use (booster) | gbtree, gblinear, dart |
| Learning rate (learning_rate) | 0.001, 0.01, 0.1, 0.3, 0.5, 1 | |
| Min loss reduction required to make a further partition on a leaf node (gamma) | 0, 0.1, 0.5, 1, 1.5, 2, 5, 20, 50, 100 | |
| Min sum of instance weight needed in a child (min_child_weight) | 0.1,0.5, 1, 5, 10 | |
| Subsample ratio of columns when constructing each tree (colsample_bytree) | 0.6, 0.8, 1.0 | |
| L2 regularization term on weights (lambda) | 0, 0.001, 0.5, 1, 10 | |
| L1 regularization term on weights (alpha) | 0, 0.001, 0.5, 1, 10 | |
| (8) NN | Hidden layer sizes (hidden_layer_sizes) | (150,100,50,), (100,50,25,), (100,) |
| Activation function (activation) | tanh, relu, logistic | |
| Solver used for weight optimization (solver) | lbfgs, sgd, adam | |
| L2 regularization penalty (alpha) | 0.0001,0.001,0.01, 0.05, 0.1, 0.5 | |
| Initial learning rate (learning_rate) | constant, adaptive | |
| Exponential decay rate for estimates of first moment vector in adam (beta_1) | 0, 0.001, 0.01, 0.1, 0.3, 0.5, 0.9 | |
| Exponential decay rate for estimates of second moment vector in adam (beta_2) | 0, 0.001, 0.01, 0.1, 0.3, 0.5, 0.9 |
Figure 2Histogram of types of selected summary statistic features. (a) for the occurances of each feature type, (b) for summary statistics occurrences.
The fixed hyper-parameters found to optimize performance on the set of tested classifiers.
| lSVM | {‘penalty’: ‘l2’, ‘loss’: ‘hinge’, ‘C’: 1} |
| pagg | {‘n_iter_no_change’: 5, ‘C’: 0.1} |
| LR | {‘solver’: ‘newton-cg’, ‘penalty’: ‘none’, ‘C’: 0.1} |
| XGB | {‘reg_lambda’: 0.001, ‘reg_alpha’: 0, ‘min_child_weight’: 10, ‘learning_rate’: 1, |
| GNB | defaults |
| SVC | {‘kernel’: ‘poly’, ‘gamma’: ‘scale’, ‘degree’: 3, ‘coef0’: 5, ‘C’: 0.1} |
| Rf | {n_estimators’: 50, ‘min_samples_split’: 2, ‘min_samples_leaf’: 0.1, |
| nn | {‘solver’: ‘adam’, ‘learning_rate’: ‘adaptive’, ‘hidden_layer_sizes’: (100,), |
Mean accuracy ± standard deviation across the k-folds, with k = 2, 4, 5, 10.
|
| 0.92 ± 0.018 | 0.991 ± 0.015 | 0.999 ± 0.002 | 0.999 ± 0.002 |
| pagg | 0.893 ± 0.018 | 0.951 ± 0.037 | 0.96 ± 0.026 | 0.982 ± 0.03 |
|
| 0.902 ± 0.0 | 0.964 ± 0.018 | 0.978 ± 0.02 | 0.991 ± 0.018 |
| XGB | 0.556 ± 0.011 | 0.604 ± 0.021 | 0.591 ± 0.041 | 0.609 ± 0.119 |
| GNB | 0.644 ± 0.025 | 0.618 ± 0.079 | 0.613 ± 0.08 | 0.684 ± 0.133 |
| RBF-SVM | 0.511 ± 0.038 | 0.529 ± 0.021 | 0.573 ± 0.022 | 0.582 ± 0.076 |
| RF | 0.609 ± 0.02 | 0.591 ± 0.04 | 0.591 ± 0.05 | 0.596 ± 0.054 |
| NN | 0.871 ± 0.004 | 0.969 ± 0.019 | 0.973 ± 0.026 | 0.964 ± 0.034 |
Calculated area under the curve for each classifier across the k-folds, with k = 2, 4, 5, 10.
|
| 0.919 | 0.991 | 0.999 | 0.999 |
| pagg | 0.891 | 0.948 | 0.959 | 0.982 |
|
| 0.9 | 0.962 | 0.977 | 0.991 |
| XGB | 0.543 | 0.593 | 0.583 | 0.606 |
| GNB | 0.644 | 0.618 | 0.608 | 0.683 |
| RBF-SVM | 0.509 | 0.529 | 0.565 | 0.575 |
| RF | 0.571 | 0.549 | 0.548 | 0.552 |
| NN | 0.873 | 0.969 | 0.975 | 0.963 |
Figure 3Sorted coefficient of importance for the top 50 selected features of the area pairs correlations.
Top 12 WM brain area pairs which feature correlations were highly ranked through RFE-CV selection. L or R at the end stands for the left or right hemispheres, respectively.
| Retrolenticular Part of Internal Capsule L | & | Fornix Cres/Stria Terminalis |
| Anterior Limb of Internal Capsule L | & | Uncinate Fasciculus R |
| Body of Corpus Callosum | & | Tapetum L |
| Corticospinal Tract R | & | Posterior Corona Radiata R |
| Posterior Limb of Internal Capsule R | & | Retrolenticular Part Of Internal Capsule R |
| External Capsule R | & | Tapetum L |
| Middle Cerebellar Peduncle | & | Inferior Cerebellar Peduncle R |
| Anterior Limb of Internal Capsule R | & | Tapetum R |
| Middle Cerebellar Peduncle | & | Cingulum Cingulate Gyrus L |
| Anterior Limb of Internal Capsule R | & | Fornix Cres /StriaTerminalis R |
| Inferior Cerebellar Peduncle R | & | Retrolenticular Part Of Internal Capsule R |
| Cingulum Hippocampus L | & | Superior Fronto-occipital Fasciculus R |