| Literature DB >> 33804986 |
Lina Abou-Abbas1, Stefon van Noordt1, James A Desjardins2, Mike Cichonski2, Mayada Elsabbagh1.
Abstract
Event-related potentials (ERPs) activated by faces and gaze processing are found in individuals with autism spectrum disorder (ASD) in the early stages of their development and may serve as a putative biomarker to supplement behavioral diagnosis. We present a novel approach to the classification of visual ERPs collected from 6-month-old infants using intrinsic mode functions (IMFs) derived from empirical mode decomposition (EMD). Selected features were used as inputs to two machine learning methods (support vector machines and k-nearest neighbors (k-NN)) using nested cross validation. Different runs were executed for the modelling and classification of the participants in the control and high-risk (HR) groups and the classification of diagnosis outcome within the high-risk group: HR-ASD and HR-noASD. The highest accuracy in the classification of familial risk was 88.44%, achieved using a support vector machine (SVM). A maximum accuracy of 74.00% for classifying infants at risk who go on to develop ASD vs. those who do not was achieved through k-NN. IMF-based extracted features were highly effective in classifying infants by risk status, but less effective by diagnostic outcome. Advanced signal analysis of ERPs integrated with machine learning may be considered a first step toward the development of an early biomarker for ASD.Entities:
Keywords: autism spectrum disorder; empirical mode decomposition; event-related potential; intrinsic mode functions; k-nearest neighbor; support vector machine
Year: 2021 PMID: 33804986 PMCID: PMC8063929 DOI: 10.3390/brainsci11040409
Source DB: PubMed Journal: Brain Sci ISSN: 2076-3425
The number of participants available for analysis from each group: high-risk infants who did not receive a diagnosis (HR-noASD), high-risk infants who received a diagnosis (HR-ASD).
| Control | HR-noASD | HR-ASD | All | |
|---|---|---|---|---|
| Male | 15 (43%) | 9 (26%) | 11 (31%) | 35 (37%) |
| Female | 29 (49%) | 24 (41%) | 6 (10%) | 59 (63%) |
| Total | 44 (47%) | 33 (35%) | 17 (18%) | 94 (100%) |
Figure 1Event-related potential (ERP) task corresponding to visual stimulus used in the original study: static gaze—direct and averted. Face and noise—gaze shift toward and away- Reprinted from ref. [6] (Supplemental Data Section Figure S2).
Figure 2Block diagram of the system: (a) The BASIS database is split into training and testing using a nested cross-validation technique. (b) Preprocessing and ERP extraction: Raw EEG data were preprocessed using the EEG Integrated Platform (EEG-IP-L) pipeline [24]: all kinds of noises and artifacts are suppressed; then clean data are segmented to get fixed-length epochs. (c) Signal decomposition and feature extractions: empirical mode decomposition (EMD) is applied to decompose signals into intrinsic mode functions (IMFs); then features are extracted per channel and per IMF. A selection step is used to reduce the number of features. (d) Classification block: using nested cross validation, features selected are used as input to two classifiers, support vector machine (SVM) and k-nearest neighbors (k-NN), at the training and testing stages. (e) Class labels of data are predefined to distinguish familial risk, then diagnostic outcome within the high-risk group.
Figure 3Grand average visual ERPs for all task conditions showing components typically observed during infant face processing, including P100, N290, and P400.
For the classification of high-risk and low-risk groups: comparison between k-NN and SVM classifiers of accuracy rates, specificity, and sensitivity. The first six rows represent the results of analysis of the six stimulus conditions separately. Rows 7, 8, and 9 show the classification performance when using features extracted from one IMF at a time. Row 10 represents the results of analyzing the six stimulus conditions together with the IMFs when features were selected by weight correlation and used as input to classifiers: using k-NN, the best performance of 86.22% is obtained with 11 features, and using SVM, the best performance of 88.44% is obtained with top 30 features.
| Classification of HR and Control | ||||||||
|---|---|---|---|---|---|---|---|---|
| Condition | Component | Number of Features after Reduction | SVM Performance | |||||
| Accuracy Rate | Sensitivity | Specificity | Accuracy Rate | Sensitivity | Specificity | |||
| Direct gaze | IMF1–3 | 18 | 76.60% | 72.00% | 82.00% | 77.70% | 82.00% | 73.00% |
| Averted gaze | IMF1–3 | 18 | 70.20% | 74.00% | 66.00% | 74.50% | 76.00% | 73.00% |
| Static direct | IMF1–3 | 18 | 60.60% | 68.00% | 52.00% | 62.80% | 68.00% | 57.00% |
| Static averted | IMF1–3 | 18 | 73.40% | 68.00% | 80.00% | 74.50% | 86.00% | 61.00% |
| Face | IMF1–3 | 18 | 71.30% | 72.00% | 70.00% | 74.50% | 78.00% | 70.00% |
| Noise | IMF1–3 | 18 | 68.10% | 72.00% | 64.00% | 63.80% | 72.00% | 55.00% |
| All | IMF1 | 36 | 63.80% | 56.00% | 73.00% | 68.10% | 68.00% | 68.00% |
| All | IMF2 | 36 | 77.70% | 80.00% | 75.00% | 80.90% | 78.00% | 84.00% |
| All | IMF3 | 36 | 74.50% | 76.00% | 73.00% | 76.60% | 68.00% | 86.00% |
| All | IMF1–3 | 11 |
| 80.00% | 93.18% | - | - | - |
| 30 | - | - | - |
| 84.00% | 93.18% | ||
For the classification of the diagnosis outcome: comparison between k-NN and SVM classifiers of accuracy rates, specificity, and sensitivity. The first six rows represent the results of the analysis of the six stimulus conditions separately. Rows 7, 8, and 9 show the classification performance when using features extracted from one IMF at a time. Row 10 represents the results of analysis of the six stimulus conditions together with the IMFs with 11 selected features by weight correlation. Best accuracies of 74% and 70.48% were obtained using k-NN and SVM respectively.
| Classification of HR-ASD and HR-noASD | ||||||||
|---|---|---|---|---|---|---|---|---|
| Condition | Component | Number of Features after Reduction | SVM Performance | |||||
| Accuracy Rate | Sensitivity | Specificity | Accuracy Rate | Sensitivity | Specificity | |||
| Direct gaze | IMF1–3 | 18 | 64.70% | 53.00% | 76.00% | 64.70% | 53.00% | 76.00% |
| Averted gaze | IMF1–3 | 18 | 47.10% | 29.00% | 65.00% | 50.00% | 41.00% | 59% |
| Static direct | IMF1–3 | 18 | 55.90% | 59.00% | 53.00% | 67.60% | 71.00% | 65.00% |
| Static averted | IMF1–3 | 18 | 55.90% | 65.00% | 47.00% | 50.00% | 47.00% | 53.00% |
| Face | IMF1–3 | 18 | 52.90% | 29.00% | 76.00% | 55.90% | 53.00% | 59.00% |
| Noise | IMF1–3 | 18 | 73.50% | 53.00% | 94.00% | 67.60% | 76.00% | 59.00% |
| All | IMF1 | 36 | 64.70% | 47.00% | 82.00% | 67.60% | 59.00% | 76.00% |
| All | IMF2 | 36 | 47.10% | 47.00% | 47.00% | 58.80% | 53.00% | 65.00% |
| All | IMF3 | 36 | 55.90% | 53.00% | 59.00% | 61.80% | 47.00% | 76.00% |
| All | IMF1–3 | 11 |
| 78.00% | 70.00% |
| 76.47% | 64.71% |
Best results obtained for all experiments of this study for both HR vs. control and HR-ASD vs. HR-noASD classifications: the names of the best classifiers are presented along with their accuracy rate and the number of features used in each experiment.
| HR vs. Control | HR-ASD vs. HR-noASD | |||||
|---|---|---|---|---|---|---|
| Condition | Component | Number of Features | Best Classifier | Accuracy Rate | Best Classifier(s) | Accuracy Rate |
| Direct gaze | IMF1–3 | 18 | SVM | 77.7 | 64.70 | |
| Averted gaze | IMF1–3 | 18 | SVM | 74.5 | SVM | 50.00 |
| Static direct | IMF1–3 | 18 | SVM | 62.8 | SVM | 67.60 |
| Static averted | IMF1–3 | 18 | SVM | 74.50 | 55.90 | |
| Face | IMF1–3 | 18 | SVM | 74.50 | SVM | 55.90 |
| Noise | IMF1–3 | 18 | 68.10 | 73.50 | ||
| All | IMF1 | 36 | SVM | 68.1 | SVM | 67.60 |
| All | IMF2 | 36 | SVM | 80.9 | SVM | 58.80 |
| All | IMF3 | 36 | SVM | 76.6 | SVM | 61.80 |
| All | IMF1–3 | 30/11 | SVM | 88.44 | 74.00 | |
Ranked predictor importance for the k-NN classifier predicting familial risk from six features and six averaged epochs.
| Prediction Importance | |||
|---|---|---|---|
| Rank | Feature | IMF# | Stimulus |
| 1 | Skewness | IMF3 | Averted gaze |
| 2 | Skewness | IMF1 | Face |
| 3 | Std | IMF2 | Static direct |
| 4 | Std | IMF1 | Noise |
| 5 | Std | IMF2 | Face |
| 6 | Skewness | IMF2 | Direct gaze |
| 7 | Shannon entropy | IMF1 | Direct gaze |
| 8 | Std | IMF2 | Static averted |
| 9 | Mean | IMF2 | Noise |
| 10 | Moment | IMF1 | Noise |
Figure 4Plot of error rates of the testing set for different values of k. A lower error rate in the testing stage is obtained with k = 11 for the classification of the diagnosis outcome as well as for the classification of risk status.
Figure 5Plot of error rates of the testing set obtained while employing different kernel functions. Graphs show that for the classification of the risk group and outcome status, a minimal error rate is obtained with linear SVM. A maximal error rate is obtained with coarse Gaussian SVMs when classifying risk status and with the fine Gaussian and coarse Gaussian when classifying the diagnosis outcome.
Predictor importance for the k-NN classifier predicting diagnostic outcome from six features and six averaged epochs.
| Prediction Importance | |||
|---|---|---|---|
| Rank | Feature | IMF# | Stimulus |
| 1 | Skewness | IMF2 | Noise |
| 2 | Skewness | IMF3 | Noise |
| 3 | Energy | IMF2 | Static direct |
| 4 | Shannon entropy | IMF1 | Static direct |
| 5 | Moment | IMF3 | Static direct |
| 6 | Skewness | IMF3 | Static direct |
| 7 | Skewness | IMF1 | Static averted |
| 8 | Skewness | IMF3 | Averted gaze |
| 9 | Skewness | IMF3 | Static averted |
| 10 | Skewness | IMF1 | Static direct |