Literature DB >> 28487830

Automated and ERP-Based Diagnosis of Attention-Deficit Hyperactivity Disorder in Children.

Hossein R Jahanshahloo¹, Mousa Shamsi¹, Elham Ghasemi², Abolfazl Kouhi¹.

Abstract

Event-related potential (ERP) is one of the most informative and dynamic methods of monitoring cognitive processes, which is widely used in clinical research to deal with a variety of psychiatric and neurological disorders such as attention-deficit/hyperactivity disorder (ADHD). In this study, there were 60 participants including 30 patients with ADHD and 30 subjects as a control group. Their ERP signals were recorded by three electrodes in two modalities. After a preprocessing step, several features such as band power, fractal dimension, autoregressive (AR) model coefficients and wavelet coefficients were extracted from recorded signals. The aim of this study is to achieve a high classification rate. The results show that the fractal dimension-wavelet combination features provided a good discriminative capability; it should be noted that this improvement was achieved by combining all sets of features and applying a feature selection algorithm, which resulted in a maximum accuracy rate of 88.77 and 95.39% in support vector machine (SVM) and v_SVM classification algorithms using a 10-fold cross-validation approach, respectively. ERP has been widely used for clinical diagnosis and cognitive processing deficits in children with ADHD. To increase the accuracy of the diagnostic process of ADHD, ERP signals were recorded to extract some specific ERP features related to this disease for classifying the two groups. The results show that the Fra-wave characterization produced the best average accuracy with an efficiency of 99.43% for v_SVM classifier, compared with 97.65% efficiency for the wavelet features and the other features.

Entities: Chemical Disease Gene Species

Keywords: Algorithms; attention; attention deficit disorder with hyperactivity child; cognition; cognition disorders; control groups; electrodes; evoked potentials; fractals; humans; nervous system diseases; support vector machine

Year: 2017 PMID： 28487830 PMCID： PMC5394803

Source DB: PubMed Journal: J Med Signals Sens ISSN： 2228-7477

Introduction

Attention-deficit/hyperactivity disorder (ADHD) is one of the most common psychiatric disorders in school-aged children, which is characterized by a persistent pattern of impaired attention, impulsive behavior, and excessive motor hyperactivity. These manifestations, related to a child’s conduct, directly affect academic activities, familiar dynamics, and performance in the social environment, which can negatively influence the personality development of a child.[1] The clinical diagnosis of this disease requires several medical specialists as well as high-cost expenditure, which makes it difficult to access this benefit among the low-income population. Event-related potentials (ERPs) are defined as changes in the ongoing electroencephalogram (EEG) because of a sensory stimulus.[2] Because these potentials are physiologically correlated with neurocognitive functions, they have been widely used for clinical diagnoses, brain computer interface, and especially, in investigations of perceptual and cognitive processing deficits in children with ADHD. The most popularly assessed ERP features for interpretation of cognitive processes are the areas and the peaks of ERP components, defined by the mean and peak-to-peak voltages, respectively.[3] The analyses of these parameters are usually performed in the time domain, whereby amplitudes and latencies of prominent peaks in the averaged potentials are measured and correlated to information processing mechanisms.[4] Although the quantification of ERP components by areas and peaks is the standard procedure in fundamental ERP research, the conventional approach has two drawbacks. First, ERPs are time-varying signals, which reflect the sum of underlying neural events during stimulus processing, that operate in different time scales ranging from milliseconds to seconds. Various procedures such as ERP subtraction or statistical methods have been employed to separate functionally meaningful events that partly or completely overlap in time. However, the reliable identification of these components in ERP waveforms still remains a problem. Second, analyses in the frequency domain have revealed that EEG/ERP components in different bands (delta, theta, alpha, beta, and gamma) are functionally related to information processing. However, the Fourier transform (FT) of ERP lacks the time localization information of transient neural events. Therefore, using efficient algorithms for signal analysis in time–frequency domain is very important to extract and relate distinct functional components. These limitations, as well as the problems related with time-invariant methods, can be solved by using the wavelet formalism. The wavelet transform (WT) is a time frequency representation that has an optimal resolution both in the time and frequency domains and has been successfully applied to the study of EEG–ERP signals.[5] Although ERP feature extraction from the time frequency domain based on the discrete WT (DWT) has been growing increasingly popular, this approach gives better results for pathology detection purposes, particularly in ADHD identification. Fractal dimension is an important parameter that has significant applications in various fields including signal processing. Signal analysis is a high-level signal processing technique for identifying the signal features such as roughness, smoothness, and solidity. Likewise, because of lack of a suitable test or biomarker for diagnosis of ADHD in children, their identification requires a complete evaluation of children’s behavior both at home and at school. As a result, their diagnosis would be somewhat less accurate and time consuming. By taking into account the impacts of ADHD on the future lives of the children suffering from it, there is an absolute need for a biological marker that is capable enough to diagnose ADHD in children at the beginning by observing its symptoms in childhood. To introduce working biomarkers, some significant research has been conducted during the past few decades about EEG signals and ERP for diagnosis of ADHD in children. The main goal of this study is to present a supporting diagnostic tool that uses signal processing for feature selection and machine learning algorithms for diagnosis. Particularly, for feature selection, information theoretic method is proposed, which is based on fractal dimension Higuchi algorithm and wavelet coefficients measure. In this method, a maximal discrepancy criterion is introduced for selecting distinct (most distinguishing) features of two groups as well as a semi-supervised formulation for efficiently updating the training set. Furthermore, support vector machine (SVM) classifier is trained and tested for identification of robust marker of ERP signal for accurate diagnosis of ADHD group. The results show that the proposed approach provides higher accuracy in the diagnostic process of ADHD in comparison to the few currently available methods. The result of the paper is organized as follows: “Materials and Methods” section gives a description of the method and data acquisition condition. “Feature Extraction” section presents the details of the features extraction containing band power, fractal dimension Higuchi algorithm, autoregressive (AR) coefficients, and wavelet coefficients. A brief description of SVM and v_SVM classifiers is presented in “Classifiers” section. Performance analysis of SVM and v_SVM classifiers using a 10-fold cross-validation approach on the extracted features is reported in “Results” section. Conclusions from this study are summarized in “Conclusion and Discussion” section.

Materials and Methods

Subjects

The participants in the research were children belonging to educational institutions of the metropolitan area of Manizales, aged between 4 and 15 years, and whose medical diagnosis had been determined by neurophysiological evaluation based on the clinical criteria of Diagnostic and Statistical Manual of Mental Disorder (DSM-IV).[6] The children were tested in a room under the same lighting and noise conditions, and with uniformity in the following considerations: nonabnormalities in physical examination, normal visual and hearing ability, and intelligence quotient (IQ) greater than 80. Patients with pharmacologic management (methylphenidate, 20 mg) did not take their medicine for up to 24 h before the test. In addition, the comorbidities in the children (oppositional defiant disorder, specific phobia, and learning problems) were taken into account. In this study, 60 children participated, which included 30 children with ADHD and 30 children as a control group. Their ERP signals were recorded by three electrodes located in the midline of the head (Pz, Cz, and Fz) according to 10–20 international system in two modalities, that is, auditory and visual, at sampling rate of 640 samples per second.

Data acquisition

The examination protocol was conducted according to the criteria of Oddball paradigm in the auditory and visual modalities. The first test involved the emission of 80 dB tone lasting 50 ms, with a frequency of 1000 Hz for frequent stimulus and 3000 Hz for infrequent stimulus, which were applied randomly at every 1.5 s. In the visual modality of the test, the children were asked to watch a monitor placed 1 m away, which showed an image with a consistent pattern (a checkerboard of 16 squares), which was the frequent stimulus. The rare stimulus was the presentation of a target in the center of the screen with the same common pattern in the background; the child must press a button each time the unusual stimulus appeared. The experiment consisted of 200 stimuli, of which 80% were frequent and 20% were infrequent stimuli.

Preprocessing

Each of the classifiers was trained with the corresponding dataset. Because of the large set of features obtained from ERP measurements (values from all three electrodes in all bands), it was a good idea to reduce this set and feed the classifier with the most appropriate subset of these features. In addition, the complexity and dimensionality were reduced. For each SVM classifier [Figure 1], the forward selection scheme was applied to choose the best attributes or features that corresponded to the selected classifier. Forward selection scheme was an algorithm that iteratively selected subset of features from a set of features, such that the chosen subset of features was most relevant to the discrimination of the data. By applying this technique to each classifier that corresponded to a different condition, the features were obtained, which represented the most relevant discrimination of the data in the corresponding condition.

Figure 1

Block diagram of the overall procedure for features extraction and classification using raw ERP signals

Feature extraction

During signal acquisition, the recordings were filtered from 0.3 to 100 Hz. This was necessary to improve the signal/noise ratio and to adapt them to later stages. ERP data was first band pass filtered (0.3–100 Hz). Then bad channels were identified and replaced, and an average reference was computed. The signals were analyzed for artifact rejection and power analysis. Blinks and other artifacts were automatically identified with a threshold of ±100 μv and excluded from analyses. The clean data was then submitted to a fast Fourier transform (FFT) with a 1-s Hanning window and 50% overlap. Next, independent component analysis (ICA) was applied to eliminate dependence among the input signals. Later, features were estimated from dataset to form the initial space of characteristics. Finally, training of the classifier and assessment of the classification performance were done with the selected features [Figure 1]. In this section, the extracted features from ERP signals are explained. A group of these features consisted of fractal-based features,[789] AR coefficients,[10] band power,[11] and wavelet coefficients.[12]

Higuchi fractal dimension

If we consider a time series as x(1), x(2), … , x(n), k new subsequences, can be constructed as: for m = 1, 2, … , k , where m indicates the index of first sample and k indicates time interval or delay between points in the k embedding sequences. Average length for each time series is defined as: where N is the length of data sequence x. Total average length, L(k), for scale k is computed as follows: where L(k) is proportional to k-, and D is the fractal dimension that is defined using Higuchi’s method.[8] In fact, Higuchi dimension is tightly related with Takens’ theory,[13] in which a signal is considered as the output of a complex system. To characterize this system, signal samples were assigned to state variables activity. Any subsequent activity in Higuchi method showed the activity of the state variable through time. Average length of the subsequent activities was interpreted as the average length of the signals produced by the state variables.

Autoregressive coefficients

The AR models are used to describe a time series. This model estimates each sample as a weighted sum of previous samples by a recursive linear filter. The integer parameter p is called the order of the AR model. The AR model of a signal x(t) in discrete time t is defined as follows: where a1, a2, … , a are the coefficients of recursive filter, p is order of the model, and ε(t) is uncorrelated noise. The Burg method[10] is applied to fit a Pth order AR model to the input signal, x, by minimizing (least squares) the forward and backward prediction errors while constraining the AR parameters to satisfy the Levinson–Durbin recursion. These coefficients can model the whole variation of a signal but they are sensitive to additive noise.

Band power

ERP can reflect the brain activity in different frequencies.[11] The raw ERP is generally described in terms of five different frequency bands: gamma (>30 Hz), beta (13–30 Hz), alpha (8–12 Hz), theta (4–8 Hz), and delta (<4 Hz). To determine the band power in the mentioned frequency intervals, raw ERP signal was filtered through band pass filters (elliptic order six) to represent ERP content in the five successive frequency bands. In the output of each band pass filter, each sample was squared and averaged over several consecutive samples providing an estimation of band power in a window with the length of 1 s.

Wavelet

The DWT is an efficient tool in signal representation.[12] Each mother wavelet can be compressed to provide a higher resolution in a dyadic form and also shifted in each resolution to model different locations of a signal through the time domain. A wavelet function in the scale a and time shift b is expressed as: ψ((t) = 2ψ(2(t − b)) (4) where Ψ(t) is a wavelet function. Compressed and stretched derivations of wavelet function model are the high and low frequency components of a signal. Hence, by projecting an original signal into different resolutions, details of the signal can be obtained.[12] By using DWT, signals are decomposed into two low and high bands at each level that are called approximations and details. Approximations represent low frequency components of the signal while details represent high frequency components. In this study, signals were decomposed into subspaces using the mother wavelet “db4.” Then energies corresponding to approximations and details were considered as signal features in each time frame.[14]

Feature selection

Feature selection is a kind of commonly used dimensionality reduction method, as opposed to feature extraction such as principle component analysis (PCA), in which new low-dimensional embedding is produced using the original features. PCA is one of the most popular appearance-based methods used mainly for dimensionality reduction in compression and recognition problems. It can be used to reduce the dimensionality of the feature vectors and eliminates all the statistical covariance in the transformed feature vectors.

Classifiers

SVM

After selecting the relevant features for each dataset, the datasets were fed to SVM classifiers and the models were built. For generalization of each classifier model, 10-fold cross-validations were used. Cross-validation is a technique that clarifies how the chosen classifier model will generalize to an independent dataset that is different from the one that has trained the model. It partitions the data into n complementary subsets and uses n−1 of the subsets for training the model, and the remaining set for testing the model. This procedure is repeated n times so that each of the subsets is used exactly once as a testing set. The results are then averaged over the rounds to get the final estimation.

ν-SVM

SVM is a method for supervised learning that is used for classification or regression analysis. That means, given an input data sample, in which each data point is marked as belonging to one of the two possible groups or classes, SVM builds a model that is then used for classifying new data points to one of the two classes.[15] For the independent and identically distributed training data, (x, y) ∈R × {+1, −1}, i = 1, 2, … , l (5) and the original optimization problem in ν-SVM classification algorithm is as follows: y[(x·w) + b]≥ρ − ξ (7) ρ≥0, ξi≥0 (8) Applying Lagrangian optimization theory and incorporating kernels for dot products leaves the following dual quadratic optimization problem: The parameters in Eqs. (5)–(12) are defined as follows: ν is an upper bound on the fraction of margin errors and lower bound on the fraction of support vectors (SVs) and 0 ≤ v ≤ 1. To understand the role of ρ, note that for ξ = 0, the constraint of Eq. (7) simply states that the two classes are separated by the margin 2ρ/w and other parameters have the same meanings as those in the original method.[16] To compute b and ρ, two sets S± are considered, of identical size s > 0, containing SVs 0 < a < 1/l and y = ±, respectively. Then, as a result of karush Kuhn tucker (KKT) conditions, constraint Eq. (7) becomes an equality with ξ = 0. Hence, in terms of kernels, the resulting decision function can be shown to take the following form:

Results

The recording acquisition per patient was made by using three electrodes (Fz, Cz, and Pz), with two different types of stimulus, auditory and visual; so, a total of six recordings were acquired per patient. To avoid correlations between electrodes, ICA was applied to generate three uncorrelated sources (for each electrode), on each one of the modalities. Tables 1–4 show classification rate of power band, fractal dimension, AR coefficient, wavelet coefficient features, and fractal–wavelet (Fra-wave) in delta, theta, alpha, and beta bands frequency with SVM and v_SVM classifiers, respectively.

Table 1

Classification rate in delta band frequency

Table 4

Classification rate in beta frequency

Classification rate in delta band frequency Classification rate in theta band frequency Classification rate in alpha band frequency Classification rate in beta frequency This study presents an implementation of ADHD automatic detection system, focusing on the discriminative capability analysis of four different sets of the features and Fra-wave to get the best classification performance. Table 1 presents the performance results of SVM and v_SVM classifiers considering different features in delta band frequency. It is presented that the Fra-wave characterization produced the best mean accuracy of 92.93% utilizing SVM and 96.07% using v_SVM, compared with other features. As it is demonstrated in Table 1, v_SVM accuracy is better than SVM accuracy considering the whole set of features. The theta band features are classified using SVM and v-SVM methods and the results are presented in Table 2. On the basis of this table, the best average accuracy is obtained for Fra-wave characterization with an efficiency of 92.62% for v_SVM classifier, whereas it is 98.77% for the wavelet feature and other features.

Table 2

Classification rate in theta band frequency

Generally, on the basis of the information provided in Tables 1–4, the auditory case has produced a better accuracy compared with visual modality.

Conclusion and Discussion

Many studies have tried to identify behavioral disorders among children by using different diagnostic methods and tools. Some researchers have used multiple sources to gather the comprehensive information.[1718] Some of these sources were questionnaires, interviews with families, parents and child, and clinical observation. However, they had considered much information simultaneously, which took a long time and also caused diagnostic errors. In some studies, teacher’s ratings were focused,[19] so the risk of misdiagnosis was high. In some studies, information obtained from both parents and teacher’s ratings were used for diagnosing behavioral disorders;[20] however, using this amount of information for diagnosis takes a long time, with high cost and high risk of errors. ERP and EEG were applied for accurate diagnosis. ERP has been widely used for clinical diagnosis and cognitive processing deficits in children with ADHD. To increase the accuracy of diagnosis processes for ADHD, ERP signals were recorded to extract some specific ERP features related to this disease and the participants were classified in two groups. In this way, after preprocessing of signals, several features were extracted from the recorded signals, some of which reflected the complexity and roughness of the signals (fractal-based features), and the others modeled the signals in the time (AR coefficients), spectral (band power), and time–frequency domains (wavelet coefficients). Next, these extracted features were applied to the classifiers such as SVM and v_SVM. Results of this experiment showed the supremacy of v_SVM classifier. This study presents an implementation of an ADHD automatic detection system, focused on the discriminative capability analysis of four different sets of features to get the best classification performance. The example, Table 2 presents the performances results, showing that the Fra-wave characterization produced the best average accuracy with an efficiency of 98.77% for v_SVM classifier, compared with the 96.68% efficiency for the wavelet features and the other features. However, the combination of them produced a better accuracy for v_SVM classifier. The fact can also be seen in Tables 1–4 that auditory modality produced a better accuracy compared with visual modality. The results can be compared with some previous research. As the first study, Jiaojiao and colleagues used channel 128, K-nearest neighbors (K-NN), and SVM to classify ADHD. The results showed that the best classification accuracy of 83.33% was achieved by K-NN classifier.[21] Mueller et al. used two groups of age-matched adults (75 ADHD and 75 controls) and performed a visual two stimulus go/no-go task. ERP responses were decomposed into independent components, and a selected set of independent ERP component features were used for SVM classification. Using a 10-fold cross-validation approach, classification accuracy was 91%. Predictive power of SVM classifier was verified on the basis of the independent ADHD sample (17 ADHD patients), resulting in a classification accuracy of 94%.[22] On the other hand, in our research, we have used visual and auditory stimulus to achieve better classification accuracy. Zeynab et al. have obtained 215 wavelet coefficients and 20 recurrence quantification analysis (RQA) features that could discriminate 31 ADHD and 37 normal groups significantly. They have used K-fold (5-fold) cross-validation in which all data have been randomly divided into five groups. Classification results have demonstrated that combination of two kinds, Morlet and Mexican hat kernel classification of features, represents the signal in a better way. Results show that there is approximately no difference between accuracy of Morlet and Mexican hat kernel classification approaches, and the accuracy of both methods was 98%.[23] Our study can be enhanced by considering the following two main factors: using more signal recording channels and increasing the number of participants in the test. This study is the first attempt to classify ADHD patients by means of SVM and independent ERP signal with combined feature Fra-wave. The results demonstrate the efficiency of the utilized approach. As a future study, independent component decomposition and feature extraction procedures can be considered. Moreover, this promising approach can easily be applied to other clinical problems.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

Table 3

Classification rate in alpha band frequency

8 in total

1. BCI Competition 2003--Data sets Ib and IIb: feature extraction from event-related brain potentials with the continuous wavelet transform and the t-value scalogram.

Authors: Vladimir Bostanov
Journal: IEEE Trans Biomed Eng Date: 2004-06 Impact factor: 4.538

2. A comparison of the behavioral and emotional disorders of primary school-going orphans and non-orphans in Uganda.

Authors: Seggane Musisi; Eugene Kinyanda; Noeline Nakasujja; Janet Nakigudde
Journal: Afr Health Sci Date: 2007-12 Impact factor: 0.927

3. Simultaneous EEG and EDA measures in adolescent attention deficit hyperactivity disorder.

Authors: I Lazzaro; E Gordon; W Li; C L Lim; M Plahn; S Whitmont; S Clarke; R J Barry; A Dosen; R Meares
Journal: Int J Psychophysiol Date: 1999-11 Impact factor: 2.997

Review 4. Assessing children with ADHD in primary care settings.

Authors: Joshua M Langberg; Tanya E Froehlich; Richard E A Loren; Jessica E Martin; Jeffery N Epstein
Journal: Expert Rev Neurother Date: 2008-04 Impact factor: 4.618

5. Machine learning approach for classification of ADHD adults.

Authors: Aleksandar Tenev; Silvana Markovska-Simoska; Ljupco Kocarev; Jordan Pop-Jordanov; Andreas Müller; Gian Candrian
Journal: Int J Psychophysiol Date: 2013-01-27 Impact factor: 2.997

6. P300 subcomponents in obsessive-compulsive disorder.

Authors: Paraskevi Mavrogiorgou; Georg Juckel; Thomas Frodl; Jürgen Gallinat; Walter Hauke; Michael Zaudig; Gerhard Dammann; Hans-Jürgen Möller; Ulrich Hegerl
Journal: J Psychiatr Res Date: 2002 Nov-Dec Impact factor: 4.791

7. Discriminating between ADHD adults and controls using independent ERP components and a support vector machine: a validation study.

Authors: Gian Candrian; Venke Arntsberg Grane; Juri D Kropotov; Valery A Ponomarev; Gian-Marco Baschera; Andreas Mueller
Journal: Nonlinear Biomed Phys Date: 2011-07-19

8. Individual analysis of EEG frequency and band power in mild Alzheimer's disease.

Authors: Davide V Moretti; Claudio Babiloni; Giuliano Binetti; Emanuele Cassetta; Gloria Dal Forno; Florinda Ferreric; Raffaele Ferri; Bartolo Lanuzza; Carlo Miniussi; Flavio Nobili; Guido Rodriguez; Serenella Salinari; Paolo M Rossini
Journal: Clin Neurophysiol Date: 2004-02 Impact factor: 3.708

8 in total

1 in total

1. Deep Learning Convolutional Neural Networks Discriminate Adult ADHD From Healthy Individuals on the Basis of Event-Related Spectral EEG.

Authors: Laura Dubreuil-Vall; Giulio Ruffini; Joan A Camprodon
Journal: Front Neurosci Date: 2020-04-09 Impact factor: 4.677

1 in total