Literature DB >> 32595441

Seizure Classification From EEG Signals Using an Online Selective Transfer TSK Fuzzy Classifier With Joint Distribution Adaption and Manifold Regularization.

Yuanpeng Zhang1,2, Ziyuan Zhou1, Heming Bai2, Wei Liu2, Li Wang1,2.   

Abstract

To recognize abnormal electroencephalogram (EEG) signals for epileptics, in this study, we proposed an online selective transfer TSK fuzzy classifier underlying joint distribution adaption and manifold regularization. Compared with most of the existing transfer classifiers, our classifier has its own characteristics: (1) the labeled EEG epochs from the source domain cannot accurately represent the primary EEG epochs in the target domain. Our classifier can make use of very few calibration data in the target domain to induce the target predictive function. (2) A joint distribution adaption is used to minimize the marginal distribution distance and the conditional distribution distance between the source domain and the target domain. (3) Clustering techniques are used to select source domains so that the computational complexity of our classifier is reduced. We construct six transfer scenarios based on the original EEG signals provided by the Bonn University to verify the performance of our classifier and introduce four baselines and a transfer support vector machine (SVM) for benchmarking studies. Experimental results indicate that our classifier wins the best performance and is not very sensitive to its parameters.
Copyright © 2020 Zhang, Zhou, Bai, Liu and Wang.

Entities:  

Keywords:  TSK fuzzy classifier; brain-computer interface; joint distribution adaption; manifold regularization; seizure classification; transfer learning

Year:  2020        PMID: 32595441      PMCID: PMC7300255          DOI: 10.3389/fnins.2020.00496

Source DB:  PubMed          Journal:  Front Neurosci        ISSN: 1662-453X            Impact factor:   4.677


Introduction

The maturity of the brain–computer interface (BCI) technology has provided an important channel for the human to use artificial intelligence (AI) to explore the cognitive activities of the brain. For example, many AI methods have been proposed for an intelligent diagnosis of epilepsy instead of neurological physicians through electroencephalogram (EEG) signals (Ghosh-Dastidar et al., 2008; Van Hese et al., 2009; Wang et al., 2016). In this study, we also focus on the intelligent diagnosis of epilepsy through EEG signals. The classic diagnostic procedure for epilepsy by using intelligent models is illustrated in Figure 1. We observe that, for an emerging task, a large number of labeled EEG epochs are required to train an intelligent model. Therefore, it needs to consume a lot of effort to manually label EEG epochs. Because the responses to EEG signals of different patients in the same cognitive activity show a certain degree of similarity, we expect to leverage abundant labeled EEG epochs, which are available in a related source domain for training an accurate intelligent model to be reused in the target domain. To this end, transfer learning is often used, which has been proven to be promising for epilepsy EEG signal recognition. For example, Yang et al. (2014) proposed a transfer model LMPROJ for epilepsy EEG signal recognition underlying the support vector machine (SVM) framework. In LMPROJ, the marginal probability distribution distance measured by the maximal mean discrepancy (MMD) between the source domain and the target domain is used to minimize the distribution difference. Jiang et al. (2017c) improved LMPROJ and generated a model A-TL-SSL-TSK for epilepsy EEG signal recognition underlying the TSK fuzzy system framework. Comparing with LMPROJ, A-TL-SSL-TSK not only used the marginal probability distribution consensus as a transfer principle but also introduced semisupervised learning (cluster assumption) for regularization. Additionally, in our previous work (Jiang et al., 2020), we proposed an online multiview and transfer model O-MV-T-TSK-FS for EEG-based drivers' drowsiness estimation. It minimized not only the marginal distribution differences but also the conditional distribution differences between the source domain and the target domain. But it did not derive any information from unlabeled data. More references about transfer learning for epilepsy EEG signal recognition can be found in Jiang et al. (2019) and Parvez and Paul (2016).
Figure 1

The classic diagnostic procedure for epilepsy.

The classic diagnostic procedure for epilepsy. Although existing intelligent models, for example, LMPROJ and A-TL-SSL-TSK, underlying the transfer learning framework are effective for epilepsy EEG signal recognition, there still exist some issues that should be further addressed. To tolerate the distribution difference between the source domain and the target domain, it is not enough to only minimize the marginal distribution difference between the two domains. Most of the existing models use only one source domain for knowledge transfer. That is to say, all available labeled data in the source domain are leveraged for model training. However, some labeled data may cause negative transfer. Therefore, in this study, by overall considering the above two issues, we propose a new intelligent TSK fuzzy classifier (online selective transfer TSK fuzzy classifier with joint distribution adaption and manifold regularization, OS-JDA-MR-T-TSK-FC) for epilepsy EEG signal recognition. First, it further explores the marginal probability distribution adaption between the source domain and the target domain from two aspects. One is that it additionally introduces conditional probability distribution adaption to further minimize the distribution difference. The second is that it preserves manifold consistency underlying the marginal probability distribution. Second, it can selectively leverage knowledge from multiple source domains. The following sections are organized as follows: in Data and Methods, we give the EEG data and our proposed method. In Results, we report the experimental results. Discussions about experimental results are presented in Discussions, and the whole conclusions are summarized in the last section.

Data and Methods

Data

In this study, we download very commonly used epilepsy EEG data to verify our proposed intelligence model. The data from the University of Bonn is open to the public for scientific research. Table 1 gives the data archive and collection conditions. Additionally, Figure 2 illustrates the amplitudes during the collection procedure of one volunteer in each group. The original EEG data cannot be directly used for model training (Jiang et al., 2017b; Tian et al., 2019). We should employ feature extraction methods to extract robust features before model training.
Table 1

Epilepsy EEG data archive and collection condition.

VolunteersGroups#GroupCollection conditions
HealthA100Volunteers with eyes open
B100Volunteers with eyes closed
EpilepticC100From hippocampal formation during seizure free intervals
D100From within epileptogenic zone during seizure free intervals
E100During seizure activity

Sampling rate: 173.6 Hz; duration: 23.6 s.

Figure 2

The amplitude of one volunteer in each group during the collection procedure. From top to bottom corresponds to (A–E), respectively.

Epilepsy EEG data archive and collection condition. Sampling rate: 173.6 Hz; duration: 23.6 s. The amplitude of one volunteer in each group during the collection procedure. From top to bottom corresponds to (A–E), respectively.

Feature Extraction

Three feature extraction algorithms, that is, wavelet packet decomposition (WPD) (Li, 2011), short-time Fourier transform (STFT) (Pei et al., 1999), and kernel principal component analysis (KPCA) (Li et al., 2005), are employed to extract three kinds of features from the original epilepsy EEG signals. Wavelet Packet Decomposition Wavelet packet decomposition is used to extract time-frequency features from epilepsy EEG signals. More specifically, the epilepsy EEG signals are disassembled into six different frequency bands with the Daubechies 4 wavelet coefficients. Each band is considered as one feature. Figure 3 illustrates the six features of group A.
Figure 3

Features extracted by wavelet packet decomposition.

Short-Time Fourier Transform Features extracted by wavelet packet decomposition. Short-time Fourier transform is used to extract frequency-domain features from epilepsy EEG signals. More specifically, the epilepsy EEG signals are disassembled into different local stationary signal segments, and then the Fourier transform is used to extract a group of spectra of the local segments, which are with evident time-varying characteristics at different times. Finally, six frequency bands are extracted from each group of spectra. Figure 4 illustrates the six features of group A.
Figure 4

Features extracted by short time Fourier transform.

Kernel Principal Component Analysis Features extracted by short time Fourier transform. Kernel principal component analysis is used to extract time-domain features from epilepsy EEG signals. More specifically, the Gaussian function is chosen as the kernel to map the original features nonlinearly. Then six kinds of features are selected from the top six PC eigenvectors. Figure 5 illustrates the six features of group A.
Figure 5

Features extracted by kernel principal component analysis.

Features extracted by kernel principal component analysis.

Online Transfer Scenario Construction

We construct six online transfer scenarios from the EEG data after feature extraction (Table 2). Each scenario consists of five source domains as multiple source domains and one target domain. Specifically, two healthy groups (A, B) and three epileptic groups (C, D, E) are combined to generate six different pairs of combinations, that is, AC, AD, AE, BC, BD, and BE. Five pairs are alternatively selected from the six combinations as source domains, and the rest one is taken as the target domain such that each pair has the opportunity to become the target domain.
Table 2

Six online transfer scenarios.

ScenariosSource domainsTarget domainNo. of subject-specific objects
SC-1BD, BC, AE, AD, ACBE20
SC-2BE, BC, AE, AD, ACBD20
SC-3BE, BD, AE, AD, ACBC20
SC-4BE, BD, BC, AD, ACAE20
SC-5BE, BD, BC, AE, ACAD20
SC-6BE, BD, BC, AE, ADAC20
Six online transfer scenarios. In general, calibration in BCIs can be divided into two types, that is, offline calibration and online calibration (Jiang et al., 2020). Offline calibration means that we have obtained a pool of unlabeled EEG epochs. Some of unlabeled EEG epochs were labeled by experts to train a classifier. The unseen epochs then were classified by the trained classifier. Online calibration means that the training EEG epochs were obtained on-the-fly. That is to say, the classifier was trained online. Both calibration methods have their own advantages and disadvantages. For example, in offline calibration, unlabeled EEG epochs can be used to assist labeled ones to achieve classifier training, for example, semisupervised learning (Mallapragada et al., 2009; Zhang et al., 2013; Dornaika and El Traboulsi, 2016). Additionally, if necessary, we can easily obtain the label of any EEG epochs at any time. In online calibration, we not only have no unlabeled EEG epochs to be used for classifier training but also have little control on which epochs to see next. However, online calibration is more attractive because it is more in line with the needs of practical application scenarios. Therefore, in this study, we only consider online calibration for seizure classification. To simulate online calibration in the aforementioned six transfer scenarios, we first generate M = 20 subject-specific objects from the target domain. The online calibration flowchart is shown in Figure 6. We repeat all rounds 10 times to obtain statistically meaningful results, where each time has a random starting position m0.
Figure 6

Online calibration flowchart.

Online calibration flowchart.

Methods

In this section, we will elaborate the method we proposed for seizure classification. We first mathematically state the transfer problem, and then we give the online transfer learning framework and hence the online transfer TSK fuzzy classifier (OS-JDA-MR-T-TSK-FC). Lastly, we give the detailed algorithm steps of OS-JDA-MR-T-TSK-FC including how to select source domains.

Problem Statement

A domain Ψ = {X, P(x)} in the transfer learning or domain adaption scenario consists of a d-dimensional feature space ∈ R and a marginal distribution P(x), and a task Γ = {Y, P(y|x)} in the similar scenario consists of a one-dimensional label space Y and a conditional distribution P(y|x), where y ∈ Y. Suppose that Ψ and Ψ are two domains derived from Ψ, they are deemed to be different when X ≠ X and/or P(x) ≠ P(x). Homoplastically, two tasks Γ and Γ derived from Γ are different when Y ≠ Y and/or P(y|x) ≠ P(y|x). Based on the above definitions, the target of OS-JDA-MR-T-TSK-FC is to train a predictive function on a source domain Ψ having N-labeled EEG epochs and a target domain Ψ having M-labeled EEG subject-specific epochs to predict the class label of a unseen epoch in the target domain with a low expected error under the hypotheses that Ψ = Ψ, Y = Y, P(x) ≠ P(x), and P(y|x) ≠ P(y|x). Online Transfer Learning Framework

OS-JDA-MR-T-TSK-FC

Because the classic one-order TSK fuzzy classifier (1-TSK-FC) (Deng et al., 2015; Jiang Y. et al., 2017a; Zhang J. et al., 2018; Zhang et al., 2019) is considered as the basic component of our online transfer learning framework, we first give some details about 1-TSK-FC before introducing our framework. The kth fuzzy rule involved in 1-TSK-FC is formulated as the following if–then form: where k = 1, 2, …, K, K represents the total number of fuzzy rules 1-TSK-FC uses. represents the ith object contains d features. in (1) represents a fuzzy set subscribed by x for the kth fuzzy rule, and ∧ represents a fuzzy conjunction operator. Each fuzzy rule is premised on the feature space and maps the fuzzy sets in the feature space into a varying singleton represented by . After the steps of inference and defuzzification, the predictive function y(•) for an unseen object x is formulated as the following form: in which the μ(x) is expressed as where can be expressed as the following form when the Gaussian kernel function is employed: where and are two parameters representing the kernel center and kernel width, respectively. Therefore, training of 1-TSK-FC means to find optimal , in the if parts, and in the then parts. Referring to the literature (Zhang et al., 2019), we know that parameters in the if parts can be trained by clustering techniques. For instance, and can be trained by fuzzy c-means (FCM) (Gu et al., 2017) as where μ is the fuzzy membership degree of x belonging to the kth cluster. h is a regularized parameter that can be always set to 0.5 according to the suggestions in Jiang Y. et al. (2017a). When and in the if parts are determined by FCM or other similar techniques, for an object x in the training set, let then we can rewrite the predictive function y(·) in (2) as the following form: Referring to Zhou et al. (2017) and Zhang Y. et al. (2018), we formulate an objective function as follows to solve p: where the first is a generalization term, the second is a square error term, and η > 0 is balance parameter used to control the tolerance of errors and the complexity of 1-TSK-FC. By setting the partial derivative of the objective function w.r.t p to zero, that is, ∂J1−(p)/∂p = 0, we can compute p analytically as In this study, 1-TSK-FC is taken as the basic learning component to support the transfer learning framework. Many previous works (Yang et al., 2014; Jiang et al., 2017c) explored the marginal distribution adaption between the source domain and the target domain for transfer learning. In our framework, we introduce conditional distribution adaption to further minimize the distribution difference. Additionally, we impose manifold consistency on the marginal distribution. Therefore, the transfer learning framework can be formulated as where ω in the first term is the overall weights of the specific-subject objects. Generally, ω should be larger than 1 so that more emphasis is given to objects in Ψ than Ψ. Therefore, we set ω to ω = max(2, σ · N/M). λ1 and λ2 are regularization parameters. The first term contains two parts: the first is to measure the loss on Ψ, and the second is to measure the loss in Ψ. The second one is the joint distribution adaption term, and the third one is the manifold regularization term. Below, we will explain how to embody them formally. Objective function of OS-JDA-MR-T-TSK-FC Under the framework shown in (11), we specify each term to get the objective function of our online transfer TSK fuzzy classifier OS-JDA-MR-T-TSK-FC.

Loss Function

The squared loss is taken as the loss function to measure the sum of squared training errors on both Ψ and Ψ; hence, the first term in (11) can be formulated as where is the predictive function of 1-TSK-FC. Suppose we have a diagonal matrix Θ in which each element is defined as By submitting (13) to (12), then (12) can be rewritten as where in which each element is derived from by using (7.c).

Joint distribution adaptation

As all we know that even EEG epoch features in Ψ and Ψ are extracted in the same way, the joint distributions (marginal and conditional distributions) between Ψ and Ψ are generally different. In order to meet practical requirements, we assume that P(x) ≠ P(x) and P(y|x) ≠ P(y|x). Therefore, a joint distribution adaptation should be designed to minimize the distribution similarity (distance) D(J, J) between Ψ and Ψ. First, the projected MMD (Gangeh et al., 2016; Jia et al., 2018; Lin et al., 2018) is employed to the marginal distribution similarity D(P, P) between Ψ and Ψ. As a result, D(P, P) can be expressed as where Φ is the MMD matrix, which can be defined as Second, we suppose that Ψ belongs to Ψ and its objects are selected by {x|x ∈ Ψ ∧ y = c}, and Ψ belongs to Ψ and its objects are selected by {x|x ∈ Ψ ∧ y = c}, where c means the cth class in one domain. Also, for the source domain, N is used to denote the number of objects in the cth class, and for the specific-subject objects in the target domain, M is used to denote the number of objects in the cth class. Hence, D(Q, Q) can be expressed as where and Δ is an MMD matrix defined as follows: According to the probability theory, the joint adaption D(J, J) = D(P, P)+D(Q, Q) so that the joint distribution adaptation can be formulated as

Manifold regularization

In the manifold assumption (Lin and Zha, 2008; Chen and Wang, 2011; Geng et al., 2012), it is assumed that if two objects x and x are very close in the intrinsic geometry in terms of P(x) and P(x), then the corresponding Q(y|x) and Q(y|x) are considered as being similar. That is to say, for the objects in Ψ and the calibration objects in Ψ, if they are in a manifold, it is expected that their output (conditional probability distribution) differences should be as small as possible. Therefore, the manifold regularization can be formulated as follows under geodesic smoothness, Where, W = [w]( is the graph affinity matrix in which each element is defined as Where, ξν(x) represents a set of v-nearest neighbors of object x. L = [l]( is the corresponding normalized graph Laplacian matrix of W, which can be computed by L = I − D−1/2WD−1/2, where D is the degree matrix in which each diagonal element d is computed by . By embedding the manifold regularization into the transfer learning framework, the marginal probability distributions of objects in the target domain and the source domain are fully utilized to guarantee the consistency between the predictive structure of the decision function f and the intrinsic manifold data structure. By substituting (14), (19), and (20) into our transfer learning framework shown in (12), we can obtain a transfer learning model, that is, OS-JDA-MR-T-TSK-FC as We can deduce a closed-form solution of p for the objective function in (26) by setting its derivative w.r.t p to zero as

Algorithm of OS-JDA-MR-T-TSK-FC

Different from most of the existing transfer models, OS-JDA-MR-T-TSK-FC can leverage knowledge from multiple source domains. However, as we know, too many source domains will improve computational complexity. Additionally, some source domains having significant differences with the target domain may bring some negative transfer knowledge. Therefore, according to Wu et al. (2017), we adopt a distance-based schema to select relative source domains. We use v to denote the mean vector of each class in the zth source domain, where z = 1,2,…, Z. Similarly, v is used to denote the mean vector of each class in the target domain. The Euclidean distance between the zth source domain and the target domain can be computed as With (24), we can get a distance set {d(1, t), d(2, t), …, d(Z, t)} that contains Z domain distances. The distance set then is partitioned by k-means to k groups (in this study, k is set to 2), and the source domains are selected from the cluster who has the smallest center. As a whole, the training of OS-JDA-MR-T-TSK-FC contains three parts: the first one is source domain selection, the second one is model training on a source domain combining with the target domain, and the last is classifier combination. Algorithm 1 shows the detailed training steps of OS-JDA-MR-T-TSK-FC. OS-JDA-MR-T-TSK-FC OS-JDA-MR-T-TSK-FC can also be used for multiclassification tasks. According to Zhou et al. (2017), we can convert y from the space R to the space R by that y = 1 if y(x) = j, and y = 0 otherwise, where i = 1, 2, …, N + M, j = 1, 2, …, C, and C represents the number of classes. Thus, the label space becomes , and is also converted from R to R(.

Results

Experiment setups and comparison results will be reported in this section.

Setups

For fair, we introduce three baselines and one transfer learning algorithm for comparison study. The three baselines all use 1-TSK-FC for training. But their training sets are different. Baseline 1 (BL1). Its training set consists of the five source domains directly connected, and its testing set is the target domain. Therefore, BL1 is considered as a calibration-independent classifier, which does not use the subject-specific data in the target domain for training. Baseline 2 (BL2). It uses only subject-specific calibration EEG data in the target domain for training. Its testing set is the unlabeled data in the target domain. Therefore, BL2 is considered as a source domain-independent classifier, which does not consider the EEG data in the source domains at all. Baseline 3 (BL3). BL3 is trained on five training sets, receptively. Each set consisted of a source domain and the subject-specific data in the target domain. The five trained models are finally combined by a weight schema that is also used in Algorithm 1. Its testing set is the unlabeled data in the target domain Transfer support vector machine (TSVM) (Chapelle et al., 2008). It trains five TSVM classifiers by combining unlabeled EEG data in the target domain for semisupervised learning. The five trained models are finally combined by a weight schema that is also used in Algorithm 1. ARRLS (Long et al., 2014). It trains five ARRLS classifiers by combining unlabeled EEG data in the target domain for supervised learning. The five trained models are finally combined by a weight schema that is also used in Algorithm 1.

Experimental Results

In this section, we report the experimental results from several aspects, that is, classification performance, interpretability, and robustness. Classification Performance Table 3 shows the average classification performance of the six scenarios in the KPCA feature space, PWD feature space, and STFT feature space, respectively. Table 4 shows the classification performance on KPCA features. Table 5 shows the classification performance on PWD features, and Table 6 shows the classification performance on STFT features. The best results are marked in bold.
Table 3

Average classification performance of the six scenarios in three feature spaces.

M048121620
KPCABL10.79620.79620.79620.79620.79620.7962
BL20.68370.74600.78990.82700.8536
BL30.78810.77610.80160.80860.80480.8174
TSVM0.87230.87650.88100.88640.88110.8927
ARRLS0.86840.82170.87420.86840.88210.8823
OS-JDA-MR-T-TSK-FC0.87010.89430.91640.91910.92140.9251
PWDBL10.86180.86180.86180.86180.86180.8618
BL20.71510.85970.88670.90570.9176
BL30.85050.85030.86610.86850.87510.8795
TSVM0.92320.92710.92690.93120.92920.9344
ARRLS0.91570.92040.92240.92870.93120.9336
OS-JDA-MR-T-TSK-FC0.88640.90730.92780.93140.93320.9376
STFTBL10.91290.91290.91290.91290.91290.9129
BL20.76190.85310.86740.88730.8962
BL30.90110.89230.89240.89510.89890.9107
TSVM0.93650.94590.94670.95020.95810.9524
ARRLS0.94250.94100.93560.94780.94520.9550
OS-JDA-MR-T-TSK-FC0.90310.92140.95000.95170.95850.9619

The best performance is marked in bold.

Table 4

Classification performance on six scenarios in the KPCA feature space.

M048121620
SC-1BL10.72540.72540.72530.72530.72530.7253
BL20.65070.69490.72850.74380.8124
BL30.78450.78990.82830.85350.83320.8404
TSVM0.85270.85640.86610.86750.86840.8690
ARRLS0.84550.86310.88740.85840.86320.8741
OS-JDA-MR-T-TSK-FC0.88350.91240.91870.91230.92010.9206
SC-2BL10.80500.80500.80500.80500.80500.8050
BL20.60310.74580.87270.92420.9447
BL30.78110.79120.88210.86420.80970.8358
TSVM0.92310.93050.92890.93590.93990.9378
OS-JDA-MR-T-TSK-FC0.91870.93640.93970.94150.94340.9439
SC-3BL10.90450.90450.90450.90450.90450.9045
BL20.80790.86890.86670.84180.9191
BL30.80080.78380.80370.81650.78040.8239
TSVM0.92350.92140.92980.93110.92870.9324
ARRLS0.91540.92000.91470.92280.91420.9364
OS-JDA-MR-T-TSK-FC0.91110.91250.93410.93990.94210.9433
SC-4BL10.66570.66570.66570.66570.66570.6657
BL20.71320.78190.77450.84310.8397
BL30.79440.75640.75060.75870.79880.7993
TSVM0.87890.88970.89420.88640.89110.9001
ARRLS0.86540.84120.85530.86310.87450.8924
OS-JDA-MR-T-TSK-FC0.85420.85960.92410.93210.93650.9387
SC-5BL10.84980.84980.84980.84980.84980.8498
BL20.63490.71190.73330.74250.7773
BL30.77510.76070.77580.76770.81210.8364
TSVM0.90240.93540.91420.93210.93680.9410
ARRLS0.89630.92240.90210.93610.95560.9254
OS-JDA-MR-T-TSK-FC0.86540.86840.90230.92340.92570.9341
SC-6BL10.82670.82670.82670.82670.82670.8267
BL20.69210.67230.76360.86670.8283
BL30.79260.77430.76890.79080.79460.7683
TSVM0.87560.86320.87860.88010.86980.8841
ARRLS0.86540.86040.85520.87420.85360.8774
OS-JDA-MR-T-TSK-FC0.81200.87630.87960.86520.86050.8697
Table 5

Classification performance on six scenarios in the WPD feature space.

M048121620
SC-1BL10.97110.97110.97110.97110.97110.9711
BL20.67180.91660.91420.92430.9513
BL30.86320.79860.85420.86110.85110.8442
TSVM0.97350.96530.98420.98110.97650.9647
ARRLS0.96320.95530.87450.95670.96510.9663
OS-JDA-MR-T-TSK-FC0.92710.93650.96540.96890.97140.9736
SC-2BL10.86260.86260.86260.86260.86260.8626
BL20.58730.81350.83630.86270.8751
BL30.78950.84630.84680.85320.83240.8574
TSVM0.90210.92340.91450.93100.92560.9345
ARRLS0.89540.93210.92360.95240.91250.9263
OS-JDA-MR-T-TSK-FC0.88520.90240.92100.92530.93560.9363
SC-3BL10.83880.83880.83880.83880.83880.8388
BL20.80950.80670.83270.82870.8865
BL30.79860.80230.82350.83100.83520.8298
TSVM0.88360.88960.86580.88740.86970.8920
ARRLS0.87590.89630.87410.85230.84780.8623
OS-JDA-MR-T-TSK-FC0.79680.85410.85530.86870.87230.8852
SC-4BL10.90240.90240.90240.90240.90240.9024
BL20.77780.98300.98180.98820.9957
BL30.91230.90890.91890.92140.92410.9298
TSVM0.94360.94260.94630.95000.94310.9498
ARRLS0.93550.96640.93540.96320.93110.9522
OS-JDA-MR-T-TSK-FC0.89360.92140.93860.93990.92890.9400
SC-5BL10.79300.79300.79300.79300.79300.7930
BL20.90470.87570.84600.94540.9091
BL30.88260.88540.88980.87540.93560.9367
TSVM0.92410.92650.93210.92220.94120.9398
ARRLS0.90210.92140.89540.88570.91450.9236
OS-JDA-MR-T-TSK-FC0.93110.93540.95120.95680.96120.9544
SC-6BL10.80290.80290.80290.80290.80290.8029
BL20.53970.76270.90900.88490.8879
BL30.85690.86010.86350.86860.87200.8789
TSVM0.91240.91540.91870.91560.91890.9257
ARRLS0.92140.92200.92010.92580.93610.9123
OS-JDA-MR-T-TSK-FC0.88450.89420.93540.92890.92980.9364
Table 6

Classification performance on six scenarios in the STFT feature space.

M048121620
SC-1BL10.89150.89150.89150.89150.89150.8915
BL20.68250.76270.84000.82480.8680
BL30.84690.85000.85980.85410.87450.9021
TSVM0.92350.92650.92110.93650.94100.9389
ARRLS0.91230.90250.91450.94520.93210.9225
OS-JDA-MR-T-TSK-FC0.92310.92120.95360.94560.95890.9610
SC-2BL10.95720.95720.95720.95720.95720.9572
BL20.84120.91520.83630.92150.9148
BL30.93560.93980.94100.93690.94590.9502
TSVM0.95780.96890.97120.97540.97410.9710
ARRLS0.94210.95320.94560.96230.94560.9361
OS-JDA-MR-T-TSK-FC0.92410.92540.96980.97890.98740.9863
SC-3BL10.94520.94520.94520.94520.94520.9452
BL20.87300.89830.96000.93460.9148
BL30.95630.95410.95680.96420.96870.9610
TSVM0.94780.96200.95360.95870.96410.9638
ARRLS0.93610.95210.93570.94300.93470.9637
OS-JDA-MR-T-TSK-FC0.91470.96890.97000.94530.94320.9564
SC-4BL10.90040.90040.90040.90040.90040.9004
BL20.76190.88130.83630.88230.9078
BL30.92140.91540.93540.94100.92580.9320
TSVM0.94250.94890.96310.95620.95110.9468
ARRLS0.93640.92580.95670.94120.93680.9387
OS-JDA-MR-T-TSK-FC0.90230.91280.95870.95990.96100.9632
SC-5BL10.90640.90640.90640.90640.90640.9064
BL20.77780.93220.87270.94240.9177
BL30.89210.85250.86510.86210.85470.8854
TSVM0.92570.93650.92780.94210.95320.9544
ARRLS0.90250.92360.91230.93670.94580.9422
OS-JDA-MR-T-TSK-FC0.87890.90240.92680.95410.95870.9635
SC-6BL10.87660.87660.87660.87660.87660.8766
BL20.63490.72880.85930.81830.8539
BL30.85410.84230.79630.81250.82360.8333
TSVM0.92140.93250.94320.93230.96540.9398
ARRLS0.91230.92360.93470.94150.95230.9225
OS-JDA-MR-T-TSK-FC0.87560.89740.92140.92650.94210.9412
Interpretability Average classification performance of the six scenarios in three feature spaces. The best performance is marked in bold. Classification performance on six scenarios in the KPCA feature space. Classification performance on six scenarios in the WPD feature space. Classification performance on six scenarios in the STFT feature space. Unlike TSVM that works in a black-box manner, the proposed OS-JDA-MR-T-TSK-FC has high interpretability because 1-TSK-FC is taken as the basic component. Table 7 shows the five trained fuzzy rules (antecedent and consequent parameters) on SC-1 in the KPCA feature space.
Table 7

Fuzzy rules trained on SC-1 in the KPCA feature space.

OS-JDA-MR-T-TSK-FC
Fuzzy rules: If x1 is A1kx2 is A2kxd is Adk, then fk(x)=p0k+p1kx1++pdkxd,k=1,2,,K
SC-1Rule No.Antecedent parameters ck=[c1k,c2k,,cdk]T,δk=[δ1k,δ2k,,δ dk]TConsequent parameters pk=[p0k,p1k,,pdk]T
1c1 = [0.0081, -0.0014, -0.0027, -0.0032, -0.0043, -0.0031], δ1 = [0.0023, 0.0055, 0.0036, 0.0041, 0.0021, 0.0028]p1 = [0.2531, 0.4321, −0.5123, 025623, 0.2415, −0.0423, 0.0012; 0.3135, 0.5287, 0.4452, −0.5342, 0.2342, −0.9734, −0.3244]T
2c2 = [0.0055, 0.0031, -0.0023, 0.0022, -0.0098, -0.0021], δ2 = [0.0050, 0.0036, 0.0043, 0.0044, 0.0041, 0.0033]p2 = [0.1213, −0.5354, 0.5653, −0.1243, 0.3452, 0.0642, 0.0043; 0.0633, −0.6342, 0.1453, 0.3345, −0.0234, 0.0078, −0.0015]T
3c3 = [0.0498, 0.0411, 0.0014, 0.0056, 0.0016, -0.0028], δ3 = [0.0046, 0.0034, 0.0057, 0.0057, 0.0046, 0.0037]p3 = [0.2342, −0.8456, −0.6345, −0.0134, −0.0267, 0.0111, −0.0042; −0.0534, 0.0324, 0.0434, 0.0116, 0.0362, −0.0632, 0.0027]T
4c4 = [0.0673, 0.0432, 0.0014, 0.0057, 0.0014, -0.0033], δ4 = [0.0041, 0.0032, 0.0032, 0.0011, 0.0034, 0.0015]p4 = [0.0454, −0.4345, −0.2563, −0.0412, 0.0345, 0.0163, 0.0423; 0.0123, −0.0532, 0.1634, 0.2134, −0.0745, 0.0122, 0.0011]T
5c5 = [0.0042, 0.0098, 0.0015, 0.0034, 0.0047, -0.0011], δ5 = [0.0047, 0.0032, 0.0044, 0.0076, 0.0034, 0.0043]p5 = [0.0177, 0.0134, 0.0214, 0.0034, −0.0045, 0.0023, −0.0013; 0.0034, 0.0053, −0.0123, 0.0054, 0.0053, 0.0016, 0.0014]T
Robustness Fuzzy rules trained on SC-1 in the KPCA feature space. From the objective function of OS-JDA-MR-T-TSK-FC, we see that there are three parameters, that is, ω (σ), λ1, and λ2 that should be fixed before a classification task. So, we should consider the robustness OS-JDA-MR-T-TSK-FC to them. The sensitivity analysis results are shown in Figure 7.
Figure 7

Average accuracy of OS-JDA-MR-T-TSK-FC in the KPCA feature space with different parameters. (A) Robustness w.r.t delta; (B) robustness w.r.t lmada 1; (C) robustness w.r.t lmada 2.

Average accuracy of OS-JDA-MR-T-TSK-FC in the KPCA feature space with different parameters. (A) Robustness w.r.t delta; (B) robustness w.r.t lmada 1; (C) robustness w.r.t lmada 2.

Discussions

We observe from Table 3 that the proposed OS-JDA-MR-T-TSK-FC wins the best average performance across the six transfer scenarios in all feature spaces when the number of specific-subject objects is more than 4. Especially compared with the three baselines, the advantages are more obvious. Moreover, the classification results in Tables 4–6 also exhibit the following four characteristics: BL1 does not use the specific-subject objects, so its accuracy is independent on M, whereas the other four classifiers depend on M, and it is intuitive that they gradually perform better than BL1 with the increasing of M. BL2 is only trained by the subject-specific objects. Therefore, BL2 becomes unusable when M is set to 0. But BL1, BL3, TSVM, and OS-JDA-MR-T-TSK-FC can work because, except subject-specific objects, they also leverage training objects from the source domains. Compared with other algorithms, when M is too small, BL2 performs so badly because it cannot get enough training patterns from subject-specific objects. When M is set to 0, TSVM always achieves the best performance. With the subject-specific objects gradually added into the training set, OS-JDA-MR-T-TSK-FC soon performs better than TSVM, which indicates that significant differences exist among the domains. Hence, a domain-dependent classifier, for example, TSVM is not very expected in our online transfer scenarios. When one batch (four subject-specific objects are taken as a batch in our experiments) or at most two batches of subject-specific objects are added into the training set, the classification performance of OS-JDA-MR-T-TSK-FC becomes stable. That is to say, the number of subject-specific objects OS-JDA-MR-T-TSK-FC needs is very small. So, OS-JDA-MR-T-TSK-FC meets the practical requirements because subject-specific objects are very few in real-world applications. In addition to classification performance, interpretability is also a main characteristic of the proposed OS-JDA-MR-T-TSK-FC. From Table 7, we see that it generates five interpretable fuzzy rules on SC-1 in the KPCA feature space. Each feature in a fuzzy rule can be interpreted as the energy of an EEG signal band, and each fuzzy membership function is endowed with a linguistic description. For example, “x1 is ” in the antecedent of a fuzzy rule can be interpreted as “the energy of an EEG band is a litter high,” where the term “a little high” can be replaced by others such as “a litter low,” “medium,” or “high.” In this way, suppose I am an expert from the field of EEG signal analysis, I assign five kinds of linguistic descriptions to each fuzzy membership function, that is, “low,” “a little low,” “medium,” “a little high,” and “high.” Therefore, for the first fuzzy rule in Table 7, it can be interpreted as follows: If the energy of an EEG signal band (band 1) is “high,” and the energy of an EEG signal band (band 2) is “a little low,” and the energy of an EEG signal band (band 3) is “low,” and the energy of an EEG signal band (band 4) is “low,” and the energy of an EEG signal band (band 5) is “low,” and the energy of an EEG signal band (band 6) is “low,” the consequent of the first fuzzy rule can be expressed as: −0.5278x1+0.4452x2−0.5342x3+0.2342x4−0.9734x5−0.3244x6. From Figure 6, we observe that O-T-TSK-FC is robust to σ in the range of [0.1, 0.4], to λ1 in the range of (Geng et al., 2012; Jiang et al., 2017c), and to λ2 in the range of (Ghosh-Dastidar et al., 2008; Mallapragada et al., 2009), respectively.

Conclusions

In this study, we propose a seizure classification model OS-JDA-MR-T-TSK-FC using an online selective transfer TSK fuzzy classifier with a joint distribution adaption and manifold regularization. We use epilepsy EEG signals provided by the University of Bonn as the original data and construct six transfer scenarios in three kinds of feature spaces to demonstrate the promising performance of OS-JDA-MR-T-TSK-FC. We also generate four baselines and introduce a transfer SVM model for fair comparison. The experimental results show that OS-JDA-MR-T-TSK-FC performs better than baselines and the introduced two transfer models. However, in this study, we only consider how to select the source domains. Recent studies show that dynamically selecting useful samples from the source domain can effectively induce the learning on the target domain. Therefore, in our future work, we will try to develop a mechanism, for example, classification error consensus to select most useful samples from the source domain.

Data Availability Statement

The original EEG data are available in http://www.meb.unibonn.de/epileptologie/science/physik/eegdata.html.

Author Contributions

YZ designed the whole algorithm and experiments. ZZ, HB, and WL contributed on code implementation, and LW gave some suggestions to the writing.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Algorithm 1

OS-JDA-MR-T-TSK-FC

Input: 1. [(x1,y1),(x2,y2),,(xN,yN),,(xN+M,yN+M)]T 2. ωt, λ1, λ2 and the number of fuzzy rules K; Output: 1. Training accuracy αz of each classifier; 2. Final decision function f; Procedure: For z = 1 to Z       Calculate the Euclidean distance d(z, t) between the zth source domain and the target domain by (24). End Partition the distance set {d(1, t), d(2, t), …, d(Z, t)} into two groups. Select Z/2 (as Z′) source domains from Z source domains. For z = 1 to Z′        Map X to Xg by (7.c);        Calculate Θ, Φ, Δ, and L by (13), (16), and (18), respectively.        Calculate pg and record it as (pg)z by (23);        Use (pg)z to predict Nz+M objects the record the training accuracy as αz; End Return f(x)=α1(pgT)1xg+α2(pgT)2xg++αZ(pgT)Zxg;
  14 in total

1.  Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions.

Authors:  Ke Chen; Shihai Wang
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2011-01       Impact factor: 6.226

2.  Riemannian manifold learning.

Authors:  Tong Lin; Hongbin Zha
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2008-05       Impact factor: 6.226

3.  Learning Flexible Graph-Based Semi-Supervised Embedding.

Authors:  Fadi Dornaika; Youssof El Traboulsi
Journal:  IEEE Trans Cybern       Date:  2015-02-26       Impact factor: 11.448

4.  Transductive domain adaptive learning for epileptic electroencephalogram recognition.

Authors:  Changjian Yang; Zhaohong Deng; Kup-Sze Choi; Yizhang Jiang; Shitong Wang
Journal:  Artif Intell Med       Date:  2014-10-17       Impact factor: 5.326

5.  Deep Multi-View Feature Learning for EEG-Based Epileptic Seizure Detection.

Authors:  Xiaobin Tian; Zhaohong Deng; Wenhao Ying; Kup-Sze Choi; Dongrui Wu; Bin Qin; Jun Wang; Hongbin Shen; Shitong Wang
Journal:  IEEE Trans Neural Syst Rehabil Eng       Date:  2019-09-11       Impact factor: 3.802

6.  Epileptic Seizure Prediction by Exploiting Spatiotemporal Relationship of EEG Signals Using Phase Correlation.

Authors:  Mohammad Zavid Parvez; Manoranjan Paul
Journal:  IEEE Trans Neural Syst Rehabil Eng       Date:  2015-07-22       Impact factor: 3.802

7.  Automatic detection of spike and wave discharges in the EEG of genetic absence epilepsy rats from Strasbourg.

Authors:  Peter Van Hese; Jean-Pierre Martens; Liesbeth Waterschoot; Paul Boon; Ignace Lemahieu
Journal:  IEEE Trans Biomed Eng       Date:  2008-11-07       Impact factor: 4.538

8.  SemiBoost: boosting for semi-supervised learning.

Authors:  Pavan Kumar Mallapragada; Rong Jin; Anil K Jain; Yi Liu
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2009-11       Impact factor: 6.226

9.  Seizure Classification From EEG Signals Using Transfer Learning, Semi-Supervised Learning and TSK Fuzzy System.

Authors:  Yizhang Jiang; Dongrui Wu; Zhaohong Deng; Pengjiang Qian; Jun Wang; Guanjin Wang; Fu-Lai Chung; Kup-Sze Choi; Shitong Wang
Journal:  IEEE Trans Neural Syst Rehabil Eng       Date:  2017-09-01       Impact factor: 3.802

10.  Computer Aided Theragnosis Using Quantitative Ultrasound Spectroscopy and Maximum Mean Discrepancy in Locally Advanced Breast Cancer.

Authors:  Mehrdad J Gangeh; Hadi Tadayyon; Lakshmanan Sannachi; Ali Sadeghi-Naini; William T Tran; Gregory J Czarnota
Journal:  IEEE Trans Med Imaging       Date:  2015-10-27       Impact factor: 10.048

View more
  4 in total

1.  Correlation Analysis between the Emotion and Aesthetics for Chinese Classical Garden Design Based on Deep Transfer Learning.

Authors:  Bao Guanglong; Gao Qian
Journal:  J Environ Public Health       Date:  2022-07-09

2.  Evaluation of Feature Selection for Alzheimer's Disease Diagnosis.

Authors:  Feng Gu; Songhua Ma; Xiude Wang; Jian Zhao; Ying Yu; Xinjian Song
Journal:  Front Aging Neurosci       Date:  2022-06-24       Impact factor: 5.702

3.  Rehabilitation Treatment of Motor Dysfunction Patients Based on Deep Learning Brain-Computer Interface Technology.

Authors:  Huihai Wang; Qinglun Su; Zhenzhuang Yan; Fei Lu; Qin Zhao; Zhen Liu; Fang Zhou
Journal:  Front Neurosci       Date:  2020-10-22       Impact factor: 4.677

4.  Seizure Prediction in EEG Signals Using STFT and Domain Adaptation.

Authors:  Peizhen Peng; Yang Song; Lu Yang; Haikun Wei
Journal:  Front Neurosci       Date:  2022-01-18       Impact factor: 4.677

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.