Literature DB >> 24695550

Overlapped partitioning for ensemble classifiers of P300-based brain-computer interfaces.

Akinari Onishi1, Kiyohisa Natsume1.   

Abstract

A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance.

Entities:  

Mesh:

Year:  2014        PMID: 24695550      PMCID: PMC3973688          DOI: 10.1371/journal.pone.0093045

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The P300 is a component of an event-related potential (ERP) in a non-invasive scalp electroencephalogram (EEG) that was discovered by Sutton et al. [1]. The P300 appears as a positive peak approximately 300 milliseconds (ms) after a rare or surprising stimulus. The P300 is elicited by the oddball paradigm: rare (target) and non-rare (non-target) stimuli are presented to a participant, and then he/she counts the occurrence of the target stimuli silently. The P300 can be seen in the ERPs corresponding to the target stimuli. Visual and auditory stimuli have often been used to elicit the P300 [2], [3]. Currently, the P300 is used in brain-computer interfaces (BCIs) for controlling devices. The P300 was first utilized for spelling out letters by Farwell and Donchin in 1988 [4]. They proposed a BCI system that typed letters according to the detected P300 elicited by the visual target stimuli, referred to as a P300-based BCI or a P300 speller. The P300-based BCI can control not only a speller but also a wheelchair [5], [6], computer-mouse [7], web browser [8], virtual reality system [9], game [10], or smart phone [11]. Since the BCI does not depend on muscle activity, it constitutes a new interface that will provide a better quality of life for patients disabled by neuromuscular diseases, such as amyotrophic lateral sclerosis (ALS) [12]. The interface, classification methods, and their extensions have been studied for more than 20 years (e.g., [13]–[15]). Stepwise linear discriminant analysis (SWLDA) has been widely used as a standard classification algorithm for the P300-based BCI [16]–[19]. Farwell and Donchin first proposed the SWLDA, together with the entire classification protocol for P300 [4]. Schalk et al. proposed a general-purpose BCI system, named BCI2000, in which the P300-based BCI was implemented together with the SWLDA [20]. Krusienski et al. compared the classification algorithms for BCI [21]. Specifically, they compared the classification accuracy of Pearson's correlation method, linear discriminant analysis (LDA), SWLDA, linear support vector machine (SVM), and Gaussian kernel SVM. The results showed that LDA and SWLDA achieved a better performance than the others. Blankertz et al. proposed an LDA with shrinkage for P300-based BCI that yielded a better performance than SWLDA when a small amount of training data were given [22]. Ensemble classifiers are among the most powerful classifiers for the P300-based BCI; however, they were developed and evaluated using a relatively large amount of training data. The ensemble of SVMs proposed by Rakotomamonjy and Guigue won the BCI competition III data set II that contains a huge amount of training data (15300 ERP data) [23]. They applied the ensemble classifiers to reduce the influence of signal variability using the classifier output averaging technique [24]. Salvaris et al. compared the classification accuracies of ensemble LDA and ensemble SVM classifiers using the BCI competition III data set II and BCI competition II data set IIb (7560 training data) [25]. They also employed an ensemble of six linear SVM classifiers and evaluated classification accuracies using their own data by 16-fold cross-validation [26]. An ensemble SWLDA classifier was first proposed by Johnson et al. and evaluated on their own P300-based BCI data (6480 training ERP data) [27]. Arjona et al. evaluated a variety of ensemble LDA classifiers using 3024 training data [28]. In online (real-time) P300-based BCI experiments, a smaller amount of training data compared to the training data used in the BCI competition III data set II and BCI competition II data set IIb tended to be used. Townsend et al. recorded 3230 ERP training data for a row-column paradigm and 4560 ERP training data for a checkerboard paradigm [15]. Guger et al. evaluated the online performances of P300-based BCI, where LDA was trained on 1125 ERP training data [29]. The EEG data are usually high dimensional and the target training data that contain P300 were rare (e.g., 1/6) and have different statistical property from the non-target data. In other words, researchers must address the class imbalance problem [30] that is severely prone to overfitting. Thus the thousands of training data can be considered small in this field. To be practical, the amount of the training data should be small in order to reduce the training time [21]. However, most of the studies on the ensemble classifiers for the P300-based BCI did not evaluate the classification accuracy using a practical amount of training data, e.g., less than 1000 ERP data. In an online experiment where less than 1000 training data are given, the ensemble classifier may not perform well because of its method of partitioning training data. Most ensemble classifiers employ naive partitioning that divides training data into partitions by sets of data associated with a target command [23]. According to the use of the naive partitioning, training data were partitioned without overlaps. Johnson et al. also employed the same partitioning method [27]. Due to the naive partitioning method, however, each weak learner in the ensemble classifier is trained on a smaller amount of training data than a single classifier. In addition, the dimension of the EEG data is usually high. In such cases, classifiers are prone to overfitting [32]. Thus, the classification performance of the ensemble classifiers may deteriorate when the amount of training data is small and ensemble classifiers should therefore be evaluated when less than 1000 training data are given. To develop a better classifier that requires less than 1000 training data, we propose a new overlapped partitioning method to train an ensemble LDA classifier, which we evaluated when 900 training data were given. The overlapped partitioning allows a larger amount of training data to be contained in a partition, although a part of the training data were reused. The proposed classifiers were evaluated on our original P300-based BCI data set and the BCI competition III data set II, using small (900) training data and large (over 8000) training data. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA) or none. Our objective was to clarify how the ensemble LDA classifiers with overlapped or naive partitioning and the single LDA classifier performed when 900 training data were given. Overlapped partitioning is a new partitioning method that is applied in the training of an ensemble classifier, and is designed such that it will be suitable for application in P300-based BCI. When we evaluated the performance of the new method, we also assessed the influences of dimension reduction methods. The algorithms were first compared under the condition that 900 training data were used, which were the smallest amount of data used to date for the evaluation of ensemble classifiers for P300-based BCI. In addition, the influence of the degree of overlap used in the ensemble classifier with overlapped partitioning was demonstrated for the first time. We consider that the overlapped partitioning is essential to implement the ensemble classifiers in an online system. This study contributes towards reducing the required amount of training data and achieving better classification performance in an online experiment.

Methods

Ethics Statement

This research plan was approved by the Internal Ethics Committee at Kyushu Institute of Technology. The possible risks, mental task, and approximate measurement time were explained to all participants. In addition, all participants gave their written informed consent before participating in this experiment.

Experimental Design

Ensemble classifiers with the proposed overlapped partitioning were evaluated on our original P300-based BCI data set (data set A) and BCI competition III data set II (data set B ) as shown in Figure 1. The primary objective is to clarify how the overlapped partitioning for ensemble classifiers influences the classification accuracy. The second objective is to confirm how the three conditions for dimension reduction (stepwise method, PCA, or none) improve classification performances.
Figure 1

Experimental design.

We analyzed two P300-based BCI data sets A and B respectively. Data set A was recorded in this online experiment. The recorded data set A is divided into pairs of training and test data by cross-validation (see Figure 4). Then the classification is performed for all pairs to compute the classification accuracy (see Figure 5). The overlapped partitioning is employed to train ensemble classifiers. Data set B (BCI competition III data set II) contains separated training data and test data. The data set was also classified by the proposed classifiers.

Experimental design.

We analyzed two P300-based BCI data sets A and B respectively. Data set A was recorded in this online experiment. The recorded data set A is divided into pairs of training and test data by cross-validation (see Figure 4). Then the classification is performed for all pairs to compute the classification accuracy (see Figure 5). The overlapped partitioning is employed to train ensemble classifiers. Data set B (BCI competition III data set II) contains separated training data and test data. The data set was also classified by the proposed classifiers.
Figure 4

Procedure of cross-validation used for the evaluation on data set A.

In this case, and . ERP data sets corresponding to fifty letters inputted by a participant were measured. The square aligned at the top illustrates a data set that contained 180 ERP data, of which were labeled as target ERPs, while the others were labeled as non-target ERPs. These data sets were sorted according to measured time. The data sets were divided into ten groups. Then, two successive groups were selected. The former group was assigned to training data and the latter to test data. Then, each weak learner in the ensemble classifier was learned on the assigned training data and tested using the following test data.

Figure 5

Training and testing procedure of the ensemble classifiers for P300-based BCI.

Training data flows are represented by blue lines and test data flows are illustrated by red lines. The training data are divided into overlapped partitions (see Figure 6). One of three conditions for dimension reduction (DR) is applied to each partitioned data : the stepwise method, PCA, or none. Then, LDA weak learners are trained on these dimension-reduced data. The training data are used only for the training of weak learners as illustrated by blue lines. After the training session, the test data are processed to compute scores for decision making.

Data Set A: Our Original P300-based BCI Data Set

To evaluate ensemble classifiers, we recorded EEG data using an online P300-based BCI, and then computed the classification accuracy offline. During the EEG recording, visual stimuli were provided to the participant. At the same time, the participant performed a mental task. The recorded signals were amplified, digitized, and then preprocessed before a letter was predicted. Our data contains P300-based BCI data from 10 participants that can be used for better statistical analysis. Parameters used in the stimulus and the recording method of data set A are summarized in Table 1.
Table 1

Parameters of stimulators, data acquisition, and preprocessing methods for data sets A and B.

Data set AData set B
#letters3636
#row66
#column66
#intensification sequence1515
Intensification duration (ms)100100
Blank duration (ms)7575
Target presentation duration (s)32.5
Feedback presentation duration (s)12.5
#participants102
#recorded letters50training:85, test:100
#channels864
Sampling rate (Hz)128240
Bandpass filter (Hz)0.11–300.1–60
ERP buffer length (ms)700700
Baseline buffer length (ms)pre-100pre-100
Moving average (window size)318
Downsampling (Hz)4320

Participants

Eleven healthy participants (ten males and one female aged 22–28 years old) participated in this study. They had no prior experience of controlling P300-based BCI. During the experiment, we checked the participants' obtained waveform as well as their health status. However, one male participant could not complete the task due to sickness. Thus, we finally analyzed data from ten participants in offline analysis.

Devices

The P300-based BCI consisted of a stimulator, amplifier, A/D converter, and computer as shown in Figure 2. EEG signals were recorded at Fz, Cz, P3, Pz, P4, PO7, Oz, and PO8 scalp sites according to the international 10–20 system, which is the alignment commonly used for P300-based BCI [9]. The ground electrode was located at the AFz site and the reference electrodes were located on the mastoids. The EEG signals were filtered (0.11–30 Hz band-pass filter) and amplified 25000 times with a BA1008 (TEAC Co. Ltd., Japan). Then, the signals were digitized by an AIO-163202FX-USB analog I/O unit (CONTEC Co. Ltd., Japan). The sampling rate was 128 Hz. The P300-based BCI was implemented by MATLAB/Simulink (Mathworks Inc., USA). The recorded signals were analyzed offline using MATLAB. Stimuli for the P300-based BCI were presented on a TFT LCD display (HTBTF-24W, 24.6 inches wide with dpi; Princeton Technology Ltd., Japan) located 60 cm in front of the participant.
Figure 2

Structure of the P300-based BCI system.

A target letter is presented to a participant, then letters on the stimulator are intensified by row or by column. The participant must do a mental task: silently count when the target letter is intensified. During this, the event-related potentials (ERPs) that contain the P300 component are recorded from the scalp. The signals are amplified, digitized, then stored in a computer. After finishing all intensifications, the signals were processed to predict a letter, then the feedback is displayed.

Structure of the P300-based BCI system.

A target letter is presented to a participant, then letters on the stimulator are intensified by row or by column. The participant must do a mental task: silently count when the target letter is intensified. During this, the event-related potentials (ERPs) that contain the P300 component are recorded from the scalp. The signals are amplified, digitized, then stored in a computer. After finishing all intensifications, the signals were processed to predict a letter, then the feedback is displayed.

Stimuli

We employed most of the parameters of the stimulator that were used in the BCI competition III data set II [23]. The stimulator of the P300-based BCI consists of 36 gray letters that form a matrix, a target indicator, and a feedback indicator (see Figure 3). All the columns and rows of the matrix were numbered to manage intensifications and for the subsequent prediction of a letter. The set of column numbers was , while the set of row numbers was . In addition, a set of all the intensifications was . A row or a column of gray letters in the matrix turned white for 100 ms (intensification duration), and then changed to gray again for 75 ms (blank duration). At least intensifications were required to identify an input letter out of the 36 letters. This is called a sequence. One row or column in a sequence was selected by a random permutation. The number of intensification sequences was fixed to 15 in the online experiment (i.e., 180 intensifications), while was varied from 1 to 15 in the offline analysis.
Figure 3

Stimulator for the P300-based BCI.

It has 36 gray letters that form a matrix in the center. Each column of the matrix is numbered 1–6 and each row 7–12. A target letter is provided at the top center of the stimulator and the predicted letter is shown at the top right as feedback in test sessions.

Stimulator for the P300-based BCI.

It has 36 gray letters that form a matrix in the center. Each column of the matrix is numbered 1–6 and each row 7–12. A target letter is provided at the top center of the stimulator and the predicted letter is shown at the top right as feedback in test sessions.

Preprocessing

EEG data were preprocessed for both online recording and offline analysis. The data were trimmed from the beginning of each intensification to 700 ms (8 channels×89 samples). Each 100 ms pre-stimulus baseline was subtracted from the corresponding ERP data. Subsequently, ERP data were smoothed (using a moving average with a window size of 3 ), downsampled to 43 Hz (8 channels×30 samples), and vectorized (240 channels×samples).

Sessions and a mental task

EEG data of P300-based BCI were recorded through a training session and ten test sessions, where only the data in the test sessions were evaluated by our proposed cross-validation in the offline analysis. In each session, a participant was required to spell out five letters using the P300-based BCI. A target letter to be inputted was selected randomly by the system. Thus the 900 ERPs (5 letters × 1 session × 12 intensifications × 15 sequences) were recorded in the training session and 9000 ERPs (5 letters × 10 sessions × 12 intensifications × 15 sequences) for test sessions. A target letter was displayed for 3 s, and then intensifications were presented. The participant was asked to perform the oddball task to elicit P300: the participant had to focus on the cued target letter and count silently when the letter was intensified. During the sessions, observed EEG data were recorded. In the training session, the feedback was not displayed. In the test sessions the feedback was shown in the feedback indicator for 1 s at the end of all 15 intensification sequences for the target letter. The online feedback was computed using the single LDA classifiers [21] and was presented to the participants in order to confirm whether the participant conducted the mental task appropriately in the test sessions. The feedback of success or failure also contributes to motivate participants [33], even though presenting feedbacks does not improve the classification accuracy of P300-based BCI [34]. In addition, the feedback is essential for participants to acquire the appropriate mental task [35]. Also an experimenter confirmed the feedback to make sure that the appropriate classification performance were observed using LDA. All the previous data gathered before the current session were used for learning the classifier in the online recording.

Data Set B: BCI Competition III Data Set II

We also evaluated the proposed ensemble classifiers using BCI competition III data set II because many novel and traditional BCI algorithms have been evaluated using this data set. Since the competition data set contains a large amount of training data, we evaluated the classification performance using limited training data (900 ERPs) in addition to the full training data (15300 ERPs). Parameters used in the stimulus and data recording of the data set B are also summarized in Table 1.

Overview of the data set and stimulator

The data set contains EEG data for participants A and B. The EEG data were recorded from 64 channels. The recorded signals were bandpass filtered (0.1–60 Hz) and digitized at 240 Hz. The same procedure of intensifications and mental tasks for data set A was also applied to the data set B. The differences in the stimulators between data sets A and B were in the size, the font and the brightness of letters, horizontal/vertical distances of letters, and the method of presenting the target and feedback letters. It should be noted that the target and feedback presentation times were different between these two data sets, though these parameters were not directly related to the offline analysis. The data set contains EEG data corresponding to 85 target letters for training (85 letters × 12 intensifications × 15 sequences = 15300 ERPs) and EEG data of 100 target letters for testing (18000 ERPs) for each participant. A more detailed description of the data set can be found in [36]. The same preprocessing method was used for data sets A and B; however different parameters were employed because the sampling rate and the number of channels for data set B were larger than those for data set A. All 64 channels data were used for the offline analysis. The data were trimmed from the beginning of each intensification to 700 ms (64 channels × 168 samples). Each 100 ms pre-stimulus baseline was subtracted from the ERP data. ERP data were smoothed (using moving average with window size = 18), downsampled to 20 Hz (64 channels × 14 samples), and vectorized (896 channels×samples). The vectorized data are handled as feature vectors in the classification.

Ensemble classifiers with overlapped partitioning

The ensemble classifier divides given training data into partitions, then those partitions were used to train multiple classifiers in the ensemble classifier. The classifier in the ensemble classifier is called a “weak learner.” The number of weak learners is denoted by . The training data were divided into partitions using overlapped partitioning. A dimension reduction method was applied to these partitioned data, and then LDA weak learners were trained. The test data corresponding to a letter were processed to compute the scores, and then the scores were translated into a predicted letter. To evaluate the classification performance using thousands of training data, the proposed cross-validation was applied.

p/q cross-validation

cross-validation is a special cross-validation that can reduce the amount of training data. For a fair comparison of the classification accuracy, the amount of training data used in the offline analysis should be reduced to less than 1000. The traditional cross-validation method is not suitable because it provides at least 4500 training data in this case. Instead, we employed the proposed cross-validation that performed -fold cross-validation, where of all data were assigned to the training data. First, the ERP training data are divided into groups. Second, assuming that the groups are aligned around a circle, groups from th group () are sequentially selected. Then, consecutive groups are assigned to the training data, and the last single group was assigned to the test data. The above procedures are repeated for all . In total, pairs of training and test data are prepared. For each pair, classification is performed. The classification accuracy can be computed as , where is the total number of letters and is the number of correct predictions among all pairs. It should be noted that cross-validation is equivalent to the conventional -fold cross-validation. In the present study, we have evaluated data set A using the cross-validation as shown in Figure 4. In other words, five letters out of were assigned to the training data, which contained ERPs (9000 ERPs × 1/10). It takes 180.125 seconds to spell out five letters in the conditions of this online experiment, which does not overly tire the participant. In addition to the cross-validation, we also used the conventional 10-fold cross-validation ( cross-validation) in order to compare the ensemble classifiers when a large amount of training data were provided. Thus, ERPs for letters out of were used as training data, which contained 8100 ERPs (9000 ERPs × 9/10). The cross-validation was not applied to data set B because the competition data set has separated training and test data.

Procedure of cross-validation used for the evaluation on data set A.

In this case, and . ERP data sets corresponding to fifty letters inputted by a participant were measured. The square aligned at the top illustrates a data set that contained 180 ERP data, of which were labeled as target ERPs, while the others were labeled as non-target ERPs. These data sets were sorted according to measured time. The data sets were divided into ten groups. Then, two successive groups were selected. The former group was assigned to training data and the latter to test data. Then, each weak learner in the ensemble classifier was learned on the assigned training data and tested using the following test data.

Overlapped partitioning

In a BCI study on ensemble classifiers, naive partitioning was used [23]. According to their use of naive partitioning, the given training data were divided into partitions by letters without overlaps. Due to the partitioning without overlaps, the amount of training data in a partition becomes small so that covariance matrices might not be estimated precisely. Instead of this method, we proposed a generalized partitioning method. All the procedures for training and testing the proposed ensemble classifier for P300-based BCI are shown in Figure 5. In overlapped partitioning, sets of training data are divided into partitions, where the overlap of each partition is allowed. In the first step of the overlapped partitioning method, training data assigned to input commands were sorted by recorded time and were divided into blocks without overlaps. Then, assuming that the blocks were aligned around a circle, consecutive blocks from th block () were selected to form a partition. The procedure was repeated for all . An example of overlapped partitioning is shown in Figure 6. Each weak learner was trained on the partitioned data (see Figure 5). The advantage of this partitioning method as compared to naive partitioning is that a larger amount of data are stored in each partition. Thus, overlapped partitioning may be robust against shortage of training data. In the present study, was fixed; however, was varied in the offline analysis.
Figure 6

Overlapped partitioning when and .

Training data were first divided into five blocks. Assuming that those five blocks were aligned around a circle, three continuous blocks were selected to form a partition. As a result, five partitions were prepared. The partitioned training data sets were used to train weak learners in the ensemble classifier.

Training and testing procedure of the ensemble classifiers for P300-based BCI.

Training data flows are represented by blue lines and test data flows are illustrated by red lines. The training data are divided into overlapped partitions (see Figure 6). One of three conditions for dimension reduction (DR) is applied to each partitioned data : the stepwise method, PCA, or none. Then, LDA weak learners are trained on these dimension-reduced data. The training data are used only for the training of weak learners as illustrated by blue lines. After the training session, the test data are processed to compute scores for decision making.

Overlapped partitioning when and .

Training data were first divided into five blocks. Assuming that those five blocks were aligned around a circle, three continuous blocks were selected to form a partition. As a result, five partitions were prepared. The partitioned training data sets were used to train weak learners in the ensemble classifier. An ensemble classifier with the overlapped partitioning can be considered as a special case of bagging used in pattern recognition [37]. In the bagging, random sampling from available training data allowing overlap is used, which is also referred to as bootstrap sampling. In contrast, the overlapped partitioning does not have any randomness so that no duplicated partition is made except for a special case. Unlike a standard pattern recognition problem, a set of EEG was recorded for every letter, where 30 ERPs contained P300 and the other 150 ERPs did not. The random sampling out of the full set of EEG data runs the risk that only a few ERP data that contain P300 could be selected in a partition, which may deteriorate classification performance. Also the random sampling out of five blocks of EEG data is not effective because duplicated partitions could be prepared. The proposed overlapped partitioning does not have such risks and provides different partitions with a constant ratio of EEG data with P300 to those without it. Thus, the weak learners of the ensemble classifier can efficiently be trained by the overlapped partitioning.

Dimension reduction

A dimension reduction method has often been applied to the BCI because EEG data are usually high dimensional. However, the influences of the dimension reduction methods have not been evaluated for ensemble classifiers. In this study, one of three conditions for dimension reduction was applied: 2 dimension reduction methods (the stepwise method and PCA) and a control condition without dimension reduction (none). Stepwise method The stepwise method selects suitable spatiotemporal predictor variables for classification by forward and backward steps. First, an empty linear regression model is prepared, then variables are appended through the following steps. In the forward step, a variable is appended to the model, then the model was evaluated by an F-test. Through the F-test, p-value was computed, which is the probability of the occurrence of a result by chance. The variable is added if the p-value of the F-test is higher than a threshold . The forward step is repeated until no variable is appended. In the following backward step, a variable of the temporal model is removed and the model was also evaluated by the F-test. Then, the variable is removed if the p-value of the F-test is lower than a threshold . The backward step is continued until no variable is removed. Then, the forward step is repeated again. The final model is determined when no variable is appended to or removed from the model. The remaining variables in the final model are used for classification. More details are given in [21], [38]. We set and , which were commonly used for P300-based BCI [21], [22]. Principal component analysis The principal component analysis (PCA) is a typical dimension reduction method which is based on the eigenvalue decomposition [39], and has also been applied to P300-based BCI [10], [40]. In summary, the covariance matrix of training data is computed and then the eigenvalue decomposition is performed. The projected data using a normalized eigenvector corresponding to the largest eigenvalue is called the first principal component (PC). The other PCs can be calculated as well. We applied PCA to data in each partition, and then used 1–140 PCs for classification on data set A, 1–400 PCs for classification on data set B.

Linear discriminant analysis

Linear discriminant analysis (LDA) is a frequently used classifier for P300-based BCI. In the ensemble classifier, LDA weak learners are implemented. One of three conditions for dimension reduction is applied to the th partitioned data, and then the weight vector of the th LDA weak learner is trained as follows:where is a total covariance matrix over the target and non-target training data, and and are the mean vectors of the target and non-target training data in the th partition. The trained weight vectors of each LDA weak learner are used to compute the score for the decision making. See [22] for more details of a single LDA classifier.

Decision making

To predict a letter, its corresponding test data were processed to compute scores for decision making. A test feature vector that belonged to the th intensification in the th sequence in the th partition after applying dimension reduction was denoted by . The score corresponding to an intensification was computed as. In the offline analysis, the number of intensification sequences was varied from 1 to 15. The inputted letters were then predicted by finding maximum scores from row and column intensifications, respectively: The first element of represents the column number of a predicted letter, while the second represents the row number. For example, denotes “N” in Figure 3.

Special cases of overlapped partitioning

The ensemble classifiers with the proposed overlapped partitioning are equivalent to ensemble classifiers with naive partitioning or a single classifier in a special case. That is, the ensemble classifier with overlapped partitioning becomes the ensemble classifier with naive partitioning when and . In this case, partitions do not overlap each other, which can be easily seen in Figure 6. Moreover, the ensemble classifier also behaves as a single classifier when . The scores in Equation 3 can be multiplied by an arbitrary : When , all the partitioned data sets are just duplications of all the given training data. After a dimension reduction method has been applied, the same data are stored in all partitions. As a result, all the weight vectors of the classifiers become the same: Since the final model of the stepwise method or the projection of the PCA is adjusted by the same training data, the test data after the dimension reduction should be the same: Considering Equation 5 and 6, the score for decision making instead of Equation 2 is computed by On the other hand, the score for a single classifier is formed as Thus, the relationship between the single classifier and the overlapped ensemble classifiers that have is From Equation 4, and work in the same way for decision making. Therefore, the ensemble classifier with overlapped partitioning that satisfies corresponds to a single classifier.

Comparison Protocol

We evaluated varieties of ensemble classifiers with overlapped partitioning in order to ensure the influence of the degree of overlap together with dimension reduction methods. One of three different conditions for dimension reduction was applied: stepwise, PCA, or none. They are denoted by overlapped ensemble SWLDA (OSWLDA), overlapped ensemble PCA LDA (OPCALDA), and overlapped ensemble LDA (OLDA) classifiers, respectively. Those 3 classifiers were evaluated on data sets A and B. Data set A, recorded by us, was analyzed in the small training data case using cross-validation and in the large training data case using cross-validation (conventional 10-fold cross-validation). Thus, the same amount of the training data was provided for each ensemble classifier 900 training data for the former and 8100 training data for the latter. Additionally, in the cross-validation, the training and test data were clearly separated so that none of the training data were used as the test data. Data set B (BCI competition III data set II) was also analyzed using limited training data (ERPs corresponding to the first 5 letters) and using full training data (ERPs corresponding to 85 letters). The former contained 900 ERPs while the latter contained 15300 ERPs for training. To confirm the influence of overlapped partitioning, the degree of overlaps was varied, while the number of weak learners was fixed in the offline analysis. Evaluated combinations of and for data sets A and B were summarized in Tables 2 and 3, respectively. In particular, in the case , the ensemble classifier with overlapped partitioning is equivalent to the single classifier. In addition, in the case where and , it behaves as a conventional ensemble classifier with naive partitioning. It should be noted that the algorithms were learned on 900 training data of both data sets, which was much smaller than the training data used in previous studies, for example, the 15300 training data used in the BCI competition III data set II [23], and 7560 data used in the BCI competition II data set IIb [41]. In our comparison, the single SWLDA which is commonly used in this field and the ensemble SWLDA proposed by Johnson et al. were compared.
Table 2

Evaluation parameters of ensemble classifiers with overlapped partitioning on data set A.

Evaluationmethod#training letters#test letters #training data for aweak learner (ERPs)
cross-valdiation5 letters(900 ERPs)50 letters5555512345180360540720900
cross-valdiation(conventional10-fold cross-validation)45 letters(8100 ERPs)50 letters4545454545454545454515101520253035404518090018002700360045005400630072008100

The data set A was evaluated by cross-validation (900 training data ) and cross-validation (8100 training data). The number of weak learners and the number of blocks were the parameters of the overlapped partitioning. These evaluation methods and parameters determine the amount of training data for a weak learner in an ensemble classifier. The number of training letters (#training letters) is decided by 50 entire letters × . The number of training data for a weak learner (#training data for a weak learner) can be computed by 9000 entire ERPs × × /.

Table 3

Evaluation parameters of ensemble classifiers with overlapped partitioning on data set B (BCI competition III data set II).

Evaluationmethod#training letters#test letters #training data for aweak learner (ERPs)
Limited training data(first 5 letters)5 letters (900 ERPs)100 letters5555512345180360540720900
Full training data85 letters100 letters171717171717171717171717171717171712345678910111213141516179001800270036004500540063007200810090009900108001170012600135001440015300

The ensemble classifiers were trained on limited training data (900 training data ) or full training data (15300 training data). The number of weak learners and the number of blocks were parameters used in the overlapped partitioning. These evaluation methods and the parameters determine the amount of training data for a weak learner in an ensemble classifier. The number of training data for a weak learner (#training data for a weak learner) can be computed by given training ERPs × /.

The data set A was evaluated by cross-validation (900 training data ) and cross-validation (8100 training data). The number of weak learners and the number of blocks were the parameters of the overlapped partitioning. These evaluation methods and parameters determine the amount of training data for a weak learner in an ensemble classifier. The number of training letters (#training letters) is decided by 50 entire letters × . The number of training data for a weak learner (#training data for a weak learner) can be computed by 9000 entire ERPs × × /. The ensemble classifiers were trained on limited training data (900 training data ) or full training data (15300 training data). The number of weak learners and the number of blocks were parameters used in the overlapped partitioning. These evaluation methods and the parameters determine the amount of training data for a weak learner in an ensemble classifier. The number of training data for a weak learner (#training data for a weak learner) can be computed by given training ERPs × /. For the statistical analysis of data set A, the effects of the intensification sequence (), dimension reduction condition (stepwise, PCA or none), and degree of overlaps () were evaluated by three-way repeated-measures ANOVA followed by post hoc pairwise t-tests with Bonferroni's method. No statistical analysis was applied to data set B because of the limited number of participants.

Results

The classification performances of OSWLDA, OPCALDA, and OLDA were evaluated on data set A using or cross-validation and data set B with limited or full training data. The degree of overlap used in the overlapped partitioning () was varied while the number of weak learners in the ensemble classifier () was fixed. As mentioned above, an overlapped ensemble classifier behaves as an ensemble classifier with naive partitioning when and , and becomes a single classifier when .

Data Set A Using Cross-validation

EEG data in data set A were classified by OSWLDA, OPCALDA, and OLDA using cross-validation using parameters in Table 2. The classification performances of these classifiers for each participant are shown in Figure 7. The mean accuracies of these algorithms are shown in Figure 8 and in Table 4.
Figure 7

Classification performances of ensemble classifiers on data set A using cross-validation.

OSWLDA, OPCALDA and OLDA were trained on 900 ERPs. The influence of overlapped partitioning were evaluated by changing the degree of overlaps () and the number of intensifications (). The classification performances of all participants were presented.

Figure 8

Mean classification performances of ensemble classifiers on data set A using cross-validation.

OSWLDA, OPCALDA and OLDA were trained on 900 ERPs. The classification accuracies were averaged over ten participants.

Table 4

Mean classification accuracies (%) of OSWLDA, OPCALDA, and OLDA evaluated on data set A using cross-validation.

Algorithms Intensification sequences
123456789101112131415
OSWLDA51 31.2 48.0 60.0 68.4 72.8 75.4 80.8 83.6 86.0 87.4 89.8 91.4 90.0 90.6 91.8
2 37.6 54.868.074.081.482.487.689.891.292.493.894.294.495.295.0
336.4 57.4 69.477.282.485.0 87.8 90.091.492.6 94.6 94.6 94.695.094.8
437.255.4 71.4 77.6 84.0 85.4 87.6 90.2 92.2 93.6 93.8 94.6 95.0 95.2 95.6
535.653.666.874.879.283.485.689.491.693.093.094.494.094.295.2
OPCALDA51 29.4 44.6 57.6 63.6 71.8 74.6 76.8 80.6 82.6 84.8 87.6 88.0 88.6 88.4 90.6
236.653.466.475.081.284.285.689.690.891.092.692.493.894.295.0
338.254.467.6 75.2 81.684.287.490.492.091.893.293.694.495.6 96.0
4 38.6 55.8 67.874.4 82.2 85.8 88.0 91.4 93.0 92.4 94.0 93.6 95.0 95.6 96.0
537.855.4 68.6 74.8 82.2 85.287.490.492.0 92.4 93.8 94.4 94.8 95.8 95.8
OLDA51 2.6 4.0 3.4 3.8 4.0 4.2 3.2 4.0 4.4 4.6 4.6 5.2 4.6 5.0 5.4
225.637.049.856.865.269.675.279.880.082.683.683.685.285.488.0
328.243.257.865.872.876.479.482.486.288.089.089.491.091.892.8
4 29.2 44.8 59.8 67.8 73.2 78.8 80.8 84.2 86.8 89.6 91.0 90.6 91.8 93.2 94.4
528.244.4 59.8 67.6 73.2 78.8 80.483.485.4 89.6 91.0 90.491.6 93.2 94.2

The best accuracy among all for each algorithm and each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and .

Classification performances of ensemble classifiers on data set A using cross-validation.

OSWLDA, OPCALDA and OLDA were trained on 900 ERPs. The influence of overlapped partitioning were evaluated by changing the degree of overlaps () and the number of intensifications (). The classification performances of all participants were presented.

Mean classification performances of ensemble classifiers on data set A using cross-validation.

OSWLDA, OPCALDA and OLDA were trained on 900 ERPs. The classification accuracies were averaged over ten participants. The best accuracy among all for each algorithm and each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and . The key finding was that OSWLDA showed higher classification performance than the single SWLDA classifier () and ensemble SWLDA classifier with naive partitioning () when 900 training data were provided. As can be seen in Table 4, most algorithms achieved the best performance when , while the worst accuracy was observed when . Regarding OLDA, when , the classification accuracy was close to the chance level (1/36). As can be seen in Figure 8, OSWLDA () achieved a higher classification accuracy than the single SWLDA classifier (), especially in . At , OSWLDA () obtained an higher accuracy than the ensemble SWLDA classifier with naive partitioning and a higher accuracy than the single SWLDA classifier. Moreover, OPCALDA () achieved a better classification accuracies than OPCALDA () when , although the differences were small. In contrast, the accuracy of OLDA () was close to that of the single LDA classifier (), although OLDA () achieved slightly higher accuracies in some sequences. A three-way repeated-measures ANOVA with the intensification sequence, dimension reduction conditions, and degree of overlap was applied. The main effects of the intensification sequence (, ), dimension reduction conditions (, ), degree of overlap (, ) and all their interactions ( for all) were significant. In addition, significant differences between the dimension reduction conditions ( for all), and between pairs of , except for the pair and ( for all), were revealed by the post hoc pairwise t-test with Bonferroni's method.

Data Set A Using 9/10 Cross-validation

EEG data in data set A were also classified by the three algorithms using cross-validation using parameters in Table 2. Classification performances of the three algorithms for each individual participant are shown in Figure 9. The mean classification performances are shown in Figure 10 and Table 5.
Figure 9

Classification performances of ensemble classifiers on data set A using cross-validation.

OSWLDA, OPCALDA and OLDA were trained on 8100 ERPs. Then the data set A was classified by those classifiers, changing and . The classification performances of all participants were displayed.

Figure 10

Mean classification performances of ensemble classifiers on data set A using cross-validation.

OSWLDA, OPCALDA and OLDA were trained on 8100 ERPs. The mean classification accuracies over ten participants were presented.

Table 5

Mean classification accuracies (%) of OSWLDA, OPCALDA, and OLDA evaluated on data set A using cross-validation.

Algorithms Intensification sequences
123456789101112131415
OSWLDA451 40.8 62.6 75.8 81.6 85.0 88.0 90.6 92.6 92.2 93.6 94.0 93.6 93.4 94.0 94.6
547.467.079.083.888.089.491.694.094.694.294.694.895.495.696.0
1047.268.280.486.290.490.493.294.895.495.496.296.097.096.496.8
1547.868.680.685.491.491.694.095.496.296.497.096.697.297.897.8
20 48.4 68.0 81.2 86.4 91.8 92.894.296.296.096.697.297.297.897.698.2
2547.869.081.086.291.692.894.696.696.297.097.897.4 98.4 98.6 98.2
3047.8 69.2 80.885.691.093.094.896.497.096.898.297.6 98.4 98.6 98.8
3547.468.880.085.891.292.894.496.497.2 97.4 98.6 98.0 98.4 98.6 98.8
4046.268.079.685.091.292.294.8 97.0 97.4 97.298.4 98.4 98.4 98.6 98.8
4543.468.280.286.090.2 93.4 95.4 96.4 97.4 97.298.298.2 98.4 98.6 99.0
OPCALDA451 46.0 63.8 76.6 83.2 87.8 88.8 91.2 92.8 92.6 92.8 93.6 94.2 93.8 94.0 94.2
5 46.0 68.8 78.883.889.290.0 91.2 93.294.294.295.495.094.694.495.4
1046.467.679.284.690.090.092.293.495.695.696.096.095.896.496.8
1547.468.280.0 86.0 90.692.093.895.296.296.496.896.896.897.298.0
2047.068.280.285.490.4 93.2 94.295.696.496.697.097.097.497.898.0
2547.066.680.285.491.092.493.895.696.697.097.497.297.8 98.2 98.2
30 47.4 67.0 80.6 85.491.092.2 94.4 95.497.096.897.697.4 98.0 98.2 98.6
3547.067.480.285.491.292.094.095.697.297.298.097.8 98.0 98.2 98.6
4046.667.0 80.6 85.4 91.4 92.294.296.0 97.6 97.4 98.4 98.0 98.0 98.2 98.6
4546.867.280.4 86.0 91.4 91.894.2 96.2 97.497.2 98.4 98.0 98.0 98.2 98.6
OLDA451 3.8 3.4 4.4 3.8 2.6 2.8 3.4 2.8 2.6 2.6 2.8 3.6 4.0 4.2 4.2
546.667.279.084.489.490.492.293.694.694.695.495.695.295.696.2
10 47.8 68.480.084.289.890.693.294.695.295.696.896.896.697.097.4
1547.6 68.6 81.885.490.691.894.295.296.096.296.896.897.897.898.6
2047.2 68.6 81.885.690.892.094.496.096.496.897.897.898.4 98.8 99.2
2546.868.2 82.0 86.2 91.2 92.2 94.8 96.496.697.098.097.8 98.6 98.8 99.6
3046.467.481.886.491.0 93.0 94.6 96.6 96.897.298.298.2 98.6 98.8 99.4
3546.468.081.6 86.8 91.0 93.0 94.696.297.0 97.6 98.8 98.4 98.6 98.8 99.4
4046.068.081.4 86.8 90.8 93.0 94.696.4 97.2 97.6 98.8 98.4 98.6 98.699.4
4546.068.081.0 86.8 90.6 93.0 94.696.4 97.2 97.6 98.8 98.6 98.6 98.699.4

The best accuracy among all for each algorithm and each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and .

OSWLDA, OPCALDA and OLDA were trained on 8100 ERPs. Then the data set A was classified by those classifiers, changing and . The classification performances of all participants were displayed. OSWLDA, OPCALDA and OLDA were trained on 8100 ERPs. The mean classification accuracies over ten participants were presented. The best accuracy among all for each algorithm and each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and . The classification performance of ensemble classifiers with the overlapped partitioning were as well as, or slightly better than that of the single classifier when 8100 training data were provided. As shown in Figure 10, the worst classification performance was achieved by the ensemble classifiers () for all algorithms, which was the same as the analysis of data set A using cross-validation. However, only a little performance improvement of the overlapped ensemble classifiers can be found when compared to the single classifier (). A three-way repeated-measures ANOVA with the intensification sequence, dimension reduction conditions, and degree of overlap was applied. The main effects of the intensification sequence (, ), dimension reduction conditions (, ), degree of overlap (, ) and all their interactions ( for all) were significant. In addition the post hoc pairwise t-test was applied. Significant differences between the dimension reduction conditions ( for all) were revealed. Also, significant differences between the pairs containing , , , and ( for all) were revealed.

Data Set B with Limited Training Data

EEG data in data set B were classified by OSWLDA, OPCALDA and OLDA using 900 training data using parameters in Table 3. Classification performances of OSWLDA, OPCALDA, and OLDA evaluated on data set B using a limited amount of training data (900 ERPs) are shown in Tables 6, 7, and 8, respectively.
Table 6

Classification accuracies (%) of OSWLDA on data set B with limited training data.

ParticipantsIntensification sequences
123456789101112131415
51A53789116771081010711
B548998785999111513
Mean5.03.57.58.59.09.56.57.56.09.58.59.510.511.012.0
52A626974610791010101310
B333755121331010
Mean 4.5 2.5 4.5 8.0 6.0 4.5 3.5 6.0 4.0 6.0 6.5 5.5 5.0 7.0 5.0
53A141724302433363740475456616567
B202334404951534959616567677274
Mean 17.0 20.029.035.036.542.044.543.049.554.059.561.564.068.570.5
54A102421322836394351535759646367
B182841446067626366687479818082
Mean14.0 26.0 31.0 38.0 44.0 51.5 50.5 53.0 58.5 60.5 65.5 69.0 72.5 71.5 74.5
55A31420262324313841535057536268
B152436375049504849565861667071
Mean9.019.028.031.536.536.540.543.045.054.554.059.059.566.069.5

The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and .

Table 7

Classification accuracies (%) of OPCALDA on data set B with limited training data.

ParticipantsIntensification sequences
123456789101112131415
51A4765661086746644
B472445334655443
Mean4.07.04.04.55.05.56.55.5 5.0 6.54.55.55.04.03.5
52A021133314443344
B741243446544222
Mean 3.5 3.0 1.0 1.5 3.5 3.0 3.5 2.5 5.0 4.5 4.0 3.5 2.5 3.0 3.0
53A121722223136374147575560596065
B102429343934413946485354596161
Mean 11.0 20.5 25.5 28.0 35.0 35.0 39.040.046.5 52.5 54.0 57.059.060.563.0
54A71717193031424148535657576265
B112627334039464949495159636462
Mean9.0 21.5 22.026.0 35.0 35.0 44.0 45.0 48.5 51.053.5 58.0 60.0 63.0 63.5
55A71617182627363843484953535660
B72325324240454450474852606360
Mean7.019.521.025.034.033.540.541.046.547.548.552.556.559.560.0

The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and .

Table 8

Classification accuracies (%) of OLDA on data set B with limited training data.

ParticipantsIntensification sequences
123456789101112131415
51A135422453323353
B632346646444878
Mean3.53.03.53.53.04.05.04.54.53.53.03.55.56.05.5
52A347463232211222
B27664610117881011911
Mean2.5 5.5 6.5 5.05.04.56.07.04.55.04.55.5 6.5 5.56.5
53A210012111111111
B235542143533112
Mean2.02.02.52.5 2.5 2.0 1.0 2.5 2.0 3.0 2.0 2.0 1.0 1.0 1.5
54A012233112222142
B212232344443354
Mean 1.0 1.0 2.0 2.0 3.02.52.0 2.5 3.0 3.0 3.02.52.04.53.0
55A7858556108688788
B4347768911988586
Mean 5.5 5.5 4.5 7.5 6.0 5.5 7.0 9.5 9.5 7.5 8.0 8.0 6.0 8.0 7.0

The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and .

The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and . The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and . The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and . The OSWLDA and OPCALDA ( and ) achieved better classification accuracies than those with naive partitioning ( and ) and the single classifiers ( and ) when 900 training data were available. As for OSWLDA, the best classification accuracies can be seen when . Further, most of the best mean classification performances of OPCALDA can be seen when or . These tendencies are similar to the analysis of data set A using cross-validation. OSWLDA achieved about 10% (15% at best when ) higher mean classification accuracy than the single SWLDA classifier (). OPCALDA also achieved a 5.5% higher mean classification accuracy than the single PCALDA classifier () when . However, all of the classification performances of OLDA were close to chance level.

Data Set B with Full Training Data

EEG data of data set B were classified by OSWLDA, OPCALDA and OLDA using 15300 training data using parameters in Table 3. Classification performances of the three algorithms evaluated on data set B using full training data (15300 ERPs) are presented in Tables 9, 10, and 11, respectively.
Table 9

Classification accuracies (%) of OSWLDA on data set B with full training data.

ParticipantsIntensification sequences
123456789101112131415
171A172849536062657279828385869190
B456266707784878991939497969796
Mean 31.0 45.0 57.5 61.5 68.5 73.0 76.0 80.5 85.0 87.5 88.5 91.0 91.0 94.0 93.0
172A193048596468757682868588919496
B466367707987909291949497979898
Mean32.546.5 57.5 64.571.577.582.58486.59089.592.5949697.0
173A213859626976818285858592939796
B496469738186879193959495969797
Mean35.051.0 64.0 67.5 75.0 81.0 84.0 86.5 89.090.089.593.594.597.096.5
174A193654656771798081858590899496
B516471708085869193959495969497
Mean35.050.062.5 67.5 73.578.082.585.587.090.089.592.592.594.096.5
175A213759636472807982858791929497
B496469717886879293959496969697
Mean35.050.5 64.0 67.071.079.083.585.587.590.090.593.594.095.097.0
176A203754606173777885878789919496
B466369708286889294959595969597
Mean33.050.061.565.071.579.582.585.0 89.5 91.091.092.093.594.596.5
177A223955636374797884878691939599
B486370708187889294959495969697
Mean35.051.062.566.572.080.583.585.089.091.090.093.094.595.5 98.0
178A223652596472767981878790949498
B466869708188899294959595959597
Mean34.0 52.0 60.564.572.580.082.585.587.591.091.092.594.594.597.5
179A273751606573778083898993959599
B456469697987899294959595959597
Mean 36.0 50.560.064.572.080.083.086.088.5 92.0 92.0 94.095.095.0 98.0
1710A223554626372747783888793959498
B486670698085919293959495969697
Mean35.050.562.065.571.578.582.584.588.091.590.594.0 95.5 95.097.5
1711A223656596575767983888691949599
B446671708087919294949495969597
Mean33.051.063.564.572.5 81.0 83.585.588.591.090.093.095.095.0 98.0
1712A223455626675747782888793959698
B436771728386919294959596959697
Mean32.550.563.067.074.580.582.584.588.091.591.0 94.5 95.096.097.5
1713A233453596574757783878793959797
B426369708186929294949495959697
Mean32.548.561.064.573.080.083.584.588.590.590.594.095.0 96.5 97.0
1714A243753606773747982878993959597
B436569718286919293959495959597
Mean33.551.061.065.574.579.582.585.587.591.091.594.095.095.097.0
1715A233452616671747881878791929596
B446269728186929294959495959597
Mean33.548.060.566.573.578.583.085.087.591.090.593.093.595.096.5
1716A223449606770737981888790949596
B456569728387939294959495959597
Mean33.549.559.066.0 75.0 78.583.085.587.591.590.592.594.595.096.5
1717A213251516065687679868589949394
B426269708284889192959494949497
Mean31.547.060.0 60.5 71.074.578.083.585.590.589.591.594.0 93.5 95.5

The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and .

Table 10

Classification accuracies (%) of OPCALDA on data set B with full training data.

ParticipantsIntensification sequences
123456789101112131415
171A163450546266727679848691909595
B395764738086899191949394939494
Mean 27.5 45.5 57.0 63.5 71.0 76.0 80.5 83.5 85.089.089.592.5 91.5 94.5 94.5
172A213249576469737678868491929597
B436369758185899190939596949596
Mean 32.0 47.5 59.0 66.0 72.5 77.081.0 83.5 84.0 89.589.593.593.095.096.5
173A183750606572747679878891949596
B446263757986899290949498969697
Mean31.0 49.5 56.567.572.079.081.584.084.5 90.5 91.0 94.5 95.0 95.596.5
174A193750616472767879868892949596
B426262777885889090949496959797
Mean30.5 49.5 56.0 69.0 71.078.582.084.084.590.0 91.0 94.094.5 96.0 96.5
175A193652626572767880858892949596
B426263757885889090949497959697
Mean30.549.057.568.571.578.582.084.0 85.0 89.5 91.0 94.5 94.595.596.5
176A193451636473758180868792949597
B436062757885889090949496959797
Mean31.0 47.0 56.5 69.0 71.079.081.585.5 85.0 90.090.594.094.5 96.0 97.0
177A193550636573758180878792949597
B436162757986899189949495959696
Mean31.048.056.0 69.0 72.0 79.5 82.086.084.5 90.5 90.593.594.595.596.5
178A183450626270768180868792949597
B446161757986899089949395949696
Mean31.047.555.568.570.578.082.585.584.590.090.093.594.095.596.5
179A193450626068768280868792949497
B446161747986899089949395949596
Mean31.547.555.568.069.577.082.586.084.590.090.093.594.094.596.5
1710A193450616067768280868792949497
B446261748087899189939395939496
Mean31.548.055.567.570.077.082.5 86.5 84.589.590.093.593.594.096.5
1711A203351595866768280868792949497
B446161738187899189939395939596
Mean 32.0 47.056.066.069.576.582.5 86.5 84.589.590.093.593.594.596.5
1712A193351605765768180868892949497
B446161738187909189939395939596
Mean31.547.056.066.569.076.083.086.084.589.590.593.593.594.596.5
1713A183351595665758180868892929496
B446261738188909089939395939597
Mean31.047.556.066.0 68.5 76.582.585.584.589.590.593.592.594.596.5
1714A183350585665768081868792929496
B446261738188919089939394939597
Mean31.047.555.565.5 68.5 76.5 83.5 85.0 85.0 89.590.093.092.594.596.5
1715A183549575664768081868492929395
B446361738190919089939395939597
Mean31.049.0 55.0 65.0 68.5 77.0 83.5 85.0 85.0 89.5 88.5 93.592.594.096.0
1716A183549565664767980868491929395
B446362738190919090939394939597
Mean31.049.055.564.5 68.5 77.0 83.5 84.5 85.0 89.5 88.5 92.592.594.096.0
1717A183449565663767879848491929295
B466462738190919090939393939597
Mean 32.0 49.055.564.5 68.5 76.5 83.5 84.084.5 88.5 88.5 92.0 92.5 93.5 96.0

The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and .

Table 11

Classification accuracies (%) of OLDA on data set B with full training data.

ParticipantsIntensification sequences
123456789101112131415
171A34244589119814111215
B044101014022333
Mean 1.5 4.0 3.0 2.5 2.0 3.0 4.0 5.0 7.5 4.5 5.0 8.0 7.0 7.5 9.0
172A203648555964737878908792929395
B355263636978828183889193899195
Mean27.544.055.559.064.071.077.579.580.589.089.092.590.592.095.0
173A243953546267757781878892959597
B436567747885878788949494939394
Mean33.5 52.0 60.064.070.076.081.082.084.5 90.5 91.093.094.094.0 95.5
174A253552566268767882878995979597
B436767757985878989949494939394
Mean 34.0 51.059.565.570.576.581.583.5 85.5 90.5 91.594.595.094.0 95.5
175A253351556269767781878996969597
B436870757984878989949594949593
Mean 34.0 50.5 60.5 65.070.576.581.583.085.0 90.5 92.0 95.0 95.0 95.0 95.0
176A253152586469757882869196969697
B436769757984888989949594949493
Mean 34.0 49.0 60.5 66.5 71.5 76.581.583.5 85.5 90.093.0 95.0 95.0 95.0 95.0
177A253151586269757882869196979697
B416770737985889089949594949493
Mean33.049.0 60.5 65.570.5 77.0 81.584.0 85.5 90.093.0 95.0 95.5 95.0 95.0
178A243249596368767982869196959697
B406771737984889088949594949493
Mean32.049.560.066.071.076.0 82.0 84.585.090.093.0 95.0 94.5 95.0 95.0
179A263449606367767981859196959697
B416671737985879088939594949393
Mean33.550.060.0 66.5 71.076.081.584.584.589.093.0 95.0 94.594.595.0
1710A253448606267768081859296959797
B406570717986879088939594949394
Mean32.549.559.065.570.576.581.585.084.589.0 93.5 95.0 94.5 95.0 95.5
1711A243348606165778081869296969696
B406569717986879088939494949395
Mean32.049.058.565.570.075.5 82.0 85.084.589.593.0 95.0 95.094.5 95.5
1712A243148595964778180869296969595
B396569708085879089939294949295
Mean31.548.058.564.569.574.5 82.0 85.5 84.589.592.0 95.0 95.093.595.0
1713A253148585864778080869196969595
B396569698085878988939294949395
Mean32.048.058.563.569.074.5 82.0 84.584.089.591.5 95.0 95.094.095.0
1714A253148595865768080849095969595
B406569708085878988939294939395
Mean32.548.058.564.569.075.081.584.584.088.591.094.594.594.095.0
1715A223147595965768080848994969595
B406570698085878988939294939295
Mean31.048.058.564.069.575.081.584.584.088.590.594.094.593.595.0
1716A223047595965768080848894959595
B406570698085878988939294939295
Mean31.047.558.564.069.575.081.584.584.088.590.094.094.093.595.0
1717A223046595865768080838893959595
B406570698085878988939294939295
Mean31.047.558.064.069.075.081.584.584.088.090.093.594.093.595.0

The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and .

The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and . The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and . The best mean accuracy among all for each repetition is written in bold and the worst is underlined. An overlapped ensemble classifier becomes an ensemble classifier with naive partitioning when and . The classifier is equivalent to a single classifier when and . The classification performances of ensemble classifiers with the overlapped partitioning (OSWLDA, OPCALDA and OLDA, , ) were as well as, or slightly better than those with naive partitioning ( and ) and those single classifier ( and ) when 15300 training data were available in most sequences. The best classification performance was achieved by OSWLDA; 98% when , , . In other words, OSWLDA achieved a higher classification performance than the ensemble of SVMs achieved by the winner of BCI competition III data set II [23]. OSWLDA achieved about 3% improvement over single SWLDA (, ). However little improvement by the ensemble classifier with the overlapped partitioning can be seen compared to the single classifier, just as the analysis of data set A using cross-validation.

Discussion

In order to ensure the influence of the overlapped partitioning compared to traditional naive partitioning and a single classifier, classification accuracies of ensemble classifiers with those partitioning methods were compared when 900 training data were given. Two different P300-based BCI data sets were evaluated; data set A with cross-validation and data set B using limited training data. The single classifier ( ) and the traditional ensemble classifier with naive partitioning ( and ) were also compared at the same time. One of three conditions for dimension reduction methods (stepwise, PCA, and none ) was also applied. The results show that OSWLDA trained on 900 ERPs achieved higher classification accuracy than the single SWLDA classifier ( ) and the ensemble SWLDA classifier with naive partitioning ( ) for both data sets (see Tables 4 and 6). More specifically, the proposed OSWLDA learned on 900 ERPs achieved a higher accuracy than the single SWLDA for data set A (, , ) and higher than the single SWLDA for data set B (, , ), where the single SWLDA is an established and commonly used classification algorithm for P300-based BCI. The performance improvement of proposed classifiers trained on 900 ERPs was due to the mutual effect of the overlapped partitioning and the dimension reduction. In the statistical analysis of data set A using cross-validation, the main effects of the intensification sequence, degree of overlap (), dimension reduction conditions, and their interactions were significant. According to the results shown in Figure 8 (c), indeed, the overlapped ensemble LDA classifier without dimension reduction (OLDA) did not achieve higher classification accuracies than a single LDA classifier () in many cases. Applying a dimension reduction method in itself is a solution to improve the classification performance of the ensemble classifier with naive partitioning. However, as shown in Figures 8 (a) and (b), when , the dimension reduction method alone did not improve the classification accuracy as compared to their single classifiers. On the other hand, as also shown in Figures 8 (a) and (b), the overlapped ensemble LDA classifier together with the stepwise method (OSWLDA, ) or PCA (OPCALDA, ) achieved higher classification accuracy than their single classifiers (). This tendency was obvious, especially for OSWLDA. Thus, the improvement in the classification accuracy was due not only to the dimension reduction or partitioning method by themselves but also to their mutual effects. Taking this into consideration, the overlapped partitioning method, together with the dimension reduction method, effectively improved the classification performance of P300-based BCI. The performance improvement of the proposed classifiers compared to the single classifier was small when a large amount of training data were provided. However, the classification performances of proposed classifiers trained on a large amount of data were high enough to achieve 99.6% for data set A (see Table 5) and 98% for data set B (see Table 9). In those cases, however, a major performance improvement caused by overlapped partitioning was not confirmed. This was because the given training data were large enough so that the overfitting problem should not occur in most cases. Thus the advantage of overlapped partitioning can be seen when a small amount of high-dimensional training data were provided such as for the analysis of data set A using cross-validation and the data set B with limited training data. We suggest to use the conventional cross-validation to find the optimal overlapping ratio before an online experiment. However the method prolongs the training time for the classifier. Instead of that, we also suggest to use (e.g., and ) because it showed suboptimal results for both data sets. In the small training data case (900 ERP data), OSWLDA and OPCALDA with ( and ) was suboptimal for both data sets A and B, but OLDA with performed as well only for data set A. In the large training data case, OSWLDA, OPCALDA and OLDA with ( and ) evaluated on data set A and with ( and ) evaluated on data set B achieved reasonable classification accuracies. In this way, the overlapping ratio was suboptimal and it can be employed to avoid using the cross-validation. This study first showed that the ensemble LDA classifiers with conventional naive partitioning were not effective compared to the single LDA classifier and the ensemble classifier with overlapped partitioning when 900 training data were given. This result implies that the ensemble LDA classifier with naive partitioning requires a longer training session to obtain more than 900 training data before an online experiment. It should be noted that 900 training data were the smallest used for the evaluation of the ensemble classifier to date. In contrast, the ensemble classifiers with the proposed overlapped partitioning method showed a significant improvement in the classification accuracy, which was even better than a single classifier when the stepwise method or PCA was applied for dimension reduction. Thus, overlapped partitioning was shown to be more practical than naive partitioning when the given training data were small (e.g., 900 training data). The performance deterioration of the ensemble LDA classifiers with naive partitioning may be due to the poor estimation of the covariance matrices of LDA weak learners. Such performance deterioration can be seen in the results of OLDA on data set A using cross-validation (, ), OLDA on data set A using cross-validation, OSWLDA and OPCALDA on data set B with limited training data (, ), OLDA on data set B with limited training data, and OLDA on data set B with full training data (, ). The problem can be seen when because a small amount of training data were provided to the weak learners (see Tables 2 and 3). Regarding data set B, 900 training data were not sufficient to train weak learners of OLDA ( with limited training data and with full training data). Compared to data set A, data set B seems to require larger training data because the EEG data of data set B were higher dimensional (896 dimension). Estimated covariance matrices are imprecise when a small amount of high dimension training data are given [22]. Johnson and Krusienski first evaluated the classification performance of the ensemble SWLDA classifier with naive partitioning [27]. They evaluated the algorithm by changing the number of classifiers ( was changed while was fixed to 1). In addition, three weighting methods for the ensemble classifier were evaluated. As a result, they found that the ensemble SWLDA classifier showed better performance than the single SWLDA classifier, depending on participants, though the statistical difference was not revealed. They also discussed that the classification performance was decreased when and because the amount of training data for a weak learner becomes small. We consider that a similar problem arose in the application of the ensemble classifier with overlapped partitioning when and , which is similar to their conditions. Such a problem can be avoided by applying the overlapped partitioning together with a dimension reduction method. The ensemble classifiers with overlapped partitioning trained on 900 ERPs showed better classification performances than a single classifier in the middle intensification sequence condition in the offline analysis. According to Figure 8 (a), OSWLDA () achieved higher classification accuracy than the single SWLDA classifier ( ) among . In contrast, the OPCALDA () showed higher classification accuracy than the single PCA LDA classifier ( ) when . This result implies that the ensemble classifier with overlapped partitioning was beneficial in the middle number of the intensification sequence. decides the terms to compute the score for decision making according to Equation 2. The performance saturation can be seen as the become larger while the classification performance was not precise when was smaller. In both cases, differences of those classification performances were hard to confirm. This might explain why the classification performance difference was obvious in the middle number of sequences. The selection of the number of the intensification sequence in an online P300-based BCI experiment depends on the applications of the BCI system. One criterion is the information transfer rate (ITR), which takes the accuracy, number of outputs, and output time (the number of sequences) into consideration [35]. OSWLDA on data set A using cross-validation () achieved the highest ITR (15.7 bits per minute) at , although only a 71.4% accuracy was expected in an online experiment. On the other hand, accuracy must be prioritized, for example, when the BCI is used to provide precise control of a robotic manipulator that could be dangerous. To decide parameters such as the number of intensification sequences, we should consider what kind of criterion (accuracy, speed, or ITR) should be optimized in terms of BCI applications. Determining the amount of training data also decides an expected online classification accuracy. If the system needs over 70% mean classification accuracy, only 900 training data are required. In case that over 95% mean accuracy is required, a large amount of training data should be prepared. Most BCI applications do not usually require over 95% classification accuracy because they are free from danger. Thus 900 training data are sufficient to achieve over 70% mean accuracy for most applications of BCI. We would like to emphasize that the ensemble classifiers with overlapped partitioning required less training data than that with naive partitioning. OSWLDA and OPCALDA performs better than the ensemble classifier with naive partitioning enough to achieve over 90% classification accuracy using only 900 training data. Especially the mean classification accuracy of OSWLDA () with the small training data achieved as well as that of ensemble SWLDA with naive partitioning () for data set A. In this way the ensemble classifier with overlapped partitioning require less training samples than that with naive partitioning so that it might be useful to do away with expensive experiments. In this research, the PCA and stepwise method were applied as a dimension reduction. The PCA and the stepwise method have different statistical properties; PCA finds the projection that maximizes the data variance while the stepwise method selects spatiotemporal variables. Although no great difference was found in the classification accuracy for data set A using and cross-validation and data set B with full training data, OSWLDA showed better performance than OPCALDA for data set B with limited training data. In this way, the stepwise method was robust for both P300-based BCI data sets. The difference between the two also appears in the online/offline test computational cost; the stepwise method requires a smaller processing burden than PCA because the stepwise method in the test case does not use data projection. The difference will be more obvious when becomes large. Considering the computational cost, the stepwise method is preferable in case a large number of classifiers are required. In future research, LDA with shrinkage [22] or Bayesian LDA [32] will be applied to the ensemble classifier with overlapped partitioning. These two methods estimate covariance matrices in different ways so that LDA in itself becomes robust against a lack of training data. Thus, it may be possible to achieve better classification accuracy with a smaller amount of training data by applying the two methods. The proposed ensemble classifiers with overlapped partitioning may be applicable to other types of BCIs such as an event-related desynchronization/synchronization (ERD/ERS)-based BCI [42]. In fact, some ensemble classifiers for ERD/ERS-based BCIs were evaluated [43] and our proposed overlapped ensemble classifiers might also be applicable. Moreover, the ensemble classifier with the overlapped partitioning can be used in other pattern recognition problems, e.g., a cancer classification [44] or fMRI data analysis [45]. Furthermore, clustering algorithms such as k-means clustering [46] could be used for a new overlapped partitioning of the ensemble classifiers. By clustering the data with overlaps, classifiers that perform well for specific features can be trained. Thus, the clustered partitioning with overlaps may show an even better classification performance.

Conclusion

In this study, ensemble LDA classifiers with the newly proposed overlapped partitioning method were evaluated on our original P300-based BCI data set and the BCI competition III data set II. In the comparison, the classifiers were trained on limited training data (900) and large training data. The ensemble LDA classifier with traditional naive partitioning and the single classifier were also evaluated. One of three conditions for dimension reduction (stepwise, PCA, or none ) was applied. As a result, the ensemble LDA classifier with overlapped partitioning and the stepwise method (OSWLDA) showed higher accuracy than the commonly used single SWLDA classifier and the ensemble SWLDA classifier when 900 training data were available. In addition, the ensemble LDA classifiers with naive partitioning showed the worst performance for most conditions. We suggest to use the stepwise method as a dimension reduction for the online implementation. In future research, the LDA with shrinkage or Bayesian LDA will be applied to the ensemble classifier with overlapped partitioning.
  29 in total

1.  Auditory and visual P300 topography from a 3 stimulus paradigm.

Authors:  J Katayama; J Polich
Journal:  Clin Neurophysiol       Date:  1999-03       Impact factor: 3.708

Review 2.  Brain-computer interfaces for communication and control.

Authors:  Jonathan R Wolpaw; Niels Birbaumer; Dennis J McFarland; Gert Pfurtscheller; Theresa M Vaughan
Journal:  Clin Neurophysiol       Date:  2002-06       Impact factor: 3.708

3.  Novel protocols for P300-based brain-computer interfaces.

Authors:  Mathew Salvaris; Caterina Cinel; Luca Citi; Riccardo Poli
Journal:  IEEE Trans Neural Syst Rehabil Eng       Date:  2011-12-12       Impact factor: 3.802

4.  Documenting, modelling and exploiting P300 amplitude changes due to variable target delays in Donchin's speller.

Authors:  Luca Citi; Riccardo Poli; Caterina Cinel
Journal:  J Neural Eng       Date:  2010-09-01       Impact factor: 5.379

5.  Single-trial analysis and classification of ERP components--a tutorial.

Authors:  Benjamin Blankertz; Steven Lemm; Matthias Treder; Stefan Haufe; Klaus-Robert Müller
Journal:  Neuroimage       Date:  2010-06-28       Impact factor: 6.556

6.  Visual P300-based BCI to steer a wheelchair: a Bayesian approach.

Authors:  Gabriel Pires; Miguel Castelo-Branco; Urbano Nunes
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2008

7.  The MindGame: a P300-based brain-computer interface game.

Authors:  Andrea Finke; Alexander Lenhardt; Helge Ritter
Journal:  Neural Netw       Date:  2009-07-16

8.  Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials.

Authors:  L A Farwell; E Donchin
Journal:  Electroencephalogr Clin Neurophysiol       Date:  1988-12

9.  The P300-based brain-computer interface (BCI): effects of stimulus rate.

Authors:  Dennis J McFarland; William A Sarnacki; George Townsend; Theresa Vaughan; Jonathan R Wolpaw
Journal:  Clin Neurophysiol       Date:  2010-11-09       Impact factor: 3.708

10.  A P300-based brain-computer interface for people with amyotrophic lateral sclerosis.

Authors:  F Nijboer; E W Sellers; J Mellinger; M A Jordan; T Matuz; A Furdea; S Halder; U Mochty; D J Krusienski; T M Vaughan; J R Wolpaw; N Birbaumer; A Kübler
Journal:  Clin Neurophysiol       Date:  2008-06-20       Impact factor: 3.708

View more
  3 in total

Review 1.  Progress in EEG-Based Brain Robot Interaction Systems.

Authors:  Xiaoqian Mao; Mengfan Li; Wei Li; Linwei Niu; Bin Xian; Ming Zeng; Genshe Chen
Journal:  Comput Intell Neurosci       Date:  2017-04-05

Review 2.  Poststroke Cognitive Impairment Research Progress on Application of Brain-Computer Interface.

Authors:  Xiaowei Sun; Mingyue Li; Quan Li; Hongna Yin; Xicheng Jiang; Hongtao Li; Zhongren Sun; Tiansong Yang
Journal:  Biomed Res Int       Date:  2022-02-07       Impact factor: 3.411

3.  Covariate shift estimation based adaptive ensemble learning for handling non-stationarity in motor imagery related EEG-based brain-computer interface.

Authors:  Haider Raza; Dheeraj Rathee; Shang-Ming Zhou; Hubert Cecotti; Girijesh Prasad
Journal:  Neurocomputing       Date:  2019-05-28       Impact factor: 5.719

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.