| Literature DB >> 26090799 |
Hossein Bashashati1, Rabab K Ward1, Gary E Birch2, Ali Bashashati3.
Abstract
A problem that impedes the progress in Brain-Computer Interface (BCI) research is the difficulty in reproducing the results of different papers. Comparing different algorithms at present is very difficult. Some improvements have been made by the use of standard datasets to evaluate different algorithms. However, the lack of a comparison framework still exists. In this paper, we construct a new general comparison framework to compare different algorithms on several standard datasets. All these datasets correspond to sensory motor BCIs, and are obtained from 21 subjects during their operation of synchronous BCIs and 8 subjects using self-paced BCIs. Other researchers can use our framework to compare their own algorithms on their own datasets. We have compared the performance of different popular classification algorithms over these 29 subjects and performed statistical tests to validate our results. Our findings suggest that, for a given subject, the choice of the classifier for a BCI system depends on the feature extraction method used in that BCI system. This is in contrary to most of publications in the field that have used Linear Discriminant Analysis (LDA) as the classifier of choice for BCI systems.Entities:
Mesh:
Year: 2015 PMID: 26090799 PMCID: PMC4474725 DOI: 10.1371/journal.pone.0129435
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The BCI framework.
Our framework has three main steps: 1. Filtering, 2. Feature Extraction and 3. Classification. The output of each step is fed to the next step.
Specification of datasets used in this paper.
| Dataset | Type | Number of Subjects | Task | Number of Channels |
|---|---|---|---|---|
| BCICIII3b | Synchronous | 3 | left hand vs. right hand | 2 |
| BCICIV2b | Synchronous | 9 | left hand vs. right hand | 3 |
| BCICIV2a | Synchronous | 9 | left hand vs. right hand vs. both feet vs. tongue | 22 |
| BCICIV1 | self-paced | 4 | left hand, right hand and foot vs. No-Control | 59 |
| SM2 | self-paced | 4 | right index finger vs. No-Control | 10 |
The accuracy of classifiers for synchronous BCI operation for all subjects.
For each subject the accuracy on the test data is shown. For each classification algorithm the first column shows the results of BP features and the second column shows the results of Morlet features.
|
|
|
|
|
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BP | Morlet | BP | Morlet | BP | Morlet | BP | Morlet | BP | Morlet | BP | Morlet | BP | Morlet | |
| Subject1 (O3) | 75.47 | 80.5 | 80.5 | 82.39 | 74.84 | 79.25 | 81.76 | 77.36 | 81.13 | 74.21 | 79.25 | 60.38 | 78.62 |
|
| Subject2 (S4) | 71.11 | 77.96 | 70.19 |
| 68.52 | 79.26 | 71.11 | 83.52 | 70 | 81.11 | 70.37 | 72.41 | 70.37 | 82.22 |
| Subject3 (X11) | 72.22 |
| 74.26 |
| 71.48 | 77.78 | 72.78 | 77.78 | 74.07 | 76.48 | 74.81 | 72.22 | 74.81 | 76.67 |
| Subject4 (100) | 66.23 | 61.84 | 60.96 | 68.86 | 63.6 | 65.35 | 64.04 | 64.91 | 61.4 |
| 63.16 | 58.77 | 63.6 | 69.3 |
| Subject5 (200) | 53.47 | 51.84 | 56.33 | 58.37 | 54.29 | 54.29 | 56.73 |
| 56.33 | 55.92 | 54.69 | 55.51 | 56.33 | 58.37 |
| Subject6 (300) | 56.52 | 54.35 | 56.09 | 53.48 | 51.74 | 51.74 | 51.74 | 45.22 | 56.09 | 49.13 | 54.35 | 46.52 |
| 51.74 |
| Subject7 (400) | 89.25 | 90.88 | 94.79 | 94.79 | 91.21 | 92.83 | 95.11 | 94.14 | 93.81 | 94.46 |
| 83.39 | 92.18 | 94.46 |
| Subject8 (500) | 61.9 | 86.45 | 67.77 |
| 63.37 | 85.71 | 68.13 | 85.71 | 65.57 | 87.18 | 67.03 | 76.56 | 67.4 | 85.71 |
| Subject9 (600) | 74.1 | 79.68 | 75.7 | 82.87 | 74.1 | 80.48 | 65.74 | 80.88 | 76.1 | 82.07 | 75.3 | 60.96 | 76.89 |
|
| Subject10 (700) | 54.31 | 70.69 | 53.02 | 72.84 | 49.14 |
| 52.59 | 70.69 | 59.05 | 71.55 | 57.33 | 64.66 | 51.29 | 71.12 |
| Subject11 (800) | 91.74 | 83.91 |
| 83.48 | 90 | 86.52 |
| 80.87 |
| 80.87 | 86.52 | 67.83 | 90.87 | 80 |
| Subject12 (900) | 78.37 | 82.45 | 77.55 |
| 75.92 | 82.86 | 76.73 |
| 77.96 | 76.33 | 77.96 | 68.57 | 77.14 | 83.67 |
| Subject13 (1) | 79 | 71.53 | 79 | 60.85 |
| 73.31 | 81.14 | 61.57 | 74.38 | 50.89 | 52.67 | 26.33 |
| 58.01 |
| Subject14 (2) | 51.59 | 53.36 | 61.13 | 54.42 | 53.36 | 51.24 | 57.24 | 57.6 |
| 32.86 | 38.87 | 25.8 | 58.3 | 54.77 |
| Subject15 (3) | 78.75 | 83.15 | 86.45 |
| 78.02 | 82.78 | 84.25 | 78.02 | 80.22 | 50.55 | 41.03 | 27.47 | 85.35 | 83.15 |
| Subject16 (4) | 71.49 | 32.02 |
| 41.23 |
| 45.18 | 71.05 | 36.84 | 60.96 | 35.53 | 36.84 | 30.26 | 72.81 | 34.65 |
| Subject17 (5) | 56.16 | 33.33 |
| 40.58 | 58.33 | 35.14 | 56.88 | 41.67 | 50 | 28.26 | 34.06 | 28.26 |
| 38.77 |
| Subject18 (6) | 51.63 | 24.65 | 56.74 | 26.98 | 52.09 | 26.05 | 57.21 | 25.58 | 54.42 | 27.91 | 33.95 | 26.98 |
| 26.05 |
| Subject19 (7) | 84.48 | 64.98 |
| 56.32 | 81.95 | 71.48 |
| 64.26 | 70.4 | 32.49 | 25.27 | 31.41 | 87 | 60.29 |
| Subject20 (8) | 77.12 | 54.98 | 80.81 | 61.62 | 79.34 | 63.47 |
| 62.36 | 75.65 | 37.27 | 46.49 | 31 | 80.44 | 61.62 |
| Subject21 (9) | 78.03 | 46.97 | 83.71 | 39.39 | 82.95 | 56.44 | 84.09 | 45.08 | 73.86 | 41.29 | 34.47 | 25 |
| 44.32 |
The AUC of classifiers for self-paced subjects.
For each classification algorithm the first column shows the results of BP features and the second column shows the results of morlet features.
|
|
|
|
|
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BP | Morlet | BP | Morlet | BP | Morlet | BP | Morlet | BP | Morlet | BP | Morlet | BP | Morlet | |
| Subject22 (22) | 0.46 | 0.56 |
| 0.58 | 0.49 | 0.49 | 0.41 | 0.48 |
| 0.57 | 0.62 | 0.55 | 0.63 | 0.56 |
| Subject23 (23) | 0.54 | 0.61 | 0.67 | 0.68 | 0.53 | 0.55 | 0.5 | 0.59 | 0.64 | 0.68 | 0.63 | 0.53 | 0.66 |
|
| Subject24 (24) | 0.6 | 0.58 |
| 0.58 | 0.58 | 0.51 | 0.6 | 0.52 | 0.65 | 0.58 | 0.64 | 0.54 | 0.63 | 0.57 |
| Subject25 (25) | 0.36 | 0.31 | 0.78 | 0.77 | 0.6 | 0.69 | 0.47 | 0.6 |
| 0.77 | 0.68 | 0.66 |
| 0.73 |
| Subject26 (a) | 0.59 | 0.53 |
| 0.55 | 0.64 | 0.53 | 0.56 | 0.53 | 0.65 | 0.53 | 0.61 | 0.5 |
| 0.57 |
| Subject27 (b) | 0.79 | 0.77 | 0.82 |
| 0.78 | 0.81 | 0.72 | 0.76 | 0.66 | 0.8 | 0.72 | 0.73 | 0.82 |
|
| Subject28 (f) | 0.49 |
| 0.51 | 0.53 | 0.51 | 0.53 | 0.49 | 0.51 | 0.5 | 0.53 | 0.48 | 0.51 | 0.49 | 0.52 |
| Subject29 (g) | 0.52 | 0.5 | 0.53 |
| 0.52 | 0.51 | 0.53 | 0.51 | 0.53 | 0.55 |
| 0.52 | 0.53 |
|
Average Rankings of the classification algorithms for both synchronous and self-paced datasets.
The number in the parenthesis corresponds to the average rank of the algorithm among different subjects. For each feature extraction method the classifiers typed in bold are the recommended ones. The recommended classifiers are selected based on the results of the statistical tests.
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
| 1 |
|
|
|
|
| 2 |
|
|
|
|
| 3 |
|
|
|
|
| 4 |
|
| QDA(4.18) | BST(4.25) |
| 5 | BST(4.54) | BST(4.16) | RF(4.87) | RF(4.87) |
| 6 | QDA(5.11) | LDA(4.45) | SVM(5.5) | SVM(5.81) |
| 7 | RF(5.21) | QDA(6.61) | BST(5.81) | QDA(5.81) |
P-values corresponding to pairwise comparison of different classifiers.
α is chosen to be 0.1. For settings 1 and 2 all hypothesis with p-value less than 0.0333 are rejected. For setting 3 and 4 all hypothesis with p-value less than 0.05 are rejected. The results are rounded up to 4 decimal places.
| Setting 1(Synchrounous, BP) | Setting 2(Synchrounous, Morlet) | Setting 3(Self-paced, BP) | Setting 4(Self-paced, Morlet) | ||||
|---|---|---|---|---|---|---|---|
| hypothesis |
| hypothesis |
| hypothesis |
| hypothesis |
|
| RF vs. MLP | 0.0006 | QDA vs. LR | 0.0 | SVM vs. LR | 0.0002 | SVM vs. LR | 0.0002 |
| QDA vs. MLP | 0.0010 | LDA vs. LR | 0.0026 | BST vs. LR | 0.0006 | QDA vs. LR | 0.0002 |
| BST vs. MLP | 0.0151 | BST vs. LR | 0.0101 | RF vs. LR | 0.0045 | RF vs. LR | 0.0054 |
| LDA vs. MLP | 0.0801 | SVM vs. LR | 0.0932 | QDA vs. LR | 0.0278 | BST vs. LR | 0.0278 |
| SVM vs. MLP | 0.7750 | MLP vs. LR | 0.1431 | LDA vs. LR | 0.2471 | LDA vs. LR | 0.3854 |
| LR vs. MLP | 0.9430 | RF vs. LR | 0.1985 | MLP vs. LR | 0.3854 | MLP vs. LR | 0.5244 |
Average Rankings of the classification algorithms for binary and multi-class classification in synchronous datasets.
The number of subjects in binary task was 12 and the number of subjects in multi-task BCIs was 9. The number in the parenthesis corresponds to the average rank of the algorithm among different subjects. For each feature extraction method the classifiers typed in bold are the recommended ones. The recommended classifiers are selected based on the results of the statistical tests.
|
|
| |||
|---|---|---|---|---|
|
|
|
|
| |
| 1 |
|
|
|
|
| 2 |
|
|
|
|
| 3 |
|
| RF(3.91) |
|
| 4 |
|
| LDA(3.91) |
|
| 5 |
| BST(4.94) | SVM(4.0) |
|
| 6 |
| LDA(5.0) | BST(4.33) | LDA(5.16) |
| 7 | RF(6.20) | QDA(7.0) | QDA(6.74) | QDA(6.44) |
P-values corresponding to pairwise comparison of different classifiers.
α is chosen to be 0.1. For binary task BCIs with BP features all hypothesis with p-value less than 0.02 are rejected. For multi-task BCIs with BP features all hypothesis with p-value less than 0.0333 are rejected. For binary task BCIs with Morlet features all hypothesis with p-value less than 0.1 are rejected. For multi-task BCIs with Morlet features all hypothesis with p-value less than 0.025 are rejected. The results are rounded up to 4 decimal places.
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|
| hypothesis |
| hypothesis |
| hypothesis |
| hypothesis |
|
| RF vs. SVM | 0.0011 | QDA vs. MLP | 0.0 | QDA vs. LR | 0.0 | QDA vs. RF | 0.0001 |
| BST vs. SVM | 0.2986 | LDA vs. MLP | 0.0045 | BST vs. LR | 0053 | LDA vs. RF | 0.0088 |
| QDA vs. SVM | 0.6706 | BST vs. MLP | 0.0053 | SVM vs. LR | 0.0159 | BST vs. RF | 0.1560 |
| LR vs. SVM | 0.8132 | RF vs. MLP | 0.0808 | RF vs. LR | 0.0206 | MLP vs. RF | 0.2300 |
| MLP vs. SVM | 0.8132 | SVM vs. MLP | 0.4781 | LDA vs. LR | 0.0206 | LR vs. RF | 0.4781 |
| LDA vs. SVM | 0.9247 | LR vs. MLP | 0.9131 | MLP vs. LR | 0.1305 | SVM vs. RF | 0.6234 |
List of classifier parameters tuned in the training phase.
| Random Forest | Number of trees |
| Maximum number features evaluated to split each node | |
| Maximum depth of each tree | |
| Minimum number of samples in each leaf | |
| SVM | C |
| Gamma | |
| LR | Regularization type |
| Regularizer coefficient | |
| Boosting | Number of trees |
| Maximum number features evaluated to split each node | |
| Maximum depth of each tree | |
| Learning rate | |
| MLP | Number of neurons in hidden layer |
| L1 coefficient | |
| L2 coefficient | |
| Learning rate | |
| GDA | None |