| Literature DB >> 34140883 |
Wonjun Ko1, Eunjin Jeon1, Seungwoo Jeong2, Jaeun Phyo1, Heung-Il Suk1,2.
Abstract
Brain-computer interfaces (BCIs) utilizing machine learning techniques are an emerging technology that enables a communication pathway between a user and an external system, such as a computer. Owing to its practicality, electroencephalography (EEG) is one of the most widely used measurements for BCI. However, EEG has complex patterns and EEG-based BCIs mostly involve a cost/time-consuming calibration phase; thus, acquiring sufficient EEG data is rarely possible. Recently, deep learning (DL) has had a theoretical/practical impact on BCI research because of its use in learning representations of complex patterns inherent in EEG. Moreover, algorithmic advances in DL facilitate short/zero-calibration in BCI, thereby suppressing the data acquisition phase. Those advancements include data augmentation (DA), increasing the number of training samples without acquiring additional data, and transfer learning (TL), taking advantage of representative knowledge obtained from one dataset to address the so-called data insufficiency problem in other datasets. In this study, we review DL-based short/zero-calibration methods for BCI. Further, we elaborate methodological/algorithmic trends, highlight intriguing approaches in the literature, and discuss directions for further research. In particular, we search for generative model-based and geometric manipulation-based DA methods. Additionally, we categorize TL techniques in DL-based BCIs into explicit and implicit methods. Our systematization reveals advances in the DA and TL methods. Among the studies reviewed herein, ~45% of DA studies used generative model-based techniques, whereas ~45% of TL studies used explicit knowledge transferring strategy. Moreover, based on our literature review, we recommend an appropriate DA strategy for DL-based BCIs and discuss trends of TLs used in DL-based BCIs.Entities:
Keywords: brain–computer interface; data augmentation; deep learning; electroencephalography; transfer learning
Year: 2021 PMID: 34140883 PMCID: PMC8204721 DOI: 10.3389/fnhum.2021.643386
Source DB: PubMed Journal: Front Hum Neurosci ISSN: 1662-5161 Impact factor: 3.169
Figure 1Overview of DL-based short/zero calibration approaches.
Geometric manipulation data augmentation methods.
| Raw data | Geometric | Zhang et al., | Motor imagery | Rotated (180°), shifted, and changed RGB values of STFT images estimated from raw EEGs |
| Shovon et al., | Rotated (5°), flipped, zoomed, brightened (±30%) STFT images estimated from raw EEGs | |||
| Schirrmeister et al., | Cropped raw EEG using a sliding window | |||
| Ko et al., | Cropped raw EEG using a sliding window | |||
| Majidov and Whangbo, | Cropped raw EEG using a sliding window | |||
| Freer and Yang, | Flipped raw EEG | |||
| Mousavi et al., | Sleep | Cropped raw EEG using a sliding window | ||
| Supratak and Guo, | Shifted raw EEG | |||
| Sakai et al., | Cognition | Shifted raw EEG | ||
| Noise addition | Zhang et al., | Motor imagery | Added Gaussian noise (std of 0.1) | |
| Freer and Yang, | Used uniform noise ([−0.5, 0.5]) | |||
| Wang F. et al., | Emotion | Added Gaussian noise (std of 0.001~0.5) | ||
| Recombination | Freer and Yang, | Motor imagery | Segmented and recombined raw EEGs | |
| Cho et al., | Segmented and recombined raw EEGs | |||
| Dai et al., | Segmented and recombined raw EEGs | |||
| Huang et al., | Segmented and recombined STFT images | |||
| Fahimi et al., | Motor | Segmented and recombined both raw EEGs and STFT images | ||
| Zhao X. et al., | Seizure | Segmented and recombined DCT images | ||
| Fan et al., | Sleep | Segmented and recombined raw EEGs; compared synthesizing qualities to other DA methods | ||
| Supratak and Guo, | Segmented and recombined raw EEGs | |||
| SMOTE | Lee T. et al., | ERP | Used borderline-SMOTE algorithm to raw EEGs | |
| Sun et al., | Sleep | Used SMOTE algorithm to hand-crafted features | ||
| Amplifying | Freer and Yang, | Motor imagery | Amplified raw EEG ±2~20% | |
| Sakai et al., | Cognition | Amplified raw EEG ±10% | ||
| Mixup | ||||
| Intrinsic mode | EMD | Zhang et al., | Motor imagery | Estimated and recombined IMFs of raw EEGs |
| Dinarès-Ferran et al., | Estimated and recombined IMFs of raw EEGs | |||
| Kalaganis et al., | Cognition | Estimated and recombined IMFs of graphs estimated by raw EEGs | ||
| SOM (Kohonen, | Liu et al., | Drowsy | Conducted ASSOM algorithm |
Implicit transfer learning methods.
| Fine-tuning | Whole | Shovon et al., | Motor imagery | Pre-trained with natural images |
| Aznan et al., | SSVEP | Pre-trained with synthetic SSVEP samples | ||
| Andreotti et al., | Sleep | Trained their network with source subjects and fine-tuned it with target subject (LOO) | ||
| Phan et al., | Pre-trained network with different dataset | |||
| Vilamala et al., | Pre-trained network with natural images | |||
| Fahimi et al., | Cognition | Trained their network with source subjects and fine-tuned it with target subject (LOO) | ||
| Partial | Zhang et al., | Motor imagery | Fine-tuned only fully-connected layers | |
| Zhao et al., | Conducted ablation studies to identify which layer should be transferred target | |||
| Raghu et al., | Seizure | Fine-tuned the last some layers of pre-trained network | ||
| Olesen et al., | Sleep | Fine-tuned parts of parameters | ||
| Enhancing | Attention | Zhang et al., | Motor imagery | Designed a self-attention module to find more class-discriminative segments |
| Zhang et al., | Designed a recurrent self-attention module | |||
| Zhang et al., | Presented raw EEG to a spatial graph and designed a recurrent self-attention module | |||
| Zhang et al., | Presented raw EEG to a spatial graph and designed two attention modules; one for attentive temporal dynamics and the other for attentive channels | |||
| Multi-scale features | Kwon et al., | Extracted spatio-spectral features in multi-frequency bands using CSP and selected top bands to use them as inputs | ||
| Ko et al., | Multi | Extracted multi-scale features including spatio-temporal-spectral patterns | ||
| Maximize mutual information | Jeon et al., | Motor imagery | Decomposed an intermediate feature into a class-relevant and class-irrelevant feature and maximized mutual information between low-level and high-level representations | |
| Meta-learning | MAML (Finn et al., | Duan et al., | Multi | Trained optimal parameters through gradient-based optimization and conducted fine-tuning with a small amount of target data |
| Relation (Sung et al., | An et al., | Motor imagery | Estimated relation scores between support and query sets among source subjects in few-shot scenarios |
Figure 2Cropping strategy using a sliding window (Schirrmeister et al., 2017; Ko et al., 2018). For a raw EEG signal, a sliding window with a length shorter than that of the signal moves on EEG with a predefined stride. Subsequently, the window crops a part of signal for augmentation.
Figure 3EEG segmentation and recombination method (referred to Dai et al., 2020). EEG samples are segmented into constant lengths. The divided pieces are then randomly recombined to generate new signals.
Figure 4Illustration of empirical mode decomposition (EMD) (Flandrin et al., 2004) which decomposes modes, i.e., intrinsic mode function (IMF) of the input signal (re-illustrated from Dinarès-Ferran et al., 2018).
Figure 5Illustration of generative adversarial network (GAN) (Goodfellow et al., 2014). Generator outputs generated data using a random noise vector z. Then, discriminator distinguishes generated data from real data x.
Figure 6Illustration of variational autoencoder (VAE) (Kingma and Welling, 2014). Encoder learns latent space of input data x. From the learned latent space, latent code z is sampled and input to decoder . Finally, decoder reconstructs input data .
Deep generative data augmentation methods.
| GAN | GAN | Roy et al., | Motor imagery | Devised LSTM-based generator and discriminator; qualitatively analyzed generated signals |
| Krishna et al., | Speech | Devised GRU-based generator and discriminator | ||
| LSGAN | Pascual et al., | Seizure | Devised U-Net-based generator and discriminator; used conditional GAN concept | |
| DCGAN | Zhang et al., | Motor imagery | Generated STFT images estimated from raw EEGs; compared synthesizing quality to other DA methods | |
| Zhang and Liu, | Compared classification accuracy of testing dataset for different ratio of raw data and artificial data; used conditional GAN concept | |||
| Fahimi et al., | Motor | Used feature vector with the random noise for the generator input | ||
| Lee Y. E. et al., | ERP | Used features of EEG signals during walking as the generator input to reconstruct EEG signals similar to ones during standing | ||
| Truong et al., | Seizure | Generated STFT images estimated from raw EEGs | ||
| Truong et al., | Generated STFT images estimated from raw EEGs | |||
| Fan et al., | Sleep | Compared synthesizing quality to other DA methods | ||
| WGAN | Ko et al., | Motor imagery | Conducted gradient penalty rather than weight clipping; used semi-supervised GAN concept | |
| Hartmann et al., | Motor | Conducted gradient penalty rather than weight clipping | ||
| Aznan et al., | SSVEP | Compared synthesizing quality to VAE-based DA methods; experimented TL setting | ||
| Panwar et al., | RSVP | Conducted gradient penalty rather than weight clipping; used conditional GAN concept | ||
| Luo et al., | Emotion | Conducted gradient penalty rather than weight clipping; used conditional GAN concept | ||
| Luo and Lu, | Conducted gradient penalty rather than weight clipping; used conditional GAN concept | |||
| Panwar et al., | Drowsy | Conducted gradient penalty rather than weight clipping | ||
| Hwang et al., | Cognition | Designed zero-calibration experiments | ||
| VAE | AE | Zhang et al., | Motor imagery | Generated STFT images estimated from raw EEGs; compared synthesizing quality to other DA methods |
| VAE | Zhang et al., | Motor imagery | Generated STFT images estimated from raw EEGs; compared synthesizing quality to other DA methods | |
| Fahimi et al., | Motor | Compared synthesizing quality to other DA methods | ||
| Aznan et al., | SSVEP | Compared synthesizing quality to VAE-based DA methods; experimented TL setting | ||
| Luo et al., | Emotion | Compared synthesizing quality to VAE-based DA methods |
Figure 7Concept of explicit TL-based methods. The alignment can be achieved by minimizing a divergence between different domains.
Figure 8Illustration of domain adversarial neural network (DANN) (Ganin et al., 2016). and denote a classification loss and domain loss, respectively. Through a GRL where gradients of a domain loss are reversed by multiplying a negative value, a domain loss is minimized in a domain discriminator and maximized in a feature extractor.
Explicit transfer learning methods.
| Non-parametric | MMD | Hang et al., | Motor imagery | Minimized MMD in a feature level and introduced CDFL |
| Chai et al., | Emotion | Minimized MMD in a feature level and trained AE and classifier separately | ||
| KLD | Zhang et al., | Sleep | Minimized KLD in a feature level and trained with classifier in an end-to-end manner | |
| EA | Kostas and Rudzicz, | Multi | Constrained that the mean covariance matrix becomes an identity matrix in a raw data level | |
| Adversarial | A-cVAE | Özdenizci et al., | Motor imagery | Added an adversarial network to cVAE, and trained cVAE and classifier separately |
| DANN | Özdenizci et al., | Motor imagery | Devised DANN by exploiting various CNN-based architectures as their feature extractor | |
| Zhao H. et al., | Added center loss for target to minimize intra-class compactness and maximize inter-class separability | |||
| Tang and Zhang, | Fed output of a classifier into a domain discriminator | |||
| Jeon et al., | Selected source based on resting-state EEG signals | |||
| Wei et al., | RSVP | Selected sources based on a ranking of performances in subject-specific classifiers | ||
| Wang et al., | Emotion | Selected sources based on a ranking of performances in subject-specific classifiers and devised centroid alignment loss | ||
| Nasiri and Clifford, | Sleep | Estimated attention maps using channel-wise domain discriminators | ||
| Ma et al., | Drowsy | Trained additional parameters capturing subject-specific features |
Figure 9Conceptual illustration of meta-learning in BCI. Meta-learning aims to learn how to quickly adapt to new subjects by updating parameters based on a variety of tasks acquired from multiple subjects. Subsequently, the trained feature space can be considered as subject-invariant space that can be efficiently applied to new subjects with short/zero calibrations.
| A-cVAE | Adversarial conditional variational autoencoder |
| AE | Autoencoder |
| ASSOM | Adaptive subspace self-organizing map |
| BCI | Brain–computer interface |
| BMU | Best matching unit |
| BN | Batch normalization |
| CCE | Categorical cross-entropy |
| CDFL | Center-based discriminative feature learning |
| CNN | Convolutional neural network |
| CSP | Common spatial pattern |
| cVAE | Conditional variational autoencoder |
| DA | Data augmentation |
| DANN | Domain adversarial neural network |
| DCGAN | Deep convolutional generative adversarial network |
| DCT | Discrete cosine transform |
| DL | Deep learning |
| EA | Euclidean alignment |
| EEG | Electroencephalography |
| EMD | Empirical mode decomposition |
| ERP | Event-related potential |
| GAN | Generative adversarial network |
| GRL | Gradient reversal layer |
| GRU | Gated recurrent unit |
| IMF | Intrinsic mode functions |
| JSD | Jensen-Shannon distance |
| KLD | Kullback-Leibler divergence |
| LOO | Leave-one subject-out |
| LSGAN | Least square generative adversarial network |
| LSTM | Long-short term memory |
| MAML | Model-agnostic meta learning |
| MINE | Mutual information neural estimator |
| MMD | Maximum mean discrepancy |
| RKHS | Reproducing kernel Hilbert space |
| RSVP | Rapid serial visual presentation |
| SMOTE | Synthetic minority oversampling technique |
| SOM | Self-organizing map |
| SPD | Symmetric positive definite |
| SSVEP | Steady-state visual evoked potential |
| STFT | Short-time Fourier transform |
| TL | Transfer learning |
| VAE | Variational autoencoder |
| WGAN | Wasserstein generative adversarial network |