| Literature DB >> 33994922 |
Jerrin Thomas Panachakel1, Angarai Ganesan Ramakrishnan1.
Abstract
Over the past decade, many researchers have come up with different implementations of systems for decoding covert or imagined speech from EEG (electroencephalogram). They differ from each other in several aspects, from data acquisition to machine learning algorithms, due to which, a comparison between different implementations is often difficult. This review article puts together all the relevant works published in the last decade on decoding imagined speech from EEG into a single framework. Every important aspect of designing such a system, such as selection of words to be imagined, number of electrodes to be recorded, temporal and spatial filtering, feature extraction and classifier are reviewed. This helps a researcher to compare the relative merits and demerits of the different approaches and choose the one that is most optimal. Speech being the most natural form of communication which human beings acquire even without formal education, imagined speech is an ideal choice of prompt for evoking brain activity patterns for a BCI (brain-computer interface) system, although the research on developing real-time (online) speech imagery based BCI systems is still in its infancy. Covert speech based BCI can help people with disabilities to improve their quality of life. It can also be used for covert communication in environments that do not support vocal communication. This paper also discusses some future directions, which will aid the deployment of speech imagery based BCI for practical applications, rather than only for laboratory experiments.Entities:
Keywords: brain-computer interfaces (BCI); covert speech; electroencephalogram (EEG); imagined speech; inner speech; neurorehabilitation; speech imagery
Year: 2021 PMID: 33994922 PMCID: PMC8116487 DOI: 10.3389/fnins.2021.642251
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Comparison of various modalities for decoding imagined speech.
| EEG | 0.06 ms | 25 mm2 (Yamazaki et al., | Non-invasive | Portable |
| MEG | 0.1 ms | 1 mm (Singh, | Non-invasive | Non-portable |
| ECoG | 0.02 ms | 4 mm (Muller et al., | Invasive | Portable |
| fMRI | 500 ms (Yoo et al., | 0.7 mm (Kashyap et al., | Non-invasive | Non-portable |
| fNIRS | 100 ms (Metzger et al., | 100 mm (Lu et al., | Non-invasive | Portable |
| ICE | 3 ms (Ayodele et al., | 0.05 mm (Ayodele et al., | Invasive | Portable |
.
The actual temporal and spatial resolution may be lower due to volume conduction effects (Burle et al., .
.
.
Figure 1Distribution of the modalities used in the literature on decoding imagined speech. “Others” include functional magnetic resonance imaging (fMRI), functional near-infrared spectroscopy (fNIRS), intracortical electroencephalography (ICE) etc.
Figure 2Flowchart detailing the database searches, the number of abstracts screened, the criteria applied for screening the papers, and the full texts retrieved. The number of records in each stage is given within parenthesis.
Figure 3Various steps involved in the development of a system for decoding imagined speech from EEG. This paper is organized in the same order as above.
Figure 4Simplified representation of dual stream prediction model (DSPM) for imagined speech. The dorsal stream is in yellow boxes, whereas the ventral stream is in blue boxes. The red circle represents the truncation of information at primary motor cortex in the case of speech imagery. pSTG, posterior superior temporal gyrus; STS, superior temporal sulcus. The primary auditory cortex lies in the superior temporal gyrus and extends into Heschl's gyri. Though Heschl's gyri is involved in speech perception, the region is not activated during speech imagery.
Figure 5Graph showing the number of electrodes used for data acquisition in various works on decoding imagined speech from EEG. X and Y-axes represent the number of electrodes and articles, respectively.
Figure 6Graph showing the sampling rates used for data acquisition by the various works in the literature on decoding imagined speech from EEG. X-axis gives the sampling rates and Y-axis gives the number of articles using each specific sampling frequency.
Comparison of the types of EEG systems, sampling rate, decoding strategy and maximum number of degrees of freedom of various studies reviewed in this work.
| 1 | Jahangiri et al. ( | Research | 2 KHz | 256 Hz | Offline | 4 |
| 2 | Wang et al. ( | Research | 250 Hz | N/A | Offline | 2 |
| 3 | Jahangiri et al. ( | Commercial | 500 Hz | 256 Hz | Offline | 4 |
| 4 | Tøttrup et al. ( | Commercial | 500 Hz | N/A | Offline | 6 |
| 5 | Saha et al. ( | Research | 1 KHz | N/A | Offline | 2 |
| 6 | Koizumi et al. ( | Research | 1 KHz | N/A | Offline | 12 |
| 7 | Sereshkeh et al. ( | Research | 1 KHz | N/A | Offline | 2 |
| 8 | Deng et al. ( | Research | 1 KHz | N/A | Offline | 6 |
| 9 | Zhang et al. ( | Research | 500 Hz | N/A | Offline | 4 |
| 10 | Cooney et al. ( | Commercial | 1 KHz | N/A | Offline | 6 |
| 11 | Chengaiyan et al. ( | Commercial | 256 Hz | N/A | Offline | 5 |
| 12 | Brigham and Kumar ( | Research | 1 KHz | N/A | Offline | 2 |
| 13 | Cooney et al. ( | Research | 1 KHz | N/A | Offline | 11 |
| 14 | Pawar and Dhage ( | Research | 1 KHz | N/A | Offline | 4 |
| 15 | Nguyen et al. ( | Research | 1 KHz | 256 Hz | Offline | 3 |
| 16 | Sereshkeh et al. ( | Research | 1 KHz | N/A | Online | 2 |
| 17 | Watanabe et al. ( | Research | 1 KHz | N/A | Offline | 3 |
| 18 | Jahangiri and Sepulveda ( | Research | 2 KHz | 256 Hz | Offline | 4 |
| 19 | Jahangiri and Sepulveda ( | Research | 2 KHz | 256 Hz | Offline | 4 |
| 20 | García et al. ( | Commercial | 128 Hz | N/A | Offline | 5 |
| 21 | Min et al. ( | Research | 1 KHz | 250 Hz | Offline | 2 |
| 22 | Saha and Fels ( | Research | 1 KHz | 256 Hz | Offline | 3 |
| 23 | Saha et al. ( | Research | 1 KHz | N/A | Offline | 2 |
| 24 | Panachakel et al. ( | Research | 1 KHz | 256 Hz | Offline | 2 |
| 25 | Panachakel et al. ( | Research | 1 KHz | N/A | Offline | 11 |
| 26 | García-Salinas et al. ( | Commercial | 128 Hz | N/A | Offline | 5 |
| 27 | Cooney et al. ( | Commercial | 1 KHz | 128 Hz | Offline | 5 |
| 28 | Balaji et al. ( | Research | 250 Hz | N/A | Offline | 4 |
Figure 7A typical experimental setup used for recording EEG during speech imagery. The subject wears an EEG electrode cap. A monitor cues the subject on the prompt that must be imagined speaking. An optional chin rest prevents artifacts due to unintentional head movements. Figure adapted with permission from Prof. Supratim Ray, Centre for Neuroscience, Indian Institute of Science, Bangalore.
Five common prompts used in decoding imagined speech and their significance.
| 1 | /ba/, /fo/, /le/ and /ry/ | Differences in place and manner of articulation. |
| 2 | “up”, “down”, “left” and “right” | Useful in controlling a computer mouse. |
| 3 | “yes” and “no” | Differences in place and manner of articulation, |
| 4 | /a/, /e/, /i/, /o/ and /u/ | Acoustic stationarity, |
| 5 | “in” and “cooperate” | Difference in complexity. |
Prompts which are not common in the literature are not tabulated here.
Figure 8Comparison of the popularity of frequency bands used in works on decoding imagined speech from EEG. Darker shades of black represent more popular frequency bands. Common EEG frequency bands are given in different colors.
Comparison of the accuracies reported in several works (reviewed in this manuscript) on decoding imagined speech from EEG.
| 1 | García et al. ( | “arriba”, “abajo”, “izquierda”, “derecha”, “seleccionar” | Discrete wavelet transform | RF | 43.6 ± 2.4% | - |
| 2 | Brigham and Kumar ( | “/ba/”, “/ku/” | Autoregressive model coefficients | NN | 68.8 ± 14.4% | - |
| 3 | Min et al. ( | “/a/”, “/e/”, “/i/”, “/o/”, “/u/” | Mean, variance, standard deviation, and skewness | ELM-R | 87.0 ± 11.4% | Pairwise classification of |
| 4 | Sereshkeh et al. ( | “yes”, “no” | Discrete wavelet transform | RNN | 75.7 ± 9.6% | Classification of |
| 5 | Nguyen et al. ( | “/a/”, “/i/”, “/u/”; “in”, “out”, “up”; | Tangent vectors in Riemannian manifold | mRVM | 80.0 ± 7.3% | Classification of words |
| 6 | Panachakel et al. ( | “in", “cooperate" | Temporal and Discrete wavelet transform | DNN | 72.0 ± 8.5% | Classification of words |
| 7 | Panachakel et al. ( | “/iy/”, “/ uw/”, “/ piy/”, “/tiy/”, “/diy/”, “/m/”, “/n/”; | Discrete wavelet transform | DNN | 57.1 ± 15.2% | - |
| 8 | Cooney et al. ( | “/iy/”, “/ uw/”, “/ piy/”, “/tiy/”, “/diy/”, “/m/”, “/n/”; | MFCC, statistical features etc. | SVM | 22.7 ± 5.2% | - |
| 9 | Saha and Fels ( | “/a/”, “/i/”, “/u/”; “in”, “out”, “up”; | Channel cross-covariance (CCV) | CNN+RNN+DAE | 79.9 ± 6.9% | Classification of words |
| 10 | García-Salinas et al. ( | “arriba”, “abajo”, “izquierda”, “derecha”, “seleccionar” | Bag of Features and trasnfer learning | Naive Bayes | 61.4 ± 12.4% | Representation of |
| 11 | Cooney et al. ( | “/a/”, “/e/”, “/i/”, “/o/,” “/u/ ” | CNN | 35.7 ± 3.0% | Uses transfer learning | |
| 12 | Tøttrup et al. ( | “go”, “stop” and “Viborg” | Spectral and temporal features | RF | 67.0 ± 9.0% | - |
| 13 | Balaji et al. ( | “Haan”, “Na” and “Yes” and “No” | Spectral power | ANN | 73.4% | Subject-wise accuracy |
| 14 | Jahangiri et al. ( | “/ba/”, “/fo/”, “/le/” and ‘/‘ry/” | Discrete Gabor transform | LDA | 82.5 ± 4.1% | |
| 15 | Pawar and Dhage ( | “left”, “right”, “up” and “down” | Discrete wavelet transform | ELM-G | 47.9 ± 6.9% | |
| 16 | Jahangiri et al. ( | “/ba/”, “/fo/”, “/le/” and ‘/‘ry/” | Discrete Gabor transform | LDA | 82.5 ± 24.1% | |
| 17 | Saha et al. ( | “/iy/”, “/ uw/”, “/ piy/”, “/tiy/”, “/diy/”, “/m/”, “/n/”; | Channel cross-covariance (CCV) | CNN+ LSTM | 77.5 ± 4.2% | Classification of |
| 18 | Koizumi et al. ( | “ue”, “shita”, “hidari”, “migi”, “mae”, “ushiro” | Spectral power | SVM | 81.3% | Subject-wise accuracy |
| 19 | Deng et al. ( | Constructed using “/ba/” and “/ku” | Hilbert spectrum | LDA | 58.1 ± 8.0% | Classification of rhythm |
| 20 | Zhang et al. ( | Mandarin lexical tones | Common spatial patterns | SVM | 80.1 ± 1.2% | |
| 21 | Watanabe et al. ( | Constructed using “/ba/” | - | NN | 38.5 ± 5.3% | |
| 22 | Jahangiri and Sepulveda ( | “/ba/”, “/fo/”, “/le/” and ‘/‘ry/” | Discrete Gabor transform | LDA | 80.7 ± 3.1% | Pairwise classification |
| 23 | Jahangiri and Sepulveda ( | “/ba/”, “/fo/”, “/le/” and ‘/‘ry/” | Discrete Gabor transform | LDA | 96.4 ± 2.3% | One v/s all classification |
| 24 | Chengaiyan et al. ( | 50 CVC words | Brain connectivity estimators and | DBN | 80.0% | Subject-wise accuracy |
| 25 | Saha et al. ( | “/iy/”, “/ uw/”, “/ piy/”, “/tiy/”, “/diy/”, “/m/”, “/n/”; | Channel cross-covariance (CCV) | CNN+DAE+XG Boost | 53.36% | Subject-wise accuracy |
| 26 | Zhao and Rudzicz ( | “/iy/”, “/ uw/”, “/ piy/”, “/tiy/”, “/diy/”, “/m/”, “/n/”; | Statistical features | SVM | 55.4 ± 20% | Classification of |
| 27 | Cooney et al. ( | “/a/”, “/e/”, “/i/”, “/o/,” “/u/ ”; | - | CNN | 30.1 ± 2.7% | Classification of |
| 28 | Sereshkeh et al. ( | “yes”, “no” | AR coefficients and DWT | SVM | 75.9 ± 11.4% | Online classification |
Figure 9Comparison of popular machine learning algorithms used for decoding imagined speech from EEG. The x-axis gives the number of articles using each algorithm.
Comparison of κ values of different works using (a) directional prompts (shaded in gray), (b) polar prompts (shaded in pink) and (c) vowel prompts (shaded in cyan).
| 1 | García et al. ( | “arriba”, “abajo”, “izquierda”, | RF | 43.6 | 20 | 0.3 | - |
| 2 | García-Salinas et al. ( | “arriba”, “abajo”, “izquierda”, | Naive Bayes | 61.4 | 20 | 0.5 | - |
| 3 | Pawar and Dhage ( | “left”, “right”, “up” and “down” | ELM-G | 47.9 | 25 | 0.3 | Uses gamma band |
| 4 | Koizumi et al. ( | “ue”, “shita”, “hidari”, | SVM | 81.3 | 16.7 | 0.8 | Uses gamma band |
| 5 | Cooney et al. ( | “arriba”, “abajo”, “derecha”, | CNN | 25 | 16.7 | 0.1 | - |
| 6 | Sereshkeh et al. ( | Decision “yes” | RNN | 63.2 | 57.8 | 0.1 | - |
| 7 | Balaji et al. ( | Decision “yes” | ANN | 85.2 | 50 | 0.7 | Uses bilingual prompts |
| 8 | Sereshkeh et al. ( | Decision “yes” | SVM | 69.3 | 60 | 0.2 | Employs online decoding |
| 9 | Min et al. ( | Pairwise combinations of | ELM-R | 68.5 | 50 | 0.4 | Accuracy is the mean of |
| 10 | Nguyen et al. ( | /a/, /i/ and /u/ | mRVM | 49.0 | 33.3 | 0.2 | - |
| 11 | Saha and Fels ( | /a/, /i/ and /u/ | CNN+RcNN+DAE | 74.3 | 33.3 | 0.6 | - |
| 12 | Cooney et al. ( | /a/, /e/, /i/, /o/, and /u/ | CNN | 30.3 | 20 | 0.1 | - |
RF, Random forest; ELM-G, Extreme learning machine (Gaussian kernel); SVM, Support vector machine; CNN, Convolutional neural networks; RNN, Regularized neural network; ANN, Artificial neural network; ELM-R, Extreme learning machine (radial basis function); mRVM, multiclass relevance vector machine; RcNN, Recurrent neural network; DAE, Deep autoencoder.
Details of the three most popular publicly available speech imagery EEG datasets.
| Shunan Zhao and Frank Rudzicz | Phonemic/syllabic prompts (/iy/, / uw/, | 64 | 1 KHz | 14 | |
| German A. Pressel Corettoa, | Vowels (/a/, /e/, /i/, /o/, /u/) | 6 | 1 KHz | 15 | |
| Chuong H Nguyen, George K Karavas | Vowels (/a/, /i/, /u/) | 64 | 1 KHz | 15 |