| Literature DB >> 30296666 |
Ciaran Cooney1, Raffaella Folli2, Damien Coyle3.
Abstract
A direct-speech brain-computer interface (DS-BCI) acquires neural signals corresponding to imagined speech, then processes and decodes these signals to produce a linguistic output in the form of phonemes, words, or sentences. Recent research has shown the potential of neurolinguistics to enhance decoding approaches to imagined speech with the inclusion of semantics and phonology in experimental procedures. As neurolinguistics research findings are beginning to be incorporated within the scope of DS-BCI research, it is our view that a thorough understanding of imagined speech, and its relationship with overt speech, must be considered an integral feature of research in this field. With a focus on imagined speech, we provide a review of the most important neurolinguistics research informing the field of DS-BCI and suggest how this research may be utilized to improve current experimental protocols and decoding techniques. Our review of the literature supports a cross-disciplinary approach to DS-BCI research, in which neurolinguistics concepts and methods are utilized to aid development of a naturalistic mode of communication.Entities:
Keywords: Cognitive Neuroscience; Computer Science; Hardware Interface
Year: 2018 PMID: 30296666 PMCID: PMC6174918 DOI: 10.1016/j.isci.2018.09.016
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Categorization of Types of Speech Typically Used in DS-BCI Experiments
| Production | Perception | |
|---|---|---|
| Overt | Fully articulated speech with audible output | Active or passive hearing of audible speech (one's own speech or from another source) |
| Intended | Intention to produce overt speech but without the capacity to produce audible output | Perception of one's own intended speech production |
| Imagined | Internal pronunciation of words, independent of movement and without any audible output | Perception of one's own imagined speech production |
Figure 1Seeking a Naturalistic Form of Communication through Direct-Speech BCI
(A) DS-BCI is a system that decodes neural signals (e.g., electroencephalography [EEG] or electrocorticography [ECoG]) (B) corresponding to imagined speech (A). Recorded signals are processed to facilitate maximal information extraction and improvement of signal-to-noise ratio (C). The feature extraction (D) and classification (E) stages compute the most discriminative information in the recorded signals and classify them as a part of speech. The output of a DS-BCI system is a textual representation of the imagined speech (F) and auditory representation, which can be used for both communication and feedback (G). In this example, the user actively produces the words “I am thirsty!” with imagined speech. The signals acquired are temporally aligned with each word to facilitate feature extraction and classification. The system produces two outputs: a text printout of the imagined speech words being produced and a synthesized audio output, i.e., “I am thirsty!”
Figure 2Direct-Speech BCI Studies Categorized According to Recording Techniques and Types of Speech
(A) is a cross-categorization of DS-BCI studies according to the recording techniques applied and the types of speech being investigated. The time period for this analysis begins with the study of Blakely et al. (2008), because this is the first study based on the BCI paradigm depicted and runs to 2018. Criteria for inclusion in this analysis are those studies using said recording techniques to decode speech production (overt, imagined, and intended) directly from neural activity. EEG and ECoG are the most often used recording approaches. High temporal resolution is an important feature of both. Although micro-electrodes do offer high spatial and temporal resolution, their use is not always possible or appropriate. Overt speech has been used as a proxy for imagined speech, or in comparative studies. The behavioral difficulty of studying imagined speech is, at least in part, a reason for this trend. The two bar graphs (B) show the distribution of measurement techniques and of types of speech used across all studies. ECoG is utilized in a total of twenty studies and EEG in a total of sixteen.
See Table 2.
Overview of DS-BCI Studies Attempting to Decode Speech from Neural Activity
| Reference | Recording Technique | Type of Speech | Experimental Paradigm |
|---|---|---|---|
| Micro-electrode | Overt | Phoneme pronunciation | |
| EEG | Imagined | Imagined speech of two syllables spoken in one of three rhythms | |
| Micro-electrode | Intended | Vowel production involving movement from a central vowel location to one of three peripheral vowel locations | |
| EEG | Imagined | Five words, presented in block, sequential, or random order | |
| EEG | Imagined | Imagined speech of two syllables, /ba/ and /ku/ at two rhythms | |
| EEG | Imagined | Imagined speech of two syllables spoken in one of three rhythms | |
| Micro-electrode | Overt | Repetition of one of ten words | |
| Micro-electrode | Intended | Intended production of 38 American English phonemes | |
| EEG | Imagined | Generation of five types of phonemes that differ in their manner vocal articulation | |
| ECoG | Overt/Imagined | Overt and imagined phoneme articulation | |
| ECoG | Overt/Imagined | Overt and imagined repetition of 36 monosyllabic words | |
| ECoG | Overt | Three language tasks based on picture naming | |
| ECoG | Overt/Imagined | Word repetition using overt or covert speech in response to visual or auditory stimuli | |
| ECoG | Overt | Spontaneous speech in non-experimental setup | |
| fNIRS | Overt/Imagined | Utterances produced in auditory, silent, and imagined speech | |
| ECoG | Overt | Articulation of Chinese sentences | |
| EEG | Overt/Imagined | Speech of monosyllabic Korean words representing two categories of meaning (number and face) | |
| ECoG | Overt | Reading of consonant-vowel syllables | |
| ECoG | Overt | Spontaneous speech in non-experimental setup | |
| ECoG | Imagined | Imagined speech production of three Japanese vowels | |
| ECoG | Overt | Two-syllable repetition tasks | |
| ECoG | Overt/Imagined | Overt and covert reading of short stories | |
| ECoG | Overt | Overt speech used to identify different phonemes by where they place in different words | |
| ECoG | Overt | Overt speech used to identify different phonemes by where they place in different words | |
| EEG | Overt/Imagined | High tone production in overt, inhibited, and imagined speech | |
| ECoG | Overt | Reading from well-known texts | |
| EEG | Imagined | Imagined speech of vowels /a/ and /u/, and no action | |
| EEG | Imagined | Imagined speech of vowels /a/ and /u/, and no action | |
| ECoG | Overt | Reading from well-known texts | |
| EEG | Overt/Imagined | Imagined speech production of seven phonemes and two pairs of phonologically similar words | |
| ECoG | Overt | Recitation of a presented sentence | |
| ECoG | Overt/Imagined | Overt and imagined speech production of words selected to maximize variability of number of syllables and semantic category | |
| EEG/fMRI | Imagined | Imagined speech production of Japanese vowels /a/ and /i/ | |
| EEG | Imagined | Imagined speech production of five Spanish words | |
| EEG | Imagined | Imagined speech of short words, long words, and vowels | |
| ECoG | Overt | Overt speech production of four phonemes | |
| EEG | Imagined | Imagined speech repetition of the words "yes" or "no" | |
| EEG | Imagined | Imagined speech repetition of the words "yes" or "no" | |
| EEG | Overt | Overt word production corresponding to presented pictures | |
| EEG | Imagined | Imagined speech word production | |
| ECoG | Overt | Overt speech of 15 Japanese syllables | |
| ECoG | Overt | Overt speech of 57 different consonant-vowel syllables |
Figure 3Neuroanatomical Regions Associated with Imagined Speech Production
The diagram depicts brain regions typically associated with language function in the left hemisphere (Berwick et al., 2013), with each of the numbered sections indicating one of Brodmann areas (BA). The IFG, which includes BA44 and BA45, is the most common region associated with imagined speech production. Single word and sentence production both activate the IFG, and the region is thought to be associated with word retrieval and associated meanings (BA45). Both the STG and MTG have been implicated in imagined speech studies as relating to the phonological loop and to production of dialogic imagined speech. The dorsal pathways between BA44 and the posterior superior temporal cortex (pSTC) supports core syntactic processes. The ventral pathways, including between BA45 and the temporal cortex (TC), support processing of semantic and conceptual information. Reprinted with permission from Berwick et al. 2013, copyright 2013, Elsevier.
Figure 4Speech Production Models with Estimated Time Courses
Although models can differ in the number of components, there is general agreement that speech production is a staged, hierarchical process with a temporal structure, as indicated in the diagram. In (A), estimated time courses associated with the stages of production are provided in milliseconds (ms) (Indefrey, 2011) along with a production model containing two major components. These are the word (lemma) level and the phonological level (Hickok, 2012). In (B), a more detailed model depicts several different phases in the production process (Levelt et al., 1999). The initial stage is conceptual preparation, where a message to be expressed is formulated and a lexical concept produced. Next is lexical selection, in which a word or lemma is retrieved for use. Following selection of a lemma, the morphological stage bridges between the conceptual domain and the phonological, or articulatory, domain. A word is then encoded in syllabic form before being encoded in phonetic form, from which the audible output is produced. In (C), a truncated version of the model in (B) is presented to highlight the stages of production corresponding to imagined speech. The estimated time courses end with the phonological encoding/syllabification stage. (A) is adapted with permission from Hickok 2012, copyright 2012, Springer Nature. (B) is adapted with permission from Levelt et al. 1999, copyright 1999, Cambridge University Press. *upper boundary.