| Literature DB >> 36061597 |
Youngmin Na1, Hyosung Joo2, Le Thi Trang2, Luong Do Anh Quan2, Jihwan Woo1,2.
Abstract
Auditory prostheses provide an opportunity for rehabilitation of hearing-impaired patients. Speech intelligibility can be used to estimate the extent to which the auditory prosthesis improves the user's speech comprehension. Although behavior-based speech intelligibility is the gold standard, precise evaluation is limited due to its subjectiveness. Here, we used a convolutional neural network to predict speech intelligibility from electroencephalography (EEG). Sixty-four-channel EEGs were recorded from 87 adult participants with normal hearing. Sentences spectrally degraded by a 2-, 3-, 4-, 5-, and 8-channel vocoder were used to set relatively low speech intelligibility conditions. A Korean sentence recognition test was used. The speech intelligibility scores were divided into 41 discrete levels ranging from 0 to 100%, with a step of 2.5%. Three scores, namely 30.0, 37.5, and 40.0%, were not collected. The speech features, i.e., the speech temporal envelope (ENV) and phoneme (PH) onset, were used to extract continuous-speech EEGs for speech intelligibility prediction. The deep learning model was trained by a dataset of event-related potentials (ERP), correlation coefficients between the ERPs and ENVs, between the ERPs and PH onset, or between ERPs and the product of the multiplication of PH and ENV (PHENV). The speech intelligibility prediction accuracies were 97.33% (ERP), 99.42% (ENV), 99.55% (PH), and 99.91% (PHENV). The models were interpreted using the occlusion sensitivity approach. While the ENV models' informative electrodes were located in the occipital area, the informative electrodes of the phoneme models, i.e., PH and PHENV, were based on the occlusion sensitivity map located in the language processing area. Of the models tested, the PHENV model obtained the best speech intelligibility prediction accuracy. This model may promote clinical prediction of speech intelligibility with a comfort speech intelligibility test.Entities:
Keywords: EEG; continuous speech; deep-learning; occlusion sensitivity; speech intelligibility
Year: 2022 PMID: 36061597 PMCID: PMC9433707 DOI: 10.3389/fnins.2022.906616
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 5.152
FIGURE 1Summary of the experimental procedure for the behavior speech intelligibility test and EEG recording. During the behavioral test, vocoded noise, and natural sentence speeches are randomly played, and the participants are asked to repeat the sentences. The electroencephalogram (EEG) responses to the speech stimuli are recorded during the passive listening task.
FIGURE 2Schematic diagram of deep learning training and testing. A training dataset is used to build up a speech intelligibility prediction model and an unseen (test) dataset determines the performance of the model.
FIGURE 3The overall scheme of speech intelligibility prediction, including speech stimuli, electroencephalogram data, feature extraction, and model prediction. The speech features of envelope (ENV), phoneme onset train (PH), and ENV ⊗ PH generate the features of ENV continuous-speech evoked potentials (CSEP), PH CSEP, and PHENV CSEP by cross-correlating with the electroencephalogram response (EEG).
Deep learning layers and their specifications.
| Deep learning layer | Filter size | Kernel | Output |
| Input | 299 × 299 × 3 | ||
| Conv2D | 32 | 16 × 16 | 299 × 299 × 32 |
| LeakyReLU | 299 × 299 × 32 | ||
| Conv2D | 8 | 8 × 8 | 299 × 299 × 8 |
| LeakyReLU | 299 × 299 × 8 | ||
| Maxpooling2D | 2 × 2 | 149 × 149 × 8 | |
| Conv2D | 8 | 4 × 4 | 149 × 149 × 8 |
| LeakyReLU | 149 × 149 × 8 | ||
| Maxpooling2D | 2 × 2 | 148 × 148 × 8 | |
| Conv2D | 3 | 3 × 3 | 148 × 148 × 3 |
| LeakyReLU | 148 × 148 × 3 | ||
| Maxpooling2D | 2 × 2 | 147 × 147 × 3 | |
| Batch normalization | 147 × 147 × 3 | ||
| Fully connected | 1 × 38 | 1 × 1 × 38 | |
| Softmax | 1 × 1 × 38 | ||
| Classification | 38 |
FIGURE 4The behavioral speech intelligibility score (left panel) of individuals in response to vocoded and natural continuous-speech stimuli. The bars (right panel) indicate the incidence of each score.
Results of behavioral speech intelligibility scores with natural and noise vocoded sentences.
| Sentence type | Behavioral score (%) | ||
|
| |||
| Mean |
| ||
| Noise vocoded | 2 Channel | 7.5 | 8.08 |
| 3 Channel | 55.2 | 17.03 | |
| 4 Channel | 77.3 | 12.63 | |
| 5 Channel | 86.4 | 8.01 | |
| 8 Channel | 97.9 | 3.44 | |
| Natural sentence | 99.6 | 1.01 | |
SD, standard deviation.
Comparison of the performance of deep learning models using event-related potentials (ERP), stimuli envelopes (ENV), phonemes (PH), and phoneme-envelopes (PHENV).
| Deep learning using | ||||
|
| ||||
| ERP | ENV | PH | PHENV | |
| Accuracy | 97.33% | 99.42% | 99.55% | 99.91% |
FIGURE 5The topographic map of occlusion sensitivity visualizing the important brain regions for classification. Dark to bright red color denotes relatively low to high contributive levels for prediction.
Summary of regions of significant contribution for deep learning. The ten most significant EEG channels and their regions, to predict speech intelligibility, are selected using occlusion sensitivity analysis.
| Deep learning using | Regions of significant contribution for deep learning | |
|
| ||
| EEG channels | Brain regions | |
| ERP | Cz, C6, C4, FCz, P6, F1, CP2, C3, Pz, FC3 | Central, frontal, parietal |
| ENV | O2, T8, PO4, CP4, C6, C1, T7, CP2, FT8, P5 | Occipital, temporal, parietal |
| PH | C3, C1, F1, P10, F3, F8, P9, C5, F6, PO8 | Central, frontal, parietal |
| PHENV | C3, C1, F1, F3, F5, P4, C5, TP7, C6, C4 | Central, frontal, parietal |
EEG, electroencephalography; ERP, event-related potential; ENV, stimuli envelope; PH, phoneme; PHENV, phoneme-envelope.