| Literature DB >> 30378469 |
Chantel Ritter1, Tara Vongpaisal1.
Abstract
For cochlear implant (CI) users, degraded spectral input hampers the understanding of prosodic vocal emotion, especially in difficult listening conditions. Using a vocoder simulation of CI hearing, we examined the extent to which informative multimodal cues in a talker's spoken expressions improve normal hearing (NH) adults' speech and emotion perception under different levels of spectral degradation (two, three, four, and eight spectral bands). Participants repeated the words verbatim and identified emotions (among four alternative options: happy, sad, angry, and neutral) in meaningful sentences that are semantically congruent with the expression of the intended emotion. Sentences were presented in their natural speech form and in speech sampled through a noise-band vocoder in sound (auditory-only) and video (auditory-visual) recordings of a female talker. Visual information had a more pronounced benefit in enhancing speech recognition in the lower spectral band conditions. Spectral degradation, however, did not interfere with emotion recognition performance when dynamic visual cues in a talker's expression are provided as participants scored at ceiling levels across all spectral band conditions. Our use of familiar sentences that contained congruent semantic and prosodic information have high ecological validity, which likely optimized listener performance under simulated CI hearing and may better predict CI users' outcomes in everyday listening contexts.Entities:
Keywords: auditory-visual perception; cochlear implant; emotion; speech; vocoder
Mesh:
Year: 2018 PMID: 30378469 PMCID: PMC6236866 DOI: 10.1177/2331216518804966
Source DB: PubMed Journal: Trends Hear ISSN: 2331-2165 Impact factor: 3.293
Stimuli Sentences.
| Happy sentences |
| Speech decoding condition: |
| 1. “It’s so good to see you” |
| 2. “The puppy is coming home today” |
| 3. “It’s beautiful outside today” |
| 4. “I love you so much” |
| 5. “My favorite movie is on” |
| Emotion recognition: |
| 1. “Let’s go play outside!” |
| 2. “We’re going on a trip today!” |
| 3. “We’re going to Grandma’s today” |
| 4. “We won our game today” |
| 5. “I am so proud of you” |
| Angry sentences |
| Speech decoding condition: |
| 1. “Why did you do that?” |
| 2. “Don’t talk to me like that!” |
| 3. “Give that back, it’s mine!” |
| 4. “Don’t be so rude!” |
| 5. “Turn that game off right now!” |
| Emotion recognition: |
| 1. “Stop yelling at me!” |
| 2. “I can’t believe you broke the toy!” |
| 3. “You never share anything!” |
| 4. “I don’t like this toy!” |
| 5. “It’s my turn on the computer!” |
| Neutral sentences |
| Speech decoding condition: |
| 1. “I’m going to the store” |
| 2. “The pencil is on the table” |
| 3. “He can run very fast” |
| 4. “My neighbor is outside” |
| 5. “I bought chips for the movie” |
| Emotion recognition: |
| 1. “The movie is playing now” |
| 2. “I have read many books” |
| 3. “The store opens in ten minutes” |
| 4. “The glass is beside you” |
| 5. “Dinner is at five o’clock” |
| Sad sentences |
| Speech decoding condition: |
| 1. “We had a fight yesterday” |
| 2. “I am feeling sick today” |
| 3. “I wish it would stop raining” |
| 4. “I’m too tired to play” |
| 5. “I can’t go, I’m grounded” |
| Emotion recognition: |
| 1. “I miss my mom very much” |
| 2. “I just want to go home” |
| 3. “My goldfish died yesterday” |
| 4. “I can’t find my friends” |
| 5. “My arm is really sore” |
Speech Feature Characteristics of the Female Talker.
| Emotion | ||||
|---|---|---|---|---|
| Speech Feature | Neutral | Angry | Sad | Happy |
| Average voice pitch (F0), Hz | 235.0 | 280.4 | 217.1 | 284.5 |
| 13.6 | 38.8 | 16.9 | 28.5 | |
| Average speech rate (words/s) | 2.24 | 1.95 | 1.94 | 1.99 |
| 0.35 | 0.46 | 0.50 | 0.32 | |
| Average intensity range (dB) | 31.9 | 39.0 | 29.8 | 35.4 |
| 3.2 | 5.8 | 2.3 | 3.7 | |
Note. SD = standard deviation.
Cutoff Frequencies (Hz) of Filterbanks for Each Spectral Band Condition.
| Spectral band condition | Cutoff frequencies (Hz) | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Two bands | 300 | 1528 | 6000 | ||||||
| Three bands | 300 | 814 | 2210 | 6000 | |||||
| Four bands | 300 | 722 | 1528 | 3066 | 6000 | ||||
| Eight bands | 300 | 477 | 722 | 1061 | 1528 | 2174 | 3066 | 4298 | 6000 |
Figure 1.Spectrogram of an original and noise-band vocoded sentence.
Figure 2.Mean speech decoding accuracy per spectral band condition for auditory and auditory–visual modalities. Error bars represent standard error of the mean.
Figure 3.Latency of speech decoding per spectral band condition for auditory and auditory–visual modalities. Error bars represent standard error of the mean.
Figure 4.Mean emotion recognition accuracy per spectral band condition for auditory and auditory–visual modalities. Error bars represent standard error of the mean.
Figure 5.Latency of emotion recognition per spectral band condition for auditory and auditory–visual modalities. Error bars represent standard error of the mean.
Figure 6.Confusion matrices, auditory-only emotion recognition. Darker colors represent more errors, while lighter colors represent fewer errors. The observed frequency of errors for each response–target emotion combination is reported in each cell.
Figure 7.Number of participants per score across spectral band conditions in the auditory-only emotion recognition task.