Literature DB >> 35155805

Voice emotion recognition by Mandarin-speaking pediatric cochlear implant users in Taiwan.

Yung-Song Lin1,2, Che-Ming Wu3,4,5, Charles J Limb6, Hui-Ping Lu7, I Jung Feng8, Shu-Chen Peng9, Mickael L D Deroche10, Monita Chatterjee11.   

Abstract

OBJECTIVES: To explore the effects of obligatory lexical tone learning on speech emotion recognition and the cross-culture differences between United States and Taiwan for speech emotion understanding in children with cochlear implant.
METHODS: This cohort study enrolled 60 cochlear-implanted (cCI) Mandarin-speaking, school-aged children who underwent cochlear implantation before 5 years of age and 53 normal-hearing children (cNH) in Taiwan. The emotion recognition and the sensitivity of fundamental frequency (F0) changes for those school-aged cNH and cCI (6-17 years old) were examined in a tertiary referred center.
RESULTS: The mean emotion recognition score of the cNH group was significantly better than the cCI. Female speakers' vocal emotions are more easily to be recognized than male speakers' emotion. There was a significant effect of age at test on voice recognition performance. The average score of cCI with full-spectrum speech was close to the average score of cNH with eight-channel narrowband vocoder speech. The average performance of voice emotion recognition across speakers for cCI could be predicted by their sensitivity to changes in F0.
CONCLUSIONS: Better pitch discrimination ability comes with better voice emotion recognition for Mandarin-speaking cCI. Besides the F0 cues, cCI are likely to adapt their voice emotion recognition by relying more on secondary cues such as intensity and duration. Although cross-culture differences exist for the acoustic features of voice emotion, Mandarin-speaking cCI and their English-speaking cCI peer expressed a positive effect for age at test on emotion recognition, suggesting the learning effect and brain plasticity. Therefore, further device/processor development to improve presentation of pitch information and more rehabilitative efforts are needed to improve the transmission and perception of voice emotion in Mandarin. LEVEL OF EVIDENCE: 3.
© 2022 The Authors. Laryngoscope Investigative Otolaryngology published by Wiley Periodicals LLC on behalf of The Triological Society.

Entities:  

Keywords:  cochlear implant; lexical tone; pitch discrimination; voice emotion

Year:  2022        PMID: 35155805      PMCID: PMC8823186          DOI: 10.1002/lio2.732

Source DB:  PubMed          Journal:  Laryngoscope Investig Otolaryngol        ISSN: 2378-8038


INTRODUCTION

Vocal expression of emotion is necessary for social interaction across civilizations. , , In addition to facial expressions, people often use prosodic vocal cues to show their emotions. Consequently, when prosodic vocal cues are absent, the expression and perception of emotion are hindered, negatively affecting social interactions and development. These have been frequently observed in children with cochlear implants (CIs). , , Although cochlear implant (CI) development has had remarkable achievements allowing people with profound hearing loss to hear lexical meaning in certain environments, , , limitations remain for present CI systems. Limitations in a CI system to transmit prosodic cues (e.g., pitch) result in limitations in the user's interpretation and communication of voice emotion. For tonal languages, slow‐pitch changes convey prosodic/emotional information, whereas rapid inflections within syllables convey meaning. , The unique demands for adequate pitch perception in tonal language may alter the fundamental frequency (F0)‐processing mechanisms of a developing auditory system. This might be altered further in children with CIs. The present study supposes that (secondary) covarying cues such as changes in intensity and duration convey the same information to some extent when F0 processing is degraded.14 To further explore the effects of obligatory lexical tone learning on speech emotion recognition, the current study was conducted on Mandarin‐speaking children in Taiwan. This work, involving collaborations with labs in the United States and Taiwan, is expected to explore the cross‐culture differences for speech emotion understanding and reveal important implications for both CI technology and rehabilitative therapies for children with CIs.

MATERIALS AND METHODS

Study design and oversight

This study was part of one multicenter, retrospective cohort research, sponsored by National Health Institute (NIH R01‐DC014233‐01). English‐speaking subjects were recruited and tested at the Johns Hopkins University School of Medicine in Baltimore, MD, and Boys Town National Research Hospital in Omaha, NE. Results and methods of the study for English‐speaking participants have been published in a prior study. Mandarin‐speaking children were recruited and tested at Chi Mei Medical Center and Chang Gung Memorial Hospital in Taiwan. Sixty cochlear‐implanted (cCI) and 53 normal‐hearing (cNH) Mandarin‐speaking children participated in this study (Tables 1 and 2). The Institutional Review Board and the Committee of Human Subjects Protection of Chi Mei Medical Center approved this study. The author(s) declare(s) that there is no conflict of interest.
TABLE 1

Participants, children with cochlear implants

ParticipantsSexAge at implantationAge at testDevice experienceAverage residual hearingTONI‐3DeviceInsertion length/active channelsStrategyDaily listening condition
CM13F2.717.504.809079Nucleus 24REFull/20ACECI only
CX15M2.947.924.98100124Nucleus 24REFull/20ACECI only
CM16F3.219.476.26>100125Nucleus 24REFull/20ACECI only
CX17F2.737.184.4590112Nucleus 24REFull/20ACECI only
CG18M1.8515.8113.9790111Nucleus 24REFull/21ACECI only
NT40M2.2712.5510.28100111MED‐EL PulsarFull/12FSPCI only
ZX43M2.168.145.9810097Nucleus 24REFull/21ACECI only
ZX44M2.406.814.41100126AB HiRes90kFull/120HiRes‐P/Fidelity‐120CI only
MJ45M4.528.233.718595Nucleus 24REFull/20ACECI only
CG12M2.5412.7610.22100105Nucleus 24CSFull/20ACECI only
CX19M3.8015.5011.709098Nucleus 24CSFull/20ACECI only
CX20F1.568.957.39100105Nucleus 24REFull/21ACECI only
CG21M2.547.044.509092Nucleus 24REFull/20ACECI only
ZX22M2.427.955.53>10093AB HiRes90kFull/120HiRes‐P/Fidelity‐120CI only
CG23F1.507.285.7890106Nucleus 24REFull/21ACECI only
CR24M1.927.835.91100122Nucleus 24REFull/20ACECI only
CG25M2.528.576.0590127Nucleus 24REFull/21ACECI only
CX26F2.1810.358.1810083Nucleus 24REFull/21ACECI only
XG27M2.1315.9413.81>10081Nucleus 24REFull/21ACECI only
CG28F2.748.776.0310095Nucleus 24REFull/20ACECI only
CX29M1.547.245.7090112Nucleus 24REFull/21ACECI only
CL30F2.0317.2115.17100OverNucleus 24REFull/20ACECI only
CG31M2.8115.7612.9510094Nucleus 24CSFull/20ACECI only
MC37M1.299.568.2790126Nucleus 24REFull/21ACECI only
XG32F2.339.657.3210094Nucleus 24REFull/20ACECI only
XG33F3.3114.5711.2790109Nucleus 24CSFull/20ACECI only
CG34M1.609.788.1890116Nucleus 24REFull/21ACECI only
CM35F2.3013.5611.25>10086Nucleus24CSFull/20N24CI only
CM51F1.966.594.63100114MED‐EL SonataFull/12FSPCI only
CG36F2.609.036.4390105Nucleus 24REFull/20ACECI only
CG37F3.0610.107.0590100Nucleus 24REFull/20ACECI only
CG08F2.5311.138.6090116Nucleus 24REFull/20ACECI only
CR30F3.2917.3814.09100OverNucleus 24REFull/20ACECI only
XG38M1.6216.6915.0690OverNucleus 24REFull/21ACECI only
XG39M3.327.874.55>100103Nucleus 24REFull/19ACECI only
CG40M2.6410.137.4990115Nucleus 24REFull/20ACECI only
XG04F3.7715.2711.509093Nucleus 24SCFull/20ACECI only
CM49M1.409.758.35100109Nucleus 24REFull/21ACECI only
CG41F1.937.155.2290110Nucleus 24REFull/21ACECI only
CX42F2.447.805.3690112Nucleus 24REFull/20ACECI only
CX43M4.4411.306.869097Nucleus 24REFull/20ACECI only
CG44F3.609.215.619087Nucleus 24REFull/19ACECI only
CL45M1.017.296.2810097Nucleus 24REFull/21ACECI only
MG12F2.6315.8413.22>10098Nucleus 24REFull/21ACECI only
MG46F1.079.067.98100115Nucleus 24REFull/21ACECI only
CG47M1.116.665.55100122Nucleus 24REFull/21ACECI only
XG48M2.8717.1914.3290OverNucleus 24REFull/20ACECI only
CG49F2.847.875.0390101Nucleus 24REFull/20ACECI only
CG51M1.527.696.1710097Nucleus 24REFull/21ACECI only
CM53F1.2810.529.23100122MED‐EL ConcertoFull/12FSPCI only
CM54M1.137.306.1790106AB HiRes90kFull/120HiRes‐P/Fidelity‐120CI only
CM52F1.6413.3311.70100103Nucleus 24REFull/20ACECI only
XG53M2.369.507.1490103Nucleus 24REFull/20ACECI only
CG54M4.219.505.299092Nucleus 24REFull/20ACECI only
CM55M1.158.317.16100106Nucleus 24REFull/21ACECI only
CM56F1.966.414.44100118MED‐EL ConcertoFull/12FSPCI only
CM57F4.426.462.0490108MED‐EL ConcertoFull/12FSPCI only
XG56M3.687.243.5690103AB HiRes90kFull/120HiRes‐P/Fidelity‐120CI only
CG57F3.2213.4310.219095Nucleus 24REFull/20ACECI only
XG58M3.078.975.899095Nucleus 24REFull/19ACECI only

Abbreviations: CI, Cochlear implant; F, female; M, male; TONI‐3, Test of Nonverbal Intelligence—Third edition; ACE, advanced combination encoder; FSP, fine structure processing.

TABLE 2

Participants, normal‐hearing children

ParticipantsGenderAge at testingTONI‐3 scoreAverage PTA
1F10.441025.83
2M13.781167.50
3M12.7111310.00
4M10.441168.33
5F10.76945.83
6F6.9210510.00
7F13.42892.08
8F13.541002.92
9F10.7110014.58
10F16.221222.92
11M8.119510.42
12M10.021117.08
13F7.041195.00
14F16.43945.00
15F9.231183.75
16F11.099210.42
17F14.011073.75
18F16.78‐‐5.00
19F9.021135.00
20F7.761206.25
21F7.381219.17
22F15.581088.33
23M13.769110.42
24F13.161114.58
25F13.761008.33
26M8.721137.92
27F7.921247.92
28F9.811186.67
29F15.271409.17
30F10.841099.58
31M14.51949.58
32M9.101077.92
33M10.9510411.25
34M9.911137.08
35M10.091045.00
36F9.25926.25
37M8.6212011.25
38F12.101069.17
39F14.0312910.83
40F15.331123.33
41F15.151093.75
42F14.691240.00
43M9.021075.83
44F8.521056.25
45M10.301295.83
46F7.3910212.08
47M9.2911510.00
48M8.891107.50
49M8.451108.75
50M8.219110.42
51M7.251067.92
52M6.521082.92
53F8.441305.42
Average10.96109.387.24

Abbreviations: F, Female; M, male; PTA, Pure‐tone audiometry; TONI‐3, Test of Nonverbal Intelligence—Third edition.

Participants, children with cochlear implants Abbreviations: CI, Cochlear implant; F, female; M, male; TONI‐3, Test of Nonverbal Intelligence—Third edition; ACE, advanced combination encoder; FSP, fine structure processing. Participants, normal‐hearing children Abbreviations: F, Female; M, male; PTA, Pure‐tone audiometry; TONI‐3, Test of Nonverbal Intelligence—Third edition. In this study, we measured the emotion recognition by school‐aged cNH and cCI (6–17 years old). The cNH performed the task with both original (full‐spectrum) speech and spectrally degraded, 4‐, 8‐, and 16‐channel narrowband vocoder (NBV) speech. As it was expected that the cCI would have difficulty in the task, the stimuli were recorded in a child‐directed manner. The sensitivity of F0 changes was also tested for the cNH and cCI. All participants gave informed consent prior to participation.

Participants

Sixty profoundly hearing‐impaired children without physical and visual deficit (disabilities) who underwent cochlear implantation before 5 years of age (32 boys, 28 girls, age range: 6.41–17.38 years, mean age 10.23 ± 3.22 years) and 53 cNH (21 boys, 32 girls, age range: 6.52–16.78 years, mean age 10.96 ± 2.92 years) participated in this study (Tables 1 and 2). There was no significant mean age difference between the two groups (t = −1.193, p = .236). Table 1 showed the clinical characteristics of the participating children with CI. For the cCI, 51 were implanted with Nucleus, 5 were implanted with MED‐EL devices, and 4 were implanted with AB devices. Test of Nonverbal Intelligence—Third edition was used to evaluate the general intelligence of the participants. There was no significant mean intelligence difference between the two groups (cCI: mean = 102.91, SD = 12.08; cNH: mean = 109.38, SD = 11.40; p = .09).

Tasks

Recording

Twelve emotionally neutral sentences (Table 3) from the Hearing In Noise Test (HINT) corpus were translated from English to Mandarin and recorded by two speakers (one male and one female) in five different emotions (happy, scared, neutral, sad, and angry) in a children‐directed manner. The two speakers were 25 and 27 years old and native speakers of Mandarin.
TABLE 3

List of sentences

Item#English sentences (six syllables each)Mandarin sentences
1Her coat is on the chair.她外套在椅子上。
2The road goes up the hill.這條路通山上。
3They're going out tonight.他們今晚要外出。
4He wore his yellow shirt.他穿了黃襯衫。
5They took some food outside.他們拿了一些食物去外面。
6The truck drove up the road.卡車開上路。
7The tall man tied his shoes.那男生綁緊鞋帶。
8The mailman shut the gate.郵差關上門。
9The lady wore a coat.那女孩穿著大衣。
10The chicken laid some eggs.雞生了幾顆蛋。
11A fish swam in the pound.魚在池裡游。
12Snow falls in the winter.冬天會下雪。
List of sentences

Listening task

Inclusion criteria for the participants were (1) children aged from 6 to 18 with normal hearing, (2) prelingually deaf children aged from 6 to 18, who underwent cochlear implantation at <5 years of age, and (3) all participants should not have any other physical and visual disability. All children received a hearing test (pure‐tone audiometry and sound field audiometry test) and a nonverbal intelligence test before starting the assigned task. The mother's educational level, an important predictor of performance, , was also recorded. For the participating children, the nonverbal intelligence was measured using the matrix reasoning and block design subtests of the Wechsler Abbreviated Scale of Intelligence ; linguistic ability was measured using Peabody Picture Vocabulary Test.

Stimuli

Emotion recognition

Speakers for the recording task were seated in a sound‐treated booth, positioned 12 in. in front of a SHURE SM63 microphone with Marantz PMD661 solid‐state recorder, and produced the sentences in the five emotions three times each. The sentences selected from the HINT corpus were translated from English to Mandarin based on their semantically emotion‐neutral content. Using Adobe Audition version 1.5 software, the original recorded audio files (44.1 kHz sampling rate, 16 bit) were edited. Noise‐vocoded versions of these sentences were also created in 4, 8, and 16 channels using AngelSim software (Emily Shannon Fu Foundation, www.tigerspeech.com). The method for noise vocoding paralleled as described by Shannon et al. All stimuli were presented via a soundcard, and a single loudspeaker was located approximately 2 ft from the listeners, at an average level of 65 dB sound pressure level (SPL).

Acoustic analysis of the stimuli sentences

Praat v. 5.3.56 was used to analyze the range of intensity (max − min in dB), mean intensity (dB SPL), overall duration (s), mean F0 height (Hz), and F0 range (ratio of maximum to minimum F0) across all recordings (Boersma 2001).23 Repeated measures analyses of variance were applied for the results of acoustic analysis. Discriminability of the stimuli for different pairs of emotions was further analyzed. All discriminabilities (d′) within the matrix for each cue were summed to be a measure of the net discriminability provided by that cue. Figure 1 revealed the acoustic features of all sentences. Figure 1 shows that male speakers' F0 height and intensity cues carried the greater weight of discriminability. In contrast, the female speaker's voice did not emphasize specific acoustic cues as discriminability was more homogeneously spread across the five metrics (even though F0 height and mean intensity were again the most useful cues). By comparison, we also plotted the analyses of our previous study in English‐speaking cCI in the bottom right panel of Figure 1.16 The discriminability measure (d′) was formulated as described by Chartterjee et al.
FIGURE 1

Results of acoustic analyses of male (red circles) and female (blue squares) speakers' utterances in five emotions (abscissa). For the top five panels, each panel corresponds to a different acoustic cue. Each point labeled in the y‐axis represents the mean of all 12 sentences for each speaker, and error bars represent standard deviations. The bottom left panel is for the acoustic discriminability of Mandarin sentences, whereas the bottom right panel is for the acoustic discriminability of English sentences used in our previous study by Monita et al. SPL, Sound pressure level

Results of acoustic analyses of male (red circles) and female (blue squares) speakers' utterances in five emotions (abscissa). For the top five panels, each panel corresponds to a different acoustic cue. Each point labeled in the y‐axis represents the mean of all 12 sentences for each speaker, and error bars represent standard deviations. The bottom left panel is for the acoustic discriminability of Mandarin sentences, whereas the bottom right panel is for the acoustic discriminability of English sentences used in our previous study by Monita et al. SPL, Sound pressure level

Dynamic F0 changes

F0‐sweep stimuli were generated from broadband harmonic complexes with 100 partials, all in sine phase with equal amplitude (sampling rate of 44.1 kHz). The overall signal was low‐pass‐filtered at 10 kHz to ensure similar access to the bandwidth by cCI and cNH listeners. All stimuli were 300 ms long with 30‐ms onset and offset ramps. The F0 of the complex varied linearly from beginning to end with 12 final/initial F0 ratios (sweep rates of 0.5, 1, 2, 4, 8, and 16 semitones per second), yielding final/initial F0 ratios ranging from 0.25 semitones (an increase of 1.4% over initial F0) to 8 semitones (an increase of 58.74% over initial F0). The starting F0 was chosen randomly from one trial to the next from one of 10 bins uniformly distributed between 120 and 140 Hz, without replacement. In the discrimination task, the stimuli with opposite sweep directions had the same F0 range. All stimuli were equalized at 65 dB SPL and presented with a ± 3 dB level roving. There were six rates of F0 sweep and two directions (rising or falling). For both tasks, each experimental condition was repeated 10 times (120 trials).

Test procedures

Emotion recognition test

The participants heard each sentence and indicated which emotion was best associated with it by clicking on one of the five choices on the screen. The 12 sentences and 5 emotions were fully randomized within each condition. Four conditions were available for testing in all: full‐spectrum speech, 16‐channel NV speech, 8‐channel NV speech, and 4‐channel NV speech. All cCI heard only full‐spectrum speech, and cNH heard all four conditions (randomized order). Sentences were presented in blocks of a given speaker (male or female, also counterbalanced) and condition. Listeners were given passive training with sentences not used in testing to familiarize themselves with the speakers' styles. Participants were encouraged to take breaks between blocks. No feedback was provided during the test.

Discrimination of F0 changes

Participants completed 20 practice trials, using the highest sweep rate with rising and falling directions, and no level‐roving. The tasks used a child‐friendly interface with an animated cartoon figure of an animal of their choice: a smiley face providing encouragement for correct response and a sad face for incorrect response in the Task. Points were earned after completing certain numbers of trials to keep the child engaged. The task (discrimination) used a three‐interval, two‐alternative forced‐choice procedure, presenting a reference F0‐sweep stimulus, with either a rising or falling tone. The other stimuli were presented, one identical to the reference and the other with opposite direction (the latter two in random order). The listener was asked which, of Intervals 2 and 3, sounded different from the reference (Interval 1). Reaction times were recorded for each trial. Percentages of correct scores were finally converted into d′ and β values for statistical analyses.

RESULTS

Error patterns of emotion recognition

Figure 2 shows the error patterns for the cCI and cNH groups of listeners, male and female sentences, and under each condition of spectral resolution tested. The cells are color‐coded to represent the strength of the numerical values, whereas the actual values are also indicated. The matrix patterns for cNH become increasingly diagonally dominant as spectral clarity increases from 4, 8, and 16 channels to full spectrum. The matrix's pattern for cCI was closer to the pattern of cNH scores with eight‐channel NBV than those with other conditions. The “scare” was the most difficult voice emotion to be recognized for the cCI across speakers and for the cNH listening to female speakers. The common error patterns were that cCI would misinterpret being scared as being happy (25.97%) when listening to female speakers and misinterpret being scared as being angry (37.42%) when listening to male speakers.
FIGURE 2

Error pattern of voice emotion recognition. It shows the error patterns for the cCI and cNH groups of listeners, for the male and female speakers' sentences, and under each condition of spectral resolution tested. The cells are color‐coded to represent the strength of the numerical values, but the actual values are also indicated. cCI, Cochlear‐implanted children; cNH, normal‐hearing children

Error pattern of voice emotion recognition. It shows the error patterns for the cCI and cNH groups of listeners, for the male and female speakers' sentences, and under each condition of spectral resolution tested. The cells are color‐coded to represent the strength of the numerical values, but the actual values are also indicated. cCI, Cochlear‐implanted children; cNH, normal‐hearing children

Group mean emotion recognition scores

Spectral degradation (for cNH)

A linear mixed‐effects (LME) analysis with rationalized arcsine unit (RAU)‐transformed scores as the independent variable; age, condition (spectral resolution), and speaker as fixed effects; and subject‐based random intercepts showed significant effects of age (p < .0001), condition (p < .0001), speaker (p = .0156) and a significant interaction between speaker and condition (p = .0001; Figure 3). The cNH performance declined as the spectral resolution worsened.
FIGURE 3

Mean voice emotion recognition score for cCI and cNH under different spectral degradation, speaker, and age. In the left panel, an LME analysis with RAU‐transformed scores as the independent variable; age, condition (spectral resolution), and speaker as fixed effects; and subject‐based random intercepts showed significant effects of age, F(1, 51) = 21.42, p < .0001; condition, F(3, 51) = 2758.43, p < .0001; speaker, F(1, 51) = 6.26, p = .0156; and a significant interaction between speaker and condition, F(3, 51) = 8.49, p = .0001. In the central panel, the average score of cCI with full‐spectrum speech was closed to the average score of cNH with eight‐channel NBV speech. In the right panel, voice emotion recognition score as a function of F0 threshold (semitone) revealed that the average performance across talkers for cCI could be predicted by their sensitivity to changes in F0 (the thresholds extracted from the Weibull fits at a d′ of 0.77; R 2 = .3302; p = .0064). cCI, Cochlear‐implanted children; cNH, normal‐hearing children; LME, linear mixed effects; NBV, narrowband vocoder; RAU, rationalized arcsine unit

Mean voice emotion recognition score for cCI and cNH under different spectral degradation, speaker, and age. In the left panel, an LME analysis with RAU‐transformed scores as the independent variable; age, condition (spectral resolution), and speaker as fixed effects; and subject‐based random intercepts showed significant effects of age, F(1, 51) = 21.42, p < .0001; condition, F(3, 51) = 2758.43, p < .0001; speaker, F(1, 51) = 6.26, p = .0156; and a significant interaction between speaker and condition, F(3, 51) = 8.49, p = .0001. In the central panel, the average score of cCI with full‐spectrum speech was closed to the average score of cNH with eight‐channel NBV speech. In the right panel, voice emotion recognition score as a function of F0 threshold (semitone) revealed that the average performance across talkers for cCI could be predicted by their sensitivity to changes in F0 (the thresholds extracted from the Weibull fits at a d′ of 0.77; R 2 = .3302; p = .0064). cCI, Cochlear‐implanted children; cNH, normal‐hearing children; LME, linear mixed effects; NBV, narrowband vocoder; RAU, rationalized arcsine unit

Full‐spectrum speech

An LME model with RAU‐transformed scores as the independent variable; age at test, group (cNH or cCI), and speaker (male or female) as fixed effects; and subject‐based random intercepts showed significant effects of age (p = .0003), group (p < .0001), and speaker (p = .0003), and a marginally significant interaction between age and group (p = .0340) on mean emotion recognition cores (Figure 3). The mean emotion recognition score of the cNH group was significantly better than the cCI. The female speakers' vocal emotions were more easy to be recognized; this difference was most apparent for the cCI group.

Comparison between cCI and cNH

The cCI group showed a range of performance (including age dependency) like that of cNH attending to four‐channel and eight‐channel noise‐vocoded speech (Figure 3). The average score of cCI with full‐spectrum speech was close to the average score of cNH with eight‐channel NBV speech.

Sensitivity to changes

A large variability in the pitch sensitivity among implanted children was observed. Figure 3 (right panel) revealed that the average performance for voice emotion recognition across speakers for cCI could be predicted by their sensitivity to changes in F0 (r 2 = .3302; p = .0064). However, this is not suggestive for cNH listening the sentences with full spectrum. Moreover, there was no significant effect for the age at implant (p = .7552), age at test (p = .5998), and duration of CI experience (p = .7364) on the task of discrimination of F0 changes.

DISCUSSION

Acoustic analysis of the Mandarin testing sentences revealed a substantial difference in the pattern of the summed discriminability indices for different cues compared with the English testing sentences used in our previous study (Figure 1). Happy was spoken with the greatest F0 range and mean F0 height in Chatterjee et al.'s study; however, scared was spoken with the greatest mean F0 height and sad with the greatest F0 range in the present study. The discriminability measure (d′) in the present study showed that F0 height is the acoustic characteristics that contain the critical information, whereas F0 range could additionally help for female voices and mean intensity for male voices. In the previous study by Chatterjee et al., the male speakers' sentences contained more information in the mean intensity patterns, whereas the female speakers' sentences contained more information in the F0 range and the intensity range.16 The error patterns of voice emotion recognitions revealed large variability for cCI. A visual inspection of the patterns reveals that for cNH, the matrices become more and more diagonally dominant as spectral clarity increases. The diagonally dominant pattern observed for cCI is similar to that for four and eight‐channel NBV speech observed in cNH. The error patterns of voice emotion recognitions for Mandarin‐speaking cCI are not the same as English‐speaking cCI. For example, the “scared” was the most difficult voice emotion to be recognized for the cCI across speakers for Mandarin‐speaking cCI. Meanwhile, the most difficult voice emotion to be recognized for the English‐speaking cCI was “scared” for male speakers and “neutral” for the female speakers. In general, both cCI and cNH groups in full‐spectrum and NBV condition obtained higher voice emotion recognition scores when listening to female speakers than when listening to male speakers. This difference was most apparent for the cCI group. This is inconsistent with our previous study for the English‐speaking peer.16 More information in the F0 range is noted in the female speakers' sentences, whereas more information in the mean intensity patterns and duration is noted in the male speakers' sentences. This may suggest that F0 is the primary cue used for voice emotion recognition. Nevertheless, children might recognize voice emotion based on secondary cues (such as intensity and duration) other than F0 ranges for cCI or degraded NBV for cNH because F0 cues are very severely degraded in CI and NBV with four or eight channels. Studies investigating music emotion processing found that CI users depend on tempo rather than pitch in the processing of musical emotion. , , , The present study suggests a similar auditory processing strategy for emotion, by increased reliance on cues such as intensity and duration that are closer to tempo‐based aspects of music, for CI users compared to NH listeners. Some cCI in this study could achieve high scores of emotion recognition. It would be interesting to investigate the underlying auditory emotion processing strategy for those cCI exhibiting high performance in this study. Although the F0 cues are severely degraded in CI and NBV with four or eight channels, they might possess an unaccounted method to interpret F0 information. Participants' age at test has significant effect on voice emotion recognition as noted for both the cCI and cNH with degraded NBV in this study. However, the age of implantation did not show an effect on CI children's performances, suggesting that the effect is genuinely developmental in nature. We suppose that brain maturation plays a role in voice emotion recognition for cCI. A tonal language benefit in pitch perception for children with CI has been reported in the literature. Present results further revealed that a high sensitivity to changes in F0 predicted a better performance of emotion recognition across Mandarin speakers cCI (R 2 = .3302; p = .0064). However, there was no significant effect for the age at implant, age at test, and duration of CI experience on the task of discrimination of F0 changes. This suggests that, in addition to a psychological representation of F0, brain plasticity would also integrate other secondary auditory cues. Together with the positive effect of age at test for cCI on emotion recognition in present study, cCI might grow up with developed cognitive systems and adapted alternative ways to process auditory emotion. We suppose that improved sensitivity of tempo and intensity changes might be a main part of the development of cognitive systems for auditory emotion in cCI.

CONCLUSION

As a result of device limitation in prosody processing, Mandarin‐speaking cCI showed deficits in voice emotion recognition. Mandarin‐speaking cCI performed comparably with cNH listening to spectral degraded speech, suggesting that cCI may have sufficiently developed adaptive strategies to interpret emotion from degraded auditory signals. Better pitch discrimination ability came with better voice emotion recognition. Besides the F0 cues, cCI adapted their voice emotion recognition to rely more on secondary cues such as intensity and duration. Although cross‐culture differences existed for the acoustic features of voice emotion, Mandarin‐speaking cCI and their English‐speaking cCI peer exhibited a positive effect between age at test on emotion recognition, suggesting the learning effects or possibly maturation effects. Therefore, further device/processor development to improve the presentation of F0 information and more rehabilitative efforts are needed to improve the transmission and perception of voice emotion.

CONFLICT OF INTEREST

The authors declare that they have no conflicts of interest.
  25 in total

1.  The surprising performance of present-day cochlear implants.

Authors:  Blake S Wilson; Michael F Dorman
Journal:  IEEE Trans Biomed Eng       Date:  2007-06       Impact factor: 4.538

2.  Vocal emotion recognition by normal-hearing listeners and cochlear implant users.

Authors:  Qian-Jie Fu; John J Galvin
Journal:  Trends Amplif       Date:  2007-12

3.  Lexical tone identification and consonant recognition in acoustic simulations of cochlear implants.

Authors:  Yung-Song Lin; Huei-Ping Lu; Su-Chen Hung; Chung-Ping Chang
Journal:  Acta Otolaryngol       Date:  2009-06       Impact factor: 1.494

4.  Better speech recognition with cochlear implants.

Authors:  B S Wilson; C C Finley; D T Lawson; R D Wolford; D K Eddington; W M Rabinowitz
Journal:  Nature       Date:  1991-07-18       Impact factor: 49.962

5.  Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners.

Authors:  Silke Paulmann; Ayse K Uskul
Journal:  Cogn Emot       Date:  2013-07-17

6.  Neurophysiological Differences in Emotional Processing by Cochlear Implant Users, Extending Beyond the Realm of Speech.

Authors:  Mickael L D Deroche; Mihaela Felezeu; Sébastien Paquette; Anthony Zeitouni; Alexandre Lehmann
Journal:  Ear Hear       Date:  2019 Sep/Oct       Impact factor: 3.570

7.  Vocal Emotion Identification by Children Using Cochlear Implants, Relations to Voice Quality, and Musical Interests.

Authors:  Teija Waaramaa; Tarja Kukkonen; Sari Mykkänen; Ahmed Geneid
Journal:  J Speech Lang Hear Res       Date:  2018-04-17       Impact factor: 2.297

8.  Musical and vocal emotion perception for cochlear implants users.

Authors:  S Paquette; G D Ahmed; M V Goffi-Gomez; A C H Hoshino; I Peretz; A Lehmann
Journal:  Hear Res       Date:  2018-08-25       Impact factor: 3.208

9.  Normal-Hearing Listeners' and Cochlear Implant Users' Perception of Pitch Cues in Emotional Speech.

Authors:  Steven Gilbers; Christina Fuller; Dicky Gilbers; Mirjam Broersma; Martijn Goudbeek; Rolien Free; Deniz Başkent
Journal:  Iperception       Date:  2015-10-18

10.  Deficits in the Sensitivity to Pitch Sweeps by School-Aged Children Wearing Cochlear Implants.

Authors:  Mickael L D Deroche; Aditya M Kulkarni; Julie A Christensen; Charles J Limb; Monita Chatterjee
Journal:  Front Neurosci       Date:  2016-03-03       Impact factor: 4.677

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.