Literature DB >> 35155805

Voice emotion recognition by Mandarin-speaking pediatric cochlear implant users in Taiwan.

Yung-Song Lin^1,2, Che-Ming Wu^3,4,5, Charles J Limb⁶, Hui-Ping Lu⁷, I Jung Feng⁸, Shu-Chen Peng⁹, Mickael L D Deroche¹⁰, Monita Chatterjee¹¹.

Abstract

OBJECTIVES: To explore the effects of obligatory lexical tone learning on speech emotion recognition and the cross-culture differences between United States and Taiwan for speech emotion understanding in children with cochlear implant.
METHODS: This cohort study enrolled 60 cochlear-implanted (cCI) Mandarin-speaking, school-aged children who underwent cochlear implantation before 5 years of age and 53 normal-hearing children (cNH) in Taiwan. The emotion recognition and the sensitivity of fundamental frequency (F0) changes for those school-aged cNH and cCI (6-17 years old) were examined in a tertiary referred center.
RESULTS: The mean emotion recognition score of the cNH group was significantly better than the cCI. Female speakers' vocal emotions are more easily to be recognized than male speakers' emotion. There was a significant effect of age at test on voice recognition performance. The average score of cCI with full-spectrum speech was close to the average score of cNH with eight-channel narrowband vocoder speech. The average performance of voice emotion recognition across speakers for cCI could be predicted by their sensitivity to changes in F0.
CONCLUSIONS: Better pitch discrimination ability comes with better voice emotion recognition for Mandarin-speaking cCI. Besides the F0 cues, cCI are likely to adapt their voice emotion recognition by relying more on secondary cues such as intensity and duration. Although cross-culture differences exist for the acoustic features of voice emotion, Mandarin-speaking cCI and their English-speaking cCI peer expressed a positive effect for age at test on emotion recognition, suggesting the learning effect and brain plasticity. Therefore, further device/processor development to improve presentation of pitch information and more rehabilitative efforts are needed to improve the transmission and perception of voice emotion in Mandarin. LEVEL OF EVIDENCE: 3.

Entities: Chemical

Keywords: cochlear implant; lexical tone; pitch discrimination; voice emotion

Year: 2022 PMID： 35155805 PMCID： PMC8823186 DOI： 10.1002/lio2.732

Source DB: PubMed Journal: Laryngoscope Investig Otolaryngol ISSN： 2378-8038

INTRODUCTION

Vocal expression of emotion is necessary for social interaction across civilizations. , , In addition to facial expressions, people often use prosodic vocal cues to show their emotions. Consequently, when prosodic vocal cues are absent, the expression and perception of emotion are hindered, negatively affecting social interactions and development. These have been frequently observed in children with cochlear implants (CIs). , , Although cochlear implant (CI) development has had remarkable achievements allowing people with profound hearing loss to hear lexical meaning in certain environments, , , limitations remain for present CI systems. Limitations in a CI system to transmit prosodic cues (e.g., pitch) result in limitations in the user's interpretation and communication of voice emotion. For tonal languages, slow‐pitch changes convey prosodic/emotional information, whereas rapid inflections within syllables convey meaning. , The unique demands for adequate pitch perception in tonal language may alter the fundamental frequency (F0)‐processing mechanisms of a developing auditory system. This might be altered further in children with CIs. The present study supposes that (secondary) covarying cues such as changes in intensity and duration convey the same information to some extent when F0 processing is degraded.14 To further explore the effects of obligatory lexical tone learning on speech emotion recognition, the current study was conducted on Mandarin‐speaking children in Taiwan. This work, involving collaborations with labs in the United States and Taiwan, is expected to explore the cross‐culture differences for speech emotion understanding and reveal important implications for both CI technology and rehabilitative therapies for children with CIs.

MATERIALS AND METHODS

Study design and oversight

This study was part of one multicenter, retrospective cohort research, sponsored by National Health Institute (NIH R01‐DC014233‐01). English‐speaking subjects were recruited and tested at the Johns Hopkins University School of Medicine in Baltimore, MD, and Boys Town National Research Hospital in Omaha, NE. Results and methods of the study for English‐speaking participants have been published in a prior study. Mandarin‐speaking children were recruited and tested at Chi Mei Medical Center and Chang Gung Memorial Hospital in Taiwan. Sixty cochlear‐implanted (cCI) and 53 normal‐hearing (cNH) Mandarin‐speaking children participated in this study (Tables 1 and 2). The Institutional Review Board and the Committee of Human Subjects Protection of Chi Mei Medical Center approved this study. The author(s) declare(s) that there is no conflict of interest.

TABLE 1

Participants, children with cochlear implants

Participants	Sex	Age at implantation	Age at test	Device experience	Average residual hearing	TONI‐3	Device	Insertion length/active channels	Strategy	Daily listening condition
CM13	F	2.71	7.50	4.80	90	79	Nucleus 24RE	Full/20	ACE	CI only
CX15	M	2.94	7.92	4.98	100	124	Nucleus 24RE	Full/20	ACE	CI only
CM16	F	3.21	9.47	6.26	>100	125	Nucleus 24RE	Full/20	ACE	CI only
CX17	F	2.73	7.18	4.45	90	112	Nucleus 24RE	Full/20	ACE	CI only
CG18	M	1.85	15.81	13.97	90	111	Nucleus 24RE	Full/21	ACE	CI only
NT40	M	2.27	12.55	10.28	100	111	MED‐EL Pulsar	Full/12	FSP	CI only
ZX43	M	2.16	8.14	5.98	100	97	Nucleus 24RE	Full/21	ACE	CI only
ZX44	M	2.40	6.81	4.41	100	126	AB HiRes90k	Full/120	HiRes‐P/Fidelity‐120	CI only
MJ45	M	4.52	8.23	3.71	85	95	Nucleus 24RE	Full/20	ACE	CI only
CG12	M	2.54	12.76	10.22	100	105	Nucleus 24CS	Full/20	ACE	CI only
CX19	M	3.80	15.50	11.70	90	98	Nucleus 24CS	Full/20	ACE	CI only
CX20	F	1.56	8.95	7.39	100	105	Nucleus 24RE	Full/21	ACE	CI only
CG21	M	2.54	7.04	4.50	90	92	Nucleus 24RE	Full/20	ACE	CI only
ZX22	M	2.42	7.95	5.53	>100	93	AB HiRes90k	Full/120	HiRes‐P/Fidelity‐120	CI only
CG23	F	1.50	7.28	5.78	90	106	Nucleus 24RE	Full/21	ACE	CI only
CR24	M	1.92	7.83	5.91	100	122	Nucleus 24RE	Full/20	ACE	CI only
CG25	M	2.52	8.57	6.05	90	127	Nucleus 24RE	Full/21	ACE	CI only
CX26	F	2.18	10.35	8.18	100	83	Nucleus 24RE	Full/21	ACE	CI only
XG27	M	2.13	15.94	13.81	>100	81	Nucleus 24RE	Full/21	ACE	CI only
CG28	F	2.74	8.77	6.03	100	95	Nucleus 24RE	Full/20	ACE	CI only
CX29	M	1.54	7.24	5.70	90	112	Nucleus 24RE	Full/21	ACE	CI only
CL30	F	2.03	17.21	15.17	100	Over	Nucleus 24RE	Full/20	ACE	CI only
CG31	M	2.81	15.76	12.95	100	94	Nucleus 24CS	Full/20	ACE	CI only
MC37	M	1.29	9.56	8.27	90	126	Nucleus 24RE	Full/21	ACE	CI only
XG32	F	2.33	9.65	7.32	100	94	Nucleus 24RE	Full/20	ACE	CI only
XG33	F	3.31	14.57	11.27	90	109	Nucleus 24CS	Full/20	ACE	CI only
CG34	M	1.60	9.78	8.18	90	116	Nucleus 24RE	Full/21	ACE	CI only
CM35	F	2.30	13.56	11.25	>100	86	Nucleus24CS	Full/20	N24	CI only
CM51	F	1.96	6.59	4.63	100	114	MED‐EL Sonata	Full/12	FSP	CI only
CG36	F	2.60	9.03	6.43	90	105	Nucleus 24RE	Full/20	ACE	CI only
CG37	F	3.06	10.10	7.05	90	100	Nucleus 24RE	Full/20	ACE	CI only
CG08	F	2.53	11.13	8.60	90	116	Nucleus 24RE	Full/20	ACE	CI only
CR30	F	3.29	17.38	14.09	100	Over	Nucleus 24RE	Full/20	ACE	CI only
XG38	M	1.62	16.69	15.06	90	Over	Nucleus 24RE	Full/21	ACE	CI only
XG39	M	3.32	7.87	4.55	>100	103	Nucleus 24RE	Full/19	ACE	CI only
CG40	M	2.64	10.13	7.49	90	115	Nucleus 24RE	Full/20	ACE	CI only
XG04	F	3.77	15.27	11.50	90	93	Nucleus 24SC	Full/20	ACE	CI only
CM49	M	1.40	9.75	8.35	100	109	Nucleus 24RE	Full/21	ACE	CI only
CG41	F	1.93	7.15	5.22	90	110	Nucleus 24RE	Full/21	ACE	CI only
CX42	F	2.44	7.80	5.36	90	112	Nucleus 24RE	Full/20	ACE	CI only
CX43	M	4.44	11.30	6.86	90	97	Nucleus 24RE	Full/20	ACE	CI only
CG44	F	3.60	9.21	5.61	90	87	Nucleus 24RE	Full/19	ACE	CI only
CL45	M	1.01	7.29	6.28	100	97	Nucleus 24RE	Full/21	ACE	CI only
MG12	F	2.63	15.84	13.22	>100	98	Nucleus 24RE	Full/21	ACE	CI only
MG46	F	1.07	9.06	7.98	100	115	Nucleus 24RE	Full/21	ACE	CI only
CG47	M	1.11	6.66	5.55	100	122	Nucleus 24RE	Full/21	ACE	CI only
XG48	M	2.87	17.19	14.32	90	Over	Nucleus 24RE	Full/20	ACE	CI only
CG49	F	2.84	7.87	5.03	90	101	Nucleus 24RE	Full/20	ACE	CI only
CG51	M	1.52	7.69	6.17	100	97	Nucleus 24RE	Full/21	ACE	CI only
CM53	F	1.28	10.52	9.23	100	122	MED‐EL Concerto	Full/12	FSP	CI only
CM54	M	1.13	7.30	6.17	90	106	AB HiRes90k	Full/120	HiRes‐P/Fidelity‐120	CI only
CM52	F	1.64	13.33	11.70	100	103	Nucleus 24RE	Full/20	ACE	CI only
XG53	M	2.36	9.50	7.14	90	103	Nucleus 24RE	Full/20	ACE	CI only
CG54	M	4.21	9.50	5.29	90	92	Nucleus 24RE	Full/20	ACE	CI only
CM55	M	1.15	8.31	7.16	100	106	Nucleus 24RE	Full/21	ACE	CI only
CM56	F	1.96	6.41	4.44	100	118	MED‐EL Concerto	Full/12	FSP	CI only
CM57	F	4.42	6.46	2.04	90	108	MED‐EL Concerto	Full/12	FSP	CI only
XG56	M	3.68	7.24	3.56	90	103	AB HiRes90k	Full/120	HiRes‐P/Fidelity‐120	CI only
CG57	F	3.22	13.43	10.21	90	95	Nucleus 24RE	Full/20	ACE	CI only
XG58	M	3.07	8.97	5.89	90	95	Nucleus 24RE	Full/19	ACE	CI only

Abbreviations: CI, Cochlear implant; F, female; M, male; TONI‐3, Test of Nonverbal Intelligence—Third edition; ACE, advanced combination encoder; FSP, fine structure processing.

TABLE 2

Participants, normal‐hearing children

Participants	Gender	Age at testing	TONI‐3 score	Average PTA
1	F	10.44	102	5.83
2	M	13.78	116	7.50
3	M	12.71	113	10.00
4	M	10.44	116	8.33
5	F	10.76	94	5.83
6	F	6.92	105	10.00
7	F	13.42	89	2.08
8	F	13.54	100	2.92
9	F	10.71	100	14.58
10	F	16.22	122	2.92
11	M	8.11	95	10.42
12	M	10.02	111	7.08
13	F	7.04	119	5.00
14	F	16.43	94	5.00
15	F	9.23	118	3.75
16	F	11.09	92	10.42
17	F	14.01	107	3.75
18	F	16.78	‐‐	5.00
19	F	9.02	113	5.00
20	F	7.76	120	6.25
21	F	7.38	121	9.17
22	F	15.58	108	8.33
23	M	13.76	91	10.42
24	F	13.16	111	4.58
25	F	13.76	100	8.33
26	M	8.72	113	7.92
27	F	7.92	124	7.92
28	F	9.81	118	6.67
29	F	15.27	140	9.17
30	F	10.84	109	9.58
31	M	14.51	94	9.58
32	M	9.10	107	7.92
33	M	10.95	104	11.25
34	M	9.91	113	7.08
35	M	10.09	104	5.00
36	F	9.25	92	6.25
37	M	8.62	120	11.25
38	F	12.10	106	9.17
39	F	14.03	129	10.83
40	F	15.33	112	3.33
41	F	15.15	109	3.75
42	F	14.69	124	0.00
43	M	9.02	107	5.83
44	F	8.52	105	6.25
45	M	10.30	129	5.83
46	F	7.39	102	12.08
47	M	9.29	115	10.00
48	M	8.89	110	7.50
49	M	8.45	110	8.75
50	M	8.21	91	10.42
51	M	7.25	106	7.92
52	M	6.52	108	2.92
53	F	8.44	130	5.42
	Average	10.96	109.38	7.24

Abbreviations: F, Female; M, male; PTA, Pure‐tone audiometry; TONI‐3, Test of Nonverbal Intelligence—Third edition.

Participants, children with cochlear implants Abbreviations: CI, Cochlear implant; F, female; M, male; TONI‐3, Test of Nonverbal Intelligence—Third edition; ACE, advanced combination encoder; FSP, fine structure processing. Participants, normal‐hearing children Abbreviations: F, Female; M, male; PTA, Pure‐tone audiometry; TONI‐3, Test of Nonverbal Intelligence—Third edition. In this study, we measured the emotion recognition by school‐aged cNH and cCI (6–17 years old). The cNH performed the task with both original (full‐spectrum) speech and spectrally degraded, 4‐, 8‐, and 16‐channel narrowband vocoder (NBV) speech. As it was expected that the cCI would have difficulty in the task, the stimuli were recorded in a child‐directed manner. The sensitivity of F0 changes was also tested for the cNH and cCI. All participants gave informed consent prior to participation.

Participants

Sixty profoundly hearing‐impaired children without physical and visual deficit (disabilities) who underwent cochlear implantation before 5 years of age (32 boys, 28 girls, age range: 6.41–17.38 years, mean age 10.23 ± 3.22 years) and 53 cNH (21 boys, 32 girls, age range: 6.52–16.78 years, mean age 10.96 ± 2.92 years) participated in this study (Tables 1 and 2). There was no significant mean age difference between the two groups (t = −1.193, p = .236). Table 1 showed the clinical characteristics of the participating children with CI. For the cCI, 51 were implanted with Nucleus, 5 were implanted with MED‐EL devices, and 4 were implanted with AB devices. Test of Nonverbal Intelligence—Third edition was used to evaluate the general intelligence of the participants. There was no significant mean intelligence difference between the two groups (cCI: mean = 102.91, SD = 12.08; cNH: mean = 109.38, SD = 11.40; p = .09).

Tasks

Recording

Twelve emotionally neutral sentences (Table 3) from the Hearing In Noise Test (HINT) corpus were translated from English to Mandarin and recorded by two speakers (one male and one female) in five different emotions (happy, scared, neutral, sad, and angry) in a children‐directed manner. The two speakers were 25 and 27 years old and native speakers of Mandarin.

TABLE 3

List of sentences

Item#	English sentences (six syllables each)	Mandarin sentences
1	Her coat is on the chair.	她外套在椅子上。
2	The road goes up the hill.	這條路通山上。
3	They're going out tonight.	他們今晚要外出。
4	He wore his yellow shirt.	他穿了黃襯衫。
5	They took some food outside.	他們拿了一些食物去外面。
6	The truck drove up the road.	卡車開上路。
7	The tall man tied his shoes.	那男生綁緊鞋帶。
8	The mailman shut the gate.	郵差關上門。
9	The lady wore a coat.	那女孩穿著大衣。
10	The chicken laid some eggs.	雞生了幾顆蛋。
11	A fish swam in the pound.	魚在池裡游。
12	Snow falls in the winter.	冬天會下雪。

List of sentences

Listening task

Inclusion criteria for the participants were (1) children aged from 6 to 18 with normal hearing, (2) prelingually deaf children aged from 6 to 18, who underwent cochlear implantation at <5 years of age, and (3) all participants should not have any other physical and visual disability. All children received a hearing test (pure‐tone audiometry and sound field audiometry test) and a nonverbal intelligence test before starting the assigned task. The mother's educational level, an important predictor of performance, , was also recorded. For the participating children, the nonverbal intelligence was measured using the matrix reasoning and block design subtests of the Wechsler Abbreviated Scale of Intelligence ; linguistic ability was measured using Peabody Picture Vocabulary Test.

Stimuli

Emotion recognition

Speakers for the recording task were seated in a sound‐treated booth, positioned 12 in. in front of a SHURE SM63 microphone with Marantz PMD661 solid‐state recorder, and produced the sentences in the five emotions three times each. The sentences selected from the HINT corpus were translated from English to Mandarin based on their semantically emotion‐neutral content. Using Adobe Audition version 1.5 software, the original recorded audio files (44.1 kHz sampling rate, 16 bit) were edited. Noise‐vocoded versions of these sentences were also created in 4, 8, and 16 channels using AngelSim software (Emily Shannon Fu Foundation, www.tigerspeech.com). The method for noise vocoding paralleled as described by Shannon et al. All stimuli were presented via a soundcard, and a single loudspeaker was located approximately 2 ft from the listeners, at an average level of 65 dB sound pressure level (SPL).

Acoustic analysis of the stimuli sentences

Praat v. 5.3.56 was used to analyze the range of intensity (max − min in dB), mean intensity (dB SPL), overall duration (s), mean F0 height (Hz), and F0 range (ratio of maximum to minimum F0) across all recordings (Boersma 2001).23 Repeated measures analyses of variance were applied for the results of acoustic analysis. Discriminability of the stimuli for different pairs of emotions was further analyzed. All discriminabilities (d′) within the matrix for each cue were summed to be a measure of the net discriminability provided by that cue. Figure 1 revealed the acoustic features of all sentences. Figure 1 shows that male speakers' F0 height and intensity cues carried the greater weight of discriminability. In contrast, the female speaker's voice did not emphasize specific acoustic cues as discriminability was more homogeneously spread across the five metrics (even though F0 height and mean intensity were again the most useful cues). By comparison, we also plotted the analyses of our previous study in English‐speaking cCI in the bottom right panel of Figure 1.16 The discriminability measure (d′) was formulated as described by Chartterjee et al.

FIGURE 1

Results of acoustic analyses of male (red circles) and female (blue squares) speakers' utterances in five emotions (abscissa). For the top five panels, each panel corresponds to a different acoustic cue. Each point labeled in the y‐axis represents the mean of all 12 sentences for each speaker, and error bars represent standard deviations. The bottom left panel is for the acoustic discriminability of Mandarin sentences, whereas the bottom right panel is for the acoustic discriminability of English sentences used in our previous study by Monita et al. SPL, Sound pressure level

Dynamic F0 changes

F0‐sweep stimuli were generated from broadband harmonic complexes with 100 partials, all in sine phase with equal amplitude (sampling rate of 44.1 kHz). The overall signal was low‐pass‐filtered at 10 kHz to ensure similar access to the bandwidth by cCI and cNH listeners. All stimuli were 300 ms long with 30‐ms onset and offset ramps. The F0 of the complex varied linearly from beginning to end with 12 final/initial F0 ratios (sweep rates of 0.5, 1, 2, 4, 8, and 16 semitones per second), yielding final/initial F0 ratios ranging from 0.25 semitones (an increase of 1.4% over initial F0) to 8 semitones (an increase of 58.74% over initial F0). The starting F0 was chosen randomly from one trial to the next from one of 10 bins uniformly distributed between 120 and 140 Hz, without replacement. In the discrimination task, the stimuli with opposite sweep directions had the same F0 range. All stimuli were equalized at 65 dB SPL and presented with a ± 3 dB level roving. There were six rates of F0 sweep and two directions (rising or falling). For both tasks, each experimental condition was repeated 10 times (120 trials).

Test procedures

Emotion recognition test

The participants heard each sentence and indicated which emotion was best associated with it by clicking on one of the five choices on the screen. The 12 sentences and 5 emotions were fully randomized within each condition. Four conditions were available for testing in all: full‐spectrum speech, 16‐channel NV speech, 8‐channel NV speech, and 4‐channel NV speech. All cCI heard only full‐spectrum speech, and cNH heard all four conditions (randomized order). Sentences were presented in blocks of a given speaker (male or female, also counterbalanced) and condition. Listeners were given passive training with sentences not used in testing to familiarize themselves with the speakers' styles. Participants were encouraged to take breaks between blocks. No feedback was provided during the test.

Discrimination of F0 changes

Participants completed 20 practice trials, using the highest sweep rate with rising and falling directions, and no level‐roving. The tasks used a child‐friendly interface with an animated cartoon figure of an animal of their choice: a smiley face providing encouragement for correct response and a sad face for incorrect response in the Task. Points were earned after completing certain numbers of trials to keep the child engaged. The task (discrimination) used a three‐interval, two‐alternative forced‐choice procedure, presenting a reference F0‐sweep stimulus, with either a rising or falling tone. The other stimuli were presented, one identical to the reference and the other with opposite direction (the latter two in random order). The listener was asked which, of Intervals 2 and 3, sounded different from the reference (Interval 1). Reaction times were recorded for each trial. Percentages of correct scores were finally converted into d′ and β values for statistical analyses.

RESULTS

Error patterns of emotion recognition

Figure 2 shows the error patterns for the cCI and cNH groups of listeners, male and female sentences, and under each condition of spectral resolution tested. The cells are color‐coded to represent the strength of the numerical values, whereas the actual values are also indicated. The matrix patterns for cNH become increasingly diagonally dominant as spectral clarity increases from 4, 8, and 16 channels to full spectrum. The matrix's pattern for cCI was closer to the pattern of cNH scores with eight‐channel NBV than those with other conditions. The “scare” was the most difficult voice emotion to be recognized for the cCI across speakers and for the cNH listening to female speakers. The common error patterns were that cCI would misinterpret being scared as being happy (25.97%) when listening to female speakers and misinterpret being scared as being angry (37.42%) when listening to male speakers.

FIGURE 2

Error pattern of voice emotion recognition. It shows the error patterns for the cCI and cNH groups of listeners, for the male and female speakers' sentences, and under each condition of spectral resolution tested. The cells are color‐coded to represent the strength of the numerical values, but the actual values are also indicated. cCI, Cochlear‐implanted children; cNH, normal‐hearing children

Group mean emotion recognition scores

Spectral degradation (for cNH)

A linear mixed‐effects (LME) analysis with rationalized arcsine unit (RAU)‐transformed scores as the independent variable; age, condition (spectral resolution), and speaker as fixed effects; and subject‐based random intercepts showed significant effects of age (p < .0001), condition (p < .0001), speaker (p = .0156) and a significant interaction between speaker and condition (p = .0001; Figure 3). The cNH performance declined as the spectral resolution worsened.

FIGURE 3

Mean voice emotion recognition score for cCI and cNH under different spectral degradation, speaker, and age. In the left panel, an LME analysis with RAU‐transformed scores as the independent variable; age, condition (spectral resolution), and speaker as fixed effects; and subject‐based random intercepts showed significant effects of age, F(1, 51) = 21.42, p < .0001; condition, F(3, 51) = 2758.43, p < .0001; speaker, F(1, 51) = 6.26, p = .0156; and a significant interaction between speaker and condition, F(3, 51) = 8.49, p = .0001. In the central panel, the average score of cCI with full‐spectrum speech was closed to the average score of cNH with eight‐channel NBV speech. In the right panel, voice emotion recognition score as a function of F0 threshold (semitone) revealed that the average performance across talkers for cCI could be predicted by their sensitivity to changes in F0 (the thresholds extracted from the Weibull fits at a d′ of 0.77; R 2 = .3302; p = .0064). cCI, Cochlear‐implanted children; cNH, normal‐hearing children; LME, linear mixed effects; NBV, narrowband vocoder; RAU, rationalized arcsine unit

Full‐spectrum speech

An LME model with RAU‐transformed scores as the independent variable; age at test, group (cNH or cCI), and speaker (male or female) as fixed effects; and subject‐based random intercepts showed significant effects of age (p = .0003), group (p < .0001), and speaker (p = .0003), and a marginally significant interaction between age and group (p = .0340) on mean emotion recognition cores (Figure 3). The mean emotion recognition score of the cNH group was significantly better than the cCI. The female speakers' vocal emotions were more easy to be recognized; this difference was most apparent for the cCI group.

Comparison between cCI and cNH

The cCI group showed a range of performance (including age dependency) like that of cNH attending to four‐channel and eight‐channel noise‐vocoded speech (Figure 3). The average score of cCI with full‐spectrum speech was close to the average score of cNH with eight‐channel NBV speech.

Sensitivity to changes

A large variability in the pitch sensitivity among implanted children was observed. Figure 3 (right panel) revealed that the average performance for voice emotion recognition across speakers for cCI could be predicted by their sensitivity to changes in F0 (r 2 = .3302; p = .0064). However, this is not suggestive for cNH listening the sentences with full spectrum. Moreover, there was no significant effect for the age at implant (p = .7552), age at test (p = .5998), and duration of CI experience (p = .7364) on the task of discrimination of F0 changes.

DISCUSSION

Acoustic analysis of the Mandarin testing sentences revealed a substantial difference in the pattern of the summed discriminability indices for different cues compared with the English testing sentences used in our previous study (Figure 1). Happy was spoken with the greatest F0 range and mean F0 height in Chatterjee et al.'s study; however, scared was spoken with the greatest mean F0 height and sad with the greatest F0 range in the present study. The discriminability measure (d′) in the present study showed that F0 height is the acoustic characteristics that contain the critical information, whereas F0 range could additionally help for female voices and mean intensity for male voices. In the previous study by Chatterjee et al., the male speakers' sentences contained more information in the mean intensity patterns, whereas the female speakers' sentences contained more information in the F0 range and the intensity range.16 The error patterns of voice emotion recognitions revealed large variability for cCI. A visual inspection of the patterns reveals that for cNH, the matrices become more and more diagonally dominant as spectral clarity increases. The diagonally dominant pattern observed for cCI is similar to that for four and eight‐channel NBV speech observed in cNH. The error patterns of voice emotion recognitions for Mandarin‐speaking cCI are not the same as English‐speaking cCI. For example, the “scared” was the most difficult voice emotion to be recognized for the cCI across speakers for Mandarin‐speaking cCI. Meanwhile, the most difficult voice emotion to be recognized for the English‐speaking cCI was “scared” for male speakers and “neutral” for the female speakers. In general, both cCI and cNH groups in full‐spectrum and NBV condition obtained higher voice emotion recognition scores when listening to female speakers than when listening to male speakers. This difference was most apparent for the cCI group. This is inconsistent with our previous study for the English‐speaking peer.16 More information in the F0 range is noted in the female speakers' sentences, whereas more information in the mean intensity patterns and duration is noted in the male speakers' sentences. This may suggest that F0 is the primary cue used for voice emotion recognition. Nevertheless, children might recognize voice emotion based on secondary cues (such as intensity and duration) other than F0 ranges for cCI or degraded NBV for cNH because F0 cues are very severely degraded in CI and NBV with four or eight channels. Studies investigating music emotion processing found that CI users depend on tempo rather than pitch in the processing of musical emotion. , , , The present study suggests a similar auditory processing strategy for emotion, by increased reliance on cues such as intensity and duration that are closer to tempo‐based aspects of music, for CI users compared to NH listeners. Some cCI in this study could achieve high scores of emotion recognition. It would be interesting to investigate the underlying auditory emotion processing strategy for those cCI exhibiting high performance in this study. Although the F0 cues are severely degraded in CI and NBV with four or eight channels, they might possess an unaccounted method to interpret F0 information. Participants' age at test has significant effect on voice emotion recognition as noted for both the cCI and cNH with degraded NBV in this study. However, the age of implantation did not show an effect on CI children's performances, suggesting that the effect is genuinely developmental in nature. We suppose that brain maturation plays a role in voice emotion recognition for cCI. A tonal language benefit in pitch perception for children with CI has been reported in the literature. Present results further revealed that a high sensitivity to changes in F0 predicted a better performance of emotion recognition across Mandarin speakers cCI (R 2 = .3302; p = .0064). However, there was no significant effect for the age at implant, age at test, and duration of CI experience on the task of discrimination of F0 changes. This suggests that, in addition to a psychological representation of F0, brain plasticity would also integrate other secondary auditory cues. Together with the positive effect of age at test for cCI on emotion recognition in present study, cCI might grow up with developed cognitive systems and adapted alternative ways to process auditory emotion. We suppose that improved sensitivity of tempo and intensity changes might be a main part of the development of cognitive systems for auditory emotion in cCI.

CONCLUSION

As a result of device limitation in prosody processing, Mandarin‐speaking cCI showed deficits in voice emotion recognition. Mandarin‐speaking cCI performed comparably with cNH listening to spectral degraded speech, suggesting that cCI may have sufficiently developed adaptive strategies to interpret emotion from degraded auditory signals. Better pitch discrimination ability came with better voice emotion recognition. Besides the F0 cues, cCI adapted their voice emotion recognition to rely more on secondary cues such as intensity and duration. Although cross‐culture differences existed for the acoustic features of voice emotion, Mandarin‐speaking cCI and their English‐speaking cCI peer exhibited a positive effect between age at test on emotion recognition, suggesting the learning effects or possibly maturation effects. Therefore, further device/processor development to improve the presentation of F0 information and more rehabilitative efforts are needed to improve the transmission and perception of voice emotion.

CONFLICT OF INTEREST

The authors declare that they have no conflicts of interest.

25 in total

1. The surprising performance of present-day cochlear implants.

Authors: Blake S Wilson; Michael F Dorman
Journal: IEEE Trans Biomed Eng Date: 2007-06 Impact factor: 4.538

2. Vocal emotion recognition by normal-hearing listeners and cochlear implant users.

Authors: Qian-Jie Fu; John J Galvin
Journal: Trends Amplif Date: 2007-12

3. Lexical tone identification and consonant recognition in acoustic simulations of cochlear implants.

Authors: Yung-Song Lin; Huei-Ping Lu; Su-Chen Hung; Chung-Ping Chang
Journal: Acta Otolaryngol Date: 2009-06 Impact factor: 1.494

4. Better speech recognition with cochlear implants.

Authors: B S Wilson; C C Finley; D T Lawson; R D Wolford; D K Eddington; W M Rabinowitz
Journal: Nature Date: 1991-07-18 Impact factor: 49.962

5. Cross-cultural emotional prosody recognition: evidence from Chinese and British listeners.

Authors: Silke Paulmann; Ayse K Uskul
Journal: Cogn Emot Date: 2013-07-17

6. Neurophysiological Differences in Emotional Processing by Cochlear Implant Users, Extending Beyond the Realm of Speech.

Authors: Mickael L D Deroche; Mihaela Felezeu; Sébastien Paquette; Anthony Zeitouni; Alexandre Lehmann
Journal: Ear Hear Date: 2019 Sep/Oct Impact factor: 3.570