| Literature DB >> 34867074 |
Emilia Parada-Cabaleiro1, Anton Batliner1, Alice Baird1, Björn Schuller1,2.
Abstract
Most typically developed individuals have the ability to perceive emotions encoded in speech; yet, factors such as age or environmental conditions can restrict this inherent skill. Noise pollution and multimedia over-stimulation are common components of contemporary society, and have shown to particularly impair a child's interpersonal skills. Assessing the influence of such features on the perception of emotion over different developmental stages will advance child-related research. The presented work evaluates how background noise and emotionally connoted visual stimuli affect a child's perception of emotional speech. A total of 109 subjects from Spain and Germany (4-14 years) evaluated 20 multi-modal instances of nonsense emotional speech, under several environmental and visual conditions. A control group of 17 Spanish adults performed the same perception test. Results suggest that visual stimulation, gender, and the two sub-cultures with different language background do not influence a child's perception; yet, background noise does compromise their ability to correctly identify emotion in speech-a phenomenon that seems to decrease with age.Entities:
Keywords: Cross-cultural; Developmental age; Emotion perception; Multi-modality; Noise; Nonsense speech; Paralinguistics
Year: 2020 PMID: 34867074 PMCID: PMC8602134 DOI: 10.1007/s10772-020-09675-1
Source DB: PubMed Journal: Int J Speech Technol ISSN: 1381-2416
Distribution of the 109 children considering: age (preoperational: stage 4–6 years; concrete operational stage: 7–10 years; formal operational stage: 11–14 years), gender (male and female), and nationality (Spanish and German)
| Age | # | Spanish | German | ||
|---|---|---|---|---|---|
| Male | Female | Male | Female | ||
| Preop. | |||||
| 4 | 10 | – | 5 | 1 | 4 |
| 5 | 16 | 5 | 3 | 2 | 6 |
| 6 | 8 | 1 | 3 | 3 | 1 |
| Total | 34 | 6 | 11 | 6 | 11 |
| Concrete | |||||
| 7 | 12 | 12 | – | – | – |
| 8 | 16 | 8 | 8 | – | – |
| 9 | 24 | 14 | 10 | – | – |
| 10 | 1 | – | 1 | – | – |
| Total | 53 | 34 | 19 | – | – |
| Formal | |||||
| 11 | 13 | 4 | 9 | – | – |
| 12 | 4 | 1 | 3 | – | – |
| 13 | 1 | – | 1 | – | – |
| 14 | 4 | 3 | 1 | – | – |
| Total | 22 | 8 | 14 | – | – |
Fig. 1Spectrograms of the clean (left) and noisified (right) nonsense emotional utterances: anger, happiness, and sadness (from top to bottom), produced by a female speaker; x-axis: duration in sec.; y-axis: frequency between 0 and 8 kHz
Fig. 2The emotionally connoted drawings used as visual stimuli in the presented study: positive (left), and negative (right)
Fig. 3The emotionally expressive images of faces used in our study, representing each of the three forced-choice categorical test responses: anger, happiness, and sadness (from left to right)
Summary of variables considering: hierarchical structure (user variables at Level-1 and task variables at Level-2), type, measurement, and values
| Variable | Level | Type | Measurement | Values |
|---|---|---|---|---|
| 1—user var. | Independent | Nominal | Cross-cultural: participant 1–34 | |
| Nominal | General: participant 1–109 | |||
| Nominal | Adult/children: participant 1–34 | |||
| 1—user var. | Independent | Scale | Cross-cultural: 4–6 years | |
| Scale | General: 4–14 years | |||
| Binary | Adult/children: | |||
| 1—user var. | Independent | Binary | ||
| 1—user var. | Independent | Binary | ||
| 1—user var. | Dependent | Binary | ||
| 2—task var. | Independent | Nominal | ||
| 2—task var. | Independent | Binary | ||
| 2—task var. | Independent | Nominal | ||
| 2—task var. | Independent | Nominal |
Note that for user-id and age, different values are considered for each assessment: cross-cultural, general, and adults vs. children)
Results for the fixed effects computed in the cross-cultural assessment considering Level-1 variables
| Factor | F | df1 | df2 | exp( | 95% CI | ||||
|---|---|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||||
| 2.44 | 1 | 608 | 0.24 | 1.28 | 0.14 | .119 | 0.94 | 1.74 | |
| 0.11 | 1 | 608 | − 0.08 | 0.92 | − 0.05 | .739 | 0.58 | 1.47 | |
| 0.04 | 1 | 608 | − 0.04 | 0.96 | − 0.02 | .842 | 0.63 | 1.46 | |
A Generalized Linear Mixed Model (GLMM) was performed on the responses given by 17 Spanish and 17 German children from the preoperational stage (4–6 years), considering task-id and user-id as crossed random effects, age, gender, and nationality as fixed effects; F-statistic (F), degrees of freedom 1 (df1) and 2 (df2), Coefficient , effect sizes exp(B) and Cohen’s d, p–value, and 95% confidence intervals (CI): lower and upper, are given
Results for the fixed effects computed in the general assessment considering Level-1 variables
| Factor | F | df1 | df2 | exp( | 95% CI | ||||
|---|---|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||||
| 6.80 | 1 | 1959 | 0.07 | 1.08 | 0.04 | .009 | 1.02 | 1.14 | |
| 0.15 | 1 | 1959 | 0.06 | 1.06 | 0.03 | .695 | 0.80 | 1.41 | |
The GLMM was performed on the responses given by the 109 children, considering task-id and user-id as crossed random effects, age and gender as fixed effects; F-statistic (F), degrees of freedom 1 (df1) and 2 (df2), coefficient , effect size: exp(B) and Cohen’s d, p–value, and 95% confidence intervals (CI): lower and upper, are given
Results for the fixed effects computed in the general assessment considering Level-1 and Level-2 variables
| Factor | F | df1 | df2 | exp( | 95% CI | ||||
|---|---|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||||
| 31.28 | 1 | 1955 | 0.85 | 2.34 | 0.47 | .000 | 1.74 | 3.76 | |
| 0.01 | 1 | 1955 | 0.02 | 1.02 | 0.01 | .920 | 0.71 | 1.47 | |
| 0.24 | 1 | 1955 | − 0.09 | 0.91 | − 0.05 | .628 | 0.63 | 1.32 | |
| 5.43 | 1 | 1955 | 0.44 | 1.55 | 0.24 | .020 | 1.07 | 2.23 | |
| 2.56 | 1 | 1955 | 0.30 | 1.35 | 0.17 | .110 | 0.94 | 1.94 | |
| 3.91 | 1 | 1955 | 0.06 | 1.06 | 0.03 | .048 | 1.00 | 1.12 | |
The GLMM was performed on the responses given by the 109 children, considering task-id and user-id as crossed random effects, the Level-1 slope age as randomly varying, and the Level-2 predictors: snr, reinforcement–reinf (positive and negative w. r. t. the reference no reinforcement), emotion–emo (the negative emotions sadness and anger w. r. t. the positive emotion happiness), and the Level-1 predictor age as fixed effects; F-statistic (F), degrees of freedom 1 (df1) and 2 (df2), coefficient , effect size: exp(B) and Cohen’s d, p–value, and 95% confidence intervals (CI): lower and upper, are given
Results for the fixed effects computed in the adults versus children’s assessment considering Level-1 variables
| Factor | F | df1 | df2 | exp( | 95% CI | ||||
|---|---|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||||
| 20.52 | 1 | 609 | 1.02 | 2.80 | 0.57 | .000 | 1.79 | 4.38 | |
| 0.74 | 1 | 609 | 0.19 | 1.21 | 0.11 | .391 | 0.78 | 1.89 | |
GLMM was performed on the responses given by the 17 Spanish children from the preoperational stage (4–6 years) and the 17 Spanish adults (17–48 years), considering as crossed random effects task-id and user-id; as fixed effects age and gender; F-statistic (F), degrees of freedom 1 (df1) and 2 (df2), coefficient , effect size: exp(B) and Cohen’s d, p–value, and 95% confidence intervals (CI): lower and upper, are given
Results for the fixed effects computed in the adults versus children’s assessment considering Level-1 and Level-2 variables
| Factor | F | df1 | df2 | exp( | 95% CI | ||||
|---|---|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||||
| 14.42 | 1 | 605 | 0.90 | 2.45 | 0.49 | .000 | 1.54 | 3.90 | |
| 0.03 | 1 | 605 | 0.05 | 1.05 | 0.03 | .886 | 0.60 | 1.84 | |
| 0.00 | 1 | 605 | 0.01 | 1.01 | 0.01 | .976 | 0.57 | 1.78 | |
| 9.63 | 1 | 605 | 0.89 | 2.45 | 0.49 | .002 | 1.39 | 4.31 | |
| 7.76 | 1 | 605 | 0.79 | 2.20 | 0.43 | .006 | 1.26 | 3.84 | |
| 14.64 | 1 | 605 | 1.01 | 2.74 | 0.56 | .000 | 1.63 | 4.59 | |
GLMM was performed on the responses given by the 17 Spanish children from the preoperational stage (4–6 years) and the 17 Spanish adults (17–48 years), considering task-id and user-id crossed random effects; the Level-1 slope age (binary: child vs. adult) randomly varying; the Level-2 predictors: snr, reinforcement–reinf (positive and negative w. r. t. the reference no reinforcement), emotion–emo (the negative emotions sadness and anger w. r. t. the positive emotion happiness), and the Level-1 predictor age as fixed effects; F-statistic (F), degrees of freedom 1 (df1) and 2 (df2), coefficient , effect size: exp(B) and Cohen’s d, p–value, and 95% confidence intervals (CI): lower and upper, are given
Fig. 4Percentage of wrong (✗) and correct (✓) responses in the identification of the emotional speech in both snr: noisy (N) and clean (C), by Spanish and German children of the Preoperational Stage (4–6 years)
Fig. 5Percentage of wrong (✗) and correct (✓) responses in the identification of the emotional speech in both snr (noisy and clean), by children of 4, 7, 9, 11, and 14 years—notice that for children of 4 years both Spanish and German children are considered together
Confusion matrix for the percentage of accuracy in the perception of each emotion (emo): anger (ang), happiness (hap), and sadness (sad); by children of 4, 7, 9, 11, and 14 years (cf. Table 1) and adults (17–48 years); in both snr (noisy and clean)
In each row, the reference is given (emotions indicated in bold); in each column, ‘identified as’ is given (emotions indicated in italics). Darker shadowing represents higher levels of accuracy; Unweighted Average Recall (UAR) and number of responses encoded in each row (#) are given as well