| Literature DB >> 29497221 |
Andrey Anikin1, Rasmus Bååth1, Tomas Persson1.
Abstract
Recent research on human nonverbal vocalizations has led to considerable progress in our understanding of vocal communication of emotion. However, in contrast to studies of animal vocalizations, this research has focused mainly on the emotional interpretation of such signals. The repertoire of human nonverbal vocalizations as acoustic types, and the mapping between acoustic and emotional categories, thus remain underexplored. In a cross-linguistic naming task (Experiment 1), verbal categorization of 132 authentic (non-acted) human vocalizations by English-, Swedish- and Russian-speaking participants revealed the same major acoustic types: laugh, cry, scream, moan, and possibly roar and sigh. The association between call type and perceived emotion was systematic but non-redundant: listeners associated every call type with a limited, but in some cases relatively wide, range of emotions. The speed and consistency of naming the call type predicted the speed and consistency of inferring the caller's emotion, suggesting that acoustic and emotional categorizations are closely related. However, participants preferred to name the call type before naming the emotion. Furthermore, nonverbal categorization of the same stimuli in a triad classification task (Experiment 2) was more compatible with classification by call type than by emotion, indicating the former's greater perceptual salience. These results suggest that acoustic categorization may precede attribution of emotion, highlighting the need to distinguish between the overt form of nonverbal signals and their interpretation by the perceiver. Both within- and between-call acoustic variation can then be modeled explicitly, bringing research on human nonverbal vocalizations more in line with the work on animal communication.Entities:
Keywords: Cross-linguistic naming study; Emotion; Non-linguistic vocalizations; Semantic spaces; Triad classification task
Year: 2017 PMID: 29497221 PMCID: PMC5816134 DOI: 10.1007/s10919-017-0267-y
Source DB: PubMed Journal: J Nonverbal Behav ISSN: 0191-5886
Fig. 3Co-occurrence of sound names (vertical) and emotion names (horizontal) in English, Swedish, and Russian. Each cell shows the number of times a participant chose a particular sound name and emotion name to describe the same sound. The labels are ordered by hierarchical clustering, so that similar terms are placed close together, resulting in the dendrograms shown next to each table
Variables used to construct the acoustic space and their weights optimized for maximum correlation between acoustic and semantic spaces
| Variable | Interpretation | Weight | Loadings | |
|---|---|---|---|---|
| PC1 | PC2 | |||
| Amplitude, median | Median root square amplitude (loudness) | 0.61 | 0.13 | −0.17 |
| Proportion of voiced frames | How much of the sound is voiced | 1.28 | 0.24 | −0.41 |
| Pitch, median | Fundamental frequency or perceived pitch (manually checked) | 1.89 | 0.75 | 0.19 |
| Pitch, SD | 0.79 | 0.21 | 0.1 | |
| First quartile, median | First quartile of spectral energy distribution | 1.42 | 0.53 | 0.04 |
| First quartile, SD | 0.74 | 0.15 | 0.16 | |
| Spectral entropy, SD | SD of the entropy of spectral energy distribution | 0.9 | 0.03 | 0.16 |
| Interburst interval, median | Time between vocal bursts (amplitude peaks) | 0.58 | 0.01 | −0.03 |
| Interburst interval, SD | 1.73 | 0.04 | −0.07 | |
| Number of bursts | Total number of amplitude peaks per sound | 1.61 | −0.11 | 0.81 |
| Syllable length, median | Length of continuous vocal segments | 0.77 | 0 | −0.16 |
| Syllable length, SD | 0.48 | 0.04 | −0.08 | |
Pearson’s correlations between the distance matrices in Experiments 1 and 2
| Acoustic analysis | Sound names EN | Sound names SV | Sound names RU | Sound names EN + SV + RU | Emotion names EN | Emotion names SV | Emotion names RU | Emotion names EN + SV + RU | Triad classif. EN | Triad classif. SV | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sound names EN | 0.47 | ||||||||||
| Sound names SV | 0.46 | 0.85 | |||||||||
| Sound names RU | 0.48 | 0.8 | 0.8 | ||||||||
|
|
| 0.94 | 0.95 | 0.92 | |||||||
| Emotion names EN | 0.34 | 0.68 | 0.65 | 0.62 | 0.69 | ||||||
| Emotion names SV | 0.33 | 0.65 | 0.64 | 0.58 | 0.67 | 0.87 | |||||
| Emotion names RU | 0.43 | 0.69 | 0.68 | 0.73 | 0.74 | 0.8 | 0.77 | ||||
|
|
| 0.72 | 0.7 | 0.68 |
| 0.95 | 0.95 | 0.91 | |||
| Triad classif. EN | 0.5 | 0.62 | 0.68 | 0.63 | 0.69 | 0.45 | 0.46 | 0.52 | 0.5 | ||
| Triad classif. SV | 0.48 | 0.67 | 0.71 | 0.67 | 0.73 | 0.52 | 0.52 | 0.55 | 0.57 | 0.75 | |
|
|
| 0.69 | 0.74 | 0.69 |
| 0.51 | 0.51 | 0.57 |
| 0.96 | 0.91 |
Bold values indicate the most relevant information
Acoustic analysis = distance matrix based on 12 acoustic variables with weights optimized for maximum correlation with each target distance matrix (see Figure S2), thus producing the highest achievable correlation
Sound names EN/SV/RU: distance matrix based on English/Swedish/Russian sound names
Sound names EN + SV + RU: averaged distance matrix for sound names in all three languages
Emotion names EN/SV/RU: distance matrix based on English/Swedish/Russian emotion names
Emotion names EN + SV + RU: averaged distance matrix for emotion names in all three languages
Triad classif. EN/SV: distance matrix based on the triad classification task in the English/Swedish sample
Fig. 2Acoustic models for classifying vocalizations based on sound names chosen by English, Swedish, and Russian participants. Shaded areas show the acoustic class predicted by a multinomial regression model using two first principal components of 12 acoustic features (see Table 1). Small labels show the position of individual stimuli and their call type
Fig. 1Top panel: semantic space representing naming distinctions in English, Swedish, and Russian. Text labels are positioned in prototypicality-adjusted cluster centroids. Bottom panel: cladograms of the major call types. Affinity propagation clustering with q selected manually (0.5 for English, 0.3 for Swedish, 0.45 for Russian). DK don’t know
Fig. 4The probability of expressing different levels of certainty in the chosen sound names and emotion names for all three language groups combined. Median of the posterior distribution and 95% CI
Fig. 5Model fit as a function of its dimensionality for the triad classification task. Shown: Pearson’s correlation with distance matrices from Experiment 1 based on naming the sound or emotion and normalized negative WAIC (larger is better)
Fig. 6Three-dimensional configuration of stimuli in the triad task. Each sound is labeled and color-coded by its most common name in Experiment 1 (cf. Fig. 1). Prototypicality-adjusted centroids are shown in large bold font. Origin lies in the center of gravity of each cloud. DK don’t know