| Literature DB >> 32341387 |
Beáta Korcsok1, Tamás Faragó2, Bence Ferdinandy3, Ádám Miklósi2,3, Péter Korondi4, Márta Gácsi2,3.
Abstract
Emotionally expressive non-verbal vocalizations can play a major role in human-robot interactions. Humans can assess the intensity and emotional valence of animal vocalizations based on simple acoustic features such as call length and fundamental frequency. These simple encoding rules are suggested to be general across terrestrial vertebrates. To test the degree of this generalizability, our aim was to synthesize a set of artificial sounds by systematically changing the call length and fundamental frequency, and examine how emotional valence and intensity is attributed to them by humans. Based on sine wave sounds, we generated sound samples in seven categories by increasing complexity via incorporating different characteristics of animal vocalizations. We used an online questionnaire to measure the perceived emotional valence and intensity of the sounds in a two-dimensional model of emotions. The results show that sounds with low fundamental frequency and shorter call lengths were considered to have a more positive valence, and samples with high fundamental frequency were rated as more intense across all categories, regardless of the sound complexity. We conclude that applying the basic rules of vocal emotion encoding can be a good starting point for the development of novel non-verbal vocalizations for artificial agents.Entities:
Mesh:
Year: 2020 PMID: 32341387 PMCID: PMC7184580 DOI: 10.1038/s41598-020-63504-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The categories of the artificial sounds across three levels of complexity. In each category the basis of the sound (sine wave or pulse train) is followed by the changed parameters in parenthesis. Level 1 category 1: Simple sine wave; Level 2 category 2: Pulse train; category 3: Sine wave sounds with pitch contour down; category 4: Sine wave sounds with pitch contour up; category 5: Variable sine wave; Level 3 category 6: Complex pulse train sounds with pitch contour down; category 7: Complex pulse train sounds with pitch contour up.
Parameters of sound samples.
| Parameters | Value or Range (across all samples) | Variance in categories 1.sin; 2.pulse; 3.pitch_d; 4.pitch_u | Variance in categories 5.var_sin; 6.comp_d; 7.comp_u | Reference |
|---|---|---|---|---|
| Fundamental frequency ( | 65 Hz − 1365 Hz | uniformly distributed random value, ±5% of | ~50–1600 Hz[ | |
| Total length (call length + interval length) | ~2 s (+ silence until 3 s total duration) | 2 s[ | ||
| Call length | 0.07; 0.16; 0.46; 0.76; 1.06; 1.96 s | uniformly distributed random value, ±25% of call length | 0.11–2 s[ | |
| Intercall interval length | 0.2 s | uniformly distributed random value, ±25% of interval length | uniformly distributed random value, ±50% of interval length | 0.05–1.7 s[ |
| Pitch contour change in categories 3.pitch_d; 4.pitch_u; 6.comp_d; 7.comp_u | uniformly distributed random value, ±10% of | [ | ||
| Vocal tract length in categories 6.comp_d; 7.comp_u | 20 cm | Modelling medium sized dog[ | ||
| Number of formants in categories 6.comp_d; 7.comp_u | 10 | |||
| First formant (f1) in categories 6.comp_d; 7.comp_u | 550 Hz |
Categories: 1.sin: Simple sine wave; 2.pulse: Pulse train; 3.pitch_d: Sine wave sounds with pitch contour down; 4.pitch_u: Sine wave sounds with pitch contour up; 5.var_sin: Variable sine wave; 6.comp_d: Complex pulse train sounds with pitch contour down; 7.comp_u: Complex pulse train sounds with pitch contour up. More variance was implemented in the sounds of categories 5.var_sin, 6.com_d and 7.comp_u, than in the other categories. Pitch contour changes were only present in categories 3.pitch_d, 4.pitch_u, 6.comp_d and 7.comp_u, and formants were only modelled in the categories 6.comp_d and 7.comp_u.
Figure 2The intensity (scale from 0 to 100) and valence (scale from −50 to 50) axes of the questionnaire. Image first published in[25].
The linear mixed models used for statistical analysis. Cat: category, f0: fundamental frequency, cl: call length, age: age of the participant, lang: language of the query (English or Hungarian), dog: participants’ dog ownership status, gender: gender of the participant, loud: loudness of sound samples, testid: participant ID, soundid: ID of the sound samples.
| fixed effects | random effects | ||
|---|---|---|---|
| Intensity | intensity ~ cat + f0 + cl + age + lang + dog + gender + cat:f0 + cat:cl + lang:f0 + lang:cl + cat:lang + loud + cat:loud + (1|testid) + (1|soundid) | cat, f0, cl, age, lang, dog, gender, loud | testid, soundid |
| Valence | valence ~ cat + f0 + cl + age + lang + dog + gender + cat:f0 + cat:cl + lang:f0 + lang:cl + cat:lang + loud + cat:loud + (1|testid) + (1|soundid) | cat, f0, cl, age, lang, dog, gender, loud | testid, soundid |
Comparison of predicted and actual ratings of valence and intensity. Predictive models are based on a Linear Mixed Effects Model of category 1 (Simple sine wave) sounds. 1.sin: Simple sine wave; 2.pulse: Pulse train; 3.pitch_d: Sine wave sounds with pitch contour down; 4.pitch_u: Sine wave sounds with pitch contour up; 5.var_sin: Variable sine wave; 6.comp_d: Complex pulse train sounds with pitch contour down; 7.comp_u: Complex pulse train sounds with pitch contour up, f0: fundamental frequency, cl: call length.
| Intensity | Valence | |||||||
|---|---|---|---|---|---|---|---|---|
| Predictive model (based on 1.sin) | Est. | t | p | Est. | t | p | ||
| Intercept | 41.0812 | 31.8 | <2.2e-16 | Intercept | −8.2916 | −7.121 | <0.001 | |
| f0 | 9.5927 | 16.2 | <2.2e-16 | f0 | −5.4035 | −7.945 | <0.001 | |
| cl | −3.2561 | −4.777 | <0.001 | |||||
| r | df | p value | t | r | df | p value | t | |
| 1.sin | 0.71 | 1670 | <2.2e-16 | 41.268 | 0.70 | 1670 | <2.2e-16 | 40.088 |
| 2.pulse | 0.49 | 1690 | <2.2e-16 | 23.345 | 0.46 | 1690 | <2.2e-16 | 21.223 |
| 3.pitch_d | 0.55 | 1672 | <2.2e-16 | 26.595 | 0.46 | 1672 | <2.2e-16 | 21.203 |
| 4.pitch_u | 0.53 | 1657 | <2.2e-16 | 25.225 | 0.53 | 1657 | <2.2e-16 | 25.191 |
| 5.var_sin | 0.60 | 1681 | <2.2e-16 | 30.587 | 0.58 | 1681 | <2.2e-16 | 29.015 |
| 6.comp_d | 0.53 | 1675 | <2.2e-16 | 25.465 | 0.47 | 1675 | <2.2e-16 | 21.986 |
| 7.comp_u | 0.50 | 1685 | <2.2e-16 | 23.493 | 0.48 | 1685 | <2.2e-16 | 22.491 |
Figure 3(a) The interaction of f0 and sound category on the ratings of intensity. Colouring of the dots shows the call length. (b) The interaction of call length and sound category on the ratings of intensity. Colouring of the dots shows the fundamental frequency. (c) The effect of the participants’ age on the ratings of intensity. Categories in (a) and (b): 1.sin: Simple sine wave; 2.pulse: Pulse train; 3.pitch_d: Sine wave sounds with pitch contour down; 4.pitch_u: Sine wave sounds with pitch contour up; 5.var_sin: Variable sine wave; 6.comp_d: Complex pulse train sounds with pitch contour down; 7.comp_u: Complex pulse train sounds with pitch contour up. The dots represent the mean intensity ratings of the sounds, while the grey shaded area around the regression line indicates the confidence interval at 95% confidence level.
Results of the Linear Mixed Model fit of the intensity ratings. Pr(>F): the p-value belonging to the F statistics. Cat: category, f0: fundamental frequency, cl: call length, age: age of the participant, lang: language of the query, loud: loudness of sound samples.
| Sum Sq | Mean Sq | NumDF | DenDF | F value | Pr(>F) | ||
|---|---|---|---|---|---|---|---|
| age | 2274 | 2274.3 | 1 | 226.8 | 4.7418 | 0.030470 | * |
| cat:f0 | 47594 | 7932.3 | 6 | 545.7 | 16.5385 | <2.2e-16 | *** |
| cat:cl | 10459 | 1743.2 | 6 | 547.2 | 3.6345 | 0.001515 | ** |
| f0:lang | 4282 | 4282.1 | 1 | 11444.2 | 8.9279 | 0.002814 | ** |
| cl:lang | 29122 | 29122.1 | 1 | 11447.5 | 60.7184 | 7.154e-15 | *** |
| cat:lang | 26478 | 4413.0 | 6 | 11436.8 | 9.2009 | 4.462e-10 | *** |
| cat:loud | 20891 | 3481.9 | 6 | 558.9 | 7.2595 | 1.763e-07 | *** |
Figure 4(a) The effect of fundamental frequency on the ratings of valence. Colouring of the dots shows the call length. (b) The interaction of call length and sound category on the ratings of valence. Colouring of the dots shows the fundamental frequency. (c) The effect of the participants’ age on the ratings of valence. Categories in (b): 1.sin: Simple sine wave; 2.pulse: Pulse train; 3.pitch_d: Sine wave sounds with pitch contour down; 4.pitch_u: Sine wave sounds with pitch contour up; 5.var_sin: Variable sine wave; 6.comp_d: Complex pulse train sounds with pitch contour down; 7.comp_u: Complex pulse train sounds with pitch contour up. The dots represent the mean valence ratings of the sounds, while the grey shaded area around the regression line indicates the confidence interval at 95% confidence level.
Results of the Linear Mixed Model fit of the valence ratings. Pr(>F): the p-value belonging to the F statistics. Cat: category, f0: fundamental frequency, cl: call length, age: age of the participant, lang: language of the query, loud: loudness of sound samples.
| Sum Sq | Mean Sq | NumDF | DenDF | F value | Pr(>F) | ||
|---|---|---|---|---|---|---|---|
| f0 | 39154 | 39154 | 1 | 567.0 | 102.0804 | <2.2e-16 | *** |
| age | 3531 | 3531 | 1 | 220.0 | 9.2069 | 0.002701 | ** |
| cat:cl | 15431 | 2572 | 6 | 569.3 | 6.7053 | 7.127e-07 | *** |
| cat:lang | 21056 | 3509 | 6 | 11348.5 | 9.1494 | 5.150e-10 | *** |
| cat:loud | 31721 | 5287 | 6 | 574.7 | 13.7834 | 1.155e-14 | *** |