| Literature DB >> 29497220 |
Patrik N Juslin1, Petri Laukka1,2, Tanja Bänziger1.
Abstract
It has been the subject of much debate in the study of vocal expression of emotions whether posed expressions (e.g., actor portrayals) are different from spontaneous expressions. In the present investigation, we assembled a new database consisting of 1877 voice clips from 23 datasets, and used it to systematically compare spontaneous and posed expressions across 3 experiments. Results showed that (a) spontaneous expressions were generally rated as more genuinely emotional than were posed expressions, even when controlling for differences in emotion intensity, (b) there were differences between the two stimulus types with regard to their acoustic characteristics, and (c) spontaneous expressions with a high emotion intensity conveyed discrete emotions to listeners to a similar degree as has previously been found for posed expressions, supporting a dose-response relationship between intensity of expression and discreteness in perceived emotions. Our conclusion is that there are reliable differences between spontaneous and posed expressions, though not necessarily in the ways commonly assumed. Implications for emotion theories and the use of emotion portrayals in studies of vocal expression are discussed.Entities:
Keywords: Communication; Emotion; Expression; Nonverbal; Voice
Year: 2017 PMID: 29497220 PMCID: PMC5816122 DOI: 10.1007/s10919-017-0268-x
Source DB: PubMed Journal: J Nonverbal Behav ISSN: 0191-5886
Fig. 1Plutchik’s ‘cone model’ of emotions
From Plutchik (1994)
Descriptive statistics (mean, standard deviation, and 95% confidence intervals) for intensity, valence, verbal cues, and sound quality of selected clips in Study 1
| Intensity | Valence | Verbal cues | Sound quality | |
|---|---|---|---|---|
| Low intensity | ||||
| Posed | 1.59 (0.28) | − 0.71 (0.70) | 0.73 (1.00) | 2.75 (0.60) |
| [1.46, 1.72] | [− 1.03, − 0.38] | [0.26, 1.20] | [2.47, 3.03] | |
| Spontaneous | 1.48 (0.26) | − 0.31 (0.78) | 0.65 (0.85) | 2.95 (0.56) |
| [1.36, 1.61] | [− 0.67, − 0.06] | [0.25, 1.05] | [2.69, 3.21] | |
| Medium intensity | ||||
| Posed | 2.38 (0.23) | − 0.67 (0.85) | 0.46 (0.72) | 2.87 (0.45) |
| [2.27, 2.48] | [− 1.06, − 0.27] | [0.12, 0.79] | [2.66, 3.08] | |
| Spontaneous | 2.44 (0.25) | − 0.56 (1.11) | 1.29 (0.89) | 2.76 (0.56) |
| [2.32, 2.55] | [− 1.08, − 0.04] | [0.87, 1.71] | [2.50, 3.03] | |
| High intensity | ||||
| Posed | 3.32 (0.22) | − 1.15 (1.28) | 0.48 (0.45) | 3.31 (0.27) |
| [3.21, 3.44] | [− 1.81, − 0.49] | [0.25, 0.71] | [3.18, 3.45] | |
| Spontaneous | 3.36 (0.22) | − 1.63 (0.61) | 1.52 (1.06) | 2.50 (0.44) |
| [3.26, 3.46] | [− 1.91, − 1.35] | [1.03, 2.02] | [2.29, 2.71] | |
| Grand mean | 2.40 (0.77) | − 0.83 (0.99) | 0.87 (0.94) | 2.85 (0.54) |
| [2.26, 2.55] | [− 1.01, − 0.65] | [0.69, 1.04] | [2.75, 2.94] | |
Fig. 2Means and 95% confidence intervals of listeners’ ratings (0–4) of the extent to which it sounds as if the speaker is experiencing a genuine emotion, for spontaneous and posed clips, respectively, as a function of emotion intensity level in Study 1
Summary of selected acoustic cues in Study 2
| Feature type | Description | Factor loading |
|---|---|---|
|
| ||
| F0M | Mean fundamental frequency (F0) on a semitone frequency scale | Factor 2: 0.94 |
| F0PercRange | Range of the 20th to the 80th percentile of F0 | Factor 6: 0.92 |
| F0SlopeRise | Mean slope of signal parts with rising F0 | Factor 8: 0.89 |
| F0SlopeFall | Mean slope of signal parts with falling F0 | Factor 9: 0.84 |
| F1 M | Mean frequency of the first formant (F1) | Factor 11: 0.75 |
| F1Bandwidth | Mean bandwidth of the first formant (F1) | Factor 3: − 0.86 |
| Jitter | Average deviation of individual consecutive F0 period lengths | (Factor 6: 0.64) |
|
| ||
| IntPercRange | Range of the 20th to the 80th percentile of voice intensity | Factor 5: 0.90 |
|
| ||
| Alpha Ratio | Ratio of the summed energy from 50 to 1000 Hz and 1000–5000 Hz | Factor 4: 0.73 |
| H1-A3 | Ratio of energy of the first F0 harmonic and the highest harmonic in the third formant range | Factor 13: − 0.71 |
|
| ||
| VoicedSegPerSec | The number of continuous voiced regions per second (pseudo syllable rate) | (Factor 7: 0.61) |
| VoicedSegM | Mean length of continuously voiced regions | Factor 7: − 0.86 |
| UnvoicedSegM | Mean length of unvoiced regions (approximating pause duration) | Factor 1: − 0.88 |
For a more comprehensive description of the acoustic cues, including algorithms used, see Eyben et al. (2013) and Eyben et al. (2016)
Robust analysis of acoustic cue variability in low and medium intensity posed and spontaneous vocal expressions in Study 2
| Acoustic cue | Emotion | Stimulus type | Interaction | df1a, b | df2 | Trends for the main effects of stimulus type and emotion | |||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||||
|
| |||||||||
| F0 M |
|
|
|
| 1.86 | .149 | 2.65 | 77.99 | P > S; Hap, Fea, Ang > Sad |
| F0PercRange |
|
| 1.35 | .999 |
|
| 2.70 | 79.11 | Hap, Ang > Sad |
| F0SlopeRise | 1.90 | .139 | 0.34 | .562 | 0.47 | .688 | 2.80 | 84.41 | na |
| F0SlopeFall | 1.55 | .217 | 1.38 | .245 | 1.52 | .223 | 2.52 | 54.61 | na |
| F1 M | 2.50 | .077 | 0.08 | .775 | 1.59 | .206 | 2.50 | 71.43 | na |
| F1 Bandwidth |
|
| 0.56 | .457 | 0.46 | .653 | 2.17 | 41.11 | Sad > Ang, Hap, Fea |
| Jitter | 2.01 | .123 | 0.38 | .539 | 1.27 | .290 | 2.77 | 86.02 | na |
| IntPercRange |
|
|
|
|
|
| 2.62 | 66.87 | P > S; Hap > Sad |
| Alpha Ratio |
|
| 2.74 | .101 | 1.51 | .219 | 2.79 | 100.28 | Ang > Sad |
| H1-A3 |
|
|
|
| 0.62 | .592 | 2.73 | 72.95 | P > S; Sad > Ang, Hap |
| VoicedSegPerSec |
|
| 0.39 | .533 |
|
| 2.62 | 59.75 | Ang > Sad, Fea |
| VoicedSegM | 2.67 | .059 |
|
|
|
| 2.73 | 70.97 | P > S; na |
| UnvoicedSegM |
|
| 0.11 | .736 |
|
| 2.55 | 58.08 | Sad > Ang |
Significant effects from ANOVA-type analyses are marked in bold. Multiple comparisons assessing main trends for emotion were conducted using robust rank-based Tukey-type nonparametric contrasts (ps < .05)
F* = the ANOVA-type statistic, P = posed clips, S = spontaneous clips, Ang = anger, Fea = fear, Hap = happiness, Sad = sadness. Abbreviations of acoustic cues are explained in Table 2
adf1 = 1 for the main effect of stimulus type
bReported df-values are Box-corrected, see Brunner et al. (1997)
Descriptive statistics for acoustic cues in low intensity voice clips of Study 2
| Acoustic cue | Statistic | Anger | Fear | Happiness | Sadness | ||||
|---|---|---|---|---|---|---|---|---|---|
| P | S | P | S | P | S | P | S | ||
| F0M |
| 0.49 | 0.51 | 0.61 | 0.52 | 0.70 | 0.48 | 0.38 | 0.29 |
|
| − 0.30 | − 0.24 | 0.11 | − 0.31 | 0.36 | − 0.38 | − 0.68 | − 0.99 | |
|
| 0.79 | 0.82 | 1.01 | 1.07 | 0.65 | 0.82 | 0.92 | 0.80 | |
| F0PercRange |
| 0.58 | 0.58 | 0.44 | 0.55 |
|
| 0.31 | 0.40 |
|
| − 0.09 | − 0.01 | − 0.42 | − 0.06 | 0.35 | − 0.37 | − 0.78 | − 0.40 | |
|
| 0.64 | 0.92 | 0.79 | 1.25 | 0.95 | 0.98 | 0.69 | 1.08 | |
| F0SlopeRise |
| 0.55 | 0.54 | 0.55 | 0.52 | 0.46 | 0.55 | 0.40 | 0.44 |
|
| 0.19 | 0.04 | − 0.06 | 0.00 | − 0.07 | 0.04 | − 0.30 | − 0.28 | |
|
| 1.11 | 0.94 | 0.91 | 1.06 | 0.90 | 0.89 | 1.03 | 0.86 | |
| F0SlopeFall |
| 0.51 | 0.59 | 0.45 | 0.57 | 0.56 | 0.45 | 0.37 | 0.47 |
|
| − 0.09 | 0.29 | − 0.23 | 0.01 | 0.06 | − 0.41 | − 0.51 | − 0.10 | |
|
| 0.94 | 1.10 | 0.86 | 1.04 | 0.81 | 1.18 | 0.84 | 0.96 | |
| F1M |
| 0.44 | 0.55 | 0.45 | 0.36 | 0.51 | 0.59 | 0.50 | 0.44 |
|
| − 0.35 | − 0.01 | − 0.36 | − 0.65 | − 0.16 | − 0.02 | − 0.28 | − 0.41 | |
|
| 0.82 | 1.03 | 0.72 | 0.89 | 0.92 | 0.91 | 0.57 | 0.70 | |
| F1 Bandwidth |
| 0.44 | 0.51 | 0.40 | 0.35 | 0.43 | 0.49 | 0.60 | 0.65 |
|
| − 0.20 | 0.10 | − 0.31 | − 0.51 | − 0.20 | 0.00 | 0.30 | 0.64 | |
|
| 0.55 | 1.09 | 0.75 | 1.19 | 0.92 | 1.49 | 0.90 | 0.99 | |
| Jitter |
| 0.56 | 0.53 | 0.50 | 0.53 | 0.57 | 0.52 | 0.35 | 0.50 |
|
| − 0.02 | 0.02 | − 0.15 | 0.06 | 0.20 | 0.01 | − 0.52 | − 0.13 | |
|
| 0.83 | 0.99 | 0.93 | 1.24 | 1.01 | 1.06 | 0.96 | 1.04 | |
| IntPercRange |
| 0.43 | 0.52 |
|
|
|
| 0.49 | 0.41 |
|
| − 0.38 | − 0.17 | − 0.14 | − 0.92 | 0.47 | − 0.19 | − 0.36 | − 0.54 | |
|
| 0.97 | 0.93 | 1.03 | 0.86 | 1.16 | 1.15 | 0.55 | 0.69 | |
| Alpha Ratio |
| 0.53 | 0.50 | 0.54 | 0.66 | 0.42 | 0.59 | 0.45 | 0.45 |
|
| − 0.11 | − 0.08 | − 0.23 | 0.39 | − 0.56 | 0.26 | − 0.57 | − 0.24 | |
|
| 0.91 | 0.88 | 1.14 | 1.09 | 1.27 | 0.81 | 0.95 | 0.69 | |
| H1-A3 |
| 0.52 | 0.42 | 0.51 | 0.51 | 0.45 | 0.35 | 0.70 | 0.55 |
|
| 0.36 | 0.08 | 0.32 | 0.23 | 0.06 | − 0.35 | 0.90 | 0.44 | |
|
| 0.86 | 0.88 | 0.90 | 0.95 | 0.87 | 1.03 | 0.93 | 0.90 | |
| VoicedSegPerSec |
|
|
| 0.43 | 0.30 | 0.46 | 0.56 | 0.48 | 0.41 |
|
| − 0.29 | 0.29 | − 0.36 | − 0.82 | − 0.23 | 0.03 | − 0.22 | − 0.34 | |
|
| 0.99 | 0.88 | 0.80 | 0.86 | 0.87 | 1.06 | 0.92 | 1.00 | |
| VoicedSegM |
| 0.57 | 0.46 |
|
|
|
| 0.47 | 0.52 |
|
| 0.02 | − 0.19 | − 0.14 | − 0.92 | 0.37 | − 0.29 | − 0.24 | 0.02 | |
|
| 0.90 | 0.79 | 0.74 | 1.05 | 1.00 | 1.03 | 1.10 | 0.99 | |
| UnvoicedSegM |
| 0.53 | 0.37 | 0.54 | 0.71 | 0.46 | 0.45 | 0.59 | 0.64 |
|
| 0.13 | − 0.06 | 0.30 | 1.05 | 0.19 | 0.15 | 0.60 | 0.60 | |
|
| 0.87 | 0.94 | 0.93 | 1.48 | 1.05 | 1.28 | 0.87 | 1.23 | |
Q = relative effects, P = posed clips, S = spontaneous clips. Significant differences in values between posed and spontaneous clips (posthoc pairwise Bonferroni corrected Brunner Munzel tests, p < .05) are marked in bold type. Cue abbreviations are explained in Table 2
Descriptive statistics for acoustic cues in medium intensity voice clips of Study 2
| Acoustic cue | Statistic | Anger | Happiness | Sadness | |||
|---|---|---|---|---|---|---|---|
| P | S | P | S | P | S | ||
| F0M |
| 0.54 | 0.54 | 0.65 | 0.48 | 0.37 | 0.20 |
|
| 0.65 | 0.98 | 0.06 | 0.67 | 0.40 | − 0.46 | |
|
| 0.90 | 0.81 | 1.23 | 0.69 | 0.80 | 0.68 | |
| F0PercRange |
| 0.55 | 0.44 | 0.54 | 0.45 | 0.39 | 0.55 |
|
| 0.57 | 0.49 | 0.13 | 0.18 | 0.23 | 0.51 | |
|
| 1.01 | 0.99 | 1.47 | 0.81 | 0.74 | 1.00 | |
| F0SlopeRise |
| 0.57 | 0.48 |
|
| 0.56 | 0.60 |
|
| 0.29 | − 0.44 | 0.23 | 0.12 | 0.41 | 0.53 | |
|
| 1.24 | 0.72 | 1.10 | 1.08 | 1.06 | 1.01 | |
| F0SlopeFall |
| 0.45 | 0.57 | 0.44 | 0.56 | 0.29 | 0.65 |
|
| 0.06 | 0.02 | − 0.61 | 0.32 | 0.47 | 0.47 | |
|
| 0.72 | 1.19 | 0.84 | 0.93 | 1.06 | 0.90 | |
| F1M |
| 0.56 | 0.57 | 0.59 | 0.40 | 0.39 | 0.26 |
|
| 0.64 | 0.66 | − 0.09 | 0.68 | 0.05 | − 0.52 | |
|
| 1.20 | 0.94 | 0.80 | 1.10 | 1.15 | 0.80 | |
| F1Bandwidth |
| 0.46 | 0.58 | 0.51 | 0.36 | 0.55 | 0.46 |
|
| − 0.15 | − 0.06 | 0.09 | 0.18 | − 0.55 | − 0.17 | |
|
| 0.81 | 1.08 | 0.69 | 1.09 | 0.92 | 0.62 | |
| Jitter |
|
|
| 0.43 | 0.56 |
|
|
|
| 0.33 | − 0.04 | − 0.25 | − 0.15 | 0.48 | 0.83 | |
|
| 0.87 | 1.00 | 1.24 | 0.76 | 1.01 | 1.02 | |
| IntPercRange |
| 0.59 | 0.53 | 0.54 | 0.43 |
|
|
|
| 0.49 | 0.50 | − 0.42 | 0.60 | 0.27 | 0.27 | |
|
| 0.93 | 0.93 | 0.69 | 0.98 | 1.06 | 0.71 | |
| Alpha Ratio |
| 0.46 | 0.58 | 0.54 | 0.44 | 0.45 | 0.43 |
|
| 0.07 | 0.53 | − 0.04 | 0.64 | 0.29 | 0.20 | |
|
| 1.15 | 0.83 | 1.14 | 0.69 | 0.65 | 0.74 | |
| H1-A3 |
| 0.51 | 0.48 | 0.44 | 0.52 | 0.54 | 0.58 |
|
| − 0.39 | − 0.68 | − 0.37 | − 0.55 | − 0.38 | − 0.19 | |
|
| 0.91 | 0.93 | 0.77 | 0.94 | 0.77 | 0.94 | |
| VoicedSegPerSec |
| 0.54 | 0.54 | 0.52 | 0.55 | 0.31 | 0.43 |
|
| 0.42 | 0.14 | − 0.44 | 0.45 | 0.26 | − 0.02 | |
|
| 1.19 | 1.01 | 0.43 | 1.11 | 1.01 | 0.86 | |
| VoicedSegM |
| 0.48 | 0.49 | 0.51 | 0.52 | 0.44 | 0.57 |
|
| − 0.01 | 0.29 | − 0.03 | 0.22 | 0.27 | 0.52 | |
|
| 1.02 | 1.14 | 0.93 | 1.04 | 1.17 | 0.96 | |
| UnvoicedSegM |
| 0.50 | 0.45 | 0.46 | 0.42 | 0.76 | 0.54 |
|
| − 0.42 | − 0.45 | 0.40 | − 0.45 | − 0.59 | − 0.30 | |
|
| 0.70 | 0.88 | 1.05 | 0.72 | 0.57 | 0.79 | |
Q = relative effects, P = posed clips, S = spontaneous clips. Significant differences in values between posed and spontaneous clips (posthoc pairwise Bonferroni corrected Brunner Munzel tests, p < .05) are marked in bold type. Cue abbreviations are explained in Table 2
Fig. 3Box-and-whisker diagrams for all significant Stimulus-type x Emotion interactions in Study 2, for low and medium emotion intensity, respectively. P = posed clips, S = spontaneous clips, Ang = anger, Fea = fear, Hap = happiness, Sad = sadness. Values indicate z-scores. Cue abbreviations are explained in Table 2
Frequency distribution of perceived emotion categories in spontaneous vocal expressions with different levels of emotion intensity in Study 3
| Low | Medium | High | |
|---|---|---|---|
| Anger | 0.11 | 0.25 | 0.45 |
| Fear | 0.06 | 0.08 | 0.11 |
| Sadness | 0.10 | 0.15 | 0.24 |
| Happiness | 0.08 | 0.16 | 0.03 |
| Disgust | 0.05 | 0.06 | 0.11 |
| Surprise | 0.08 | 0.07 | 0.02 |
| Boredom | 0.16 | 0.08 | 0.01 |
| Contentment | 0.24 | 0.07 | 0.01 |
| Other | 0.14 | 0.08 | 0.03 |
| Total | 1.00 | 1.00 | 1.00 |
Summary of the 23 datasets included in the database
| Database | Description of content | Language | Initial content | Number of selected files | Type |
|---|---|---|---|---|---|
| Berlin (Burkhardt et al. | Portrayals of 6 emotions (anger, boredom, disgust, fear, joy, sadness) by 10 actors. 10 standard content sentences (same for all emotions) | German | 535 audio files | 119 (randomly selected) | Acted (professional) |
| eNTERFACE’05 (Martin et al. | Portrayals of 6 emotions (anger, disgust, fear, happiness, sadness, surprise) by 42 actors. 5 standard content sentences (different for each emotion) | English (non-native speakers) | 1293 video files | 89 (randomly selected) | Acted (non-professional) |
| GEMEP (Bänziger et al. | Portrayals of 18 emotions (admiration, amusement, anxiety, cold anger, contempt, despair, disgust, elation, hot anger, interest, panic fear, pleasure, pride, relief, sadness, shame, surprise, tenderness) by 10 actors. Free speech content (different for each portrayal) | French | 1463 video files | 83 (randomly selected) | Acted (professional) |
| Hawk et al. ( | Portrayals of 9 emotions (anger, contempt, disgust, embarrassment, fear, joy, pride, sadness, surprise) by 8 actors. One standard content sentence (same for all emotions) | English | 72 audio files | 0 (no content with full sentences) | Acted (acting students) |
| Juslin and Laukka ( | Portrayals of 5 emotions (anger, disgust, fear, happiness, sadness; each with 2 levels of emotion intensity) by 8 actors. 2 standard content sentences per language (same for all emotions) | English, Swedish | 160 audio files | 25 (randomly selected) | Acted (professional) |
| SAVEE (Haq and Jackson | Portrayals of 6 emotions (anger, disgust, fear, happiness, sadness, surprise) by 4 actors. 15 standard content sentences per emotion (3 common for all emotions) | English | 360 video files | 90 (randomly selected) | Acted (non-professional) |
| SU Voices (Nordström and Laukka | Portrayals of 13 emotions (anger, contempt, disgust, fear, happiness, interest, lust, pride, relief, sadness, serenity, shame, tenderness; each with 2 levels of intensity) by 14 actors. One standard content sentence (same for all emotions) | Swedish | 364 audio files | 27 (randomly selected) | Acted (professional and non-professional) |
| VENEC (Laukka et al. | Portrayals of 18 emotions (affection, anger, amusement, contempt, disgust, distress, fear, guilt, happiness, interest, lust, negative surprise, positive surprise, pride, relief, sadness, serenity, shame; each with 3 levels of emotion intensity) by 20 actors. 2 standard content sentences (same for all emotions) | English | 1020 audio files | 102 (randomly selected) | Acted (professional) |
| Belfast Naturalistic (Douglas-Cowie et al. | Recordings of human interactions. Annotated for arousal and valence | English | 22 video files | 20 (selected based on observer ratings) | Spontaneous (interviews) |
| BINED (Sneddon et al. | Recordings of participants engaging in various emotion inducing tasks in laboratory settings. Annotated for arousal and valence. Only a subset of recordings contains speech | English | 28 long video files | 67 (manually extracted and selected based on observer ratings) | Spontaneous (emotion inducing laboratory tasks) |
| DIT IE (Cullen et al. | Recordings of participants engaging in a computer game task. Annotated for arousal and valence | English | 160 segmented and pre-selected audio files | 54 (selected based on observer ratings) | Spontaneous (emotion inducing laboratory tasks) |
| E-Wiz (Aubergé et al. | Recordings of participants engaging in a Wizard-of-Oz task. We used a pre-selected set of annotated stimuli, as described in Laukka et al. ( | French | 36 segmented and pre-selected audio files | 5 (selected based on observer ratings; most of the content consisted of single word utterances) | Spontaneous (emotion inducing laboratory tasks) |
| HUMAINE (Douglas-Cowie et al. | Recordings from various sources, including reality television shows, emotion inducing lab tasks, and human interactions. Some of the content is annotated for arousal and valence | English, French | 47 long video files | 55 (manually extracted and selected based on observer ratings) | Spontaneous (interviews, emotion inducing laboratory tasks, television shows) |
| Lego (Kehrein | Recordings of dialogues where participants cooperatively try to assemble an impossible Lego task. Annotated with regard to various affect labels | German | 5 long audio files | 118 (manually extracted and selected based on observer ratings) | Spontaneous (emotion inducing laboratory task) |
| Nimitek (Gnjatović and Rösner | Audio recordings of participants engaging in a Wizard-of-Oz task. Partly annotated with regard to various affect labels | German | 10 long audio files | 113 (selected based on observer ratings) | Spontaneous (emotion inducing laboratory tasks) |
| SEMAINE (McKeown et al. | Video recordings of human interactions. Annotated for arousal and valence | English | 140 long video files | 210 (manually extracted and selected based on observer ratings) | Spontaneous (interviews) |
| SSPNet Conflict (Kim et al. | Recordings from televised political debates. Annotated with regard to conflict level | French | 1430 audio segments | 162 (manually extracted and selected based on observer ratings) | Spontaneous (television shows) |
| TIVAC (Juslin and Laukka | Recordings of emotional speech from various sources available online | English, Swedish | 84 segmented and pre-selected audio files | 84 (selected based on observer ratings) | Spontaneous (documentaries, television shows, interviews etc.) |
| TNO Gaming (Truong et al. | Recordings of persons engaging in a computer game task. Annotated for self-report and observer ratings of arousal and valence | Dutch | 2400 segmented audio files | 49 (selected based on self-report and observer ratings) | Spontaneous (emotion inducing laboratory task) |
| Vera am Mittag (Grimm et al. | Recordings from a television talk show. Annotated for perceived arousal, valence and dominance levels | German | 947 video files | 53 (selected based on observer ratings) | Spontaneous (television shows) |
| With and Kaiser ( | Recordings from clinical interviews. Annotated with regard to perceived affect labels | French | 202 segmented and pre-selected video files | 77 (selected based on observer ratings) | Spontaneous (interviews) |
| Voice provider (Neiberg et al. | Recordings from call center human–computer interactions. We used a pre-selected set of annotated stimuli, as described in Laukka et al. ( | Swedish | 200 segmented and pre-selected audio files | 4 (selected based on observer ratings; most of the content consisted of single word utterances) | Spontaneous (call center data) |
| Emovox (Klasmeyer et al. | Recordings of participants engaging in acting tasks and various emotion inducing lab tasks. Annotated by emotional self-ratings | English, French, German | Nearly 14,000 segmented audio files | Acted: 108 (randomly selected) | Acted (non-professional) |
| Spontaneous: 163 (based on self-reported emotion ratings) | Spontaneous (emotion inducing laboratory tasks) |
Distribution of randomly selected voice clips in Study 1 across datasets
| Database | Low intensity | Medium intensity | High intensity | Total |
|---|---|---|---|---|
|
| ||||
| Berlin (Burkhardt et al. | 1 | 4 | 5 | 10 |
| Emovox (Klasmeyer et al. | 3 | 1 | 4 | |
| eNTERFACE’05 (Martin et al. | 4 | 3 | 7 | |
| GEMEP (Bänziger et al. | 6 | 4 | 10 | 20 |
| Juslin and Laukka ( | 1 | 2 | 1 | 4 |
| SAVEE (Haq and Jackson | 3 | 6 | 9 | |
| SU Voices (Nordström and Laukka | 1 | 1 | ||
| VENEC (Laukka et al. | 1 | 1 | 2 | |
|
| ||||
| Belfast Naturalistic (Douglas-Cowie et al. | 2 | 2 | ||
| DIT IE (Cullen et al. | 1 | 1 | 1 | 3 |
| Emovox (Scherer | 1 | 1 | ||
| HUMAINE (Douglas-Cowie et al. | 1 | 1 | ||
| Lego (Kehrein | 2 | 2 | ||
| Nimitek (Gnjatović and Rösner | 1 | 3 | 1 | 5 |
| SEMAINE (McKeown et al. | 5 | 4 | 9 | |
| SSPNet Conflict (Kim et al. | 5 | 2 | 7 | |
| TIVAC_E (Juslin and Laukka | 1 | 10 | 11 | |
| TIVAC_S (Juslin and Laukka | 5 | 6 | 11 | |
| Vera am Mittag (Grimm et al. | 4 | 2 | 6 | |
| With and Kaiser ( | 2 | 2 | ||
Distribution of voice clips included in the acoustic comparisons of Study 2 across the datasets
| Database | Anger | Anger | Fear | Happiness | Happiness | Sadness | Sadness |
|---|---|---|---|---|---|---|---|
| Low | Medium | Low | Low | Medium | Low | Medium | |
|
| |||||||
| Total | 25 | 34 | 32 | 36 | 33 | 44 | 13 |
| Berlin (Burkhardt et al. | 7 | 5 | 14 | 17 | 2 | ||
| Emovox (Klasmeyer et al. | 7 | 1 | 6 | 3 | 1 | 4 | 1 |
| eNTERFACE’05 (Martin et al. | 5 | 8 | 9 | 4 | 8 | 9 | 1 |
| GEMEP (Bänziger et al. | 4 | 4 | 2 | 4 | 3 | 3 | |
| Juslin and Laukka ( | 2 | 3 | 4 | 2 | 2 | 2 | 1 |
| SAVEE (Haq and Jackson | 4 | 9 | 8 | 12 | 1 | 6 | 1 |
| SU Voices (Nordström and Laukka | 1 | 1 | 1 | 2 | 1 | 1 | |
| VENEC (Laukka et al. | 2 | 1 | 4 | 6 | 3 | 2 | 3 |
|
| |||||||
| Total | 73 | 39 | 11 | 25 | 14 | 28 | 21 |
| Belfast Naturalistic (Douglas-Cowie, et al. | 1 | ||||||
| DIT IE (Cullen et al. | 1 | ||||||
| Emovox (Scherer | 16 | 2 | 11 | 2 | |||
| E-Wiz (Aubergé et al. | 1 | ||||||
| Lego (Kehrein | 22 | 1 | 5 | 2 | 4 | ||
| Nimitek (Gnjatović and Rösner | 19 | 15 | 2 | 8 | 9 | ||
| SEMAINE (McKeown et al. | 1 | 1 | 1 | ||||
| SSPNet Conflict (Kim et al. | 2 | 3 | |||||
| TIVAC_E (Juslin and Laukka | 7 | 1 | 5 | 2 | 10 | ||
| TIVAC_S (Juslin and Laukka | 8 | 7 | 3 | 3 | 1 | 9 | |
| Vera am Mittag (Grimm et al. | 3 | ||||||
| Voice Provider (Neiberg et al. | 2 | 2 | |||||
| With and Kaiser ( | 4 | 8 | 5 | 2 | |||