| Literature DB >> 25461382 |
Ekaterina Volkova1, Stephan de la Rosa2, Heinrich H Bülthoff3, Betty Mohler2.
Abstract
Emotion expression in human-human interaction takes place via various types of information, including body motion. Research on the perceptual-cognitive mechanisms underlying the processing of natural emotional body language can benefit greatly from datasets of natural emotional body expressions that facilitate stimulus manipulation and analysis. The existing databases have so far focused on few emotion categories which display predominantly prototypical, exaggerated emotion expressions. Moreover, many of these databases consist of video recordings which limit the ability to manipulate and analyse the physical properties of these stimuli. We present a new database consisting of a large set (over 1400) of natural emotional body expressions typical of monologues. To achieve close-to-natural emotional body expressions, amateur actors were narrating coherent stories while their body movements were recorded with motion capture technology. The resulting 3-dimensional motion data recorded at a high frame rate (120 frames per second) provides fine-grained information about body movements and allows the manipulation of movement on a body joint basis. For each expression it gives the positions and orientations in space of 23 body joints for every frame. We report the results of physical motion properties analysis and of an emotion categorisation study. The reactions of observers from the emotion categorisation study are included in the database. Moreover, we recorded the intended emotion expression for each motion sequence from the actor to allow for investigations regarding the link between intended and perceived emotions. The motion sequences along with the accompanying information are made available in a searchable MPI Emotional Body Expression Database. We hope that this database will enable researchers to study expression and perception of naturally occurring emotional body expressions in greater depth.Entities:
Mesh:
Year: 2014 PMID: 25461382 PMCID: PMC4252031 DOI: 10.1371/journal.pone.0113647
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Overview of several databases described in the introduction section.
| Name | Format | Emotions or actions | Modalities | Mode | Samples, Actors, Raters | Contexts |
| CMU MoBo |
| actions and interactions |
| E |
| various |
| FABO | V |
| F, | E |
| emotion elicitation by situation vignettes, free expression |
| KUG |
| 55 actions and gestures |
| E |
| directed motion capture in studio |
|
|
| actions modulated by 4 emotions |
| E |
| emotion and action elicitation by situation vignettes |
| GEMEP | A, V |
|
| E |
| pseudospeech sentences and a nonverbal vocalization |
| IEMOCAP | A, | 6 emotions and 3 affect dimensions | F, | E, |
| scripted and spontaneous dyadic interactions |
| USC CreativeIT | A, | affect dimensions, interaction tendencies, and performance ratings |
| E |
| two-sentence exercise and paraphrases |
| AffectME |
| 4 emotions and 2 affect dimensions |
|
| 103, 11, 8 | emotions were triggered naturally while playing a video game |
|
|
| 4 emotions |
|
| 161, 9, 7 | Wii Grand Slam Tennis game |
|
| A, V |
|
|
| 1400, 256, self-report | laboratory-based emotion induction tasks |
| MMLI | A, | laughter episodes |
| E, |
| word games, humorous videos, tongue twisters |
| UCL-ILHAIRE |
| 4 laugher types + non-laughter |
| E, | 126, 9, 32 | word and collaborative games, humorous videos |
|
|
|
|
| E, |
| short scenarios, sentences and narrations |
The databases included in this table share one or more features with our new database such as: focus on body motion, motion capture format, rich set of emotion categories (more than the six basic emotions), naturalness of the produced motion, large number of motion samples. These shared features are highlighted in bold. The table highlights the following properties: the major formats the data was recorded in (Audio, Motion capture, Video), the emotions expressed, the modalities recorded (Body, Face, Respiration, Speech), the mode of the emotions (Elicited vs. Naturalistic), the size of the database/dataset, the number of actors, the number of observers/raters, and the context in which the data collection took place. The highlighted initial letters in bold correspond to the code letters in the format, modalities and mode columns. The last row describes the new database for the easiness of comparison. Table 2 lists emotion categories used in the databases.
Emotion categories used in the databases included in Table 1.
| Name | N of categories | Emotion categories |
| FABO | 10 emotions | anger, anxiety, boredom, disgust, fear, happiness, neutral, sadness, surprise, uncertainty |
|
| 4 emotions | neutral, angry, happy, sad |
| GEMEP | 17 emotions | amusement, anxiety, cold anger (irritation), despair, hot anger (rage), fear (panic), interest, joy (elation), pleasure (sensory), pride, relief, sadness, admiration, contempt, disgust, surprise, tenderness |
| IEMOCAP | 6 emotions and 3 affect dimensions | neutral, anger, happiness, sadness, frustration, excited; valence, activation, dominance |
| USC CreativeIT | affect dimensions, interaction tendencies, and performance ratings | valence, activation, dominance; approach-avoidance; interest, naturalness, creativity |
| AffectME | 4 emotions and 2 affect dimensions | concentrating, defeated, frustrated, triumphant; valence, arousal |
|
| 4 emotions | happiness, concentrated, high-intensity negative emotion, low-intensity negative emotion |
|
| 8 emotions | disgust, surprise, fear, relaxed, sadness, anger, amusement, frustration |
| MMLI | laughter episodes | laughter, non-laughter |
| UCL-ILHAIRE | 4 laugher types + non-laughter | hilarious laughter, social laughter, awkward laughter, fake laughter, non-laughter |
|
| 11 emotions | amusement, joy, pride, relief, surprise, anger, disgust, fear, sadness, shame, neutral |
The CMU MoBo [11] and the KUG [18] databases were not included in this table because their design was not aimed at capture of emotional body expressions but rather actions and neutral gestures.
Figure 1Setup for motion capture sessions.
The individual in this figure has given written informed consent (as outlined in PLOS consent form) to publish the photo with the face unmasked. (A) An actor in motion capture Moven Xsens suit, t-pose (B) Acting setup: an actor in neutral pose, stool, pedals and display. (C) An actor expressing pride.
Stories narrated by actors during motion capture sessions.
| Story Title | AnBh | DiMi | HeGa | LeSt | MaMa | NoVo | PaPi | SiGl | Count |
| Blue Beard (TBB) | √ | √ | √ | √ | 4 | ||||
| Flower Princess (TFP) | √ | √ | √ | √ | 4 | ||||
| Golden Goose (TGG) | √ | √ | √ | 3 | |||||
| Hoodie Crow (THC) | √ | √ | 2 | ||||||
| Jack My Hedgehog (TJH) | √ | √ | √ | 3 | |||||
| Owl And Eagle (TOE) | √ | 1 | |||||||
| Six Swans (TSS) | √ | 1 | |||||||
| Swineherd (TSH) | √ | √ | 2 | ||||||
| White Duck (TWD) | √ | √ | √ | √ | 4 | ||||
| Sum | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 24 |
Each actor performed three stories in total, one story per session. To increase actors' motivation and comfort, the choice of three stories out of 9 was with the actor and not the researcher, which resulted in the fact that some stories were acted out by several actors (e.g., TBB, TWD) and some by only one actor (e.g., TSS).
Figure 2Stick-figure representations of human upper body used in the emotion categorisation studies [51].
Regardless of the body proportions of the actor, the motion trajectories were mapped onto a skeleton of average body size. Note that the motion capture files included in the database contain data for the full body and have the original actors' body size and proportions.
Figure 3Motion sequence frequencies across intended (A) and perceived (B, C) emotions.
Intended emotions originate from actors' text annotations while perceived emotions come from the categories forming a unique modal value in observers' response distribution for every motion sequence. The perceived emotion frequencies are split into two graphs to allow the same y-axis scale for (A) and (B) graphs. The emotional category in plot C is the sum of all frequencies in plot B.
Figure 4Emotion frequency distribution across consistency levels.
(A) Histogram of consistency rates across motion sequences. The minimally possible consistency is always equal to one divided by the number of observations for the given stimulus and multiplied by two because there have to be at least least two observers assigning the same category to the stimuli to form a modal value. (B) Distribution of perceived emotions across categories with a consistency rate of 0.3 or more. (C) Distribution of perceived emotions across categories with a consistency rate of 0.5 or more. (D) Distribution of perceived emotions across categories with a consistency rate of 0.7 or more.
Frequencies in the final set of motion sequences across intended emotion categories and acting tasks.
| Emotion | Short scenarios | Narrations | |||||||||||
| Non-verbal | Sentences | ||||||||||||
| NS | NC | SD | SN | TBB | TFP | TGG | THC | TJH | TOE | TSS | TSH | TWD | |
| amusement | 7 | 6 | 8 | 6 | 4 | 31 | 8 | 6 | |||||
| anger | 5 | 8 | 8 | 8 | 24 | 9 | 17 | 4 | 17 | 5 | 13 | 23 | |
| disgust | 4 | 6 | 5 | 6 | 8 | 9 | 18 | 3 | 12 | 1 | 11 | 23 | |
| fear | 5 | 7 | 7 | 8 | 29 | 12 | 4 | 4 | 6 | 2 | 6 | 3 | 28 |
| joy | 7 | 5 | 8 | 8 | 13 | 52 | 29 | 16 | 15 | 23 | 3 | 21 | 27 |
| neutral | 6 | 25 | 11 | 5 | 10 | 2 | 5 | 38 | |||||
| pride | 8 | 5 | 8 | 8 | 18 | 39 | 13 | 1 | 17 | 6 | 2 | 12 | 8 |
| relief | 6 | 7 | 4 | 4 | 9 | 18 | 4 | 10 | 6 | 3 | 3 | 1 | 16 |
| sadness | 7 | 6 | 8 | 7 | 19 | 14 | 34 | 12 | 12 | 3 | 9 | 6 | 74 |
| shame | 6 | 7 | 7 | 8 | 3 | 8 | 8 | 2 | 4 | 2 | 3 | ||
| surprise | 7 | 6 | 5 | 7 | 20 | 18 | 26 | 21 | 12 | 4 | 11 | 32 | |
The abbreviations stand for: NS — non-verbal solitary, NC — non-verbal communicative, SD — sentences with direct speech, SN — sentences without direct speech, TBB — Blue Beard, TFP — Flower Princess, TGG — Golden Goose, THC — Hoodie Crow, TJH — Jack My Hedgehog, TOE — Owl and Eagle, TSS — Six Swans, TSH — Swineherd, TWD — White Duck.
Frequencies in the final set of motion sequences across actors (rows) and intended emotion categories (columns).
| AnBh | DiMi | HeGa | LeSt | MaMa | NoVo | PaPi | SiGl | total | |
| amusement | 7 | 3 | 4 | 4 | 4 | 16 | 35 | 3 | 76 |
| anger | 30 | 22 | 20 | 15 | 4 | 24 | 6 | 20 | 141 |
| disgust | 14 | 11 | 13 | 30 | 3 | 8 | 12 | 15 | 106 |
| fear | 40 | 14 | 9 | 17 | 5 | 16 | 8 | 12 | 121 |
| joy | 52 | 27 | 33 | 46 | 26 | 10 | 17 | 16 | 227 |
| neutral | 25 | 10 | 18 | 20 | 2 | 12 | 10 | 5 | 102 |
| pride | 27 | 21 | 23 | 10 | 10 | 19 | 23 | 12 | 145 |
| relief | 25 | 14 | 4 | 22 | 6 | 5 | 6 | 9 | 91 |
| sadness | 41 | 31 | 15 | 61 | 6 | 30 | 8 | 19 | 211 |
| shame | 7 | 13 | 5 | 13 | 4 | 3 | 5 | 8 | 58 |
| surprise | 24 | 27 | 35 | 37 | 7 | 34 | 1 | 4 | 169 |
| total | 292 | 193 | 179 | 275 | 77 | 177 | 131 | 123 | 1447 |
Frequencies in the final set of motion sequences across actors and acting tasks.
| Actor | Short scenarios | Narrations | |||||||||||
| Non-verbal | Sentences | ||||||||||||
| NS | NC | SD | SN | TBB | TFP | TGG | THC | TJH | TOE | TSS | TSH | TWD | |
| AnBh | 7 | 8 | 9 | 10 | 98 | 67 | 93 | ||||||
| DiMi | 9 | 9 | 9 | 9 | 57 | 48 | 52 | ||||||
| HeGa | 9 | 7 | 9 | 8 | 57 | 35 | 54 | ||||||
| LeSt | 7 | 9 | 8 | 8 | 81 | 78 | 84 | ||||||
| MaMa | 9 | 7 | 8 | 9 | 44 | ||||||||
| NoVo | 8 | 8 | 8 | 8 | 55 | 41 | 49 | ||||||
| PaPi | 6 | 7 | 7 | 10 | 54 | 47 | |||||||
| SiGl | 7 | 8 | 10 | 8 | 31 | 28 | 31 | ||||||
The abbreviations stand for: NS — non-verbal solitary, NC — non-verbal communicative, SD — sentences with direct speech, SN — sentences without direct speech, TBB — Blue Beard, TFP — Flower Princess, TGG — Golden Goose, THC — Hoodie Crow, TJH — Jack My Hedgehog, TOE — Owl and Eagle, TSS — Six Swans, TSH — Swineherd, TWD — White Duck.
Figure 5Emotion recognition accuracy across acting tasks for intended emotions (A) and consistency rates for perceived emotions (B) across acting tasks.
All error bars represent 95% CI.
Frequencies of motion sequences across actors where intended and perceived emotion categories coincide, sorted by the total frequency within each emotion category (rows) and actor (columns).
| NoVo | HeGa | DiMi | AnBh | LeSt | SiGl | PaPi | MaMa | total | |
| neutral | 9 | 18 | 9 | 18 | 20 | 3 | 9 | 2 | 88 (.86) |
| anger | 12 | 17 | 9 | 9 | 5 | 7 | 2 | 4 | 65 (.46) |
| sadness | 15 | 1 | 16 | 8 | 1 | 2 | 2 | 2 | 47 (.22) |
| fear | 14 | 3 | 10 | 3 | 2 | 32 (.26) | |||
| joy | 7 | 14 | 2 | 2 | 1 | 2 | 1 | 29 (.12) | |
| pride | 4 | 10 | 1 | 3 | 1 | 1 | 3 | 1 | 24 (.16) |
| surprise | 9 | 3 | 1 | 2 | 15 (.08) | ||||
| amusement | 2 | 2 | 1 | 3 | 8 (.10) | ||||
| shame | 5 | 1 | 1 | 1 | 8 (.14) | ||||
| disgust | 1 | 1 | 1 | 1 | 4 (.03) | ||||
| relief | 1 | 2 | 3 (.03) | ||||||
| total | 71 (.40) | 69 (.38) | 54 (.28) | 46 (.15) | 28 (.10) | 22 (.16) | 21 (.16) | 12 (.15) | 323 (.22) |
Values in round brackets represent the proportions of the frequencies in relation to the whole database (see Table 6).
Figure 6Average recognition accuracy across actors (A) and observers' response consistency across actors (B).
All error bars represent 95% CI.
Figure 7Physical properties of motion sequences across acting tasks and individual actors.
All error bars represent 95% CI. The panels show: (A) Duration, sec; (B) Peaks in motion trajectories of right and left wrists across acting tasks; (C) Average motion speed; (D) Average motion span. As the bar plots show, physical properties of motion sequences depend both on the acting tasks and on the individual actors.
Online Database Overview.
| Column Name | Description |
| Actor-dependent Motion Properties | |
| Motion Id | Unique motion sequence file name |
| Intended Emotion | One of the eleven emotion categories as intended by the actor |
| Intended Polarity |
|
| Duration | Duration of the motion in seconds |
| Peaks | Number of peaks and valleys in motion trajectory along |
| Speed | Average speed in m/sec for the left and the right wrist joints |
| Span | Average span in meters between the left and the right wrist joints |
| Acting Task | Nonverbal, Sentences, Narration |
| Acting Sub-task | Specific sub-task or story title (see |
| Actor | The id name for one for the eight actors who performed the motion sequence |
| Gender | Actor's gender (“f” – female, “m” – male) |
| Age | Actor's age ( |
| Handedness | Actor's handedness (“r” – righthanded, “l” – lefthanded |
| Native Tongue | Actor's mother language (German, English, Hindi) |
| Observer-dependent Motion Properties | |
| Perceived Category | One of the eleven emotion categories as perceived by majority of the observers |
| Perceived Polarity |
|
| Accurate Category | “1” when intended and perceived emotions coincide, “0” otherwise |
| Accurate Polarity | “1” when intended and perceived polarity coincide, “0” otherwise |
| Responses | The list of eleven responses to the motion sequence from all the observers |
| Consistency | The proportion of responses taken by the unique modal value, which is also recorded in “Perceived Category” |
| Text | The text that served as acting motivation (not spoken out loud in non-verbal tasks) |
The 1447 motion sequences can be filtered and sorted by their metadata represented in the columns, such as intended emotion, perceived emotion, physical properties of the motion, actor information, etc. The database is available online (ebmdb.tuebingen.mpg.de), accompanied by usage instructions and the license agreement.