| Literature DB >> 35808423 |
Sudarsana Reddy Kadiri1, Paavo Alku1.
Abstract
Understanding of the perception of emotions or affective states in humans is important to develop emotion-aware systems that work in realistic scenarios. In this paper, the perception of emotions in naturalistic human interaction (audio-visual data) is studied using perceptual evaluation. For this purpose, a naturalistic audio-visual emotion database collected from TV broadcasts such as soap-operas and movies, called the IIIT-H Audio-Visual Emotion (IIIT-H AVE) database, is used. The database consists of audio-alone, video-alone, and audio-visual data in English. Using data of all three modes, perceptual tests are conducted for four basic emotions (angry, happy, neutral, and sad) based on category labeling and for two dimensions, namely arousal (active or passive) and valence (positive or negative), based on dimensional labeling. The results indicated that the participants' perception of emotions was remarkably different between the audio-alone, video-alone, and audio-video data. This finding emphasizes the importance of emotion-specific features compared to commonly used features in the development of emotion-aware systems.Entities:
Keywords: emotion analysis; emotion recognition; emotion synthesis; feature extraction; naturalistic audio–visual emotion database
Mesh:
Year: 2022 PMID: 35808423 PMCID: PMC9269694 DOI: 10.3390/s22134931
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
List of emotions and expressive states.
| Emotions | Expressive States |
|---|---|
| 1. Angry | 1. Confused |
| 2. Disgusted | 2. Excited |
| 3. Frightened | 3. Interested |
| 4. Happy | 4. Relaxed |
| 5. Neutral | 5. Sarcastic |
| 6. Sad | 6. Worried |
| 7. Surprised |
Figure 1Histogram for the signal length (duration).
Order of preference in the perception of emotions with respect to neutral. The notation ‘***’ refers to the highest preference and the notation ‘**’ refers to the medium preference and ‘*’ refers to the lowest preference.
| Audio-Alone | Video-Alone | Audio–Video | |
|---|---|---|---|
| Angry | *** | * | ** |
| Happy | ** | *** | ** |
| Sad | * | ** | *** |
Identified affective states (denoted by *) for the four emotions in the audio-alone data.
| Angry | Happy | Neutral | Sad | Excited | Worried | Surprised | |
|---|---|---|---|---|---|---|---|
| Angry | * | - | - | - | * | - | - |
| Happy | - | * | - | - | - | - | * |
| Neutral | - | - | * | * | - | - | - |
| Sad | - | - | * | * | - | * | - |
Identified affective states (denoted by *) for the four emotions in the video-alone data.
| Angry | Happy | Neutral | Sad | Excited | Worried | Surprised | |
|---|---|---|---|---|---|---|---|
| Angry | * | - | - | - | * | - | - |
| Happy | - | * | - | - | - | - | * |
| Neutral | - | - | * | * | - | - | - |
| Sad | - | - | - | * | - | * | - |
Identified affective states (denoted by *) for the four emotions in the audio–video data.
| Angry | Happy | Neutral | Sad | Excited | Worried | Surprised | |
|---|---|---|---|---|---|---|---|
| Angry | * | - | - | - | * | - | - |
| Happy | - | * | - | - | - | - | * |
| Neutral | - | - | * | - | - | - | - |
| Sad | - | - | - | * | - | * | - |
Figure 2Illustrations of the pitch (shown in (b)) and intensity (shown in (c)) contours of a speech signal (shown in (a)) in neutral emotion.
Figure 3Illustrations of the pitch (shown in (b)) and intensity (shown in (c)) contours of a speech signal (shown in (a)) in high-arousal angry emotion.
Figure 4Illustrations of the pitch (shown in (b)) and intensity (shown in (c)) contours of a speech signal (shown in (a)) in high-arousal happy emotion.
Figure 5An hierarchical approach for analyzing audio–visual data.