| Literature DB >> 25525561 |
Miraj Shah1, David G Cooper1, Houwei Cao1, Ruben C Gur2, Ani Nenkova3, Ragini Verma1.
Abstract
Automatic recognition of emotion using facial expressions in the presence of speech poses a unique challenge because talking reveals clues for the affective state of the speaker but distorts the canonical expression of emotion on the face. We introduce a corpus of acted emotion expression where speech is either present (talking) or absent (silent). The corpus is uniquely suited for analysis of the interplay between the two conditions. We use a multimodal decision level fusion classifier to combine models of emotion from talking and silent faces as well as from audio to recognize five basic emotions: anger, disgust, fear, happy and sad. Our results strongly indicate that emotion prediction in the presence of speech from action unit facial features is less accurate when the person is talking. Modeling talking and silent expressions separately and fusing the two models greatly improves accuracy of prediction in the talking setting. The advantages are most pronounced when silent and talking face models are fused with predictions from audio features. In this multi-modal prediction both the combination of modalities and the separate models of talking and silent facial expression of emotion contribute to the improvement.Entities:
Keywords: emotion; face; multimodal; silent; talking; voice
Year: 2013 PMID: 25525561 PMCID: PMC4267560 DOI: 10.1109/ACII.2013.15
Source DB: PubMed Journal: Int Conf Affect Comput Intell Interact Workshops ISSN: 2156-8103