| Literature DB >> 34189306 |
Camilo Salazar1, Edwin Montoya-Múnera1, Jose Aguilar1,2.
Abstract
In this work, the affective state of users in virtual learning environments is assessed/recognized in terms of continuous arousal and valence dimensions, making use of multimodal information (audio, text and video), whenever any of these modalities are available. In general, virtual learning environments where these three modalities are all the time, are not common; at some moments only the video modality is available, while in others only text or/and video and/or audio. Different approaches using feature-level fusion and decision-level fusion are proposed for multimodal recognition with missing data. Recognizing according to available modalities is studied following the ideas of dropout from neural networks and of variable input length from recurrent neural networks. This proposal is innovative because it represents emotions in the continuous space, which is not common in virtual education; and makes use of the available modalities in a virtual environment in a given moment, which is very common in virtual learning environments because the people are not speaking or writing all the time.Entities:
Keywords: Affective state; Arousal and valence dimensions; Emotional recognition; Multimodal recognition
Year: 2021 PMID: 34189306 PMCID: PMC8220333 DOI: 10.1016/j.heliyon.2021.e07253
Source DB: PubMed Journal: Heliyon ISSN: 2405-8440
Figure 1Differences between decision level and feature level fusion.
Examples of categorical emotion models (table extracted from: [15]).
| Theorist | Basic Emotions |
|---|---|
| James (1884) | Fear, love, grief, rage |
| McDougall (1926) | Anger, elation, fear, disgust, subjection, wonder, tender-emotion |
| Watson (1930) | Love, fear, rage |
| Arnold (1960) | Love, fear, anger, aversion, courage, dejection, desire, despair, hate, sadness, hope |
| Mowrer (1960) | Pain, pleasure |
| Izard (1971) | Fear, anger, contempt, joy, distress, guilt, interest, shame, surprise, disgust |
| Plutchik (1980) | Fear, Acceptance, anger, anticipation, disgust, sadness, surprise, joy |
| Ekman, Friesen, and Ellsworth (1982) | Fear, anger, disgust, joy, sadness, surprise |
| Gray (1982) | Rage and terror, anxiety, joy |
| Panksepp (1982) | Fear, expectancy, rage, panic |
| Tomkins (1984) | Fear, anger, interest, contempt, disgust, distress, shame, surprise, joy |
| Weiner and Graham (1984) | Happiness, sadness |
| Frijda (1986) | Desire, happiness, interest, surprise, wonder, sorrow |
| Oatley and Johnson-Laird (1987) | Anger, disgust, anxiety, sadness happiness |
Figure 2Example of mapping of discrete emotions to the valence-arousal dimensional model (figure taken from: [16]).
Comparison with related works.
| A | B | C | D | E | |
|---|---|---|---|---|---|
| X | |||||
| X | X | X | |||
| X | |||||
| X | X | ||||
| X | X | ||||
| X | X | ||||
| X | X | X | |||
| X | X | X | |||
| X | X | X | |||
| Our work | X | X | X | X | X |
Figure 3General procedure.
Figure 4Feature selection macro algorithm.
Figure 5First approach when video modality is not available.
Figure 6Second approach when video modality is not available.
Figure 7Third approach when video modality is not available.
Figure 8Distribution of data from IEMOCAP, hci-tagging and arousal valence Facebook posts datasets, over the arousal-valence space (horizontal axis: valence, vertical axis: arousal).
Summary of comparison of continuous emotion approaches (B: base approach; F: First approach, feature level fusion with zero padding; S: Second approach, decision level fusion with zero padding; T: Third approach: decision level fusion with dynamic input, wavg: weighted average) for each modality (A: audio, V: video, T: Text) with different metrics (N/A: not apply, MLE: mean linear error).
| Modals. | Data | model | RMSE | SRE | MLE | ||
|---|---|---|---|---|---|---|---|
| B “naive” | A,V,T | Artificial | wavg,PLS | 0.370 | 0.334 | 0.404 | N/A |
| F “naive” | A,V,T | Artificial | PLS,PLS | 0.300 | 0.355 | 0.192 | N/A |
| S “naive” | A,V,T | Artificial | PLS,PLS | 0.400 | 0.326 | 0.401 | N/A |
| T “naive” | A,V,T | Artificial | NN,PLS | 0.346 | 0.340 | 0.400 | N/A |
| B | A,V,T | Artificial | wavg,RF | 0.394 | 0.357 | 0.338 | N/A |
| F | A,V,T | Artificial | RF,RF | 0.443 | 0.345 | 0.300 | N/A |
| S | A,V,T | Artificial | NN | 0.387 | 0.371 | 0.259 | N/A |
| T | A,V,T | Artificial | NN,NN | 0.281 | 0.409 | 0.474 | N/A |
| V | Semaine | SVM | N/A | 0.310 | N/A | N/A | |
| AV | Semaine | SVM | N/A | N/A | N/A | 0.190 | |
| A,V,T | avec 2017 | NN | N/A | <0.1 | N/A | N/A |