| Literature DB >> 32837250 |
Valentina Franzoni1, Giulio Biondi2, Alfredo Milani1.
Abstract
Crowds express emotions as a collective individual, which is evident from the sounds that a crowd produces in particular events, e.g., collective booing, laughing or cheering in sports matches, movies, theaters, concerts, political demonstrations, and riots. A critical question concerning the innovative concept of crowd emotions is whether the emotional content of crowd sounds can be characterized by frequency-amplitude features, using analysis techniques similar to those applied on individual voices, where deep learning classification is applied to spectrogram images derived by sound transformations. In this work, we present a technique based on the generation of sound spectrograms from fragments of fixed length, extracted from original audio clips recorded in high-attendance events, where the crowd acts as a collective individual. Transfer learning techniques are used on a convolutional neural network, pre-trained on low-level features using the well-known ImageNet extensive dataset of visual knowledge. The original sound clips are filtered and normalized in amplitude for a correct spectrogram generation, on which we fine-tune the domain-specific features. Experiments held on the finally trained Convolutional Neural Network show promising performances of the proposed model to classify the emotions of the crowd.Entities:
Keywords: CNN; Crowd computing; Crowd emotions; Emotion recognition; Image recognition; Transfer learning
Year: 2020 PMID: 32837250 PMCID: PMC7429201 DOI: 10.1007/s11042-020-09428-x
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.757
Fig. 1System architecture of the experimental method.
Fig. 2Spectrograms examples for blocks of different categories for each scale.
Fig. 3Spectrograms examples for the same block with different scales.
Per-class clips number, blocks number and duration in the dataset
| Class | Different Clips | Total duration (s) | Total of blocks |
|---|---|---|---|
| Approval | 39 | 518 | 1787 |
| Disapproval | 15 | 118 | 388 |
| Neutral | 15 | 1874 | 7340 |
| Total | 69 | 2510 | 9515 |
Results for experimental setting 1
| Scale | Accuracy (Avg. over 5 re-training) |
|---|---|
| Mel | 0.9983 |
| Erb | 0.9981 |
| Bark | 0.9983 |
| Log | 0.9968 |
Results for experimental setting 2
| Scale | Accuracy (Avg. over 5 re-training) |
|---|---|
| Mel | 0.9292 |
| Erb | 0.9636 |
| Bark | 0.9646 |
| Log | 0.9924 |
Confusion matrix for Mel scale on each spectrogram of network 2
| Real/Predicted | Approval | Disapproval | Neutral |
|---|---|---|---|
| Approval | 229 | 6 | 0 |
| Disapproval | 0 | 51 | 0 |
| Neutral | 119 | 34 | 883 |
Results for the majority-vote classification scheme
| Scale | Network | Accuracy | Correct classification (majority classification) | Wrong classification (majority classification) |
|---|---|---|---|---|
| Mel | 0 | 0.8994 | 12 | 0 |
| Mel | 1 | 0.9561 | 12 | 0 |
| Mel | 2 | 0.8797 | 12 | 0 |
| Mel | 3 | 0.9781 | 12 | 0 |
| Mel | 4 | 0.9327 | 12 | 0 |
| Erb | 0 | 0.8805 | 12 | 0 |
| Erb | 1 | 0.9728 | 10 | 2 |
| Erb | 2 | 0.9849 | 12 | 0 |
| Erb | 3 | 0.9917 | 12 | 0 |
| Erb | 4 | 0.9879 | 12 | 0 |
| Bark | 0 | 0.9773 | 10 | 2 |
| Bark | 1 | 0.9433 | 10 | 2 |
| Bark | 2 | 0.9652 | 12 | 0 |
| Bark | 3 | 0.9758 | 10 | 2 |
| Bark | 4 | 0.9614 | 10 | 2 |
| Log | 0 | 0.9947 | 12 | 0 |
| Log | 1 | 0.9894 | 12 | 0 |
| Log | 2 | 0.9992 | 12 | 0 |
| Log | 3 | 0.9803 | 12 | 0 |
| Log | 4 | 0.9985 | 12 | 0 |