| Literature DB >> 22438733 |
Sergio Escalera1, Xavier Baró, Jordi Vitrià, Petia Radeva, Bogdan Raducanu.
Abstract
Social interactions are a very important component in people's lives. Social network analysis has become a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. For our study, we used a set of videos belonging to New York Times' Blogging Heads opinion blog. The Social Network is represented as an oriented graph, whose directed links are determined by the Influence Model. The links' weights are a measure of the "influence" a person has over the other. The states of the Influence Model encode automatically extracted audio/visual features from our videos using state-of-the art algorithms. Our results are reported in terms of accuracy of audio/visual data fusion for speaker segmentation and centrality measures used to characterize the extracted social network.Entities:
Keywords: audio/visual data fusion; influence model; social interaction; social network analysis
Mesh:
Year: 2012 PMID: 22438733 PMCID: PMC3304135 DOI: 10.3390/s120201702
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1.Block diagram of our integrated framework for Social Network extraction and analysis.
Figure 2.(a) Face and mouth detection and (b) segmented mouth regions.
Figure 3.Result of a one-class classification process for an excerpt of five minutes conversation. The legend shows the true label of the samples. Samples are linearly separable using the DTW-based one-classifier.
Figure 4.Stacked Sequential Learning scheme.
Figure 5.The Influence Model architecture.
Figure 6.The significance of α parameters in the case of a two-person conversation.
Video, Audio, and A–V speaking classification.
| 1–2 | 66.58 | 58.36 | 75.77 | 68.32 | ||
| 1–3 | 58.01 | 63.90 | 64.23 | 72.52 | ||
| 1–4 | 68.37 | 78.50 | 56.38 | 57.35 | ||
| 1–5 | 88.99 | 72.50 | 88.80 | 84.02 | ||
| 1–6 | 69.51 | 61.86 | 79.14 | 80.31 | ||
| 9–3 | 82.63 | 61.95 | 83.83 | 73.92 | ||
| 9–10 | 65.01 | 63.71 | 96.79 | 65.44 | ||
| 3–11 | 65.77 | 74.91 | 83.84 | 86.72 | ||
| 4–3 | 75.35 | 64.09 | 79.10 | 82.07 | ||
| 4–12 | 94.13 | 75.36 | 91.46 | 94.21 | ||
| 13–15 | 70.96 | 71.95 | 76.43 | 77.22 | ||
| 13–14 | 65.11 | 43.10 | 74.97 | 56.56 | ||
| 12–14 | 86.20 | 64.02 | 88.90 | 72.28 | ||
| 12–7 | 97.75 | 85.26 | 97.13 | 92.54 | ||
| 8–10 | 61.44 | 55.93 | 79.18 | 88.29 | ||
| 9–11 | 67.09 | 66.88 | 89.72 | 91.37 | ||
| 7–14 | 55.88 | 63.54 | 91.37 | 60.12 | ||
| Mean Rank | 2.82 | 2.01 | 1.17 | |||
Figure 7.Social Network showing participant labels and influence direction.
Figure 8.Comparison between speaking prediction with the fusion methodology and ground truth vectors.
Centrality Measures.
| 1 | 5 | 0 | 0.5344 | 0.7582 | |
| 2 | 0 | 1 | 0.4628 | 0 | 0.1278 |
| 3 | 2 | 2 | 0.7898 | 0.7692 | 0.3241 |
| 4 | 2 | 1 | 0.7510 | 0.4321 | |
| 5 | 0 | 1 | 0.5339 | 0 | 0.0152 |
| 6 | 0 | 1 | 0.3985 | 0 | 0.2832 |
| 7 | 1 | 1 | 0.5052 | 0 | 0.1224 |
| 8 | 0 | 1 | 0 | 0 | 0 |
| 9 | 0 | 0.5165 | 0.0437 | ||
| 10 | 2 | 0 | 0.5375 | 0.1429 | 0.0187 |
| 11 | 1 | 1 | 0.6177 | 0 | 0.1084 |
| 12 | 0 | 0.6453 | 0.8352 | 0.2159 | |
| 13 | 0 | 2 | 0.4897 | 0.2747 | 0.0131 |
| 14 | 3 | 0 | 0.5344 | 0.6264 | 0.0896 |
| 15 | 1 | 0 | 0.3434 | 0 | 0.0063 |