| Literature DB >> 35494862 |
Wasim Muhammad1, Imran Ahmed1, Jamil Ahmad1,2, Muhammad Nawaz1, Eatedal Alabdulkreem3, Yazeed Ghadi4.
Abstract
Like other business domains, digital monitoring has now become an integral part of almost every academic institution. These surveillance systems cover all the routine activities happening on the campus while producing a massive volume of video data. Selection and searching the desired video segment in such a vast video repository is highly time-consuming. Effective video summarization methods are thus needed for fast navigation and retrieval of video content. This paper introduces a keyframe extraction method to summarize academic activities to produce a short representation of the target video while preserving all the essential activities present in the original video. First, we perform fine-grain activity recognition using a realistic Campus Activities Dataset (CAD) by modeling activity attention scores using a deep CNN model. In the second phase, we use the generated attention scores for each activity category to extract significant video frames. Finally, we evaluate the inter-frame similarity index used to reduce the number of redundant frames and extract only the representative keyframes. The proposed framework is tested on different videos, and the experimental results show the performance of the proposed summarization process.Entities:
Keywords: Dats science; Deep learning; Emerging technologies; Machine learning
Year: 2022 PMID: 35494862 PMCID: PMC9044333 DOI: 10.7717/peerj-cs.911
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1CNN based campus surveillance video summarization framework.
Characteristics of campus activities dataset.
| S. No. | Category | No. of video samples | Duration (H:M:S) | No. of frames |
|---|---|---|---|---|
| 1 | Writing on Board (WB) | 104 | 00:28:52 | 14,507 |
| 2 | Explanation (Exp) | 161 | 00:39:27 | 15,052 |
| 3 | Board Cleaning (BC) | 63 | 00:16:11 | 9,237 |
| 4 | Class Entry (CE) | 136 | 00:19:03 | 15,271 |
| 5 | Class Exit (CEx) | 96 | 00:13:09 | 11,918 |
| 6 | Class Break (CB) | 42 | 00:09:36 | 6,634 |
| 7 | Paper Distribution (PD) | 86 | 00:21:43 | 14,194 |
| 8 | Paper Attempt (PA) | 121 | 00:18:50 | 12,871 |
| 9 | Paper Signature (PS) | 87 | 00:22:49 | 15,227 |
| 10 | Attendance (Att) | 109 | 00:23:36 | 11,829 |
| 11 | Paper Collection (PC) | 207 | 00:26:19 | 16,102 |
| 12 | Exam Entry (EE) | 121 | 00:15:17 | 15,363 |
| 13 | Exam Exit (EEx) | 131 | 00:15:59 | 13,488 |
| 14 | Exam Break (EB) | 52 | 00:11:42 | 6,357 |
Architecture of the CNN model.
| Layer | Kernels | Size | Stride | Activation |
|---|---|---|---|---|
| Conv – 1 | 128 | (3,3) | (1,1) | Relu |
| Max-Pooling | – | (2,2) | (2,2) | – |
| Conv – 2 | 64 | (3,3) | (1,1) | Relu |
| Max-Pooling | – | (2,2) | (2,2) | – |
| Conv – 3 | 32 | (3,3) | (1,1) | Relu |
| Max-Pooling | – | (2,2) | (2,2) | – |
| Conv – 4 | 16 | (3,3) | (1,1) | Relu |
| Max-Pooling | – | (2,2) | (2,2) | – |
| FC-01 | – | 500 | – | – |
| FC-02 | – | 1000 | – | – |
| Output | – | 14 | – | Softmax |
Figure 2Training progress and train-validation accuracy.
Figure 3Concept matrix representing prediction result of test frames.
Figure 4Various classification metrics showing the classification performance of the trained model.
Figure 5Activity attention plot showing all attention values of Video 1.
Figure 6Video frames corresponding to individual activity category.
Figure 7Graph showing salient activity frames from all the recognized activities.
Figure 8Graphs showing the attention scores and the corresponding salient frames for Video 2, Video 3 and Video 4.
Performance of the proposed summarization process on test videos.
| Test video | Total frames | Significant frames | Detection accuracy (%) | Key frames | Size of video (%) |
|---|---|---|---|---|---|
| Video 1 | 7,078 | 6,660 | 94.1 | 295 | 4 |
| Video 2 | 7,570 | 7,193 | 95.0 | 82 | 1 |
| Video 3 | 9,539 | 8,991 | 94.2 | 363 | 3.8 |
| Video 4 | 3,070 | 2,031 | 66.1 | 215 | 7 |
Figure 9A representative key frame from each activity category in Video 1 story board.
Figure 10Sample key frames extracted from Video 2, Video 3 and Video 4.