| Literature DB >> 34975282 |
Asma Zahra1,2, Mubeen Ghafoor3, Kamran Munir4, Ata Ullah2, Zain Ul Abideen2.
Abstract
Smart video surveillance helps to build more robust smart city environment. The varied angle cameras act as smart sensors and collect visual data from smart city environment and transmit it for further visual analysis. The transmitted visual data is required to be in high quality for efficient analysis which is a challenging task while transmitting videos on low capacity bandwidth communication channels. In latest smart surveillance cameras, high quality of video transmission is maintained through various video encoding techniques such as high efficiency video coding. However, these video coding techniques still provide limited capabilities and the demand of high-quality based encoding for salient regions such as pedestrians, vehicles, cyclist/motorcyclist and road in video surveillance systems is still not met. This work is a contribution towards building an efficient salient region-based surveillance framework for smart cities. The proposed framework integrates a deep learning-based video surveillance technique that extracts salient regions from a video frame without information loss, and then encodes it in reduced size. We have applied this approach in diverse case studies environments of smart city to test the applicability of the framework. The successful result in terms of bitrate 56.92%, peak signal to noise ratio 5.35 bd and SR based segmentation accuracy of 92% and 96% for two different benchmark datasets is the outcome of proposed work. Consequently, the generation of less computational region-based video data makes it adaptable to improve surveillance solution in Smart Cities.Entities:
Keywords: Deep learning; Smart cities and towns; Smart city applications; Surveillance cameras; Video surveillance
Year: 2021 PMID: 34975282 PMCID: PMC8710820 DOI: 10.1007/s11042-021-11468-w
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.757
State-of-the-art techniques
| Author and References | Dataset | Technique(s) | Results/accuracy | Weakness/remarks |
|---|---|---|---|---|
| Kumar et al. [ | Own Extracted dataset | Kalman filtering and KL-divergence technique | 96.20% (classification) | Limited feature (only traffic) The utilized data is not publically available. No other objects on road are utilized to monitor |
| Hwang et al. [ | Kitti | DBSCAN | 67%-96% (pixel accuracy) | Challenging for different weather/light conditions |
| [ | Mean-shift segmentation | 76% (pixel accuracy) | Works for specifically pedestrians | |
| Badrinarayananet al. [ | CamVid | SegNet | 50.02 mIoU | Consume high computational cost due to extensive depth of layers |
| Ronneberger et al. [ | ISBI cell tracking dataset | U-Net | 92.03% (pixel accuracy) 77.56% mIoU | Results in high delays due to the higher number of parameters Designed for binary class segmentation |
| Hyeonwoo et al. [ | PASCAL VOC 2012 | Deconvolution network | 69.6 mIoU | Need more time for inference due to a dense network |
| Jonathan et al. [ | SIFT flow | Fully convolutional network | 85.2% (pixel accuracy) | Requires features from previously learned networks which makes it unsuitable for road security scenarios |
Fig. 1Proposed ESSE Phases for S-R extraction, QP-Map generation, and HEVC encoding–decoding
List of notations
| Sr | Notation | Description |
|---|---|---|
| 1 | Total number of pixel belongs to a class | |
| 2 | Row index of a frame | |
| 3 | frame | A 2D image from the video with r rows and c columns |
| 4 | Video_Frames | Number of frames in an input video |
| 5 | fn | Index for frame starting from the first frame |
| 6 | SM[] | An array of Salient_Map containing frames with S-R |
| 7 | N | Number of Video Frames |
| 8 | wd | Diagonal weight |
| 9 | wn | Neighbor weight |
| 10 | QM1…N | QP map for 1…N number of frames |
| 11 | QP | Quantization parameter of HEVC |
| 12 | L1–LN | Layers for proposed S-SSN |
| 13 | i | S-SSN Layer index |
| 14 | Activation function on each S-SSN layer | |
| 15 | Upsampling stage for S-SSN layers |
Fig. 2S-SSN and modified HEVC for a input frame b S-Rs, c SM and QP Map and d encoded frame
Fig. 3a Bit-rates are shown for comparison with PICO b PSNR gain of proposed ESSE
Fig. 4For segmentation networks, the number of parameters are shown in (a) whereas mean IoU are shown in (b), (c) and (d) carries the validation and training accuracies. e Average time for training segmentation networks
Fig. 5Visual segmented results for b S-SSN c SegNet d DN a is
taken from tested video
Fig. 6For different video sequences, the Pixel accuracy and Mean IoU are presented in (a) and (b)
BD-PSNR and BDBR results for ESSE framework and default HEVC for tested surveillance video
| QP | Default HEVC bitrate (kb/s) | Proposed ESSE bit-rate (kb/s) | Avg Δ BD-BR (%) | Avg PSNR in S-R | Avg Δ BD-PSNR(dB) | Avg PSNR of Frame in dB | |||
|---|---|---|---|---|---|---|---|---|---|
| Default HEVC | Proposed ESSE | Default HEVC | Proposed ESSE | ||||||
| 0 | 44,319.73 | 16,865.82 | − 56.92 | 67.43 | 67.56 | 5.35 | 64.4205 | 60.2447 | |
| 5 | 29,600.50 | 13,299.33 | 59.80 | 59.91 | 58.1584 | 53.2501 | |||
| 10 | 17,587.23 | 7204.11 | 56.11 | 56.25 | 54.0684 | 50.7699 | |||
| 15 | 10,377.49 | 4312.31 | 52.71 | 52.84 | 50.8862 | 47.3834 | |||
| 20 | 5939.27 | 2552.19 | 49.61 | 49.77 | 47.9496 | 42.1739 | |||
| Medium-quality setting | |||||||||
| 25 | 3506.91 | 1525.07 | − 52.03 | 46.47 | 46.75 | 4.23 | 44.9181 | 39.4887 | |
| 30 | 2028.57 | 960.64 | 43.40 | 43.29 | 41.9901 | 36.7874 | |||
| 35 | 1120.18 | 556.39 | 40.26 | 40.45 | 38.9366 | 33.1159 | |||
| Low-quality setting | |||||||||
| 40 | 610.03 | 320.29 | − 25.87 | 37.15 | 37.00 | 1.18 | 35.9031 | 31.1556 | |
| 45 | 319.66 | 216.45 | 34.02 | 33.50 | 32.8828 | 30.5697 | |||
| 51 | 136.70 | 136.70 | 30.50 | 30.50 | 29.4725 | 29.4725 | |||
Bit-rate savings of ESSE for different smart city surveillance videos encoded at base QP value of 20
| Surveillance videos | Default HM | ESSE | Bit-rate savings | Default HEVC PSNR (db) | ESSE PSNR (db) |
|---|---|---|---|---|---|
| Cross-road [ | 2256.851 | 774.339 | 65.68 | 45.15 | 45.18 |
| Dash-cam [ | 5939.277 | 2552.192 | 57.02 | 46.94 | 46.95 |
| Main-road [ | 4939.265 | 1952.183 | 60.47 | 46.61 | 46.62 |
| Bank [ | 3256.762 | 1274.345 | 60.87 | 44.86 | 44.89 |
Fig. 7Case study results for smart city surveillance pedestrian scenarios. a Original frames b Default HEVC c ESSE framework
Fig. 8Case study results for smart city surveillance traffic scenarios. a Original frames b default HEVC c ESSE framework
Fig. 9Case study results for smart city surveillance bicycle/motorbike scenarios. a Original frames. b Default HEVC c ESSE framework