| Literature DB >> 34066509 |
Mostafa Ahmed Ezzat1, Mohamed A Abd El Ghany2,3, Sultan Almotairi4, Mohammed A-M Salem1,5.
Abstract
The automation strategy of today's smart cities relies on large IoT (internet of Things) systems that collect big data analytics to gain insights. Although there have been recent reviews in this field, there is a remarkable gap that addresses four sides of the problem. Namely, the application of video surveillance in smart cities, algorithms, datasets, and embedded systems. In this paper, we discuss the latest datasets used, the algorithms used, and the recent advances in embedded systems to form edge vision computing are introduced. Moreover, future trends and challenges are addressed.Entities:
Keywords: IOT; computer vision; smart city; surveillance
Year: 2021 PMID: 34066509 PMCID: PMC8124810 DOI: 10.3390/s21093222
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Complete System Architecture.
Figure 2Video Management Components.
Figure 3Cloud Services.
Figure 4End Users Applications.
Summary of existing review papers.
| Reference | Smart City | Surveillance | Internet of Things | Artificial Intelligence | Edge Computing |
|---|---|---|---|---|---|
| Dlodlo et al. [ | 🗸 | × | 🗸 | × | × |
| Eigenraam et al. [ | 🗸 | 🗸 | 🗸 | × | × |
| Roman et al. [ | × | 🗸 | × | × | 🗸 |
| Bilal et al. [ | × | × | 🗸 | × | 🗸 |
| Ai et al. [ | × | × | 🗸 | × | 🗸 |
| Hu et al. [ | × | 🗸 | × | × | 🗸 |
| Achmad et al. [ | 🗸 | × | × | × | × |
| Yu et al. [ | × | 🗸 | × | × | 🗸 |
| Gharaibeh et al. [ | 🗸 | × | × | × | × |
| Lim et al. [ | 🗸 | 🗸 | × | × | × |
| Zhaohua et al. [ | 🗸 | × | 🗸 | 🗸 | × |
| Chen et al. [ | 🗸 | × | 🗸 | 🗸 | × |
| Ke et al. [ | 🗸 | 🗸 | 🗸 | × | × |
| Jameel et al. [ | 🗸 | 🗸 | 🗸 | × | × |
| Hassan et al. [ | × | 🗸 | × | × | 🗸 |
| Our review paper | 🗸 | 🗸 | 🗸 | 🗸 | 🗸 |
Different Embedded Systems used in Computer Vision.
| Embedded System | Advantages | Disadvantages |
|---|---|---|
| Used mainly in running | More expensive | |
| Small in Size | Smaller version of | |
| Excellent prototyping | Used for classification | |
| Mainly manufacturers use it | Used for classification | |
|
| Adapted for general- | High cost |
FPGA, ASIC, CPU and GPU different hardware comparison with different algorithms.
| Algorithm | Hardware Type | Frequency | Latency | Power |
|---|---|---|---|---|
| MLP [ |
| - | 800 ns | 294 mW |
| MLP [ |
| - | 19,968 ns | 123 mW |
| MLP [ |
| - | 540 ns | 1776 mW |
| MLP [ |
| 100 MHz | 270 ns | 240 mW |
| MLP [ |
| 577 MHz | 372 mW | |
| MLP [ |
| 577 MHz | 216 mW | |
| PCA(DT) [ |
| - | 795 ns | - |
| PCA(DT) [ |
| - | 746 ns | - |
| PCA(KNN) [ |
| - | 3073 ns | - |
| DNN [ |
| 200 MHz | - | 168 mW |
| DNN [ |
| 200 MHz | - | 304 mW |
| DNN [ |
| 200 MHz | - | 379 mW |
| DNN [ |
| 400 MHz | - | 386 mW |
| YOLO (Tiny) [ |
| 3.07 GHz | 1.12 s | - |
| YOLO (Tiny) [ |
| Upto 1 GHz | 36.92 s | - |
| YOLO (GoogLeNet) [ |
| 3.07 GHz | 13.54 s | - |
| YOLO (GoogLeNet) [ |
| Upto 1 GHz | 0.744 s | - |
| Faster RCNN (VGG16) [ |
| Upto 1 GHz | Failed | - |
| YOLO (Tiny) [ |
| 1531 MHz | 0.0037 s | 178 W |
| SVM [ |
| 1.124 MHz | 0.03 ms | - |
| SVM [ |
| 0.852 MHz | 1.23 ms | - |
| Markov Chain [ |
| 1600 MHz | 33.4 ns | - |
| Markov Chain [ |
| 1.25 GHz | Na | - |
| Markov Chain [ |
| 2500 MHz | Na | - |
| Faster RCNN (ZF) [ |
| 3.07 GHz | 2.547 s | - |
| Faster RCNN (ZF) [ |
| 1 t GHz | 71.53 s | - |
| Faster RCNN (ZF) [ |
| 0.2 GHz | - | - |
| CNN Size 2.74 GMAC [ |
| 150 MHz | - | - |
| YOLO (GoogLeNet) [ |
| 1531 MHz | 0.010 s | 230 W |
| Faster RCNN (ZF) [ |
| 1531 MHz | 0.043 s | 69 W |
| F-RCNN (VGG16) [ |
| 1531 MHz | 0.062 s | 81 W |
| MLP [ |
| - | 540 ns | 1.556 W |
| STFT and MLP [ |
| 25.237 MHz | - | 0.123 W |
| STFT and MLP [ |
| 27.889 MHz | - | 3.456 W |
| ANN [ |
| 5.332 MHz | - | |
| SVM [ |
| 875 MHz | 1.23 ms | - |
| SVM [ |
| 1480 MHz | 0.047 ms | - |
| TABLA [ |
| 852 MHz | - | 5 W |
Frequently used datasets.
| Applications | Name | Description | Type | Size and Resolution | Paper |
|---|---|---|---|---|---|
|
| - Includes a ROI and the | Videos | - It is a 2000-frames video dataset from | [ | |
| People |
| - Dataset contains images | Images | - The counts of persons | [ |
|
| - consists of 1244 images, | Images | - It is a 2000-frames video dataset from | [ |
Comparison between methods used in application.
| Application | Method Used | Advantages | Disadvantages |
|---|---|---|---|
| People | Conventional | - Multiple features | - Count can be performed |
| Machine Learning [ | - Count in heavily | - Difficult in low resolution cameras | |
| Deep Learning [ | - Novel loss function | - Time consuming when coming to |
Frequently used datasets.
| Applications | Name | Description | Type | Size and Resolution | Paper |
|---|---|---|---|---|---|
| Age and |
| - 19,906 images in the training set | Images | - Size: 48 MB (Compressed) | [ |
|
| - Color images of faces at various | Images | - different quality cameras | [ | |
|
| - 21,000 frontal face images | Images | - 500–600 per class | [ | |
|
| - face images from 20,284 celebrities | Images | - 523,051 Images | [ |
Comparison between methods used in application.
| Application | Method Used | Advantages | Disadvantages |
|---|---|---|---|
| Age and Gender Estimation | Conventional Techniques [ | - Fast calculation speed | - Large feature dimension |
| Machine Learning [ | - Capture low redundancy colors | - Works in unsupervised way | |
| Deep Learning [ | - the age estimation task is split into several comparative stages | - Time consuming when coming to training |
Frequently used datasets.
| Applications | Name | Description | Type | Size and Resolution | Paper |
|---|---|---|---|---|---|
|
| - The videos are captured | Videos | - Contains 16 training | [ | |
|
| - Scenes are taken from inside | Videos | - 1st scene consists of 1450 frames | [ | |
|
| - Videos from 20 different TV | Videos | - 6766 video clips | [ | |
| Action Recognition |
| - 500,000 videos with 600 | Videos | - 10 s duration | [ |
|
| - One million labeled 3 s | Videos | - 399 classes | [ | |
|
| -520K untrimmed videos | Videos | - an average length of 2.6 min- | [ | |
|
| - a large collection of densely- | Videos | - 220,847 videos | [ | |
|
| - A dataset which guides our research | Videos | - 157 action classes | [ | |
|
| - a wide range of complex human | Videos | - 200 classes | [ | |
|
| - diversity in terms of actions and with | Videos | - action categories can be divided | [ |
Comparison between methods used in application.
| Application | Method Used | Advantages | Disadvantages |
|---|---|---|---|
| Action | Trajectory analysis [ | - Efficient in non-jammed scenes | - Can’t detect irregular |
| Deep Learning: | - Works well in understanding | - Both normal and abnormal |
Frequently used datasets.
| Fire |
| - It is composed by 149 videos | Videos | - Contains smoke and fire | [ |
|
| - Seven smoke videos and | Videos | - cover indoor and outdoor with | [ | |
|
| - Early fire and smoke detection based | Videos | [ |
Comparison between methods used in application.
| Fire and | Conventional | - Train the model to classify regions | - Low Accuracy |
| Machine Learning [ | - Classification of fire/smoke pixels | - Detection delay of | |
| Deep Learning [ | - adaptive background subtraction | - False alarms |
Frequently used datasets.
| Vehicle |
| From the Beijing Laboratory of | Images | - six categories by vehicle type: | [ |
|
| - 3425 rear-angle images of | Images | - 360 × 256 pixels recorded in highways | [ |
Comparison between methods used in application.
| Application | Method Used | Advantages | Disadvantages |
|---|---|---|---|
| Vehicle Detection, | Conventional | - Motion detection problem | - Background noise |
| Machine Learning [ | - Provides better perf- | - In multiple objects moving | |
| Deep Learning [ | - Detect small objects | - Restricted with the functionality |