| Literature DB >> 31861734 |
Naveed Ilyas1, Ahsan Shahzad2, Kiseon Kim1.
Abstract
Traditional handcrafted crowd-counting techniques in an image are currently transformed via machine-learning and artificial-intelligence techniques into intelligent crowd-counting techniques. This paradigm shift offers many advanced features in terms of adaptive monitoring and the control of dynamic crowd gatherings. Adaptive monitoring, identification/recognition, and the management of diverse crowd gatherings can improve many crowd-management-related tasks in terms of efficiency, capacity, reliability, and safety. Despite many challenges, such as occlusion, clutter, and irregular object distribution and nonuniform object scale, convolutional neural networks are a promising technology for intelligent image crowd counting and analysis. In this article, we review, categorize, analyze (limitations and distinctive features), and provide a detailed performance evaluation of the latest convolutional-neural-network-based crowd-counting techniques. We also highlight the potential applications of convolutional-neural-network-based crowd-counting techniques. Finally, we conclude this article by presenting our key observations, providing strong foundation for future research directions while designing convolutional-neural-network-based crowd-counting techniques. Further, the article discusses new advancements toward understanding crowd counting in smart cities using the Internet of Things (IoT).Entities:
Keywords: crowd analysis; deep learning; smart cities
Year: 2019 PMID: 31861734 PMCID: PMC6983207 DOI: 10.3390/s20010043
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Categorization of crowd-counting techniques.
Figure 2Unique challenges of convolutional-neural-network (CNN) crowd counting (CC) techniques in an image.
Figure 3General form of CNN-CC algorithm. Crowd-counting mechanism starts from object annotation in an image to density estimation; object counting is depicted. General framework of crowd counting (top), and CNN working is expanded (bottom).
Figure 4Categorization of CNN-CC techniques.
Summary of different crowd-counting datasets with their intrinsic features.
| Dataets | USCD [ | Mall [ | UCF [ | WE [ | STA [ | STB [ |
|---|---|---|---|---|---|---|
|
| 2000 | 2000 | 50 | 3980 | 482 | 716 |
|
| 158 × 238 | 320 × 240 | Varied | 576 × 720 | Varied | 768 × 1024 |
|
| 11 | 13 | 94 | 1 | 33 | 9 |
|
| 25 | - | 1279 | 50 | 501 | 123 |
|
| 46 | 53 | 4543 | 253 | 3139 | 578 |
|
| 49,885 | 62,325 | 63,974 | 199,923 | 241,677 | 88,488 |
|
| Collected from video camera, ground-truth annotation, low-density dataset, no perspective variation | Collected from surveillance camera, diverse illumination condition; compared to USCD, it has higher density, no scene-perspective variations | Collected from various places like concerts, marathons, diverse scenes with wide range of densities, challenging datasets as compared to USCD and Mall | Specific for cross-scene crowd-counting large diversity, but limited as compared to UCF, not dense as compared to UCF, more images | Chosen from Internet, large scale, largest in terms of number of annotated people, large density as compared to (B), diverse scenes, and varying densities | Collected from Shanghai, varying scale and perspective, nonuniform density level in many images, making it tilt towards the low-density level |
Figure 5Architectures of different subcategories: (a) basic-CNN-CC, (b) context-aware CNN-CC techniques (context-CNN-CC), (c) patch-based-CNN-CC, (d) scale-aware CNN-CC techniques (scale-CNN-CC), (e) multitask-CNN-CC, (f) whole-image-CNN-CC, (g) aerial-view-CNN-CC, and (h) perspective-CNN-CC.
Summary of advantages and limitations of basic-CNN-CC algorithms.
| Technique | Features | Datasets | Negative Samples | Data Driven | Architecture | ||
|---|---|---|---|---|---|---|---|
| Yes | No | Yes | No | ||||
| Fu et al. [ | Real-time approach | PETS_2009, Subway video, | ✓ | ✓ | ConvNets | ||
| Mundhenk et al. [ | Contextual information, | Cars Overhead with Context | ✓ | ✓ | AlexNet, | ||
| Wang et al. [ | End-to-end deep CNN | UCF | ✓ | ✓ | FCN | ||
| Zhao et al. [ | Joint learning of crowd | USCD, [LHI, TS, CNN] * | ✓ | ✓ | FlowNet | ||
| Hu et al. [ | Two supervisory signals: | UCF, USCD | ✓ | ✓ | ConvNets | ||
| Walach et al. [ | Gradient boosting and selective | UCF, USCD, [Bacterial Cell, Make 3D] * | ✓ | ✓ | Boosting Net | ||
* Private datasets.
Summary of advantages and limitations of Context-CNN-CC algorithms.
| Technique | Features | Datasets | Negative | Data | Architecture | ||
|---|---|---|---|---|---|---|---|
| Yes | No | Yes | No | ||||
| Chattopadhyay et al. [ | Associative subitizing | PASCAL VOC, COCO | ConvNet | ||||
| Zhang et al. [ | Attention model for head detection | UCF, STA, STB | AM-CNN | ||||
| Li et al. [ | Dilated convolution and | UCF, STA, STB, WE | CSRNet | ||||
| Han et al. [ | Combination of correlation and MRF | UCF | ResNet | ||||
| Wang et al. [ | Density adaption network | ST, UCF | DAN, LCN, HCN | ||||
| Liu et al. [ | Spatially aware network | ST, UCF, WE | Local Refinement | ||||
Summary of advantages and limitations of scale-CNN-CC algorithms.
| Technique | Features | Datasets | Negative | Data | Architecture | ||
|---|---|---|---|---|---|---|---|
| Yes | No | Yes | No | ||||
| Liu et al. [ | Geometry-aware crowd counting | ST, WE, Venice | ✓ | ✓ | Siamese | ||
| Huang et al. [ | Exploits cross-scale similarity | ST, WE | ✓ | ✓ | Wide and Deep | ||
| Kang et al. [ | Image pyramid to deal with scale variation | ST, WE, USCD | ✓ | ✓ | VGG network | ||
| Boominathan et al. [ | Combination of deep and shallow networks | UCF | ✓ | ✓ | VGG-16 | ||
| Zeng et al. [ | Single multiscale column | ST, UCF | ✓ | ✓ | Inception | ||
| Kumagai et al. [ | Integration of multiple | UCF, Mall | ✓ | ✓ | MoC-CNN | ||
| Onoro-Rubio et al. [ | CCNN for mapping the appearance of | UCF, USCD, | ✓ | ✓ | CCNN, Hydra | ||
| Shi et al. [ | Dynamic data-augmentation strategy, NetVLAD | ST, UCF, WE | ✓ | ✓ | VGG-like net | ||
| Cao et al. [ | Multi-scale feature extraction with | UCF, STA, STB, USCD | ✓ | ✓ | SANet | ||
| Shen et al. [ | GANs-based network, novel regularizer | ST, UCF, USCD | ✓ | ✓ | ACSCP | ||
Summary of advantages and limitations of multitask-CNN-CC algorithms.
| Technique | Features | Datasets | Negative | Data | Architecture | ||
|---|---|---|---|---|---|---|---|
| Yes | No | Yes | No | ||||
| Arteta et al. [ | Multitasking: foreground and background | Penguins dataset | ✓ | ✓ | ConvNet | ||
| Idrees et al. [ | Multitasking with loss optimization | UCF-QNRF | ✓ | ✓ | DenseNet | ||
| Zhu et al. [ | Combination of pedestrian flow | UCF, | ✓ | ✓ | VGGNet-16 | ||
| Huang et al. [ | Body structure-aware methods | STB, UCF, USCD | ✓ | ✓ | Multi-column body-part | ||
| Yang et al. [ | Multicolumn multitask CNN focusing | ST, UCF, USCD, | ✓ | ✓ | MMCNN | ||
| Liu et al. [ | Self-supervised tasking | UCF, STA, STB | ✓ | ✓ | VGG-16 | ||
* Private Datasets.
Summary of advantages and limitations of aerial-view-CNN-CC algorithms.
| Technique | Features | Datasets * | Negative | Data | Architecture | ||
|---|---|---|---|---|---|---|---|
| Yes | No | Yes | No | ||||
| Khan et al. [ | Automatic approach to select a region | Time-lapse image | ✓ | ✓ | Architecture of | ||
| Ribera et al. [ | Plants are estimated by using the | RGB UAV images of | ✓ | ✓ | Inception-v2 | ||
| Hernnandez et al. [ | Feature pyramid network | BBBC005 | ✓ | ✓ | VGG-Style NN | ||
| Xie et al. [ | Two convolutional regression networks | RPE, T and LBL cells | ✓ | ✓ | VGG-net | ||
* Private Datasets.
Summary of advantages and limitations of Perspective-CNN-CC algorithms.
| Technique | Features | Datasets | Negative | Data | Architecture | ||
|---|---|---|---|---|---|---|---|
| Yes | No | Yes | No | ||||
| Kang et al. [ | Incorporating side information | USCD | ✓ | ✓ | ACNN | ||
| Zhao et al. [ | Perspective embedded deconvolution network | WE | ✓ | ✓ | PE-CFCN-DCN | ||
| Marsden et al. [ | Multidomain patch-based regressor | ST, Penguin, Dublin cell * | ✓ | ✓ | VGG16 | ||
| Zhang et al. [ | Cross scene crowd counting, human body | UCF | ✓ | ✓ | Crowd CNN model | ||
| Shi et al. [ | Perspective-aware weighting layer | UCF, WE, STA, STB | ✓ | ✓ | PACNN | ||
| Yao et al. [ | General model based on CNN and LSTM | ST, UCF, WE | ✓ | ✓ | DSRM with ResNet | ||
* Private Datasets.
Summary of advantages and limitations of patch-based-CNN-CC algorithms.
| Technique | Features | Datasets | Negative | Data | Architecture | ||
|---|---|---|---|---|---|---|---|
| Yes | No | Yes | No | ||||
| Cohen et al. [ | Smaller network used for | [VGG, MBM] * | ✓ | ✓ | Count-ception | ||
| Liu et al. [ | Detection and density-estimation network | Mall, STB, WE | ✓ | ✓ | DecideNet | ||
| Onro-Rubio et al. [ | Joint feature extraction and pixelwise | ST, USCD, TRANSCOS | ✓ | ✓ | GU-Net | ||
| Xu et al. [ | Depth-information-based method | STB, Mall, ZZU-CIISR | ✓ | ✓ | Multi-scale | ||
| Shami et al. [ | Head-detector-based crowd-estimation method | ST, UCF | ✓ | ✓ | ImagNet | ||
| Zhang et al. [ | Aggregated framework | UCF, AHU-CROWD | ✓ | ✓ | count-net | ||
| Zhang et al. [ | Multicolumn CNN with varying receptive fields | ST, UCF | ✓ | ✓ | MCNN | ||
| Wang et al. [ | Skip-connection CNN with scale-related training | ST, UCF | ✓ | ✓ | SCNN | ||
| Sam et al. [ | Switch CNN multidomain patch-based regressor | ST, UCF, WE | ✓ | ✓ | Switch CNN | ||
* Private Datasets.
Summary of advantages and limitations of whole-image-CNN-CC algorithms.
| Technique | Features | Datasets | Negative | Data | Architecture | ||
|---|---|---|---|---|---|---|---|
| Yes | No | Yes | No | ||||
| Rahnmonfar et al. [ | Simulated learning, and synthetic data for training, | Fruit dataset * | ✓ | ✓ | Inception-ResNet | ||
| Sheng et al. [ | Pixel-level semantic-feature map, | USCD, | ✓ | ✓ | Semantic-feature map | ||
| Marsden et al. [ | Simultaneous multiobjective method for violent-behavior | UCF | ✓ | ✓ | ResNetCrowd | ||
| Marsden et al. [ | Multiscale averaging to handle scale variation | ST, UCF | ✓ | ✓ | FCN | ||
| Sindagi et al. [ | Multitask end-to-end cascaded network | ST, UCF | ✓ | ✓ | Cascaded network | ||
* Private Datasets.
Figure 6Applications of crowd analysis in different fields.
Figure 7Normalized Mean Absolute Error (nMAE) of network-CNN-CC algorithms tested on different datasets: (a) basic-CNN-CC, (b) context-CNN-CC, (c) scale-CNN-CC, and (d) multitask-CNN-CC.
Figure 8nMAE of CNN-CC algorithms tested on different datasets: (a) perspective-CNN-CC, (b) patch-based-CNN-CC, and (c) whole-image-CNN-CC.