| Literature DB >> 36262152 |
Bens Pardamean1,2, Faizal Abid1, Tjeng Wawan Cenggoro2,3, Gregorius Natanael Elwirehardja1,2, Hery Harjono Muljo2,4.
Abstract
In recent years, the performance of people-counting models has been dramatically increased that they can be implemented in practical cases. However, the current models can only count all of the people captured in the inputted closed circuit television (CCTV) footage. Oftentimes, we only want to count people in a specific Region-of-Interest (RoI) in the footage. Unfortunately, simple approaches such as covering the area outside of the RoI are not applicable without degrading the performance of the models. Therefore, we developed a novel learning strategy that enables a deep-learning-based people counting model to count people only in a certain RoI. In the proposed method, the people counting model has two heads that are attached on top of a crowd counting backbone network. These two heads respectively learn to count people inside the RoI and negate the people count outside the RoI. We named this proposed method Gap Regularizer and tested it on ResNet-50, ResNet-101, CSRNet, and SFCN. The experiment results showed that Gap Regularizer can reduce the mean absolute error (MAE), root mean square error (RMSE), and grid average mean error (GAME) of ResNet-50, which is the smallest CNN model, with the highest reduction of 45.2%, 41.25%, and 46.43%, respectively. On shallow models such as the CSRNet, the regularizer can also drastically increase the SSIM by up to 248.65% in addition to reducing the MAE, RMSE, and GAME. The Gap Regularizer can also improve the performance of SFCN which is a deep CNN model with back-end features by up to 17.22% and 10.54% compared to its standard version. Moreover, the impacts of the Gap Regularizer on these two models are also generally statistically significant (P-value < 0.05) on the MOT17-09, MOT20-02, and RHC datasets. However, it has a limitation in which it is unable to make significant impacts on deep models without back-end features such as the ResNet-101.Entities:
Keywords: Convolutional neural networks; Deep learning; People counting; Region-of-Interest
Year: 2022 PMID: 36262152 PMCID: PMC9575879 DOI: 10.7717/peerj-cs.1067
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1(A) A sample of input image for people counting model; (B) the corresponding density map as ground truth; (C) the input image overlaid by its density map.
Figure 2An illustration of the RoI problem in people counting, where the RoI is the area inside a room for the input image in Fig. 1.
The RoI is marked in green in (A), while the area outside RoI is marked in red. The simplest solution is to cover the outside area as in (B). However, this approach also removes two persons inside the room in this case, as illustrated in (C). This is to be contrasted with the correct density map as depicted in Fig. 1C.
Figure 3An illustration of ground truth where (A) is the input image for exclusively inside people/positive density map (B) and exclusively outside people/negative density map (C).
The positive Gaussian distribution in (B) is illustrated in 3D plot in (D) and the negative Gaussian distribution in (C) is illustrated in 3D plot in (E).
Figure 4Proposed model.
Figure 5(A) A sample of an input image in the MOT17-09 dataset. (B) The defined RoI of this dataset is marked in green.
Figure 6(A) A sample of an input image in the MOT20-02 dataset. (B) The defined RoI of this dataset is marked in green.
Figure 7(A) A sample of an input image in the RHC dataset. (B) The defined RoI is marked in green, while the area outside RoI is marked in red.
Figure 8Distribution of the RHC dataset.
Figure 9Data annotation process of the RHC dataset.
Figure 10Selecting σ1 and σ2 from two images of the same person.
Figure 11Examples of the generated density maps.
The larger the y-axis coordinate of the person’s head, the wider the Gaussian distribution, because its standard deviation σ is larger. However, the peak value becomes smaller because the sum of the Gaussian distribution is normalized to 1.
Figure 12The train and validation loss plot of the: (A) ResNet-50 + Gap Regularizer model, (B) ResNet-101 + Gap Regularizer model, (C) CSRNet + Gap Regularizer model, and (D) SFCN + Gap Regularizer model.
The performance of all models on the MOT17-09 dataset.
SM is the performance of the standard models and +GR is the performance of models with Gap Regularizer. Bold entries are the better performance between SM and +GR.
| Backbone networks | ResNet-50 | ResNet-101 | CSRNet | SFCN | |
|---|---|---|---|---|---|
| 0.9 | 0.9 | 0.8 | 0.8 | ||
| SM | 2.776 | 1.454 | 0.938 |
| |
| +GR |
|
|
| 1.128 | |
| MAE |
|
|
|
| |
| SM | 3.166 |
| 1.183 |
| |
| +GR |
| 1.781 |
| 1.305 | |
| RMSE |
|
|
|
| |
| SM | 3.689 | 1.943 | 1.117 | 1.359 | |
| +GR |
|
|
|
| |
| GAME(1) |
|
|
|
| |
| SM | 4.043 | 2.265 | 1.484 | 1.7 | |
| +GR |
|
|
|
| |
| GAME(2) |
|
|
|
| |
| SM | 0.675 | 0.786 | 0.559 | 0.775 | |
| +GR |
|
|
|
| |
| SSIM |
|
|
|
| |
Note:
Significant difference at P-value < 0.05.
The performance of all models on the MOT20-02 dataset.
SM is the performance of the standard models and +GR is the performance of models with Gap Regularizer. Bold entries are the better performance between SM and +GR.
| Backbone networks | ResNet-50 | ResNet-101 | CSRNet | SFCN | |
|---|---|---|---|---|---|
| 0.7 | 0.9 | 0.9 | 0.9 | ||
| SM | 5.756 | 4.015 | 0.707 |
| |
| +GR |
|
|
| 1.299 | |
| MAE |
|
|
|
| |
| SM | 6.537 |
| 0.921 |
| |
| +GR |
| 4.987 |
| 1.816 | |
| RMSE |
|
|
|
| |
| SM | 6.436 |
| 1.264 |
| |
| +GR |
| 4.477 |
| 1.872 | |
| GAME(1) |
|
|
|
| |
| SM | 8.068 |
|
|
| |
| +GR |
| 5.408 | 1.788 | 2.44 | |
| GAME(2) |
|
|
|
| |
| SM | 0.587 |
| 0.53 | 0.806 | |
| +GR |
| 0.796 |
|
| |
| SSIM |
|
|
|
| |
Note:
Significant difference at P-value < 0.05.
The performance of all models on the RHC dataset.
SM is the performance of the standard models and +GR is the performance of models with Gap Regularizer. Bold entries are the better performance between SM and +GR.
| Backbone networks | ResNet-50 | ResNet-101 | CSRNet | SFCN | |
|---|---|---|---|---|---|
| 0.7 | 0.8 | 0.7 | 0.9 | ||
| SM |
|
| 0.237 | 0.235 | |
| +GR | 0.228 | 0.228 |
|
| |
| MAE |
|
|
|
| |
| SM | 0.445 |
|
| 0.45 | |
| +GR |
| 0.436 | 0.366 |
| |
| RMSE |
|
|
|
| |
| SM | 0.347 |
| 0.367 | 0.364 | |
| +GR |
| 0.333 |
|
| |
| GAME(1) |
|
|
|
| |
| SM | 0.374 |
| 0.421 | 0.381 | |
| +GR |
| 0.35 |
|
| |
| GAME(2) |
|
|
|
| |
| SM |
| 0.9647 | 0.276 |
| |
| +GR | 0.963 |
|
| 0.963 | |
| SSIM |
|
|
|
| |
Note:
Significant difference at P-value < 0.05.
Number of parameters and depth of all backbone networks provided by the C-3-Framework.
| Backbone network | Number of parameters | Depth |
|---|---|---|
| Resnet-50 | 8,674,625 | 102 layers |
| Resnet-101 | 27,666,753 | 221 layers |
| CSRNet | 16,263,489 | 36 layers |
| SFCN | 38,596,801 | 237 layers |