| Literature DB >> 35632101 |
Lorenzo Monti1, Rita Tse2, Su-Kit Tang2, Silvia Mirri3, Giovanni Delnevo3, Vittorio Maniezzo3, Paola Salomoni3.
Abstract
Studies and systems that are aimed at the identification of the presence of people within an indoor environment and the monitoring of their activities and flows have been receiving more attention in recent years, specifically since the beginning of the COVID-19 pandemic. This paper proposes an approach for people counting that is based on the use of cameras and Raspberry Pi platforms, together with an edge-based transfer learning framework that is enriched with specific image processing strategies, with the aim of this approach being adopted in different indoor environments without the need for tailored training phases. The system was deployed on a university campus, which was chosen as the case study. The proposed system was able to work in classrooms with different characteristics. This paper reports a proposed architecture that could make the system scalable and privacy compliant and the evaluation tests that were conducted in different types of classrooms, which demonstrate the feasibility of this approach. Overall, the system was able to count the number of people in classrooms with a maximum mean absolute error of 1.23.Entities:
Keywords: Internet of Things; ambient intelligence; deep learning; occupancy detection; smart buildings; smart environments; smart sensing; transfer learning
Mesh:
Year: 2022 PMID: 35632101 PMCID: PMC9143913 DOI: 10.3390/s22103692
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1The fat client–thin server architecture that is being proposed. The server side is located at the center of the image, while the client nodes are located around the edges.
Figure 2The proposed transfer learning framework. The model, which was pre-trained using the ImageNet dataset, was fine-tuned using the CSC and filtered COCO datasets. Once deployed, the model counted the number of people who were present in the images that were captured by the Intel RealSense D415 cameras.
Figure 3Distribution of the number of people that were counted.
Figure 4Average IoU and mean AP, which were computed using the validation set during the training process.
Batch iterations over the course of the training process. For each iteration, the mean average precision (with 50% as the threshold parameter), precision, recall, F1 score and average intersection over union (with a threshold of 0.25) are reported. The best obtained results are reported in bold, corresponding to iteration 65,000.
| Batch | mAP@50% | Precision@0.25 | Recall@0.25 | F1@0.25 | AIoU@0.25 |
|---|---|---|---|---|---|
| 1000 | 17.49 | 0.61 | 0.24 | 0.34 | 41.27 |
| 5000 | 56.35 | 0.83 | 0.57 | 0.67 | 64.83 |
| 10,000 | 59.42 | 0.76 | 0.63 | 0.69 | 59.41 |
| 15,000 | 62.27 | 0.85 | 0.6 | 0.7 | 67.54 |
| 20,000 | 63.74 | 0.83 | 0.63 | 0.71 | 64.97 |
| 25,000 | 65.93 | 0.8 | 0.6 | 0.72 | 63.48 |
| 30,000 | 66.34 | 0.79 | 0.67 | 0.73 | 63.31 |
| 35,000 | 66.66 | 0.76 | 0.69 | 0.72 | 60.61 |
| 40,000 | 66.51 | 0.77 | 0.69 | 0.72 | 60.81 |
| 45,000 | 67.7 | 0.74 | 0.71 | 0.73 | 59.02 |
| 50,000 | 67.25 | 0.74 | 0.7 | 0.72 | 59.13 |
| 55,000 | 67.99 | 0.78 | 0.69 | 0.73 | 62.89 |
| 60,000 | 68.03 | 0.78 | 0.69 | 0.73 | 62.83 |
|
|
|
|
|
|
|
| 70,000 | 67.26 | 0.69 | 0.73 | 0.71 | 54.5 |
| 75,000 | 68.23 | 0.76 | 0.71 | 0.73 | 60.73 |
| 80,000 | 68 | 0.75 | 0.7 | 0.73 | 60.51 |
| 85,000 | 67.31 | 0.72 | 0.72 | 0.72 | 57.54 |
| 90,000 | 68.89 | 0.74 | 0.72 | 0.73 | 59.11 |
| 95,000 | 68.17 | 0.74 | 0.72 | 0.73 | 59.45 |
| 100,000 | 67.99 | 0.74 | 0.72 | 0.73 | 59.36 |
| 105,000 | 67.67 | 0.74 | 0.71 | 0.73 | 59.6 |
Figure 5An example of an acquired image, with the prediction and labeling of people, jackets, backpacks and chairs (two cameras; large classroom).
Performance of the system using the testing set.
| Classroom | Average Accuracy | Standard Deviation | RMSE | MAE |
|---|---|---|---|---|
|
| ||||
| 1 | 97.10% | 6.78 | 0.65 | 0.33 |
| 2 | 96.76% | 9.05 | 0.55 | 0.21 |
| 3 | 94.07% | 13.83 | 0.62 | 0.25 |
| 4 | 95.10% | 8.58 | 0.79 | 0.47 |
| 5 | 95.80% | 8.09 | 2.89 | 1.22 |
|
| ||||
| 1 | 95.58% | 6.70 | 2.23 | 1.14 |
| 2 | 93.44% | 15.17 | 1.93 | 1.23 |
| 3 | 91.00% | 14.76 | 1.78 | 1.12 |
Performance of the system in large classrooms when evaluating the images that were captured by the two cameras separately.
| Large Classrooms | ||||
|---|---|---|---|---|
|
|
| |||
|
|
|
|
|
|
| 1 | 94.73% | 7.43 | 96.42% | 5.80 |
| 2 | 93.91% | 17.51 | 92.97% | 12.47 |
| 3 | 90.42% | 15.03 | 91.57% | 14.53 |
Figure 6Critical areas for the occupancy detection system in both small and large classrooms.