| Literature DB >> 35591221 |
Karolis Ryselis1, Tomas Blažauskas1, Robertas Damaševičius1, Rytis Maskeliūnas1.
Abstract
The identification of human activities from videos is important for many applications. For such a task, three-dimensional (3D) depth images or image sequences (videos) can be used, which represent the positioning information of the objects in a 3D scene obtained from depth sensors. This paper presents a framework to create foreground-background masks from depth images for human body segmentation. The framework can be used to speed up the manual depth image annotation process with no semantics known beforehand and can apply segmentation using a performant algorithm while the user only adjusts the parameters, or corrects the automatic segmentation results, or gives it hints by drawing a boundary of the desired object. The approach has been tested using two different datasets with a human in a real-world closed environment. The solution has provided promising results in terms of reducing the manual segmentation time from the perspective of the processing time as well as the human input time.Entities:
Keywords: depth images; human body segmentation; image processing; point cloud
Mesh:
Year: 2022 PMID: 35591221 PMCID: PMC9102319 DOI: 10.3390/s22093531
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Activity diagrams: autosegmentation (left), segmentation inside the point cloud (right).
Figure 2A screenshot of implemented software solution.
Figure 3Example depth images; different colors represent different distances from the sensor.
Algorithm runtime comparison.
| Kinect, ms | RealSense, ms | Worst Case, ms | |
|---|---|---|---|
| PCL Euclidean clustering | 980 | 344,944 | 236 |
| Bounding box | 184 | 247 | 228 |
| Expanding bounding box | 16.3 | 45.7 | 145 |
Algorithm node traverse count comparison.
| Kinect, M | RealSense, M | Worst Case, M | |
|---|---|---|---|
| PCL Euclidean clustering | 193.7 | 38,099.9 | 39.8 |
| Bounding box | 39.6 | 37.1 | 21.1 |
| Expanding bounding box | 2.5 | 4.2 | 21.1 |
Figure 4Frequencies of accuracy values.
Figure 5Examples of different accuracy value frames: (a) accuracy score 11%, (b) accuracy score 12%, (c) accuracy score 16%, (d) accuracy score 96%, (e) accuracy score 97%, (f) accuracy score 100%.
Figure 6Accuracy distributions of both datasets. (a) Accuracy distribution of the first dataset by view side; (b) Accuracy distribution of the second dataset by view side and human pose.
Comparison of state-of-the-art segmentation solutions.
| Solution | Accuracy | Segments | Based on | Data Type |
|---|---|---|---|---|
| RGB–Depth–Thermal [ | 79% | Human | Random forest | RGBD + IR |
| Body part models + GC [ | 65% | Human | Geometrical + prior knowledge | RGB |
| Pictorial structures + GC [ | 58% | Human | Geometrical + prior knowledge | Depth |
| Semantic CNN [ | 65% | Any object | CNN | RGBD |
| Depth aware CNN [ | 49–61% | Any object | CNN | RGBD |
| Suggested | 24–76% | Any object | Geometrical | Depth |