| Literature DB >> 35684612 |
Pascal Schneider1, Yuriy Anisimov1, Raisul Islam1, Bruno Mirbach1, Jason Rambach1, Didier Stricker1, Frédéric Grandidier2.
Abstract
We present TIMo (Time-of-flight Indoor Monitoring), a dataset for video-based monitoring of indoor spaces captured using a time-of-flight (ToF) camera. The resulting depth videos feature people performing a set of different predefined actions, for which we provide detailed annotations. Person detection for people counting and anomaly detection are the two targeted applications. Most existing surveillance video datasets provide either grayscale or RGB videos. Depth information, on the other hand, is still a rarity in this class of datasets in spite of being popular and much more common in other research fields within computer vision. Our dataset addresses this gap in the landscape of surveillance video datasets. The recordings took place at two different locations with the ToF camera set up either in a top-down or a tilted perspective on the scene. Moreover, we provide experimental evaluation results from baseline algorithms.Entities:
Keywords: anomaly detection; dataset; deep learning; depth imaging; machine learning; neural networks; person detection; time-of-flight
Mesh:
Year: 2022 PMID: 35684612 PMCID: PMC9182984 DOI: 10.3390/s22113992
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Example frames from our dataset. (Top): a scene from tilted view, (Bottom): a scene from top-down view. (Left): IR image, (Right): Depth image with depth encoded as color.
Comparison of related datasets to our dataset. The 3D joints refers to joints of the human body such as they are used in human pose estimation.
| Dataset | Year | # Sequences | Data Modalities | Camera Hardware | Annotations | Environment |
|---|---|---|---|---|---|---|
| TIMo Anomaly Detection (ours) | 2021 | 1588 | IR, Depth | MS Kinect | Anomaly Frames | Indoor |
| TIMo Person Detection (ours) | 2021 | 243 | IR, Depth | MS Kinect | 2D/3D Object BBox, | Indoor |
| ShanghaiTech Campus [ | 2018 | 437 | RGB | RGB | Anomaly Frames, | Outdoor |
| UTD-MHAD [ | 2015 | 861 | RGB, Depth, | MS Kinect v1 | Action Classes | Indoor |
| NTU-RGB+D 120 [ | 2019 | 114 K | RGB, Depth, | MS Kinect v2 | Action Classes | Indoor |
| UCF-Crime [ | 2018 | 1900 | RGB | RGB | Anomaly Frames | Indoor + |
| TiCAM (Real) [ | 2021 | 533 | RGB, IR, Depth | MS Kinect | 2D/3D Object BBox, | Car Cabin |
| DAD [ | 2020 | 386 | IR, Depth | CamBoard | Anomaly Frames | Car Cabin |
| CUHK Avenue [ | 2013 | 37 | RGB | RGB | Anomaly Frames, | Outdoor |
| UCSD Ped 1 + 2 [ | 2010 | 70 + 28 | Grayscale | Grayscale | Anomaly Frames, | Outdoor |
| Subway Exit + Entrance [ | 2008 | 1 + 1 | Grayscale | Grayscale | Anomaly Frames, | Subway |
| IITB-Corridor [ | 2020 | 368 | RGB | RGB | Anomaly Frames | Outdoor |
Figure 2Recording setups used for capturing the dataset. The position of the Azure Kinect is marked with a red square. (a) Camera mounting for tilted-view in Scene 1. (b) Setup for top-down view in Scene 2 (camera not installed yet). (c) Setup for top-down view in Scene 1. (d) Entrances in in Scene 1. Subjects were told to use specific entrances during recording (e.g. enter at A and leave through D).
Figure 3Example frames of anomalies. Top row: RGB, bottom row: Infrared. (Please note that the RGB data modality is only used for visualization here and not provided in the dataset).
Figure 4Visualization of annotations for the person detection/people counting dataset, generated by [40].
Data statistics of the anomaly dataset’s train and test split. The train split does not contain anomalies since the split was made for usage with unsupervised methods. Note that some choreographies are used in both the tilted view as well as the top-down view, so the total number of unique choreographies is less than the sum from the configurations.
| TIMo Anomaly Dataset–Train Split | ||||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
| |||||
|
|
|
|
|
|
|
|
| |
| Tilted View | 285 | 0 | 285 | 185,620 | 0 | 185,620 | 31 | 0 |
| Top-down View | 624 | 0 | 624 | 180,359 | 0 | 180,359 | 19 | 0 |
| Total | 909 | 0 | 909 | 365,979 | 0 | 365,979 | 36 | 0 |
|
| ||||||||
|
|
|
|
| |||||
|
|
|
|
|
|
|
|
| |
| Tilted View | 31 | 151 | 182 | 66,508 | 25,617 | 92,125 | 29 | 20 |
| Top-down View | 79 | 418 | 497 | 104,165 | 49,528 | 153,693 | 18 | 12 |
| Total | 110 | 569 | 679 | 170,673 | 75,145 | 245,818 | 34 | 22 |
Data statistics of the TIMo person detection dataset.
| TIMo Person Detection Dataset | |||
|---|---|---|---|
|
|
|
|
|
| Training | 125 | 6415 | 8501 |
| Complex Training | 34 | 7675 | 8186 |
| Total | 159 | 14,090 | 16,687 |
| Testing | 72 | 5089 | 6129 |
| Complex Testing | 12 | 3533 | 4971 |
| Total | 84 | 8622 | 11,000 |
Results of our anomaly detection baseline algorithm measured as the relative area under the ROC curve (AUROC). The tilted view data was recorded at Scene 1 and the top-down view data at Scene 2.
| Anomaly Detection Dataset | ||
|---|---|---|
|
| |
|
| Tilted View | 66.4% | 62.8% |
| Top-down View | 56.4% | 62.2% |
Results of person detection on Mask R-CNN [42] and YOLACT [43].
| Person Detection Dataset | ||
|---|---|---|
|
|
|
|
| Mask R-CNN | 92.9 % | 92.8 % |
| YOLACT | 88.6 % | 93.0 % |