| Literature DB >> 35574253 |
Najmath Ottakath1, Omar Elharrouss1, Noor Almaadeed1, Somaya Al-Maadeed1, Amr Mohamed1, Tamer Khattab2, Khalid Abualsaud1.
Abstract
The COVID-19 outbreak has extenuated the need for a monitoring system that can monitor face mask adherence and social distancing with the use of AI. With the existing video surveillance systems as base, a deep learning model is proposed for mask detection and social distance measurement. State-of-the-art object detection and recognition models such as Mask RCNN, YOLOv4, YOLOv5, and YOLOR were trained for mask detection and evaluated on the existing datasets and on a newly proposed video mask detection dataset the ViDMASK. The obtained results achieved a comparatively high mean average precision of 92.4% for YOLOR. After mask detection, the distance between people's faces is measured for high risk and low risk distance. Furthermore, the new large-scale mask dataset from videos named ViDMASK diversifies the subjects in terms of pose, environment, quality of image, and versatile subject characteristics, producing a challenging dataset. The tested models succeed in detecting the face masks with high performance on the existing dataset, MOXA. However, with the VIDMASK dataset, the performance of most models are less accurate because of the complexity of the dataset and the number of people in each scene. The link to ViDMask dataset and the base codes are available at https://github.com/ViDMask/VidMask-code.git.Entities:
Keywords: Faster Mask RCNN with Resnet backbone and FPN; Mask Video dataset; Mask detection; Social distancing; YOLOR; YOLOV4; YOLOV4-tiny; YOLOV5
Year: 2022 PMID: 35574253 PMCID: PMC9085388 DOI: 10.1016/j.displa.2022.102235
Source DB: PubMed Journal: Displays ISSN: 0141-9382 Impact factor: 3.074
Fig. 1Flowchart of the mask detection techniques and proposed dataset source.
VIDMASK dataset.
| Dataset | Number of videos | Frames | Video format | Ann mask |
|---|---|---|---|---|
| VIDMASK | 60 | 10,000+ | Mp4/avi | 50,000+ |
Fig. 3Sample of annotated ViDMASK dataset.
Fig. 4Illustration of social distance measurement.
Fig. 2Social distancing flowchart.
Precision, recall, AP and mAP evaluation with MOXA3K.
| Method | Precision | Recall | mAP @50% IOU | Mask AP | Non-mask AP |
|---|---|---|---|---|---|
| YOLOV4 416 × 416 | 0.91 | 0.85 | 68.2% | 76.31% | 60.13% |
| YOLOV4-tiny 416 × 416 | 0.73 | 0.61 | 59.06% | 66.59% | 50.50% |
| YOLOV5 416 × 416 | 0.45 | 0.65 | 65.5% | 73.3% | 57.0% |
| Mask RCNN with FPN 800 × 800 | 0.95 | 0.95 | 74.717% | 33.729% | 33.445% |
Precision, recall, AP and mAP evaluation with part of ViDMASK dataset.
| Method | Precision | Recall | mAP @50% IOU | Mask AP | Non-mask AP |
|---|---|---|---|---|---|
| YOLOV4 416 × 416 | 0.78 | 0.85 | 49.03% | 84.77% | 4.64% |
| YOLOV4-tiny 416 × 416 | 0.83 | 0.84 | 51.42% | 86% | 5.83% |
| YOLOV5 416 × 416 | 0.443 | 0.577 | 44.1% | 84.5% | 3.74% |
| Mask RCNN and FPN 800 × 800 | 0.577 | 0.386 | 47.712% | 53.153% | 2.255% |
| YOLOR 1280 × 1280 |
Fig. 8Predicted results of the deep learning models with MOXA dataset.
Comparison with state of art literature.
| Model | mAP@50-MOXA | mAP@50-VIDMASK | FPS |
|---|---|---|---|
| YOLOV3 414 × 414 | 63.99% | – | 21.2 |
| YOLOV3 608 × 608 | 66.84% | – | 10.9 |
| YOLOV3 832 × 832 | 61.73% | – | 6.9 |
| YOLOV3Tiny 414 × 414 | 56.27% | – | 138 |
| YOLOV3Tiny 608 × 608 | 55.08% | – | 72 |
| YOLOV3Tiny 832 × 832 | 56.57% | – | 46.5 |
| SSD 300 MobileNetv2 | 46.52% | – | 67.1 |
| F-RCNN 300 Inceptionv2 | 60.5% | – | 14.8 |
| 49.03% | |||
| 51.42% | |||
| 44.1% | |||
| 47.71% | |||
| – |
Fig. 6YOLOV4, training results (MoXA3k).
Fig. 7YOLOV4, training results (ViDMASK).
Fig. 9Obtained results on VIDMASK.
Fig. 5mAP, precision and recall plot for YOLOV5 (ViDMask).
Fig. 10Social Distance measurement.