| Literature DB >> 36196269 |
Iram Javed1, Muhammad Atif Butt2, Samina Khalid1, Tehmina Shehryar3, Rashid Amin4, Adeel Muzaffar Syed5, Marium Sadiq1.
Abstract
Coronavirus triggers several respirational infections such as sneezing, coughing, and pneumonia, which transmit humans to humans through airborne droplets. According to the guidelines of the World Health Organization, the spread of COVID-19 can be mitigated by avoiding public interactions in proximity and following standard operating procedures (SOPs) including wearing a face mask and maintaining social distancing in schools, shopping malls, and crowded areas. However, enforcing the adaptation of these SOPs on a larger scale is still a challenging task. With the emergence of deep learning-based visual object detection networks, numerous methods have been proposed to perform face mask detection on public spots. However, these methods require a huge amount of data to ensure robustness in real-time applications. Also, to the best of our knowledge, there is no standard outdoor surveillance-based dataset available to ensure the efficacy of face mask detection and social distancing methods in public spots. To this end, we present a large-scale dataset comprising of 10,000 outdoor images categorized into a binary class labeling i.e., face mask, and non-face masked people to accelerate the development of automated face mask detection and social distance measurement on public spots. Alongside, we also present an end-to-end pipeline to perform real-time face mask detection and social distance measurement in an outdoor environment. Initially, existing state-of-the-art single and multi-stage object detection networks are fine-tuned on the proposed dataset to evaluate their performance in terms of accuracy and inference time. Based on better performance, YOLO-v3 architecture is further optimized by tuning its feature extraction and region proposal generation layers to improve the performance in real-time applications. Our results indicate that the presented pipeline performed better than the baseline version, showing an improvement of 5.3% in terms of accuracy.Entities:
Keywords: Coronavirus; Face mask detection; Single and multi-stage detectors; Social distance measurement
Year: 2022 PMID: 36196269 PMCID: PMC9522539 DOI: 10.1007/s11042-022-13913-w
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.577
An Overview of Existing Machine Learning Methods Used for Face Mask Detection and Recognition Tasks
| Author | Methods | Dataset | Accuracy | Limitation |
|---|---|---|---|---|
| Roy et al. [ | MOXA included YOLO-v3, Tiny YOLO-v3, SSD and Faster R-CNN | Kaggle’s medical masks dataset — 3000 images | YOLO-v3: 63.99% mAP, Tiny YOLOv3: 56.27% mAP, SSD: 46.52% mAP, and F-RCNN: 60.5% mAP | Unmanned approach MOXA requires improvement including more innovative object detectors |
| Nagrath et al.[ | Single shot multibox object detection model and MobileNetV2 | Kaggle’s medical masks and PyImage search dataset contains 1,376 images | The SSDMNV2 model attained 92.64% accuracy | SSDMNV2 was trained on artificially produced images, still not tested in real situations as well as with real-time CCTV |
| Hussain et al.[ | Transfer learning with CNN, VGG-16, MobileNetV2, ResNet-50, Inceptionv3 | MAFA dataset, Masked Face-Net, and Bing dataset | Using VGG-16 achieved 99.81% accuracy and with MobileNetV2 attained 99.6% accuracy | Online accessible dataset contain noisy and construct by artificially, which is not suitable for real time system |
| Snyder et al.[ | ResNet-50 with FPN and Multi-Task CNN | MCelebFaces Attributes, Microsoft Common Objects in Context, WIDER FACE dataset and Custom Mask Community Dataset | 87.7% detection accuracy | Incorrectly identify faces with mask and without mask |
| Kodali et al.[ | CNN model | Kaggle dataset with 853 images | 96% detection accuracy | Incorrectly identify faces with mask and without mask |
| Sagayam et al.[ | OpenCV and MobileNet-V2 used to detect face mask | Kaggle’s medical masks and PyImage search | 99% accuracy achieved by MobieNet-V2 | Trained on limited dataset which is not perform well in real time situation |
| Degadwala et al.[ | YOLO-v4 | MAFA and WIDER-FACE dataset | 99.98% accuracy obtained by YOLO-v4 | Have need of more computational power and require 30FPS camera resolution rate |
| Taneja et al.[ | MobileNet-V2 lightweight CNN model used to detect face mask | Medical Masks Dataset and the Face Mask Dataset | 99.98% accuracy | Performance of MobileNet-V2 is not accurate as compared to Faster R-CNN and Inception-V2 |
| Chadav et al.[ | Multi-stage CNN model | Kaggle with 853 images | 98% accuracy | Dual-stage CNN model do not detect side views of the face |
| Bhuiyan et al.[ | YOLO-v3 | Google colab datasets having 650 images | 96% accuracy | Limited dataset used and cannot test on real time condition |
| Ejaz et al.[ | Using Principal Component Analysis (PCA) recognition faces with masks and without the mask | ORL face dataset is used for masked faces containing 500 images | Attain accuracy for face mask is 72% and without the mask is 95% | PCA gave poor results in mask face, only front side face images use for the dataset |
| Qin et al.[ | Classification of facial image with SRCNet and automatic identifies faces wear with mask | Medical Masks dataset having 3835 images | Acquire 98.70% accuracy with image super resolution classification network (SRCNet) | Use of limited number of images |
| Jiang et al.[ | Proposed SSD to classify face with FPN | 7959 images collected from internet | Without mask: 89.6% precision, with mask: 91.9% precision | Do not differentiate between mask and unmask face properly |
| Rahman et al.[ | Facial mask detection in smart city through CCTV | 1539 images are collected from different sources | 98.7% Achieve accuracy | Confuse system with a hand covered face |
| Punn et al.[ | YOLO-v3 used to monitoring real time social distance | 800 images taken from OID dataset | YOLO v3 with deep sort acquire better result as compared to FPS | Privacy issue, do not record violations |
| Yang et al.[ | Faster R-CNN and YOLOv4 detects real time social distance and critical density | Taken 12300 images from MS-COCO dataset | Accuracy and performance are good to monitor social distance | Do not record data, crowd analysis still a challenge |
| Militante et al.[ | Single shot detector used to detect face mask and physical distance with alarm system | 20000 images collected from web | accuracy rate of 97% | Do not detect face mask and distance at the same time |
| Yadav et al.[ | face mask and social distance detection and generate an alert signal with SSD | used custom dataset of 3165 images | obtain accuracy 85% and 95% | N/A |
Fig. 1The Proposed Pipeline For Developing Face Mask Detection And Social Distance Measurement in Public Places
Fig. 2Sample Images From Our M UST F ace D ataset
Evaluation of existing state-of-the-art object detection networks on proposed dataset
| Method | Mean Accuracy | mAP | mAP @ 0.95 | Inf.Time (ms) |
|---|---|---|---|---|
| YOLO-V3 | 64.1% | 59.6% | 53.1% | 28 |
| SSD | 61.8% | 56.2% | 48.6% | 34 |
| RETINA-NET 50 | 55.2% | 51.9% | 44.7% | 37 |
| RETINA-NET 101 | 51.0% | 46.3% | 41.8% | 39 |
| FAST-RCNN | 41.7% | 39.4% | 37.1% | 132 |
| FASTER-RCNN (FPN) | 47.3% | 44.0% | 41.5% | 119 |
| FASTER-RCNN (ResNet-50) | 59.0% | 57.4% | 55.6% | 108 |
| FASTER-RCNN (ResNet-101) | 62.7% | 61.3% | 59.0% | 98 |
Evaluation of improved YOLO-v3 on proposed dataset
| Method | Mean Accuracy | mAP | mAP @ 0.95 | Inf. Time |
|---|---|---|---|---|
| Existing | 64.1% | 59.6% | 53.1% | 28 ms |
| YOLO-V3 | ||||
| Proposed | 69.4% | 64.7% | 62.0% | 25 ms |
| YOLO-V3 |
Fig. 3Qualitative examples of our masked/non-masked face detection method on our face mask dataset
Results of proposed distance measurement methods
| Sr. No. | Ground Truth (ft) | Predictions (ft) | RMSE |
|---|---|---|---|
| Distance 1 | 2.44 | 2.37 | 0.035 |
| Distance 2 | 2.99 | 2.95 | 0.020 |
| Distance 3 | 3.16 | 3.10 | 0.030 |