| Literature DB >> 34171485 |
Shilpa Sethi1, Mamta Kathuria2, Trilok Kaushik3.
Abstract
Effective strategies to restrain COVID-19 pandemic need high attention to mitigate negatively impacted communal health and global economy, with the brim-full horizon yet to unfold. In the absence of effective antiviral and limited medical resources, many measures are recommended by WHO to control the infection rate and avoid exhausting the limited medical resources. Wearing a mask is among the non-pharmaceutical intervention measures that can be used to cut the primary source of SARS-CoV2 droplets expelled by an infected individual. Regardless of discourse on medical resources and diversities in masks, all countries are mandating coverings over the nose and mouth in public. To contribute towards communal health, this paper aims to devise a highly accurate and real-time technique that can efficiently detect non-mask faces in public and thus, enforcing to wear mask. The proposed technique is ensemble of one-stage and two-stage detectors to achieve low inference time and high accuracy. We start with ResNet50 as a baseline and applied the concept of transfer learning to fuse high-level semantic information in multiple feature maps. In addition, we also propose a bounding box transformation to improve localization performance during mask detection. The experiment is conducted with three popular baseline models viz. ResNet50, AlexNet and MobileNet. We explored the possibility of these models to plug-in with the proposed model so that highly accurate results can be achieved in less inference time. It is observed that the proposed technique achieves high accuracy (98.2%) when implemented with ResNet50. Besides, the proposed model generates 11.07% and 6.44% higher precision and recall in mask detection when compared to the recent public baseline model published as RetinaFaceMask detector. The outstanding performance of the proposed model is highly suitable for video surveillance devices.Entities:
Keywords: COVID-19; Face mask detection; Object deletion; One-stage detector; Transfer learning; Two-stage detector
Mesh:
Substances:
Year: 2021 PMID: 34171485 PMCID: PMC8223067 DOI: 10.1016/j.jbi.2021.103848
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 8.000
Fig. 1Various Pre-trained Models based on CNN Architectures.
Different Categories of Datasets.
| Type of Datasets | Dataset | Scale | #Faces | #masked face images | Occlusion |
|---|---|---|---|---|---|
| Masked face detection Datasets | FDDB | 2845 | 5171 | – | – |
| MALF | 5250 | 11931 | – | ✓ | |
| calebA | 200000 | 202599 | – | – | |
| WIDERFACE | 32203 | 194000 | – | ✓ | |
| Face masked datasets | MAFA | 30811 | 37824 | 35806 | ✓ |
| RMFRD | 95000 | 9200 | 5000 | ✓ | |
| SMFRD | 85000 | 5000 | 5000 | ✓ | |
| MFDD | 500000 | 500000 | 24771 | ✓ | |
Fig. 2Proposed Architecture.
Fig. 3Fine-tuning of ResNet50.
Fig. 4Variety of Occlusions Present in Dataset.
Comparison between MobileNet-SSD, ResNet50 and Their Various Combinations based on Random vs. Hard/Soft Complexity of Test Data.
| Comparison Parameters | MobileNet-SSD to ResNet50 (Left to Right) | ||||
|---|---|---|---|---|---|
| 100–0% | 75–25% | 50–50% | 25–75% | 0–100% | |
| Random split (mAP) | 0.8868 | 0.9095 | 0.9331 | 0.9650 | 0.9899 |
| Image complexity prediction time (ms) | – | 0.05 | 0.05 | 0.05 | – |
| Total Computation Time (ms) | 0.05 | 1.97 | 3.13 | 5.12 | 6.02 |
Fig. 5Affine Transformation for localizing the face with no mask.
Fig. 6Confusion Matrix Obtained for Various Pre-trained Models.
Fig. 7Comparison of Various Models on Different Performance Criteria.
Fig. 8Correlation between Ground Truth Visual Difficulty Score and Predicted Image Complexity Score.
Fig. 9Identity Detection of Faces violating Mask Norms.
Comparison of Proposed model with Recent face mask detection Model
| Model | Face Detection | Mask Detection | ||
|---|---|---|---|---|
| Precision (%) | Recall (%) | Precision (%) | Recall (%) | |
| RetinaFaceMask based on MobileNet | 83.0 | 95.6 | 82.3 | 89.1 |
| RetinaFaceMask based on ResNet | 91.9 | 96.3 | 93.4 | 94.5 |
| Proposed model based on ResNet50 | 99.2 | 99.0 | 98.92 | 98.24 |
Fig. 10Training and Testing Accuracy over 60 Epochs.