| Literature DB >> 36101885 |
Ruchi Jayaswal1, Manish Dixit1.
Abstract
The whole world is suffering from a novel coronavirus, which has become an epidemic. According to a World Health Organization report, this is a communicable disease, i.e., it transfers from an infected person to a healthy person. Therefore, wearing a mask is the most important precaution to protect from COVID-19. This paper presented a deep learning-based approach to design a Face Mask Detection framework to predict whether a person is wearing a mask or not. The proposed method uses a Single Shot Multibox detector as a face detector model and a deep Inception V3 architecture (SSDIV3) to extract the pertinent features of images and discriminate them in mask and without masks labels. Optimizing the SSDIV3 approach using different modeling parameters is a genuine contribution of this work. In addition to this, the system is tested and analyzed on VGG16, VGG19, Xception, Mobilenet V2 models at different modeling parameters. Furthermore, two synthesized novel Face Mask Datasets are introduced containing diversified masks (2d_printed, 3d_printed, handkerchief, transparent, natural-looking mask appearance masks) and unmask images of humans collected in outdoor and indoor environments such as parks, homes, laboratories. The experiment outcomes demonstrate that the proposed system has achieved an accuracy of 98% on the synthesized benchmark datasets, which comparatively outperforms other state-of-art methods and datasets in a real-time environment.Entities:
Keywords: CNN models; Computer vision; Covid-19; Deep neural network; Face mask dataset; Face mask detection
Year: 2022 PMID: 36101885 PMCID: PMC9454394 DOI: 10.1007/s11042-022-13697-z
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.577
Fig. 1Possible spread of COVID-19 with or without precautions globally
Fig. 2The outline of the study
Review of the existing face mask detection techniques
| Authors | Technicalities involved | Dataset Used | Research findings | Limitations | Performance |
|---|---|---|---|---|---|
| Preeti Nagrath et al. [ | MobileNetV2, Fine tuning, SSD | Medical Face Mask Dataset and Face Mask Dataset | SSD used as a face detector and MobilenetV2 as a classifier. | More different datasets can be used to consider facial landmarks and facial part detection process. | Achieved accuracy of 92.64 and F1 score of 93% |
| Mingjie Jiang et al. [ | RetinaFaceMask and MobileNet,RetinaFaceMask and ResNet | Wider Face and Masked Faces are combined and made Face Mask Dataset. | Transfer Learning and attention mechanism are used to predict the mask images | Normal masks like 2d printed, N-95 marks are only considered. | Achieved accuracy upto 93.49%. |
| Sunil Singh et al. [ | YOLOv3 and Faster-RCNN | MAFA, Wider Face and manually collected face masks images | YOLOv3 produced better results compared with Faster-RCNN for face mask detection. | Sensitive for spatial location camera, artificial mask and two masks are considered. | Achieved precision of YOLOv3 and Faster R-CNN model is 55 and 62, respectively. |
| Susanto et al. [ | YoloV4 | Surgical and fabric face mask images | YOLOv4 is used for face mask detection | Different 3d masks and other 2d masks are not considered. | Average FPS is nearly about 11,1. |
| Shilpa Sethi et al. [ | ResNet50, AlexNet and MobileNet | Self-synthesized unbiased face mask dataset | ResNet50 is used as a proposed framework, and a transfer learning approach is applied. | It is not integrated into high-resolution video surveillance devices. | Achieved accuracy upto 98.2% |
| Arjya Das et al. [ | CNN model | Face Mask Dataset and Medical Mask Dataset | Pre-trained CNN model is applied in which two 2D convolution layers are used. | Moving images didn’t predict mask or no mask accurately. | Achieved Accuracy on first dataset 1 is 95.77 and 94.58% on dataset 2. |
| G K Jakir Hussain et al. [ | CNN model, (SVM and Symbolic Classifiers) | Taking 50 several image datasets | CNN model is applied to predict mask or no masks, further check human body temperature, and then use Arduino controller to interrupt the people. | Train the dataset tiny size dataset, which is 50 and not taken different types of masks. | Achieved accuracy using proposed CNN is 91.11% |
| Walid Hariri [ | VGG-16 model, Bag of features and Multilayer Perceptron(MLP) | Real World Masked Face Dataset | VGG-16 model is taken as feature extractor then the mapping of features is done by Bag of features. Further, the classification process is done by MLP classifier. | Mask dataset is simulated i.e. artificially created. | Achieved high accuracy of recognition of face mask images is 91.3% |
Fig. 3The schematic of the FMD System
Fig. 4Fine-tuned Inception V3 architecture
Accuracies and Training time of models by modifying Optimizers
| Accuracy (percentage) and Trained Time (miliseconds) | |||||||
|---|---|---|---|---|---|---|---|
| Various Optimizers and DL = 512 | |||||||
| Models | S.G.D | RMSProp | AdaGrad | Adam | AdaDelta | NAdam | Ftrl |
| Mobilenet_V2 | 91.45192.2 | 93.17 184.17 | 90.32 187.74 | 94.45 146.4 | 84.45 185.08 | 96.25 188.3 | 89.25 185.3 |
| VGG16 | 60.45 250.5 | 84.13 227.97 | 56.14 227.55 | 57.25 152.1 | 53.62 228.41 | 84.85 229.91 | 52.35 227.6 |
| VGG19 | 60.32 244.7 | 77.01 243.94 | 57.14 244.51 | 78.98 181.46 | 38.85 243.54 | 79.74 244.8 | 52.15 243.7 |
| Xception | 90.41 427 | 94.43 429 | 89.20 431.26 | 95.82 274.5 | 71.51 426.09 | 93.58 428.4 | 52.22 430.25 |
Fig. 5The layout of the proposed RTFMD dataset of both version
Fig. 6Shows the different types of masks used in the RTFMD-V2 dataset
Fig. 7Sample images of the RTFMD-V2 dataset
Fig. 8Shows the sample images of the RTFMD dataset
Fig. 9Instances of R.F.M.D. dataset with masked and no-mask faces
Fig. 10Sample images of Face Mask dataset with mask and without mask
Accuracies and Training time of models are shown by modifying α and m hyperparameters
| Accuracy (percentage-%) and Trained Time (miliseconds) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| S.no | Models | Learning_Rate (α) | ||||||||
| 1 | Mobilenet | 92.32 152.69 | 92.45 154.69 | 93.63 146.71 | 92.54 153.77 | 90.14 151.52 | 93.86 146.4 | 90.28 150.22 | 92.32 145.76 | 91.88 146.87 |
| 2 | VGG16 | 69.34 163.21 | 71.52 165.22 | 67.38 152.68 | 84.47 163.69 | 85.89 155.02 | 80.09 152.18 | 86.35 163.49 | 79.16 155.86 | 79.99 152.18 |
| 3 | VGG19 | 74.59 170.63 | 71.46 159 | 64.88 156.0 | 83.78 172.7 | 82.35 160.65 | 78.61 184.62 | 81.26 171.80 | 84.64 161.92 | 81.98 160.2 |
| 4 | Xception | 93.45 296.77 | 94.12 281.66 | 93.41 280.49 | 95.82 290.05 | 96.76 281.29 | 96.43 274.59 | 98.57 282.61 | 94.24 276.43 | 96.32 269.62 |
Accuracies and Training time of models by modifying A.F., D.L. and n hyperparameters
| Accuracy (%) and Trained time (ms) | ||||||
|---|---|---|---|---|---|---|
| Models | AF = Softmax, DL = 256 and n = 30 | DL = 512, AF=Sigmoid and n = 30 | n = 50, DL = 256 and AF=Sigmoid | |||
| MobilenetV2 | 92.32 | 145.46 | 92.25 | 227.69 | ||
| VGG16 | 80.67 | 158.17 | 90.06 | 253 | ||
| VGG19 | 81.33 | 156.11 | 84.65 | 258.39 | ||
| Xception | 93.24 | 271.66 | 95.05 | 454.48 | ||
Fig. 11Trade-off between training accuracies and training losses of (a) SSDIV3 (b) Xception (c) MobileNetV2 (d) VGG19 model
Fig. 12The bar-graph depicts the difference of optimizers in mentioned models
Comparative analysis with other deep convolutional models on the synthesized dataset (RTFMD-V2)
| Learning rate α = 1e-4, Epochs n = 50, Activation function = Sigmoid, Optimizer = ADAM | ||||||
|---|---|---|---|---|---|---|
| Model used | Batch size | Train | Validation | Training time | ||
| Train_Accuracy | Train_Loss | Val_Accuracy | Val_Loss | |||
| MobileNet | 16 | 0.95 | 0.14 | 0.94 | 0.21 | 765.16 |
| 64 | 0.96 | 0.08 | 0.96 | 0.094 | 767.23 | |
| VGG16 | 16 | 0.87 | 0.31 | 0.84 | 0.31 | 873.92 |
| 64 | 0.83 | 0.34 | 0.84 | 0.33 | 863.92 | |
| VGG19 | 16 | 0.88 | 0.35 | 0.85 | 0.35 | 940.12 |
| 32 | ||||||
| 64 | 0.832 | 0.34 | 0.86 | 0.32 | 930.48 | |
| Xception | 16 | 0.97 | 0.07 | 0.97 | 0.08 | 1798.54 |
| 64 | 0.96 | 0.08 | 0.98 | 0.072 | 1734.77 | |
| 16 | 0.98 | 0.064 | 0.97 | 0.107 | 1694.29 | |
| 64 | 0.97 | 0.06 | 0.97 | 0.097 | 1497.63 | |
Experimental Outcomes with synthesized datasets using the proposed model
| Synthesized dataset | Image size | Precision | Recall | Accuracy(%) | Loss | Time elapsed |
|---|---|---|---|---|---|---|
| RTFMD | 620 | 96.93 | 97.23 | 98.45 | 0.09 | 261.94 |
| RTFMD-V2 | 1180 | 98.12 | 97.45 | 98.75 | 0.05 | 1503.29 |
Fig. 13Comparative analysis among different models (a) Accuracy (b) elapsed time
Fig. 14Trade-off between training accuracy and training loss;validation accuracy and validation loss of models(a) SSDIV3 b)VGG16 (C) VGG19 (d) Xception e) MobileNet
Evaluated results on the Existing dataset using the proposed methodology
| Name_of Dataset | Image Size | Batch size | Precision(%) | Recall(%) | Accuracy(%) | Loss | Time elapsed(ms) |
|---|---|---|---|---|---|---|---|
| RFMD | 3000 | 16 | 94.24 | 96.48 | 95.64 | 0.18 | 4124.32 |
| 64 | 96.78 | 96.81 | 96.98 | 0.10 | 4006.29 | ||
| Face_Mask Dataset | 7000 | 16 | 91.48 | 93.72 | 92.68 | 0.35 | 11,038.65 |
| 64 | 93.55 | 96.48 | 95.90 | 0.19 | 10,012.26 |
Fig. 15Comparison of evaluated metrics on all datasets
Fig. 16Comparison of loss metric on all datasets at different batch size
Comparison of proposed model against the state-of-the-art models on RTFMD-V2 dataset
| Models | Accuracy | Time Required(ms) | F1 Score | Total number of parameters | Model size |
|---|---|---|---|---|---|
| MobileNetV2 | 96% | 766.24 | 0.96 | 2 Million | 20 MB |
| VGG16 | 85% | 865.82 | 0.85 | 14 Million | 64 MB |
| VGG19 | 86% | 932.67 | 0.86 | 21 Million | 83.8 MB |
| Xception | 98% | 1774.1 | 0.98 | 21.9 Million | 98 MB |
Fig. 17Modeling parameters of different models over RTFMD dataset a) Comparison at different activation function and dense layer b) Total time elapsed of various optimizers with all models
Fig. 18Total training time elapsed in the presented model with varying batch sizes for (a) Existing datasets and the (b) Self-Synthesized Face Mask datasets
Fig. 19Resultant images for existing face mask dataset to predict person wearing a mask(a) or (b) without a mask
Fig. 20Resultant images for RTFMD dataset (a) face with no mask, (b) partially covered face, (c) face covered with mask, (d) right tilt face covered with mask, (e) left tilt face covered with mask, (f) multi face covered with mask
Fig. 21Resultant images for RTFMD-V2 dataset. (a) Multiperson covered with natural-looking masks, and 2d_printed mask (b) Multiface covered with a natural-looking mask and no mask (c) Multiperson covered with natural-looking masks appearance
Comparison of proposed model and dataset with existing approaches and dataset
| Methods | Accuracy (%) | |||
|---|---|---|---|---|
| Previous Face Mask Datasets | Synthesized Face Mask Datasets | |||
| Face Mask Dataset [ | Medical Face Mask dataset [ | RTFMD | RTFMD-V2 | |
| Preeti Nagrath et al. [ | 92.48 | 93.02 | 93.25 | 95.6 |
| Xinqi et al. [ | 93.49 | 94.29 | 95.45 | 96.45 |
| Arjya Das et al. [ | 95.77 | 94.58 | 93.45 | 94.76 |