| Literature DB >> 35528282 |
Neeru Jindal1, Harpreet Singh2, Prashant Singh Rana2.
Abstract
With the outbreak of the Coronavirus Disease in 2019, life seemed to be had come to a standstill. To combat the transmission of the virus, World Health Organization (WHO) announced wearing of face mask as an imperative way to limit the spread of the virus. However, manually ensuring whether people are wearing face masks or not in a public area is a cumbersome task. The exigency of monitoring people wearing face masks necessitated building an automatic system. Currently, distinct methods using machine learning and deep learning can be used effectively. In this paper, all the essential requirements for such a model have been reviewed. The need and the structural outline of the proposed model have been discussed extensively, followed by a comprehensive study of various available techniques and their respective comparative performance analysis. Further, the pros and cons of each method have been analyzed in depth. Subsequently, sources to multiple datasets are mentioned. The several software needed for the implementation are also discussed. And discussions have been organized on the various use cases, limitations, and observations for the system, and the conclusion of this paper with several directions for future research.Entities:
Keywords: COVID-19; Classification; Face mask detection; Object detection
Year: 2022 PMID: 35528282 PMCID: PMC9069221 DOI: 10.1007/s11042-022-12999-6
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.577
Fig. 1Precautions to avoid COVID-19
Fig. 2Methodology used
Fig. 3The number of publications in face mask detector from the year 2000 to 2022(The year 2022 includes data till January 11) as taken from Semantic Scholar using words “Face Mask Detector”
Fig. 4The proposed flow diagram for face mask detection system
Fig. 5Steps involved in data cleaning
Fig. 6Video pre-processing techniques
Fig. 7Approach for object detection methods
Fig. 8Object detection segments
Fig. 9Faster R-CNN [33]
Fig. 10R-FCN [34]
Fig. 11Working of Yolo [32]
Fig. 12Single Shot Multibox Architecture [35]
Comparison table of state-of-the-art detection models [78]
| Model | Accuracy | FPS |
|---|---|---|
| Faster R-CNN | 70.4 | 17 |
| R-FCN | 77.6 | 6 |
| YOLOv3 | 78.6 | 91 |
| SSD | 78.5 | 59 |
Fig. 13Features extraction techniques in face mask detection [12, 15, 42, 93]
Fig. 14The Face Mask Detection could be implemented by first performing face detection followed by face mask classification on an individual
Fig. 15Face detection speed analysis [64]
Detection accuracy comparison of algorithms
| Model | Face Detection | Mask Detection | ||
|---|---|---|---|---|
| Precision (%) | Recall (%) | Precision (%) | Recall (%) | |
| Dlib based on ResNet50 [ | 99.20 | 99.0 | 98.92 | 98.24 |
| MTCNN based on MobileNet [ | 94.50 | 86.38 | 84.39 | 80.92 |
| RetinaFace based on MobileNet [ | 83.0 | 95.60 | 82.30 | 89.10 |
| RetinaFace based on ResNet [ | 91.9 | 96.3 | 93.4 | 94.5 |
Fig. 16Comparison of various classification algorithms
Detection accuracy comparison of algorithms
| Real-time technique | Algorithm | Runtime | Theoretical Significance |
|---|---|---|---|
| Face Mask Detection, Su et al. [ | YOLOv3 | 14.62 fps | A novel fusion transfer learning in the amalgamation of YOLOv3 has been proposed for face mask detection, with EfficientNet being the backbone architecture for feature extraction. |
| Real-time Face Mask Detection, Gadge et al. [ | YOLOv4 | 49.5 fps | A face mask detector has been fabricated using deep learning model which works with real-time streams. The training is performed over several thousand epochs to achieve notable results in real-time. |
| Real-Time AI based Face Mask Detection, Teboulbi et al. [ | Neural Networks | – | Multiple neural network architectures, after evaluation, have been used with Raspberry Pi and webcam to enforce effective real-time methods for face mask detection in public places. |
| Real-Time Face Mask Detection in Video, Ding et al. [ | YOLOv5 | 52 fps | Deep Learning techniques are established to be producing improved detection results in real-time. Also, the availability of enriched dataset assists in improving the efficiency of the model. |
| Personal Protective Equipment Detection, Nath et al. [ | YOLOv3 | 11 fps | A deep learning model built to verify whether construction workers wear personal protective equipment, including hard hats, vests, etc., in real-time. Also, different approaches have been compared to execute the task. |
| Front Vehicle Detection, Cao et al. [ | SSD | 66.6 fps | A vehicle detection model is proposed for intelligent cars having an improvised SSD architecture. Also, the system has shown to be robust in excessive traffic. |
| Illegal Parking Detection, Tang et al. [ | SSD | 40 fps | A deep learning-based model with an improved SSD has been proposed to detect illegal vehicle parking. The system has achieved a precision of 97.3%. |
| Vehicle Type Recognition, Kim et al. [ | Faster R-CNN YOLOv4 SSD | 36.32 fps 82.1 fps 105.14 fps | The study aims to analyze several techniques for vehicle type recognition in real-time. The analysis has maintained the performance of YOLOv4 to be the best. |
| Social Distancing monitoring, Jindal et al. [ | YOLOv3 | 21fps | The proposed study uses YOLOv3 to implement a bird-eye view based social distancing monitoring system. Also, by showcasing red boxes, an alert is shown if social distancing is violated. |
Fig. 17Deep Learning and Machine Learning usage trends over the years 2011–2021(till April’2021)
Fig. 18Records of the different (a) Object detection (b) face detection (c) classification techniques analyzed over the year 2000–2021(April) using Semantic Scholar
List of advantages and disadvantages of some of object detection algorithms (deep learning approach)
| Name | Faster R-CNN [ | R-FCN [ | YOLO [ | SSD [ |
|---|---|---|---|---|
| Advantages | Performs well on small object detection because of the presence of nine anchors in a single grid | The speed is faster in comparison to that of other region-based CNN | Objects are detected efficiently and with great speed, hence finding application in real-time | Faster in object localization than Faster R-CNN because of one step architecture |
| Challenges | The training time is more and, despite efficiency, fails to perform real-time detection because of two-step architecture | The mAP of R-FCN is appreciable but lesser than R-CNN | Difficulty in detecting small objects and close objects | More training data is required and relatively less accuracy |
List of advantages and disadvantages of various face detection methods
| Name | Dlib [ | MTCNN [ | RetinaFace [ |
|---|---|---|---|
| Advantages | Along with an easy training process, it can work well with different face orientations and that too at a fast speed | Efficient detection even when faces are not aligned initially because of jointly performing face detection and alignment | Can generate an accurate rectangular face bounding box together with five points facial landmark |
| Challenges | Fails in detecting small faces as it is trained for a minimum 80 × 80 face size | It is slow as compared to other models and also does not perform well with small faces | It tends to fail when input images contain a large face |
List of advantages and disadvantages of various classifiers
| Name | CNN [ | SVM [ | Decision Trees [ | Ensemble |
|---|---|---|---|---|
| Advantages | High accuracy while performing image recognition tasks and automatically detects crucial features without any intervention | Requires few input data in addition to being highly efficient in high-dimensional spaces. Also, it is memory efficient | In comparison to other models, has fewer requirements for data pre-processing and performs well for categorical data | Capable of making better predictions and achieving preferable performance than any other single model. |
| Challenges | Many input data required along with difficulty in classifying images that are variant to input data | Poor performance in overlapping target classes and, in cases where several features of data points exceed the number of training data samples | Often higher training time and a small change in data leads to huge structure change which results in instability | Usually computationally expensive which results in learning time and memory constraints |
List of different datasets available on online platforms for the study
| NAME | DESCRIPTION | SOURCE(URLs) |
|---|---|---|
| YOLO Medical Mask Dataset | Contains ~631 images of people with medical mask | |
| Dataset by Prajna Bhandari at PyImagesearch | Contains ~1376 images including 690 with mask images and 686 without a mask | |
| Real-World Masked Face Dataset (RMFD) | Contains ~5000 masked face images of 525 people and ~ 90,000 normal face | |
| LFW- Simulated Masked face | The training dataset contains ~13,027 masked faces images of 5713 people, and the testing dataset contains ~70 masked faces of 48 people | |
| Face Mask Detection Dataset | Contains ~3725 images of faces with masks and ~ 3828 images of faces without masks |
Fig. 19Requirements for face mask detection system [36]
Fig. 20List of tools used in data collection [36]
Fig. 21List of tools used in Image Annotation [36] *The dataset could be enriched by deploying techniques like data augmentation
Fig. 22List of useful Python libraries [18]
Fig. 23Popularity of useful python libraries based on statistics of GitHub stars (till April’ 2021)
Fig. 24List of other supporting libraries [18]
Fig. 25Upsurge of deep learning from March 2013 to August 2021 (Created by Google Trends)