Literature DB >> 35528282

Face mask detection in COVID-19: a strategic review.

Neeru Jindal¹, Harpreet Singh², Prashant Singh Rana².

Abstract

With the outbreak of the Coronavirus Disease in 2019, life seemed to be had come to a standstill. To combat the transmission of the virus, World Health Organization (WHO) announced wearing of face mask as an imperative way to limit the spread of the virus. However, manually ensuring whether people are wearing face masks or not in a public area is a cumbersome task. The exigency of monitoring people wearing face masks necessitated building an automatic system. Currently, distinct methods using machine learning and deep learning can be used effectively. In this paper, all the essential requirements for such a model have been reviewed. The need and the structural outline of the proposed model have been discussed extensively, followed by a comprehensive study of various available techniques and their respective comparative performance analysis. Further, the pros and cons of each method have been analyzed in depth. Subsequently, sources to multiple datasets are mentioned. The several software needed for the implementation are also discussed. And discussions have been organized on the various use cases, limitations, and observations for the system, and the conclusion of this paper with several directions for future research.

Entities: Chemical

Keywords: COVID-19; Classification; Face mask detection; Object detection

Year: 2022 PMID： 35528282 PMCID： PMC9069221 DOI： 10.1007/s11042-022-12999-6

Source DB: PubMed Journal: Multimed Tools Appl ISSN： 1380-7501 Impact factor: 2.577

Introduction

The spread of Coronavirus disease 2019, commonly known as COVID-19, is a significant concern for everyone worldwide. It is a contagious disease that has affected human life globally [108, 117]. The health specialists suggest that the virus might transmit by direct or indirect contact with the infected person [119], hence measures like compulsory wearing of face masks [40], as illustrated in Fig. 1, have been strictly put into effect by medical bodies. Numerous studies advise putting face masks on even if a person is not feeling sick. It is not the first time, during COVID-19, that wearing face masks has been stressed to combat the transmission. It is a practice that can be dated back to the 1910–11 Manchurian epidemic in China [60]. Various pandemics of history have been survived by wearing face masks. Besides, it is well proven by various studies that not just wearing face masks instead wearing them properly limits the transmission of the virus to quite an extent. The observation that greater the proportion of population wearing face masks in a country, the lesser the cases of COVID-19 in the nation has created the need for an automated face mask detector.

Fig. 1

Precautions to avoid COVID-19

Precautions to avoid COVID-19 Further, the coronavirus pandemic has necessitated the scientific contribution across the globe to help in battling the pandemic. Leveraging the contemporary technical advancements. Numerous solutions to prevent the transmission of the virus have been formulated. As observed in [71], the authors have put forward an updated mask detection architecture working with noteworthy efficiency of 97%. In [5], the spotting of face masks involved PyTorch, with results being 97% accurate. Further, [95] proposed the detection of several kinds of masks using ultramodern method, and also, the output was obtained after applying the model in real-time. CNN based detectors have been used on custom collected face mask datasets in [21]. Another study was performed to formulate an application that inspects people wearing face masks in public areas [31]. Additionally, the already existing dataset was enriched by including more images in [79]. The proposed work used the Faster R-CNN model to implement the task and achieved an accuracy of 99.8%. In [30], the authors have put forward a system of verifying the correct position of the face mask of an individual, while [72] includes discussions on the various technological methods available to deal with the virus. With the advancements in technology that the world has been witnessing, there are various available techniques [7, 48, 74, 76, 113] that could prove valuable to society if used effectively. A real-time system which could itself classify, seeing a person, in two categories [77]: A person wearing a face mask A person not wearing a face mask could be useful in recent times. Such systems could find applications in public areas like hospitals, airports, malls, etc. One of the methods to make the detector is by first detecting the faces in real-time. And, after detecting the faces from the webcam stream, saving the frames containing the faces and next applying a classifier. The numerous algorithms that could be used for categorization have been discussed in the subsequent section. Another way that could be opted to execute the same is by using an object detection model. Following are the contributions in view of the current state-of-the-art. Although several precautions are recommended to get safe from covid-19, still face masking, and social distancing are significant factors. So, it was necessary to propose many face masking techniques under one umbrella for the research community. Pertaining to the need of the current time, the proposed work reviews several studies conducted in the field of face mask detection. The strong suit of plenty of publications has been discussed on face masking, which is still missing in terms of observations, future trends, a vast number of references, current trends, etc. Performance parameters of several algorithms are compared, and discussions on them are presented to increase the efficacy of the review paper.

Motivation and trends in recent years

With time, the surge in COVID-19 cases urged people to be cautious, alert, and take all safety measures possible. In situations such as this, where a mere sneeze could be harmful to many people, safety remains the priority. To ensure the well-being of all humans, a system that could itself monitor if a face mask is on or not is necessitated. It would not only secure a being rather fellows in the vicinity as well. Having access to the ultra-modern technological methods, implementing such a system could be a boon to society. After analyzing the problem statement, numerous studies performed on the same were scrutinized to commence the research. Then, the content relevant to the issue was filtered, and a depth understanding of the topic was attained. Further, several existing datasets were explored, along with the techniques available. The literature survey of the available methods was conducted, followed by a comparison of the different algorithms. Further, the software was explored and thereby applications. Eventually, the future scope was inspected as shown in Fig. 2.

Fig. 2

Methodology used

Methodology used Initially, around 180 papers were identified belonging to varied publications like Springer. Later, the collected documents were checked for duplication and removed, if any. Then, the articles were screened for their eligibility in context with relevance to the problem statement and thereby, leaving just about 140 papers. Further, the papers were assessed for quality, bringing down the count to 130. Besides, around 100 papers were analyzed for understanding the various techniques available, including state-of-the-art. Few more publications were investigated to gather knowledge about the available datasets. The paper’s organisation is as follows: Section 2 deals with the general flow chart of the face mask detector. Section 3 discusses the various techniques that could be used to implement a face mask detector, while Section 4 reviews some of the real-time methods. Section 5 analyses the trends of techniques in the last two decades along with the advantages and challenges of the techniques discussed in Section 3. In Section 6, the URLs for multiple online available datasets are mentioned. Section 7 suggests several useful software that could be used to carry out the process, followed by Section 8 that states the use cases, drawbacks and the observations made for the process. Section 9 provides conclusions of the study along with future directions. Figure 3 illustrates the number of publications in face mask detectors in the last two decades. Owing to COVID-19, such detectors have gained to be a hot topic of study in 2020 among researchers.

Fig. 3

The number of publications in face mask detector from the year 2000 to 2022(The year 2022 includes data till January 11) as taken from Semantic Scholar using words “Face Mask Detector”

General flow chart

The implementation of the face mask detector system could be executed in two phases, as shown in Fig. 4.

Fig. 4

The proposed flow diagram for face mask detection system

The proposed flow diagram for face mask detection system The first phase is the training phase. This stage is initiated with the collection of the dataset. One of the most crucial steps is to have a good quantity and quality of data [1]. One can prepare the dataset or use already existing datasets from the various available sources. If preparing yourself, the size of data could be increased by using techniques like data augmentation. Also, the data has to be cleaned before use because it plays a significant role in building a model. Various Steps involved in data cleaning are shown in Fig. 5. After obtaining a good quality dataset, the model is selected under the system’s demands and trained on the chosen dataset. Multiple techniques could be used to accomplish the target.

Fig. 5

Steps involved in data cleaning

Steps involved in data cleaning By acquiring the most suitable trained model, the first phase comes to an end. In the subsequent step, the frames from the live video feed or the images are used as input to the trained model. The live video feed could be obtained using a mobile phone, a camera, or a surveillance camera and hence could vary in format, i.e., H.265, H.264, etc. There are several cases where the video frame cannot capture the images as desired. There is a possibility of the video recorded being blurred or having noise, etc. In scenarios like these, image pre-processing comes to the rescue. Further, there are several methods in OpenCV that could be used to enhance the quality of the image. For instance, blurriness could be reduced using the filter2D function of OpenCV, which enhances the sharpness of the picture. Also, image denoising techniques of the same library are helpful to deal with noisy images. Various transforms or histograms could be used for the same. Additionally, object tracking could also be considered to detect faces. Though these are the ways to deal with the discrepancies, the target should be to capture good quality videos (Fig. 6).

Fig. 6

Video pre-processing techniques

Face mask detection techniques

Some of the several techniques used in face mask detection are discussed below (Fig. 7):

Fig. 7

Approach for object detection methods

Object detection

Deep Learning techniques have managed to pick up steam currently because of their ability to train vast data with high accuracy [102]. These state-of-art methods prioritise accuracy in some cases whereas speed in others. In place of the advantages of deep learning techniques in a real-time application, this section discusses object detection using the deep learning approach [19, 29, 42, 46, 109, 114]. At the hands of Computer Vision, Object Detection works to identify and locate objects of certain classes in images and videos. This is imitated in Fig. 8. Besides, this technique uses bounding boxes to localize the things in the input image. This can also enumerate the number of objects in the given image. Various object detection algorithms are available lately [37, 41, 121]. They are categorized into [92].

Fig. 8

Object detection segments

Two-Shot Detection Single-Shot Detection Object detection segments

Two shot detectors

This model achieves the target in two steps: Region proposal followed by classification of those regions and refinement of location prediction. Various models for this category are: Faster Region-Based Convolutional Neural Network It is the improvised model of earlier proposed R-CNN [91] and fast R-CNN. It comes with better region-based CNN architecture [25]. Moreover, it is one of the extensively employed advanced algorithm with the R-CNN backbone. Compared to earlier models, it replaces the selective search algorithm used to identify RoI. The detailed diagram explaining the same is shown in Fig. 9. Additionally, when accuracy is of concern, this algorithm is given preference. In [82], the author performs company logo detection using the mentioned technique. Also, in [22], this algorithm is used to identify the stages in malaria-infected blood. In [39], the author uses this state-of-art model to monitor people wearing face masks in public areas. Furthermore, several researchers [6, 14, 63, 87, 94, 103, 115] have taken leverage of this method.

Fig. 9

Faster R-CNN [33]

Faster R-CNN [33] Region-Based Fully Convolutional Network It is a two-shot architecture that is developed, taking inspiration from Faster-RCNN. Unlike Faster R-CNN, all the composite work is finished before ROI pooling, which is applied on score maps. All regional proposals utilize the same score maps to perform average voting. Also, all the layers are convolutional and computed on the image. It can be taken as a hybrid model of one-shot and two-shot models. The architecture is shown in Fig. 10. Besides, the related works are talked over in [15, 54, 106] closely.

Fig. 10

R-FCN [34]

Single-shot detectors

They are usually used when speed is a priority to implement a study. This is because of their method to predict the boundary boxes and the classes, which does not involve a dedicated step for the proposal of bounding boxes and utilizes a single deep neural network. Therefore, they find numerous applications in real-time detections. You Only Look Once Unlike selecting an image in parts, the algorithm performs categorization in a single pass. The input image is made to pass through multiple layers of the network which eventually produces a prediction as an output [62]. Moreover, Yolov3 makes use of DarkNet-53 to detect features. DarkNet-53 is a 53 layers CNN trained on ImageNet. It even uses Residual networks, which skip connections [80]. Besides, anchor boxes are used as a pre-trained landmark by the bounding boxes to provide the detected object location. Again, it predicts the class probabilities for each grid cell. In this model, the Non-max Suppression algorithm finds usage to eliminate anchor boxes that are not required. The bounding boxes are discarded using IoU (Intersection over Union) (Fig. 11)

Fig. 11

Working of Yolo [32]

Working of Yolo [32] Further, YOLO has gained attention because of its speed [58]. Moreover, its excellence in learning even on the generalized images of the objects and making predictions with high accuracy aids it is outperforming other fellow models. In [85], the author has enhanced the traditional Yolov4 series to propose a novel detector. Likewise, in [11], this state-of-art technique has been implemented to improve the performance of mask detectors. Also, a similar approach is elucidated in various [2, 8, 38, 43, 49, 52, 55, 57, 83, 88, 98] compositions. Single shot multibox detector It uses VGG-16 as its backbone architecture, discarding the fully connected layers [12]. The model can be set up in two components, i.e., extraction of feature maps, followed by application of convolution filter in order to detect objects. It works by matching objects with default boxes of distinct aspects. Whenever any box meets the set minimum threshold value of IoU, a match becomes considerable. Besides, after approximation, each feature map location is scaled, and the predictions by the model are made by feature maps to consider objects of multiple sizes as shown in Fig. 12.

Fig. 12

Single Shot Multibox Architecture [35]

Single Shot Multibox Architecture [35] In [65], real-time face mask detection is discussed with changes in architecture used. [68] provides a way to execute the algorithm. Also, [81] talks about the model used in detecting objects for the blinds. Further, a different approach is used in [23] for object detection. In [17], an improvised way of detecting face masks using SSD has been executed. The authors have improved the algorithm by using inverse convolution and feature fusion. While [53] brings up a similar technique for executing their study. It can be observed from Table 1 that single shot detectors, including YOLO and SSD, have higher inference speed owing to faster localization and categorization followed by Faster R-CNN. Additionally, the algorithm to be used is chosen depending on the requirement of the problem. Generally, Faster R-CNN, because of the detection speed, is employed when the results are not to be obtained in real-time, whereas YOLO is the choice of practitioners when working with live data feed. Also, SSD maintains a balance between speed and detection effectiveness.

Table 1

Comparison table of state-of-the-art detection models [78]

Model	Accuracy	FPS
Faster R-CNN	70.4	17
R-FCN	77.6	6
YOLOv3	78.6	91
SSD	78.5	59

Comparison table of state-of-the-art detection models [78]

Feature extraction

Extraction of features is a way to get rid of unnecessary information from the data, thereby reducing the computational cost and still having imperative and relevant data reserved. Also, the reduced data helps increase the model’s learning rate. Moreover, real-time face mask detection leverages machine learning and deep learning techniques for feature extraction. In deep learning, neural networks themselves facilitate extracting features without human intervention. The input data is passed to the feature extraction network, with different backbone architectures, including MobileNetv2 and Xception [71]. Subsequently, the result is forwarded to the classifier network categorizing a person with or without a mask. On the other hand, algorithms, like histogram of oriented gradients (HOG) and Principal Component Analysis (PCA), could be utilized to obtain features in the machine learning model [29, 71]. Additionally, features could be extracted manually by incorporating the methods mentioned in Fig. 13.

Fig. 13

Features extraction techniques in face mask detection [12, 15, 42, 93]

Other techniques

Diversely, another path that could be taken to execute the study is by considering the problem in two sections. The problem statement, here face mask detector, could be constructed by first performing face detection on the frames coming from the video feed and later giving the frames with faces as an input to the classifier, which hence furnishes the desired output, i.e., faces with or without masks (Fig. 14).

Fig. 14

The Face Mask Detection could be implemented by first performing face detection followed by face mask classification on an individual

The Face Mask Detection could be implemented by first performing face detection followed by face mask classification on an individual Elaborating on above-mentioned points, FACE DETECTION is a technical advancement in the contemporary world where human faces could be detected in an image. The location of the face is marked using bounding boxes. Also, numerous aspects are to be considered to perform successful detection [51]. Due to the advantages of neural networks, even they are used in detection [104]. The innovation is in use in various applications. Some of the different methods to perform the same are listed below: Dlib Dlib performs face detection using deep learning through Convolutional Neural Networks. It performs better than HOG based method even on the faces at odd angles. A delicate implementation of the library is well illustrated in [86, 111]. Multi-task Cascaded Convolutional Neural Network A CNN-based proposed works in three different stages to detect and localize faces and vital facial points. [120]. Besides, [110] conducted facial recognition using MTCNN. In [28], the real-time application of detecting people with or without face masks using the mentioned method is illustrated. Likewise, a detailed study is executed in [50]. RetinaFace It is a single-stage detector that works on pixel-wise face localization and simultaneously predicts face box, face score, and facial key points. An elaborate discussion is presented in multiple pieces of research [16, 26, 69].

Performance analysis

From the analysis in Fig. 15, it can be observed that all the algorithms perform efficiently on images. However, some studies maintain the poor performance of dlib in scenarios with a lot of faces in it. While analyzing the performance of the different methods, the quality of the image should be considered. Also, the model’s accuracy varies with the angle of the face in an image, as studied in [64].

Fig. 15

Face detection speed analysis [64]

Face detection speed analysis [64] Although the effectiveness of the architecture can be influenced by the size and the quality of the dataset, there are precisely defined parameters used to assess the classification outcomes. Precision and recall are the evaluation metrics to check the performance of the model. Additionally, precision is taken to be the measure of correct positive identifications while recall represents the proportion of correctly classified actual positives. The closer the value of precision and recall is to 1, the more accurate is the used backbone network. From Table 2, Dlib based on ResNet50 has the precision and recall value closest to 1, in comparison to other algorithms, thereby conducive to an effective model.

Table 2

Detection accuracy comparison of algorithms

Model	Face Detection		Mask Detection
Model	Precision (%)	Recall (%)	Precision (%)	Recall (%)
Dlib based on ResNet50 [84]	99.20	99.0	98.92	98.24
MTCNN based on MobileNet [44]	94.50	86.38	84.39	80.92
RetinaFace based on MobileNet [44]	83.0	95.60	82.30	89.10
RetinaFace based on ResNet [84]	91.9	96.3	93.4	94.5

Detection accuracy comparison of algorithms After successfully performing face detection, the next step to classify the faces detected is carried off. CLASSIFICATION is considered supervised learning in machine learning [90], which specifies the class label to which the input data belongs. The methods that can be used to perform the same are considered below. Convolutional Neural Network In deep learning, a CNN model is usually fed with an image as an input which is then made to pass through multiple layers [3]. To begin with, the input is made to pass through convolutional layers with kernels in succession, followed by a pooling layer. This layer then reduces the number of learning parameters and hence computations by turning down the size of feature maps. It is afterwards carried through fully connected layers, which at the end apply a softmax function that predicts the probabilistic values for each class. The class having the maximum value is then taken to be the class to which the object belongs. CNN can make use of varied backbone architectures to achieve the task. In [13], the VGG-16 architecture of CNN is discussed. Further, a real-time face mask detector which could be helpful in times like those of COVID-19, is demonstrated in [27]. Besides, [75, 93, 97, 99, 122] analyses the usage of the technique. Support Vector Machines It is a method leading to the division of the input data into different classes by making boundaries using hyper-planes. When working on multi-class data, each class is considered to have its binary classifier. [59] describes and exhibits how SVM is used for image classification. It uses SVM on several datasets and later even compares the performances on each dataset and with multiple other classifiers. Also, discussion about similar aspects is done in [45, 123]. Decision Trees It is among the most useful algorithms that are availed to deal with classification problems. It is a flow-chart-like structure where each internal node tests on a feature, and the branch represents the test result while the leaf node represents the decision, i.e., class label [24]. In [73], decision trees and their specific algorithms are reviewed in depth. Correspondingly, [100] talks about work in the same domain. Ensemble This type of learning produces an optimal predictive model because it combines several other models. The model works either by bagging or by bootstrap aggregation. [4] reviews about the available hybrid and Ensemble methods in detail. Besides, an assessment of the process is described in [20]. The accuracy comparison chart, as shown in Fig. 16, analyses the result of several algorithms obtained on the Simulated Masked Face Dataset (SMFD) as studied in [67, 75]. Although it can be observed that SVM has achieved the highest possible accuracy, it cannot be neglected that the other components, like the selection of hyperparameters, play a crucial role while deciding the feasibility of an algorithm. The amalgamation of architecture, dataset, pre-processing, and requirement of the problem statement result in selecting the technique to be used.

Fig. 16

Comparison of various classification algorithms

Analysis of real-time techniques

The comparison of different contemporary real timedetection techniques has been shown in Table 3.

Table 3

Detection accuracy comparison of algorithms

Real-time technique	Algorithm	Runtime	Theoretical Significance
Face Mask Detection, Su et al. [92]	YOLOv3	14.62 fps	A novel fusion transfer learning in the amalgamation of YOLOv3 has been proposed for face mask detection, with EfficientNet being the backbone architecture for feature extraction.
Real-time Face Mask Detection, Gadge et al. [62]	YOLOv4	49.5 fps	A face mask detector has been fabricated using deep learning model which works with real-time streams. The training is performed over several thousand epochs to achieve notable results in real-time.
Real-Time AI based Face Mask Detection, Teboulbi et al. [97]	Neural Networks	–	Multiple neural network architectures, after evaluation, have been used with Raspberry Pi and webcam to enforce effective real-time methods for face mask detection in public places.
Real-Time Face Mask Detection in Video, Ding et al. [19]	YOLOv5	52 fps	Deep Learning techniques are established to be producing improved detection results in real-time. Also, the availability of enriched dataset assists in improving the efficiency of the model.
Personal Protective Equipment Detection, Nath et al. [66]	YOLOv3	11 fps	A deep learning model built to verify whether construction workers wear personal protective equipment, including hard hats, vests, etc., in real-time. Also, different approaches have been compared to execute the task.
Front Vehicle Detection, Cao et al. [10]	SSD	66.6 fps	A vehicle detection model is proposed for intelligent cars having an improvised SSD architecture. Also, the system has shown to be robust in excessive traffic.
Illegal Parking Detection, Tang et al. [96]	SSD	40 fps	A deep learning-based model with an improved SSD has been proposed to detect illegal vehicle parking. The system has achieved a precision of 97.3%.
Vehicle Type Recognition, Kim et al. [47]	Faster R-CNN YOLOv4 SSD	36.32 fps 82.1 fps 105.14 fps	The study aims to analyze several techniques for vehicle type recognition in real-time. The analysis has maintained the performance of YOLOv4 to be the best.
Social Distancing monitoring, Jindal et al. [61]	YOLOv3	21fps	The proposed study uses YOLOv3 to implement a bird-eye view based social distancing monitoring system. Also, by showcasing red boxes, an alert is shown if social distancing is violated.

Detection accuracy comparison of algorithms Faster R-CNN YOLOv4 SSD 36.32 fps 82.1 fps 105.14 fps

Face mask detection techniques analysis

In terms of the approach being used for the execution of tasks, Fig. 17 demonstrates the growth of each method since the year 2011. It can be observed that deep learning has gained much attention freshly. Also, the data has been collected using Semantic Scholar.

Fig. 17

Deep Learning and Machine Learning usage trends over the years 2011–2021(till April’2021)

Deep Learning and Machine Learning usage trends over the years 2011–2021(till April’2021) Figure 18 shows the comparative percentage usage of reviewed techniques in articles available on different e-sources from 2000 till 2021(April). The articles in Fig. 17a have been selected from Semantic Scholar using the keywords “technique name” + Face Mask Detection. Further, Fig. 17b depicts the articles chosen from Semantic Scholar using “technique name” + Face Detection. Whereas for Fig. 17c, keywords “technique name” + Classification were utilized.

Fig. 18

Records of the different (a) Object detection (b) face detection (c) classification techniques analyzed over the year 2000–2021(April) using Semantic Scholar

Popular techniques with advantages and challenges

A single algorithm cannot suffice for all the needs. The choice of the algorithm relies on many factors. The specific parameters that rule the decision-making include the size of training data, speed, accuracy, training time, number of features, etc. None of the models can be declared best among the counters, but a comparison can be put together to help in the choosing process [101, 105] (Tables 4, 5 and 6).

Table 4

List of advantages and disadvantages of some of object detection algorithms (deep learning approach)

Name	Faster R-CNN [91]	R-FCN [15]	YOLO [62]	SSD [56]
Advantages	Performs well on small object detection because of the presence of nine anchors in a single grid	The speed is faster in comparison to that of other region-based CNN	Objects are detected efficiently and with great speed, hence finding application in real-time	Faster in object localization than Faster R-CNN because of one step architecture
Challenges	The training time is more and, despite efficiency, fails to perform real-time detection because of two-step architecture	The mAP of R-FCN is appreciable but lesser than R-CNN	Difficulty in detecting small objects and close objects	More training data is required and relatively less accuracy

Table 5

List of advantages and disadvantages of various face detection methods

Name	Dlib [9]	MTCNN [118]	RetinaFace [112]
Advantages	Along with an easy training process, it can work well with different face orientations and that too at a fast speed	Efficient detection even when faces are not aligned initially because of jointly performing face detection and alignment	Can generate an accurate rectangular face bounding box together with five points facial landmark
Challenges	Fails in detecting small faces as it is trained for a minimum 80 × 80 face size	It is slow as compared to other models and also does not perform well with small faces	It tends to fail when input images contain a large face

Table 6

List of advantages and disadvantages of various classifiers

Name	CNN [70]	SVM [89]	Decision Trees [24]	Ensemble[4, 100]
Advantages	High accuracy while performing image recognition tasks and automatically detects crucial features without any intervention	Requires few input data in addition to being highly efficient in high-dimensional spaces. Also, it is memory efficient	In comparison to other models, has fewer requirements for data pre-processing and performs well for categorical data	Capable of making better predictions and achieving preferable performance than any other single model.
Challenges	Many input data required along with difficulty in classifying images that are variant to input data	Poor performance in overlapping target classes and, in cases where several features of data points exceed the number of training data samples	Often higher training time and a small change in data leads to huge structure change which results in instability	Usually computationally expensive which results in learning time and memory constraints

List of advantages and disadvantages of some of object detection algorithms (deep learning approach) List of advantages and disadvantages of various face detection methods List of advantages and disadvantages of various classifiers

Dataset

It is a collection of instances used to train models for learning. It can either be created by scraping from the internet or accessing various online websites [107]. Few of the sources that are currently available on different sources are shared in this article (Table 7).

Table 7

List of different datasets available on online platforms for the study

NAME	DESCRIPTION	SOURCE(URLs)
YOLO Medical Mask Dataset	Contains ~631 images of people with medical mask	https://www.kaggle.com/gooogr/yolo-medical-mask-dataset
Dataset by Prajna Bhandari at PyImagesearch	Contains ~1376 images including 690 with mask images and 686 without a mask	https://github.com/prajnasb/observations
Real-World Masked Face Dataset (RMFD)	Contains ~5000 masked face images of 525 people and ~ 90,000 normal face	https://github.com/X-zhangyang/Real-World-Masked-Face-Dataset
LFW- Simulated Masked face	The training dataset contains ~13,027 masked faces images of 5713 people, and the testing dataset contains ~70 masked faces of 48 people	https://www.kaggle.com/muhammeddalkran/lfw-simulated-masked-face-dataset
Face Mask Detection Dataset	Contains ~3725 images of faces with masks and ~ 3828 images of faces without masks	https://www.kaggle.com/omkargurav/face-mask-dataset

List of different datasets available on online platforms for the study

Several supporting software

These days, there is a plethora of programming languages, programming tools, libraries, and frameworks to choose from while working on a project. Further, there are no stringent rules to choose from numerous such sources. Nonetheless, the article lists specific tools that could be useful in a study. The basic requirements for completing a face mask detector project are illustrated below (Fig. 19)

Fig. 19

Requirements for face mask detection system [36]

Dataset

A rich and relevant dataset can be accessed using the below-mentioned methods:

Data collection

It involves accumulation of content pertinent to the problem situation at hand. It is usually performed in accordance with the task to be executed. There are various methods available that could be used to prepare one’s dataset. Some of the tools that could be used for the purpose are shown below (Fig. 20).

Fig. 20

List of tools used in data collection [36]

Annotating image

One of the essential steps while dealing with the image dataset is to annotate it. It refers to labeling images to be later utilized in the machine learning model. Lately, various approaches are viable to execute the same. Some of them are (Fig. 21):

Fig. 21

List of tools used in Image Annotation [36] *The dataset could be enriched by deploying techniques like data augmentation

Model

Below are libraries and frameworks typical of the different implementation techniques mentioned above. They can be installed under the demand of the action, and the model used. To add, the desired file can be imported from the concerned library (Fig. 22).

Fig. 22

List of useful Python libraries [18]

List of useful Python libraries [18] Since, while working with model creation, open-source libraries and frameworks play a significant role. Figure 23 reviews the ranking of the numerous libraries consistent with the GitHub star count as reported by the official documentation of the respective library on PyPi till April 2021. The assessment could be helpful for the uninitiated to begin working with such user-friendly libraries.

Fig. 23

Popularity of useful python libraries based on statistics of GitHub stars (till April’ 2021)

Python

Some of the other useful open source libraries that can be amalgamated with the essential packages are talked through in this section (Fig. 24).

Fig. 24

List of other supporting libraries [18]

Applications, limitations, and observations

Certain areas where face mask detection can be effectively employed are discussed below. Transit hubs At places like airports, railway stations, etc., face mask detectors, integrated with security cameras, can be implemented to keep a check on travellers wearing face masks or not. The passenger’s face could be detected throughout the premises, and the authorities could be informed immediately if any violation is detected. Workplaces A mechanism to observe if an employee has worn a face mask or not could be incorporated in an office. A warning message could be sent to people who are not following the safety precautions. Also, a daily record of people not complying with the regulations could be maintained. Healthcare centres In various healthcare organizations and hospitals, a face mask detection system could track health workers wearing face masks during their shifts. Besides, it could be helpful in alerting the visitors entering the site without face masks. The officials could be immediately informed in case of defiance. Surveillance systems Utilizing face mask detection systems unified with surveillance cameras can help strictly track people wearing face masks or not in public areas.

Limitations

Although the system performs efficiently in real-time, it faces the following challenges. Although different network architecture performs better in mask detection tasks, the model suffers limitations due to large dataset performance [65]. The irregularities in images, like those with insufficient light and side angle, need proper attention [116]. Also, another major challenge is to achieve high accuracy in the least possible time [97]. Additionally, the video analysis has difficulties, including motion blur, transitioning between frames, etc. [64].

Observations

Although two-stage detectors excel in accuracy, one-stage detectors outperform them when used for real-time requirements. Hence, for real-time video feed detection use of algorithms like YOLO, SSD is appreciable. Since training a deep neural network is expensive as it involves high computational complexity, transfer learning, i.e., utilizing pre-trained models like MobileNet, VGG-16, etc., is recommended. Owing to the exceptional results that deep learning models produce, they have become the choice of various practitioners. Though they perform efficiently with high accuracy, applying disparate backbone architectures with different hyperparameters could result in even better accuracy. Also, poor images, like insufficient light, side angle, etc., in the dataset have affected the performance of the model. Hence, the dataset’s quality could be improved further for future use. Though there have been many studies and research work dedicated to COVID-19 these days, there is still a scope for a lot more analysis that could be done in the healthcare domain. After reviewing many studies, it can be inferred that despite the variety of techniques being available to implement the model, one-stage object detectors are the preferred choice for real-time requirements. The accuracy with which it works in real-time makes the application possible. Also, because of the computational costs, drawbacks could be dealt with by altering the architectures, hyperparameters, input size, etc.

Conclusion and future directions

To deal with the pandemic more effectively, developing central systems capable of automatically detecting whether a person is wearing a face mask or not has become an engaging topic for people working in this sphere. A countless number of researches have been initiated lately in this domain. However, this paper aims to provide a detailed review of the various ways that could be opted for executing such an advanced system. After inspecting all the implementation techniques, it could be safely stated that deep learning has become popular among researchers in recent times. The efficiency of the approach makes it suitable for use in such tasks. Additionally, despite many datasets being available, the RMFD dataset is widely used. If used constructively, the deployment of the model could be beneficial in public areas. The proposed system could be upgraded for future works by integrating them with automated thermal detection systems. Also, a check on whether social distancing is being practised in crowded areas could be an add-on to the system. A feature of facial landmark detection could be added for biometric purposes. Moreover, owing to the versatility of the state-of-art techniques, their architectures could be enhanced to achieve better results at a faster speed. As shown in Fig. 25, there has been an upsurge in the usage of deep learning methods. Taking advantage of the enormous utility of these methods, various future studies could be executed in this domain. The quality of datasets could be improved by removing images with insufficient light. Nonetheless, the system could be integrated with a model to check if sufficient physical distance is being maintained between people. It could also be blended with a design that detects the mask type of a person. Besides, new feature extraction techniques could be explored using machine learning algorithms.

Fig. 25

Upsurge of deep learning from March 2013 to August 2021 (Created by Google Trends)

19 in total

1. Plague Masks: The Visual Emergence of Anti-Epidemic Personal Protection Equipment.

Authors: Christos Lynteris
Journal: Med Anthropol Date: 2018-11-14

2. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Authors: Shaoqing Ren; Kaiming He; Ross Girshick; Jian Sun
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2016-06-06 Impact factor: 6.226

3. Applying Faster R-CNN for Object Detection on Malaria Images.

Authors: Jane Hung; Stefanie C P Lopes; Odailton Amaral Nery; Francois Nosten; Marcelo U Ferreira; Manoj T Duraisingh; Matthias Marti; Deepali Ravel; Gabriel Rangel; Benoit Malleret; Marcus V G Lacerda; Laurent Rénia; Fabio T M Costa; Anne E Carpenter
Journal: Conf Comput Vis Pattern Recognit Workshops Date: 2021-11-18

4. SSDMNV2: A real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2.

Authors: Preeti Nagrath; Rachna Jain; Agam Madan; Rohan Arora; Piyush Kataria; Jude Hemanth
Journal: Sustain Cities Soc Date: 2020-12-31 Impact factor: 7.587

5. Face Mask Wearing Detection Algorithm Based on Improved YOLO-v4.

Authors: Jimin Yu; Wei Zhang
Journal: Sensors (Basel) Date: 2021-05-08 Impact factor: 3.576

6. Fighting against COVID-19: A novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection.

Authors: Mohamed Loey; Gunasekaran Manogaran; Mohamed Hamed N Taha; Nour Eldeen M Khalifa
Journal: Sustain Cities Soc Date: 2020-11-12 Impact factor: 7.587

7. Identifying Facemask-Wearing Condition Using Image Super-Resolution with Classification Network to Prevent COVID-19.

Authors: Bosheng Qin; Dongxiao Li
Journal: Sensors (Basel) Date: 2020-09-14 Impact factor: 3.576

8. An Automatic System to Monitor the Physical Distance and Face Mask Wearing of Construction Workers in COVID-19 Pandemic.

Authors: Moein Razavi; Hamed Alikhani; Vahid Janfaza; Benyamin Sadeghi; Ehsan Alikhani
Journal: SN Comput Sci Date: 2021-10-29

9. Deep learning-based bird eye view social distancing monitoring using surveillance video for curbing the COVID-19 spread.

Authors: Raghav Magoo; Harpreet Singh; Neeru Jindal; Nishtha Hooda; Prashant Singh Rana
Journal: Neural Comput Appl Date: 2021-07-02 Impact factor: 5.606

1 in total

Review 1. Forecasting the Post-Pandemic Effects of the SARS-CoV-2 Virus Using the Bullwhip Phenomenon Alongside Use of Nanosensors for Disease Containment and Cure.

Authors: Mohammed S Alqahtani; Mohamed Abbas; Mohammed Abdulmuqeet; Abdullah S Alqahtani; Mohammad Y Alshahrani; Abdullah Alsabaani; Murugan Ramalingam
Journal: Materials (Basel) Date: 2022-07-21 Impact factor: 3.748

1 in total