| Literature DB >> 35408217 |
Shabana Habib1, Majed Alsanea2, Mohammed Aloraini3, Hazim Saleh Al-Rawashdeh4, Muhammad Islam5, Sheroz Khan5.
Abstract
Since December 2019, the COVID-19 pandemic has led to a dramatic loss of human lives and caused severe economic crises worldwide. COVID-19 virus transmission generally occurs through a small respiratory droplet ejected from the mouth or nose of an infected person to another person. To reduce and prevent the spread of COVID-19 transmission, the World Health Organization (WHO) advises the public to wear face masks as one of the most practical and effective prevention methods. Early face mask detection is very important to prevent the spread of COVID-19. For this purpose, we investigate several deep learning-based architectures such as VGG16, VGG19, InceptionV3, ResNet-101, ResNet-50, EfficientNet, MobileNetV1, and MobileNetV2. After these experiments, we propose an efficient and effective model for face mask detection with the potential to be deployable over edge devices. Our proposed model is based on MobileNetV2 architecture that extracts salient features from the input data that are then passed to an autoencoder to form more abstract representations prior to the classification layer. The proposed model also adopts extensive data augmentation techniques (e.g., rotation, flip, Gaussian blur, sharping, emboss, skew, and shear) to increase the number of samples for effective training. The performance of our proposed model is evaluated on three publicly available datasets and achieved the highest performance as compared to other state-of-the-art models.Entities:
Keywords: COVID-19; MobileNet; autoencoder; classification; convolution neural network; data augmentation; deep learning; face mask; machine learning
Mesh:
Year: 2022 PMID: 35408217 PMCID: PMC9003465 DOI: 10.3390/s22072602
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The visual representation of the proposed model.
The data augmentation with a range of parameters.
| S. No | Technique | Parameter |
|---|---|---|
| 1 | Rotation (degree angle) | −15–15 |
| 2 | Flip | Right, left |
| 3 | Gaussian Blur (value of sigma) | 0.25, 0.50, 0.75, 1.0 |
| 4 | Sharping (value of lightness) | 0.50, 1.00, 1.50, 2.00 |
| 5 | Emboss (value of strength) | 0.50, 1.00, 1.50, 2.0 |
| 6 | Skew (Tilt) | Right, left |
| 7 | Shear | x-axis and y-axis, 10 degrees |
The internal architecture of MobileNetV2.
| Layer | Repetition | Size of Stride |
|---|---|---|
| Convolution 3 × 3 | 1 | 2 |
| Bottleneck | 1 | 1 |
| 2 | 2 | |
| 3 | 2 | |
| 4 | 2 | |
| 3 | 1 | |
| 3 | 2 | |
| 1 | 1 | |
| Convolution 1 × 1 | 1 | 1 |
| Pooling 7 × 7 | 1 | - |
| Convolution 1 × 1 | 1 | - |
The internal architecture of the proposed model.
| Type of Layer | Output Shape | Params. |
|---|---|---|
| MobileNetV2 | 7 × 7 × 1280 | 2,257,984 |
| Global average pooling | 1280 | - |
| Encoder1 | 640 | 819,840 |
| Encoder1 | 320 | 205,120 |
| Dense | 64 | 20,544 |
| Dense | 32 | 2080 |
| Dense | 2 | 66 |
|
|
|
Figure 2The training loss and accuracy of the proposed model, (a) the loss and accuracy over the FMD dataset; (b) the loss and accuracy over the FM dataset.
The number of samples in the original and augmented datasets.
| Dataset | Original | Augmented | ||
|---|---|---|---|---|
| Mask | Normal | Mask | Normal | |
| FMD | 3725 | 3828 | 7450 | 7656 |
| FM | 690 | 686 | 6900 | 6860 |
Figure 3The detailed performance of each model in terms of TP, TN, FP, and FN where (a)—FMD original dataset, (b)—FMD augmented dataset, (c) FM—original dataset, and (d)—FM augmented dataset.
Figure 4The detailed performance of each model in terms of TP, TN, FP, and FN over RMFR where (a)—original dataset, (b)—balanced dataset.
The detailed comparative analysis of different models for face mask detection.
| Dataset | Data Type | Model | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|---|---|
| FMD | Original data | VGG16 | 0.8295 | 0.8431 | 0.8363 | 0.8397 |
| VGG19 | 0.8389 | 0.8527 | 0.8457 | 0.849 | ||
| InceptionV3 | 0.7893 | 0.8011 | 0.7951 | 0.7993 | ||
| ResNet-101 | 0.8698 | 0.884 | 0.8769 | 0.8795 | ||
| ResNet-50 | 0.8792 | 0.8585 | 0.8687 | 0.8689 | ||
| EfficientNet | 0.8094 | 0.9178 | 0.8602 | 0.8702 | ||
| MobileNetV1 | 0.8698 | 0.8493 | 0.8594 | 0.8596 | ||
| MobileNetV2 | 0.8792 | 0.8948 | 0.8869 | 0.8894 | ||
|
|
|
|
|
| ||
| Augmented data | VGG16 | 0.9099 | 0.8985 | 0.9042 | 0.9048 | |
| VGG19 | 0.9199 | 0.9371 | 0.9284 | 0.93 | ||
| InceptionV3 | 0.8599 | 0.8838 | 0.8717 | 0.8751 | ||
| ResNet-101 | 0.9499 | 0.9584 | 0.9542 | 0.955 | ||
| ResNet-50 | 0.9399 | 0.958 | 0.9488 | 0.95 | ||
| EfficientNet | 0.9799 | 0.9596 | 0.9696 | 0.9697 | ||
| MobileNetV1 | 0.9299 | 0.9476 | 0.9387 | 0.9401 | ||
| MobileNetV2 | 0.9699 | 0.9895 | 0.9796 | 0.9801 | ||
|
|
|
|
|
| ||
| FM | Original data | VGG16 | 0.8087 | 0.8267 | 0.8176 | 0.819 |
| VGG19 | 0.8493 | 0.8669 | 0.858 | 0.859 | ||
| InceptionV3 | 0.829 | 0.8137 | 0.8212 | 0.819 | ||
| ResNet-101 | 0.7797 | 0.8042 | 0.7918 | 0.7943 | ||
| ResNet-50 | 0.7797 | 0.8127 | 0.7959 | 0.7994 | ||
| EfficientNet | 0.8493 | 0.8254 | 0.8371 | 0.8343 | ||
| MobileNetV1 | 0.829 | 0.8827 | 0.855 | 0.859 | ||
| MobileNetV2 | 0.8493 | 0.8852 | 0.8669 | 0.8692 | ||
|
|
|
|
|
| ||
| Augmented data | VGG16 | 0.8799 | 0.8983 | 0.889 | 0.8898 | |
| VGG19 | 0.9199 | 0.9296 | 0.9247 | 0.9249 | ||
| InceptionV3 | 0.8899 | 0.9086 | 0.8991 | 0.8999 | ||
| ResNet-101 | 0.8699 | 0.8535 | 0.8616 | 0.8599 | ||
| ResNet-50 | 0.8699 | 0.8973 | 0.8834 | 0.8848 | ||
| EfficientNet | 0.9299 | 0.9033 | 0.9164 | 0.9149 | ||
| MobileNetV1 | 0.9299 | 0.9589 | 0.9442 | 0.9448 | ||
| MobileNetV2 | 0.9899 | 0.9707 | 0.9802 | 0.9799 | ||
|
|
|
|
|
|
The detailed comparative analysis of different models over the RMFR balanced and unbalanced dataset.
| Data Type | Model | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|---|
| Original data | VGG16 | 0.8798 | 0.2894 | 0.4355 | 0.8884 |
| VGG19 | 0.8998 | 0.3333 | 0.4864 | 0.9071 | |
| InceptionV3 | 0.9198 | 0.3897 | 0.5475 | 0.9221 | |
| ResNet-101 | 0.8898 | 0.31 | 0.4598 | 0.8965 | |
| ResNet-50 | 0.9298 | 0.4246 | 0.5829 | 0.9394 | |
| EfficientNet | 0.9098 | 0.3596 | 0.5155 | 0.9173 | |
| MobileNetV1 | 0.9198 | 0.3897 | 0.5475 | 0.9245 | |
| MobileNetV2 | 0.9298 | 0.4246 | 0.5829 | 0.9328 | |
|
|
|
|
|
| |
| Balanced data | VGG16 | 0.9298 | 0.4246 | 0.5829 | 0.9334 |
| VGG19 | 0.9498 | 0.5134 | 0.6665 | 0.9529 | |
| InceptionV3 | 0.9398 | 0.4652 | 0.6224 | 0.9422 | |
| ResNet-101 | 0.9898 | 0.846 | 0.9123 | 0.9934 | |
| ResNet-50 | 0.9798 | 0.7312 | 0.8374 | 0.9881 | |
| EfficientNet | 0.9898 | 0.846 | 0.9123 | 0.9935 | |
| MobileNetV1 | 0.9798 | 0.7312 | 0.8374 | 0.9874 | |
| MobileNetV2 | 0.9993 | 0.9973 | 0.9983 | 0.9998 | |
|
|
|
|
|
|
A comparative analysis of the proposed model with other state-of-the-art models.
| Model | Precision | Recall | F1-Score | Accuracy |
|---|---|---|---|---|
| Militante et al. [ | 0.975 | 0.945 | 0.955 | 0.96 |
| Chen et al. [ | - | - | - | 0.9480 |
| Hariri et al. [ | - | - | - | 0.913 |
| Oumina et al. [ | 0.9484 | 0.9508 | - | 0.9711 |
| Loey et al. [ | 0.9963 | 0.9963 | 0.9945 | 0.9964 |
|
|
|
|
|
|
The hardware specification of each setting.
| Setting | Memory | Model |
|---|---|---|
| Raspberry Pi | 4 GB | Raspberry Pi 4 B+ |
| CPU | 32 GB | AMD Ryzen 5 5600X 6-Core Processor |
| GPU | 8 GB | RTX 2070 |