| Literature DB >> 35910402 |
Fevziye Irem Eyiokur1, Hazım Kemal Ekenel2, Alexander Waibel1.
Abstract
Health organizations advise social distancing, wearing face mask, and avoiding touching face to prevent the spread of coronavirus. Based on these protective measures, we developed a computer vision system to help prevent the transmission of COVID-19. Specifically, the developed system performs face mask detection, face-hand interaction detection, and measures social distance. To train and evaluate the developed system, we collected and annotated images that represent face mask usage and face-hand interaction in the real world. Besides assessing the performance of the developed system on our own datasets, we also tested it on existing datasets in the literature without performing any adaptation on them. In addition, we proposed a module to track social distance between people. Experimental results indicate that our datasets represent the real-world's diversity well. The proposed system achieved very high performance and generalization capacity for face mask usage detection, face-hand interaction detection, and measuring social distance in a real-world scenario on unseen data. The datasets are available at https://github.com/iremeyiokur/COVID-19-Preventions-Control-System.Entities:
Keywords: CNN; COVID-19; Face mask detection; Face-hand interaction detection; Social distance measurement
Year: 2022 PMID: 35910402 PMCID: PMC9307220 DOI: 10.1007/s11760-022-02308-x
Source DB: PubMed Journal: Signal Image Video Process ISSN: 1863-1703 Impact factor: 1.583
Comparison of the face mask datasets
| Dataset name | No mask | Mask | Improper Mask | Face Mask Type | Ethnicities | Head Pose |
|---|---|---|---|---|---|---|
| ISL-UFMD | 10698 | 10618 | 500 | Real | Various | Various |
| RMFD [ | 90468 | 2203 | – | Real | Asian | Frontal to Profile |
| RWMFD [ | 858 | 4075 | 238 | Real | Mostly Asian | Frontal to Profile |
| Face mask [ | 718 | 3239 | 123 | Real | Mostly Asian | Various |
| MaskedFace-Net [ | – | 67049 | 66734 | Artificial | Various | Mostly Frontal |
(*) Although it is stated that RMFD dataset [10] contains 5000 face images with mask, there are only 2203 face images with mask in the publicly available version
Fig. 1Example images from ISL-UFMD belonging to three different classes; no mask, face mask, improper face mask
Fig. 2Example images from ISL-UFHD that represent face-hand interaction and no interaction
Fig. 3Proposed system for controlling COVID-19 preventions
Face mask detection results on proposed ISL-UFMD dataset for three classes
| Model | Accuracy | Precision | Recall | ||||
|---|---|---|---|---|---|---|---|
| No Mask | Mask | Improper Mask | No Mask | Mask | Improper Mask | ||
| Inception-v3 | 0.985 | 0.986 | 0.833 | 0.988 | 0.984 | 0.800 | |
| ResNet50 | 95.63% | 0.965 | 0.954 | 0.636 | 0.973 | 0.973 | 0.389 |
| MobileNetV2 | 97.91% | 0.988 | 0.975 | 0.842 | 0.983 | 0.640 | |
| EfficientNet-b0 | 97.82% | 0.973 | 0.984 | 0.986 | 0.520 | ||
| EfficientNet-b1 | 97.91% | 0.979 | 0.986 | 0.800 | 0.990 | 0.984 | 0.711 |
| EfficientNet-b2 | 97.91% | 0.977 | 0.792 | 0.977 | 0.760 | ||
| EfficientNet-b3 | 98.19% | 0.988 | 0.733 | 0.986 | 0.982 | ||
Bold values indicate the best scores
Fig. 4Class activation map (CAM): a face mask detection task, b face-hand interaction detection task, c misclassified samples of face mask detection task, d misclassified samples of face-hand interaction detection task
Results for cross-dataset experiments. All models are trained and tested on corresponding dataset. Please note that all experiments are conducted on the 3-class classification setup to perform fair comparison
| Architecture | Training Set | Test Set | # Images | Accuracy | |
|---|---|---|---|---|---|
| Train | Test | ||||
| MobileNetV2 | ISL-UFMD | RMFD [ | 20764 | 92671 | 91.4% |
| MobileNetV2 | ISL-UFMD | RWMFD [ | 20764 | 5171 | 94.7% |
| MobileNetV2 | ISL-UFMD | MaskedFace-Net [ | 20764 | 133782 | 88.11% |
| MobileNetV2 | ISL-UFMD | Face mask [ | 20764 | 4080 | |
| Inception-v3 | ISL-UFMD | RMFD [ | 20764 | 92671 | |
| Inception-v3 | ISL-UFMD | RWMFD [ | 20764 | 5171 | |
| Inception-v3 | ISL-UFMD | MaskedFace-Net [ | 20764 | 133782 | |
| Inception-v3 | ISL-UFMD | Face mask [ | 20764 | 4080 | 94.7% |
| MobileNetV2 | RMFD + RWMFD | ISL-UFMD | 97842 | 21816 | 86.59% |
| MobileNetV2 | RMFD + RWMFD | Face mask [ | 97842 | 4080 | 91.07% |
| MobileNetV2 | MaskedFace-Net + FFHQ | ISL-UFMD | 211936 | 21816 | 51.49% |
| MobileNetV2 | MaskedFace-Net + FFHQ | Face mask [ | 211936 | 4080 | 20.4% |
| Inception-v3 | RMFD + RWMFD | ISL-UFMD | 97842 | 21816 | 88.92% |
| Inception-v3 | RMFD + RWMFD | Face mask [ | 97842 | 4080 | 88.4% |
| Inception-v3 | MaskedFace-Net + FFHQ | ISL-UFMD | 211936 | 21816 | 51.39% |
| Inception-v3 | MaskedFace-Net + FFHQ | Face mask [ | 211936 | 4080 | 19.2% |
Bold values indicate the best scores
Face-hand interaction detection results on proposed ISL-UFHD dataset
| Model | Accuracy | Precision | Recall |
|---|---|---|---|
| Inception-v3 | 93.20% | 0.932 | 0.932 |
| ResNet50 | 91.76% | 0.918 | 0.918 |
| MobileNetV2 | 92.37% | 0.924 | 0.924 |
| EfficientNet-b0 | 92.37% | 0.926 | 0.924 |
| EfficientNet-b1 | 92.90% | 0.929 | 0.929 |
| EfficientNet-b2 | |||
| EfficientNet-b3 | 92.44% | 0.925 | 0.924 |
Bold values indicate the best scores
Evaluation of the overall system on the test videos
| Video | # frames | # sub. | Mask acc. | Face-hand acc. | Dist. acc. |
|---|---|---|---|---|---|
| V1 | 179 | 2 | 100% | 99.16% | 98.32% |
| V2 | 307 | 2 | 99.51% | 96.25% | 100% |
| V3 | 303 | 3 | 96.91% | 89.43% | 96.69% |
| V4 | 192 | 3 | 100% | 86.97% | 97.22% |
| V5 | 207 | 5 | 99.03% | 95.45% | 100% |
| V6 | 105 | 7 | 87.07% | 99.86% | 74.55% |
| Total | 1293 | 22 | 97.95% | 93.84% | 96.51% |