| Literature DB >> 34848910 |
Peishu Wu1, Han Li1, Nianyin Zeng1, Fengping Li2.
Abstract
Coronavirus disease 2019 (COVID-19) is a world-wide epidemic and efficient prevention and control of this disease has become the focus of global scientific communities. In this paper, a novel face mask detection framework FMD-Yolo is proposed to monitor whether people wear masks in a right way in public, which is an effective way to block the virus transmission. In particular, the feature extractor employs Im-Res2Net-101 which combines Res2Net module and deep residual network, where utilization of hierarchical convolutional structure, deformable convolution and non-local mechanisms enables thorough information extraction from the input. Afterwards, an enhanced path aggregation network En-PAN is applied for feature fusion, where high-level semantic information and low-level details are sufficiently merged so that the model robustness and generalization ability can be enhanced. Moreover, localization loss is designed and adopted in model training phase, and Matrix NMS method is used in the inference stage to improve the detection efficiency and accuracy. Benchmark evaluation is performed on two public databases with the results compared with other eight state-of-the-art detection algorithms. At IoU = 0.5 level, proposed FMD-Yolo has achieved the best precision AP50 of 92.0% and 88.4% on the two datasets, and AP75 at IoU = 0.75 has improved 5.5% and 3.9% respectively compared with the second one, which demonstrates the superiority of FMD-Yolo in face mask detection with both theoretical values and practical significance.Entities:
Keywords: COVID-19; Face mask detection; Feature extraction and fusion; Improved YoloV3 algorithm
Year: 2021 PMID: 34848910 PMCID: PMC8612756 DOI: 10.1016/j.imavis.2021.104341
Source DB: PubMed Journal: Image Vis Comput ISSN: 0262-8856 Impact factor: 2.818
Fig. 1General flowchart of face mask detection algorithm (FMDA) framework.
Fig. 2Comparison of conventional bottleneck structure with Res2Net module.
Fig. 3Four types of feature fusion methods.
Fig. 4The architecture of YoloV3 network.
Fig. 5The anchor cluster algorithm flow used in this paper.
Fig. 6Overall structure of proposed FMD-Yolo.
Fig. 7The structure of Im-Res2Net-101 backbone.
Fig. 8The implementation of En-PAN structure.
Datasets information.
| Numbers | MD-2 | MD-3 |
|---|---|---|
| Training set | 6362 | 683 |
| Validation set | 1590 | 170 |
| Category a1 / b1 of training set | 9937 | 567 |
| Category a2 / b2 of training set | 3212 | 2388 |
| Category b3 of training set | – | 107 |
| Mean boxes per image | 2.067 | 4.490 |
Datasets information.
| Parameters | MD-2 | MD-3 |
|---|---|---|
| Max Iterations | 120,000 | 150,000 |
| Base Learning Rate | 0.000625 | |
| PiecewiseDecay Iters | 110,000 | 130,000 |
| Warmup Steps | 4000 | |
| Optimizer | SGD with Momentum (factor = 0.9) | |
| Regularizer | L2 (factor = 0.0005) | |
| Train Batch Size | 6 | |
Performance comparison of FMD-Yolo and other eight detection algorithms on MD-2 dataset.
| Methods | Evaluation metrics | ||||
|---|---|---|---|---|---|
| Faster RCNN baseline | 0.565 | 0.869 | 0.668 | 0.623 | 0.895 |
| Faster RCNN with FPN | 0.597 | 0.886 | 0.713 | 0.655 | 0.900 |
| Yolo V3 | 0.574 | 0.888 | 0.659 | 0.670 | 0.929 |
| Yolo V4 | 0.599 | 0.899 | 0.718 | 0.661 | 0.920 |
| RetinaNet | 0.616 | 0.887 | 0.726 | 0.664 | 0.909 |
| FCOS | 0.605 | 0.894 | 0.710 | 0.674 | 0.932 |
| EfficientDet | 0.613 | 0.870 | 0.726 | 0.665 | 0.910 |
| HRNet | 0.618 | 0.902 | 0.745 | 0.671 | 0.916 |
| FMD-Yolo (ours) | |||||
Performance comparison of FMD-Yolo and other eight detection algorithms on MD-3 dataset.
| Methods | Evaluation metrics | ||||
|---|---|---|---|---|---|
| Faster RCNN baseline | 0.521 | 0.821 | 0.604 | 0.601 | 0.889 |
| Faster RCNN with FPN | 0.536 | 0.846 | 0.551 | 0.615 | 0.907 |
| Yolo V3 | 0.507 | 0.824 | 0.585 | 0.609 | 0.927 |
| Yolo V4 | 0.525 | 0.843 | 0.591 | 0.587 | 0.897 |
| RetinaNet | 0.489 | 0.792 | 0.536 | 0.579 | 0.875 |
| FCOS | 0.521 | 0.796 | 0.558 | 0.628 | 0.916 |
| EfficientDet | 0.454 | 0.699 | 0.518 | 0.569 | 0.841 |
| HRNet | 0.512 | 0.802 | 0.556 | 0.571 | 0.845 |
| FMD-Yolo (ours) | |||||
Fig. 9Overall P-R curves for each model on MD-2 and MD-3 datasets (IoU=0.5).
Performance comparison on each category of the MD-2 dataset.
| Methods | ||||
|---|---|---|---|---|
| Faster RCNN baseline | 0.776 | 0.811 | 0.962 | 0.979 |
| Faster RCNN with FPN | 0.811 | 0.824 | 0.961 | 0.975 |
| Yolo V3 | 0.804 | 0.863 | 0.971 | 0.994 |
| Yolo V4 | 0.829 | 0.855 | 0.969 | 0.985 |
| RetinaNet | 0.796 | 0.825 | 0.978 | 0.993 |
| FCOS | 0.819 | 0.870 | 0.969 | 0.994 |
| EfficientDet | 0.762 | 0.826 | 0.993 | |
| HRNet | 0.834 | 0.852 | 0.971 | 0.980 |
| FMD-Yolo (ours) | 0.974 | |||
Performance comparison on each category of the MD-3 dataset.
| Methods | Category | |
|---|---|---|
| Faster RCNN baseline | 0.817 / 0.917 / 0.730 | 0.860 / 0.930 / 0.875 |
| Faster RCNN with FPN | 0.838 / 0.913 / 0.787 | 0.860 / 0.922 / |
| Yolo V3 | 0.864 / 0.928 / 0.681 | 0.900 / 0.943 / |
| Yolo V4 | 0.840 / 0.903 / 0.785 | 0.887 / 0.928 / 0.875 |
| RetinaNet | 0.774 / 0.870 / 0.732 | 0.800 / 0.886 / |
| FCOS | 0.842 / 0.917 / 0.627 | 0.933 / 0.940 / 0.875 |
| EfficientDet | 0.696 / 0.870 / 0.530 | 0.793 / 0.917 / 0.812 |
| HRNet | 0.825 / 0.915 / 0.666 | 0.860 / 0.925 / 0.750 |
| FMD-Yolo (ours) | ||
Fig. 10P-R curves of FMD-Yolo on each category (IoU=0.5).