| Literature DB >> 34066802 |
Abstract
To solve the problems of low accuracy, low real-time performance, poor robustness and others caused by the complex environment, this paper proposes a face mask recognition and standard wear detection algorithm based on the improved YOLO-v4. Firstly, an improved CSPDarkNet53 is introduced into the trunk feature extraction network, which reduces the computing cost of the network and improves the learning ability of the model. Secondly, the adaptive image scaling algorithm can reduce computation and redundancy effectively. Thirdly, the improved PANet structure is introduced so that the network has more semantic information in the feature layer. At last, a face mask detection data set is made according to the standard wearing of masks. Based on the object detection algorithm of deep learning, a variety of evaluation indexes are compared to evaluate the effectiveness of the model. The results of the comparations show that the mAP of face mask recognition can reach 98.3% and the frame rate is high at 54.57 FPS, which are more accurate compared with the exiting algorithm.Entities:
Keywords: CSPDarknNet53; PANet; YOLO-v4; adaptive image scaling; face mask recognition
Mesh:
Year: 2021 PMID: 34066802 PMCID: PMC8125872 DOI: 10.3390/s21093263
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Flow chart of the proposed approach.
Figure 2YOLO-v4 network structure.
Figure 3CSPDarkNet53 module structure.
Figure 4CSP1_X module structure.
Figure 5CSP2_X module structure.
Figure 6Image scaling in YOLO-v4.
Figure 7Adaptive image scaling.
Figure 8Network model of mask detection.
Figure 9The process of object positioning and prediction.
The size of the prior box.
| Feature Map | Receptive Field | Prior Box Size |
|---|---|---|
| 13 × 13 | large object | (221 × 245) |
| 26 × 26 | medium object | (165 × 175) |
| 52 × 52 | small object | (46 × 51) |
Distribution of different types of samples in the data set.
| Sort | Training Set | Validation Set | Testing Set | |||
|---|---|---|---|---|---|---|
| Images | Objects | Images | Objects | Images | Objects | |
| face | 2556 | 2670 | 338 | 350 | 721 | 753 |
| face_mask | 2685 | 2740 | 219 | 228 | 716 | 730 |
| WMI | 2585 | 2604 | 311 | 311 | 724 | 730 |
| total | 7826 | 8014 | 868 | 889 | 2161 | 2213 |
Figure 10Division of key parts.
Figure 11Sample diagram from the data set.
Configuration parameters.
| Device | Configuration |
|---|---|
| Operating system | Windows 10 |
| Processor | Inter(R)i7-9700k |
| GPU accelerator | CUDA 10.1, Cudnn 7.6 |
| GPU | RTX 2070Super, 8G |
| Frames | Pytorch, Keras, Tensorflow |
| Compilers | Pycharm, Anaconda |
| Scripting language | Python 3.7 |
| Camera | A4tech USB2.0 Camera |
The hyperparameters of the model.
| Hyperparameters | Before Initialization | After Initialization |
|---|---|---|
| initial learning rate | 0.01000 | 0.00320 |
| optimizer weight decay | 0.00050 | 0.00036 |
| momentum | 0.93700 | 0.84300 |
| classification coefficient | 0.50000 | 0.24300 |
| object coefficient | 1.00000 | 0.30100 |
| hue | 0.01500 | 0.01380 |
| saturation | 0.70000 | 0.66400 |
| value | 0.40000 | 0.46400 |
| scale | 0.50000 | 0.89800 |
| shear | 0.00000 | 0.60200 |
| mosaic | 1.00000 | 1.00000 |
| mix-up | 0.00000 | 0.24300 |
| flip up-down | 0.00000 | 0.00856 |
Comparison of different models in parameters, model size, and training time.
| Model | Parameters | Model Size | Training Time |
|---|---|---|---|
| Proposed work | 45.2 MB | 91.0 MB | 2.834 h |
| YOLO-v4 | 61.1 MB | 245 MB | 9.730 h |
| YOLO-v3 | 58.7 MB | 235 MB | 8.050 h |
| SSD | 22.9 MB | 91.7 MB | 3.350 h |
| Faster R-CNN | 27.1 MB | 109 MB | 45.830 h |
Comparison of different models in test time, reasoning time, FPS.
| Model | One Image Test Time | All Reasoning Time | FPS |
|---|---|---|---|
| Proposed work | 0.022 s | 144.7 s | 54.57 |
| YOLO-v4 | 0.042 s | 151.1 s | 23.83 |
| YOLO-v3 | 0.047 s | 153.1 s | 21.39 |
| SSD | 0.029 s | 97.0 s | 34.69 |
| Faster R-CNN | 0.410 s | 1620.7 s | 2.44 |
The parameter distribution of different modules in different models.
| Module | Faster R-CNN | SSD | YOLO-v3 | YOLO-v4 | Proposed Work |
|---|---|---|---|---|---|
| Backbone | - | - | 40,620,740 | 30,730,448 | 9,840,832 |
| Neck | - | - | 14,722,972 | 27,041,012 | 37,514,988 |
| Prediction | - | - | 6,243,400 | 6,657,945 | 43,080 |
| All parameters | 28,362,685 | 24,013,232 | 61,587,112 | 64,014,760 | 47,398,900 |
| All CSPx | - | - | - | 26,816,384 | - |
| All CSP1_X | - | - | - | - | 8,288,896 |
| All CSP2_X | - | - | - | - | 18,687,744 |
| All layers | 185 | 69 | 256 | 370 | 335 |
Sample detection results of different models on the test set.
| Models | Sort | Size | Object | TP | FP | FN | P | R |
|
|---|---|---|---|---|---|---|---|---|---|
| Proposed work | face | 416 × 416 | 753 | 737 | 50 | 16 | 0.936 | 0.979 | 0.957 |
| face_mask | 416 × 416 | 730 | 725 | 23 | 5 | 0.969 | 0.993 | 0.980 | |
| WMI | 416 × 416 | 730 | 712 | 39 | 18 | 0.948 | 0.975 | 0.961 | |
| Total | 416 × 416 | 2213 | 2174 | 112 | 39 | 0.951 | 0.982 | 0.967 | |
| YOLO-v4 | face | 416 × 416 | 753 | 666 | 42 | 87 | 0.941 | 0.885 | 0.910 |
| face_mask | 416 × 416 | 730 | 705 | 199 | 25 | 0.780 | 0.966 | 0.860 | |
| WMI | 416 × 416 | 730 | 670 | 195 | 60 | 0.775 | 0.918 | 0.840 | |
| Total | 416 × 416 | 2213 | 2041 | 436 | 172 | 0.832 | 0.923 | 0.870 | |
| YOLO-v3 | face | 416 × 416 | 753 | 640 | 53 | 113 | 0.924 | 0.850 | 0.890 |
| face_mask | 416 × 416 | 730 | 686 | 23 | 44 | 0.968 | 0.940 | 0.950 | |
| WMI | 416 × 416 | 730 | 623 | 26 | 107 | 0.960 | 0.853 | 0.900 | |
| Total | 416 × 416 | 2213 | 1949 | 102 | 264 | 0.950 | 0.881 | 0.913 |
The comparative experiments of AP of different models in three categories.
| Sort | Size | IOU | Face | Face_Mask | WMI |
|---|---|---|---|---|---|
| Proposed work | 416 × 416 | AP@.50 | 0.979 | 0.995 | 0.973 |
| 416 × 416 | AP@.75 | 0.978 | 0.995 | 0.983 | |
| 416 × 416 | AP@.50:.95 | 0.767 | 0.939 | 0.834 | |
| YOLO-v4 | 416 × 416 | AP@.50 | 0.943 | 0.969 | 0.944 |
| 416 × 416 | AP@.75 | 0.680 | 0.899 | 0.800 | |
| 416 × 416 | AP@.50:.95 | 0.541 | 0.740 | 0.670 | |
| YOLO-v3 | 416 × 416 | AP@.50 | 0.921 | 0.981 | 0.941 |
| 416 × 416 | AP@.75 | 0.617 | 0.888 | 0.835 | |
| 416 × 416 | AP@.50:.95 | 0.559 | 0.789 | 0.724 | |
| SSD | 300 × 300 | AP@.50 | 0.941 | 0.986 | 0.988 |
| 300 × 300 | AP@.75 | 0.503 | 0.920 | 0.926 | |
| 300 × 300 | AP@.50:.95 | 0.518 | 0.789 | 0.790 | |
| Faster R-CNN | 600 × 600 | AP@.50 | 0.943 | 0.974 | 0.950 |
| 600 × 600 | AP@.75 | 0.700 | 0.927 | 0.866 | |
| 600 × 600 | AP@.50:.95 | 0.612 | 0.824 | 0.769 |
The mAP comparison experiments of different models in all categories.
| Model | mAP@.50 | mAP@.75 | mAP@.50:95 |
|---|---|---|---|
| Proposed work | 0.983 | 0.985 | 0.847 |
| YOLO-v4 | 0.952 | 0.793 | 0.680 |
| YOLO-v3 | 0.948 | 0.780 | 0.689 |
| SSD | 0.972 | 0.783 | 0.691 |
| Faster R-CNN | 0.956 | 0.831 | 0.735 |
Figure 12Visualization of different models in performance testing.
Influence of different activation functions.
| Function | Train Time | Face | Face_Mask | WMI | mAP@.50 |
|---|---|---|---|---|---|
| H-swish | 2.834 h | 0.979 | 0.995 | 0.973 | 0.983 |
| Mish | 3.902 h | 0.971 | 0.995 | 0.973 | 0.980 |
| L-ReLU | 2.812 h | 0.975 | 0.985 | 0.974 | 0.978 |
| ReLU | 3.056 h | 0.970 | 0.972 | 0.969 | 0.970 |
| Sigmoid | 2.985 h | 0.966 | 0.968 | 0.963 | 0.966 |
Ablation experiments.
| CSP1_X | CSP2_X | H-Swish | Face | Face_Mask | WMI | mAP@.50 | FPS |
|---|---|---|---|---|---|---|---|
| × | × | × | 0.943 | 0.969 | 0.944 | 0.952 | 23.83 |
| √ | × | × | 0.982 | 0.984 | 0.972 | 0.979 | 43.47 |
| × | √ | × | 0.969 | 0.993 | 0.962 | 0.975 | 45.45 |
| √ | √ | × | 0.971 | 0.993 | 0.967 | 0.977 | 47.65 |
| √ | √ | √ | 0.979 | 0.995 | 0.973 | 0.983 | 54.57 |