| Literature DB >> 33286143 |
Feng Hong1,2, Changhua Lu1, Chun Liu1, Ruru Liu2, Weiwei Jiang1, Wei Ju1, Tao Wang1.
Abstract
Human key-point detection is a challenging research field in computer vision. Convolutional neural models limit the number of parameters and mine the local structure, and have made great progress in significant target detection and key-point detection. However, the features extracted by shallow layers mainly contain a lack of semantic information, while the features extracted by deep layers contain rich semantic information but a lack of spatial information that results in information imbalance and feature extraction imbalance. With the complexity of the network structure and the increasing amount of computation, the balance between the time of communication and the time of calculation highlights the importance. Based on the improvement of hardware equipment, network operation time is greatly improved by optimizing the network structure and data operation methods. However, as the network structure becomes deeper and deeper, the communication consumption between networks also increases, and network computing capacity is optimized. In addition, communication overhead is also the focus of recent attention. We propose a novel network structure PGNet, which contains three parts: pipeline guidance strategy (PGS); Cross-Distance-IoU Loss (CIoU); and Cascaded Fusion Feature Model (CFFM).Entities:
Keywords: IoU; feature fusion; key-point detection; object detection
Year: 2020 PMID: 33286143 PMCID: PMC7516841 DOI: 10.3390/e22030369
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The proposed network to find key points of the human body.
Performance comparison of various network structures.
| Method | Backbone | Decoder | Postprocessing | Performance |
|---|---|---|---|---|
| Mask-R-CNN [ | ResNet-50-FPN | conv+deconv | offset regression | 63.1AP@COCO |
| DHN [ | ResNet-152 | deconv | Flip/sub-pixel shift | 73.7 AP@COCO |
| CNN [ | VGG-19 | conv | Flip/sub-pixel shift | 61.8 AP@COCO |
| PGNN [ | ResNet-50 | GlobalNet | Flip/sub-pixel shift | 68.7 AP@COCO |
| DetNet [ | ResNet-50 | deconv | Flip/sub-pixel shift | 69.7 AP@COCO |
| DENSENETS [ | ResNet-50 | deconv | - | 61.8AP@COCO |
| LCR-Net++ [ | ResNet-50 | deconv | Flip/sub-pixel shift | 73.2AP@COCO |
| HRNet [ | HRNet-152 | 1×1conv | Flip/sub-pixel shift | 77.0AP@COCO |
| [ | ResNet-101 | deconv | Flip/sub-pixel shift | 69.9 AP@COCO |
| PFAN [ | VGG-19 | multi-stage CNN | Flip/sub-pixel shift | 70.2 AP@COCO |
| Proposed method | ResNet-50-Pipeline | Deconv+1×1conv | offset regression | 77.2AP@COCO |
Figure 2The number of positive bounding boxes after the NMS, grouped by their IoU with the matched ground truth. In traditional NMS (blue bar), a significant portion of accurately localized bounding boxes get mistakenly suppressed due to the misalignment of classification confidence and localization accuracy, while IoU-guided NMS (yellow bar) preserves more accurately localized bounding boxes [35].
Logic operation based on bounding box regression.
| |
| |
| Where |
| Ensure: - IoU value; |
| 1:▲The area of |
| 2:▲The area of |
| 3:▲The area of overlap: |
| min( |
| 4:▲ |
Figure 3An of overview of proposed PGNet.ResNet-50 is used as the backbone. Using the cascaded fusion feature model (CFFM), the backbone network is divided into 5 stages, and the feature-guided network after the image is convolved is used to extract key-point features.
Figure 4An example pipeline-parallel assignment with four machines and an example timeline at one of machines, highlighting the temporal overlap of computation and activation/gradient communication.
Figure 5Distribution of bounding boxes for iterative training.
Parameter comparison of various network structures after different regularization processing.
| Backbone | Norm | APbbox | AP50bbox | AP75bbox | APSbbox | APMbbox | APLbbox |
|---|---|---|---|---|---|---|---|
| ResNet50+FPN | GN | 37.8 | 59.0 | 40.8 | 22.3 | 41.2 | 48.4 |
| syncGN | 37.7 | 58.5 | 41.1 | 22.3 | 40.2 | 48.9 | |
| CBN | 37.8 | 59.8 | 40.3 | 22.5 | 40.5 | 49.1 | |
| ResNet101+FPN | GN | 39.3 | 60.6 | 42.7 | 22.5 | 42.5 | 48.8 |
| syncGN | 39.3 | 59.8 | 43.0 | 22.3 | 42.9 | 51.6 | |
| CBN | 39.2 | 60.0 | 42.2 | 22.3 | 42.6 | 51.8 | |
| ResNet50+proposed | GN | 39.3 | 60.7 | 42.6 | 22.5 | 43.2 | 48.1 |
| syncGN | 39.3 | 59.8 | 43.5 | 23.4 | 43.7 | 51.9 | |
| CBN | 39.4 | 59.8 | 43.2 | 23.1 | 42.9 | 52.6 |
Figure 6Comparison of epoch trained by this method and epoch of other training methods.