| Literature DB >> 35408247 |
Yaonan Dai1, Jiuyang Yu1, Dean Zhang1, Tianhao Hu1, Xiaotao Zheng1.
Abstract
Aiming at the problem of Transformers lack of local spatial receptive field and discontinuous boundary loss in rotating object detection, in this paper, we propose a Transformer-based high-precision rotating object detection model (RODFormer). Firstly, RODFormer uses a structured transformer architecture to collect feature information of different resolutions to improve the collection range of feature information. Secondly, a new feed-forward network (spatial-FFN) is constructed. Spatial-FFN fuses the local spatial features of 3 × 3 depthwise separable convolutions with the global channel features of multilayer perceptron (MLP) to solve the deficiencies of FFN in local spatial modeling. Finally, based on the space-FFN architecture, a detection head is built using the CIOU-smooth L1 loss function and only returns to the horizontal frame when the rotating frame is close to the horizontal, so as to alleviate the loss discontinuity of the rotating frame. Ablation experiments of RODFormer on the DOTA dataset show that the Transformer-structured module, the spatial-FFN module and the CIOU-smooth L1 loss function module are all effective in improving the detection accuracy of RODFormer. Compared with 12 rotating object detection models on the DOTA dataset, RODFormer has the highest average detection accuracy (up to 75.60%), that is, RODFormer is more competitive in rotating object detection accuracy.Entities:
Keywords: RODFormer; rotating object detection; spatial-FFN; structured transformers
Year: 2022 PMID: 35408247 PMCID: PMC9003240 DOI: 10.3390/s22072633
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Structure of RODFormer.
Comparison of the target size of the dataset.
| Dataset | 10–50 Pixels | 50–300 Pixels | >300 Pixels |
|---|---|---|---|
| DOTA | 0.57 | 0.41 | 0.02 |
| NWPU VHR-10 | 0.15 | 0.83 | 0.02 |
| MSCOCO | 0.43 | 0.49 | 0.08 |
| PASCAL VOC | 0.14 | 0.61 | 0.25 |
Figure 2Structure of block.
Figure 3Structure of spatial-FFN.
Figure 4Structure of Neck. (a) PANet structure; (b) Bidirectional fusion structure.
Figure 5Structure of the head.
Figure 6Structure of eight-parameter regression.
Experimental environment and parameter settings.
| Configure | Setting | Parameter | Setting |
|---|---|---|---|
| Experiment system | Ubuntu 20.04 | Backbone | ViT-B4 |
| Learning framework | 1.10 | Total batch size | 16 |
| GPU | Nvidia RTX 3090 Ti | Epoch | 300 |
| Initial weight | Xavier init | Initial learning rate | 10−4 |
| Programming language | Python 3.9 | Weight decay rate | 0.0001 |
| Number stage | 1, 2, 5, 8 | ||
| IoU | 0.1 |
Results of ablation experiments with RODFormer.
| Backbone | STS | SFM | C-SL1 | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC |
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||||||||||
| ResNet50 | × | × | × | 88.03 | 74.49 | 38.02 | 66.34 | 60.24 | 46.56 | 68.20 | 86.39 | 77.12 | 78.28 | 52.50 | 61.15 | 50.82 | 60.21 | 49.99 | 63.89 |
| ResNet152 | × | × | × | 88.92 | 77.82 | 41.50 | 61.86 | 67.32 | 53.97 | 72.19 | 89.88 | 78.65 | 74.92 | 53.25 | 58.41 | 52.47 | 68.85 | 62.78 | 66.85 |
| ViT-B4 | × | × | × | 88.21 | 76.63 | 45.81 | 70.25 | 66.21 | 71.59 | 80.69 | 89.64 | 80.69 | 80.25 | 57.06 | 58.61 | 62.93 | 60.96 | 48.51 | 69.20 |
| ViT-B4 | √ | × | × | 89.51 | 77.58 | 47.51 | 69.03 | 68.94 | 77.69 | 82.01 | 86.50 | 82.36 | 82.12 | 59.08 | 56.02 | 58.90 | 62.38 | 53.06 | 70.38 |
| ViT-B4 | √ | √ | × |
| 79.59 | 48.93 | 71.43 | 72.54 | 80.51 | 87.95 |
| 86.09 |
| 60.01 | 60.39 | 62.94 | 68.02 | 58.98 | 73.44 |
| ViT-B4 | √ | √ | √ | 89.76 |
|
|
|
|
|
| 90.53 |
| 83.05 |
| 60.34 |
|
|
|
|
Comparison results of various models.
| Category | PL | BD | BR | GTF | SV | LV | SH | TC | BC | ST | SBF | RA | HA | SP | HC |
| |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model |
| ||||||||||||||||
| R2CNN (2017) | 80.94 | 65.67 | 35.34 | 67.44 | 59.92 | 50.91 | 55.81 | 90.67 | 66.92 | 72.39 | 55.06 | 52.23 | 55.14 | 53.35 | 48.22 | 60.67 | |
| RRPN (2018) | 88.52 | 71.20 | 31.66 | 59.30 | 51.85 | 56.19 | 57.25 | 90.81 | 72.84 | 67.38 | 56.69 | 52.84 | 53.08 | 51.94 | 53.58 | 61.01 | |
| RoI- | 88.64 | 78.52 | 43.44 |
| 68.81 | 73.68 | 83.59 | 90.74 | 77.27 | 81.46 | 58.39 | 53.54 | 62.83 | 58.93 | 47.67 | 69.56 | |
| CADNet (2019) | 87.80 | 82.40 | 49.40 | 73.50 | 71.10 | 63.50 | 76.60 |
| 79.20 | 73.30 | 48.40 | 60.90 | 62.00 | 67.00 | 62.20 | 69.90 | |
| DRN (2020) | 89.71 | 82.34 | 47.22 | 64.10 | 76.22 | 74.43 | 85.84 | 90.57 | 86.18 | 84.89 | 57.65 | 61.93 |
| 69.63 | 58.48 | 73.23 | |
| ICN (2018) | 81.40 | 74.30 | 47.70 | 70.30 | 64.90 | 67.80 | 70.00 | 90.80 | 79.10 | 78.20 | 53.60 | 62.90 | 67.00 | 64.20 | 50.20 | 68.20 | |
| RADet (2020) | 79.45 | 76.99 | 48.05 | 65.83 | 65.46 | 74.40 | 68.86 | 89.70 | 78.14 | 74.97 | 49.92 | 64.63 | 66.14 | 71.58 | 62.16 | 69.09 | |
| SCRDet (2019) |
| 80.65 | 52.09 | 68.36 | 68.36 | 60.32 | 72.41 | 90.85 |
|
| 65.02 | 66.68 | 66.25 | 68.24 | 65.21 | 72.61 | |
| MFIAR-Net (2020) | 89.62 | 84.03 | 52.41 | 70.30 | 70.13 | 67.64 | 77.81 | 90.85 | 85.40 | 86.22 | 63.21 | 64.14 | 68.31 | 70.21 | 62.11 | 73.49 | |
| IRetinaNet (2021) | 88.70 | 82.46 | 52.81 | 68.75 | 78.51 | 81.45 | 86.41 | 90.02 | 85.37 | 86.31 |
| 65.20 | 67.80 | 69.29 | 64.83 | 75.53 | |
| PolarDet (2021) | 89.73 |
| 45.30 | 63.32 | 78.44 | 76.65 | 87.13 | 90.79 | 80.58 | 85.89 | 60.97 |
| 68.20 |
|
| 75.02 | |
| S2A-Net (2021) | 89.11 | 82.84 | 48.37 | 71.11 | 78.11 | 78.39 | 87.25 | 90.83 | 84.90 | 85.64 | 60.36 | 62.60 | 65.26 | 69.13 | 57.94 | 74.12 | |
| RODFormer | 89.76 | 79.64 |
| 71.57 |
|
|
| 90.53 | 87.73 | 83.05 | 60.19 | 60.34 | 66.03 | 69.75 | 64.95 |
| |
Figure 7Comparison of RODFormer and IRetinaNet.
Figure 8Visualization results of some detected objects.