| Literature DB >> 35808445 |
Yukuan Liu1, Guanglin He1, Zehu Wang1, Weizhe Li1, Hongfei Huang1.
Abstract
To address the problems of tiny objects and high resolution of object detection in remote sensing imagery, the methods with coarse-grained image cropping have been widely studied. However, these methods are always inefficient and complex due to the two-stage architecture and the huge computation for split images. For these reasons, this article employs YOLO and presents an improved architecture, NRT-YOLO. Specifically, the improvements can be summarized as: extra prediction head and related feature fusion layers; novel nested residual Transformer module, C3NRT; nested residual attention module, C3NRA; and multi-scale testing. The C3NRT module presented in this paper could boost accuracy and reduce complexity of the network at the same time. Moreover, the effectiveness of the proposed method is demonstrated by three kinds of experiments. NRT-YOLO achieves 56.9% mAP0.5 with only 38.1 M parameters in the DOTA dataset, exceeding YOLOv5l by 4.5%. Also, the results of different classifications show its excellent ability to detect small sample objects. As for the C3NRT module, the ablation study and comparison experiment verified that it has the largest contribution to accuracy increment (2.7% in mAP0.5) among the improvements. In conclusion, NRT-YOLO has excellent performance in accuracy improvement and parameter reduction, which is suitable for tiny remote sensing object detection.Entities:
Keywords: YOLOv5; nested residual transformer; remote sensing imagery; tiny object detection
Mesh:
Year: 2022 PMID: 35808445 PMCID: PMC9269754 DOI: 10.3390/s22134953
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1The architecture of YOLOv5 v6.1.
Figure 2The structures of C3 and BottleNeck.
Figure 3The architecture of the proposed NRT-YOLO.
Figure 4The structures of C3NRT and NRT.
Figure 5The comparation between C3NRT and other structures.
Figure 6The structure of C3NRA.
Figure 7Sample images in DOTA.
Figure 8Distribution of classes and sizes of objects in DOTAv1.5: (a) Distribution histogram of the classes of the labels; (b) Heat map of the size of the objects.
Comparison results between NRT-YOLO and other YOLO methods.
| Method | P (%) | R (%) | mAP0.5 (%) | mAP (%) | Parameters (M) | GFLOPs | Latency (ms) |
|---|---|---|---|---|---|---|---|
| YOLOv4 | 76.8 | 43.2 | 48.1 | 27.6 | 26.9 | 51.0 | 37.8 |
| YOLOv5m | 79.0 | 46.2 | 51.0 | 30.2 |
|
|
|
| YOLOv5l | 79.3 | 47.6 | 52.4 | 31.4 | 46.2 | 108.0 | 41.9 |
| YOLOv5x |
| 49.2 | 54.1 | 32.2 | 86.3 | 204.3 | 51.6 |
| NRT-YOLO | 78.1 |
|
|
| 38.1 | 115.2 | 48.7 |
* The best results of every metric are bolded.
Comparison results of COCO evaluation.
| Method | mAPsmall (%) | mAPmedium (%) | mAPlarge (%) |
|---|---|---|---|
| YOLOv4 | 6.7 | 21.5 | 39.1 |
| YOLOv5m | 8.3 | 23.9 | 42.5 |
| YOLOv5l | 9.0 | 25.8 | 43.2 |
| YOLOv5x | 10.6 | 26.8 | 46.6 |
| NRT-YOLO |
|
|
|
* The best results of every metric are bolded.
Comparison results (mAP0.5) of different classifications between NRT-YOLO and other YOLO methods.
| Class | YOLOv4 | YOLOv5m | YOLOv5l | YOLOv5x | NRT-YOLO |
|---|---|---|---|---|---|
| Small vehicle | 28.6 | 31.6 | 31.9 | 32.1 |
|
| Large vehicle | 66.0 | 69.0 | 68.8 | 68.7 |
|
| Plane | 77.2 | 79.9 | 80.0 | 79.3 |
|
| Storage tank | 42.4 | 50.2 | 44.6 | 46.4 |
|
| Ship | 56.9 | 58.8 | 61.0 | 61.5 |
|
| Harbor | 69.3 | 71.2 | 72.0 |
| 72.4 |
| Ground track field | 32.8 | 35.9 | 41.8 | 40.3 |
|
| Soccer ball field | 38.2 | 38.2 | 38.6 |
| 38.1 |
| Tennis court | 90.4 | 93.0 | 92.8 |
| 93.0 |
| Swimming pool | 49.9 | 53.7 | 54.7 | 57.1 |
|
| Baseball diamond | 59.1 | 61.4 | 68.2 | 69.9 |
|
| Roundabout | 36.7 | 37.0 | 42.2 |
| 43.3 |
| Basketball-court | 45.2 | 47.8 | 49.3 | 51.0 |
|
| Bridge | 27.4 | 34.9 | 35.6 |
| 39.9 |
| Helicopter | 1.5 | 2.5 | 5.0 | 5.6 |
|
* The best results of each classification are bolded.
Figure 9The detection examples: (a) Detection result of an example from DOTA validation set using YOLOv5l; (b) Detail view of (a); (c) Detection of the same picture; (d) Detail view of (c).
Results of ablation study.
| Method | mAP0.5 (%) | mAP (%) | Parameters (M) | GFLOPs | Latency (ms) |
|---|---|---|---|---|---|
| YOLOv5l-1024 | 52.4 | 31.4 | 46.2 | 108.0 | 41.9 |
| +prediction head | 53.4 (+1.0) * | 31.5 (+0.1) | 47.2 | 127.8 | 47.9 |
| +C3NRT | 56.1 (+2.7) | 33.4 (+1.9) | 44.5 | 126.0 | 49.9 |
| +C3NRA | 56.0 (−0.1) | 33.4 (−−) | 37.9 | 113.8 | 48.7 |
| +ms testing | 56.9 (+0.9) | 33.5 (+0.1) | 37.9 | 113.8 | 48.7 |
* The increments of mAP0.5 and mAP are noted in parentheses.
Ablation study of different classifications (mAP0.5).
| Class | YOLOv5l | +Detection Head | +C3NRT | +C3NRA | +ms Testing |
|---|---|---|---|---|---|
| Small vehicle | 31.9 |
| 33.4 | 33.3 | 33.2 |
| Large vehicle | 68.8 |
| 71.7 | 70.9 | 71.7 |
| Plane | 80.0 | 80.6 | 80.8 | 81.2 |
|
| Storage tank | 44.6 |
|
| 53.7 | 52.2 |
| Ship | 61.0 |
|
| 66.3 | 65.6 |
| Harbor | 72.0 |
| 74.6 | 74.6 | 72.4 |
| Ground track field | 41.8 | 42.0 |
| 44.7 |
|
| Soccer ball field | 38.6 | 36.0 |
| 37.6 | 38.1 |
| Tennis court | 92.8 | 91.7 |
| 92.6 | 93.0 |
| Swimming pool | 54.7 | 54.4 |
| 54.7 |
|
| Baseball diamond | 68.2 | 67.8 |
| 68.9 |
|
| Roundabout | 42.2 | 37.9 |
| 40.5 |
|
| Basketball-court | 49.3 |
|
| 52.7 |
|
| Bridge | 35.6 |
|
| 39.7 | 39.9 |
| Helicopter | 5.0 | 6.2 |
|
| 27.7 |
* The data with improvement over 1% compared to previous measure is bolded.
Comparison results between C3NRT and other Transformer blocks.
| Method | P (%) | R (%) | mAP0.5 (%) | mAP (%) | Parameters (M) | GFLOPs |
|---|---|---|---|---|---|---|
| C3-Trans [ | 78.8 | 49.1 | 53.5 | 31.6 | 46.7 |
|
| MixBlock [ | 80.1 |
| 56.1 |
| 59.0 | 138.7 |
| C3NRT |
| 50.2 |
| 33.4 |
| 126.0 |
* The best results of every metric are bolded.