| Literature DB >> 35494849 |
Yan Dong1, Yundong Liu1, Haonan Kang2, Chunlei Li1, Pengcheng Liu3, Zhoufeng Liu1.
Abstract
Advancements in deep neural networks have made remarkable leap-forwards in crop detection. However, the detection of wheat ears is an important yet challenging task due to the complex background, dense targets, and overlaps between wheat ears. Currently, many detectors have made significant progress in improving detection accuracy. However, some of them are not able to make a good balance between computational cost and precision to meet the needs of deployment in real world. To address these issues, a lightweight and efficient wheat ear detector with Shuffle Polarized Self-Attention (SPSA) is proposed in this paper. Specifically, we first utilize a lightweight backbone network with asymmetric convolution for effective feature extraction. Next, SPSA attention is given to adaptively select focused positions and produce a more discriminative representation of the features. This strategy introduces polarized self-attention to spatial dimension and channel dimension and adopts Shuffle Units to combine those two types of attention mechanisms effectively. Finally, the TanhExp activation function is adopted to accelerate the inference speed and reduce the training time, and CIOU loss is used as the border regression loss function to enhance the detection ability of occlusion and overlaps between targets. Experimental results on the Global Wheat Head Detection dataset show that our method achieves superior detection performance compared with other state-of-the-art approaches.Entities:
Keywords: Deep neural networks; Lightweight; Polarized self-attention; Wheat ears
Year: 2022 PMID: 35494849 PMCID: PMC9044259 DOI: 10.7717/peerj-cs.931
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1The overall structure of LE-SPSANet:
(A) Asymmetric backbone with SPSA attention; (B) the neck use the structure like PANet; (C) three YOLO detection heads use the feature maps from encoder blocks in neck; (D) ASB and TanhExp components are provided in Fig. 2C and Eq. (1), respectively.
Figure 2Different types of basic convolution blocks.
(A) Bottleneck. (B) Inverted bottleneck. (C) AsymmBlock.
Backbone network structure.
| Stage | Input | Type | Exp size | Out | SPSA | Stride |
|---|---|---|---|---|---|---|
| 1024*1024*3 | conv2d | – | 16 | – | 2 | |
| P1 | 512*512*16 | ASB, 3*3 | 16 | 16 | – | 1 |
| 512*512*16 | ASB, 3*3 | 64 | 24 | – | 2 | |
| P2 | 256*256*24 | ASB, 3*3 | 72 | 24 | – | 1 |
| 256*256*24 | ASB, 5*5 | 72 | 40 | + | 2 | |
| 128*128*40 | ASB, 5*5 | 120 | 40 | + | 1 | |
| P3 | 128*128*40 | ASB, 5*5 | 120 | 40 | + | 1 |
| 128*128*40 | ASB, 3*3 | 240 | 80 | – | 2 | |
| 64*64*80 | ASB, 3*3 | 200 | 80 | – | 1 | |
| 64*64*80 | ASB, 3*3 | 184 | 80 | – | 1 | |
| 64*64*80 | ASB, 3*3 | 184 | 80 | – | 1 | |
| 64*64*80 | ASB, 3*3 | 480 | 112 | + | 1 | |
| P4 | 64*64*112 | ASB, 3*3 | 672 | 112 | + | 1 |
| 32*32*112 | ASB, 5*5 | 672 | 160 | + | 2 | |
| 32*32*160 | ASB, 5*5 | 960 | 160 | + | 1 | |
| P5 | 32*32*160 | ASB, 5*5 | 960 | 160 | + | 1 |
Figure 3Different activation functions.
Figure 4An overview of the proposed SPSA module.
Figure 5IOU diagram.
Some of the image characteristics of the sub-datasets composing the GWHD dataset.
| Sub-dataset name | Target stage | Row spacing (cm) | Seeding density | Focal length |
|---|---|---|---|---|
| UTokyo_1 | Post-flowering | 15.0 | 186 | 10.0 |
| UTokyo_2 | Flowering | 12.5 | 200 | 7.0 or 4.0 |
| Arvalis_1 | Post-flowering-Ripening | 17.5 | 300 | 50.0 and 60.0 |
| Arvalis_2 | Post-flowering | 17.5 | 300 | 7.7 |
| Arvalis_3 | Post-flow, erring-Ripening | 17.5 | 300 | 7.7 |
| INRAE_1 | Post-flowering-Ripening | 16.0 | 300 | 7.7 |
| USask_1 | n.a | 30.5 | 250 | 16.0 |
| RRes_1 | n.a | n.a | 350 | 50.0 |
| ETHZ_1 | n.a | 12.5 | 400 | 35.0 |
| NAU_1 | Flowering | 20.0 | 300 or 450 | 24.0 |
| UQ_1 | Flowering-Ripening | 22.0 | 150 | 55.0 |
Hardware and software configuration.
| Name | Parameter |
|---|---|
| System | Ubuntu18.04 |
| GPU | NVIDIA GTX 1080Ti-12G |
| CUDA | 10.0 |
| CUDNN | 7.3.1 |
| Pytorch | 1.8.1 |
Figure 6(A–F) Visualization results.
The results of each detection algorithm.
| Algorithm | Input | Model size (MB) | FPS (frames/s) | MAP% |
|---|---|---|---|---|
| EfficientDet (D0) | 512*512 | 15.07 | 14 | 85.5 |
| YOLOv4 | 416*416 | 244.29 | 14 | 91.2 |
| RetinaNet | 416*416 | 138.91 | 16 | 91.9 |
| YOLOv3 | 416*416 | 235.04 | 39 | 92.1 |
| YOLOv5(x) | 1024*1024 | 177.5 | 10 | 93.6 |
| LE-SPSANet | 1024*1024 | 9.0 | 25 | 94.4 |
Comparison of ablation result.
| AsymmNet | CIOU | SPSA | TanhExp | Model size (MB) | FPS (frames/s) | MAP% | |
|---|---|---|---|---|---|---|---|
| 1 | – | – | – | – | 177.5 | 10 | 93.6 |
| 2 | + | – | – | – | 12.9 | 23 | 91.7 |
| 3 | + | + | – | – | 12.9 | 27 | 93.5 |
| 4 | + | + | + | – | 9.0 | 21 | 94.3 |
| 5 | + | + | + | + | 9.0 | 25 | 94.4 |
Figure 7Comparison of the effects of different modules on wheat ear detection.
The three pictures (respectively represent high overlap, densely packed wheat ears, and similar backgrounds) in each row are a group, namely Group1–Group5.