| Literature DB >> 35310676 |
Yue Teng1,2, Jie Zhang1, Shifeng Dong1,2, Shijian Zheng3, Liu Liu4.
Abstract
Pest disaster severely reduces crop yield and recognizing them remains a challenging research topic. Existing methods have not fully considered the pest disaster characteristics including object distribution and position requirement, leading to unsatisfactory performance. To address this issue, we propose a robust pest detection network by two customized core designs: multi-scale super-resolution (MSR) feature enhancement module and Soft-IoU (SI) mechanism. The MSR (a plug-and-play module) is employed to improve the detection performance of small-size, multi-scale, and high-similarity pests. It enhances the feature expression ability by using a super-resolution component, a feature fusion mechanism, and a feature weighting mechanism. The SI aims to emphasize the position-based detection requirement by distinguishing the performance of different predictions with the same Intersection over Union (IoU). In addition, to prosper the development of agricultural pest detection, we contribute a large-scale light-trap pest dataset (named LLPD-26), which contains 26-class pests and 18,585 images with high-quality pest detection and classification annotations. Extensive experimental results over multi-class pests demonstrate that our proposed method achieves the best performance by 67.4% of mAP on the LLPD-26 while being 15.0 and 2.7% gain than state-of-the-art pest detection AF-RCNN and HGLA respectively. Ablation studies verify the effectiveness of the proposed components.Entities:
Keywords: Soft-IoU; agricultural pest detection; convolutional neural network; feature enhancement; wisdom agriculture
Year: 2022 PMID: 35310676 PMCID: PMC8927730 DOI: 10.3389/fpls.2022.810546
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1The schematic diagram of different prediction bounding boxes with the same Intersection over Union (IoU). (A) The prediction box contains all object pixels. (B) The prediction box contains almost all object pixels. (C) The prediction box contains pixels of another category (motorbike). (D) Most of the pixels in the prediction box are other categories (motorcycles).
Figure 2The overall framework of the MSR-RCNN.
Figure 3The super-resolution feature enhancement component.
Figure 4The feature full fusion mechanism.
Figure 5The feature full weighting mechanism.
The overall performance comparison.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| ||||||
| Faster R-CNN (Ren et al., | 35.4 | 62.3 | 37.7 | 50.5 | ||
| Cascade R-CNN (Cai and Vasconcelos, | 36.0 | 62.6 | 38.5 | 50.2 | ||
| Libra R-CNN (Pang et al., | 37.4 | 65.2 | 40.2 | 52.8 | ||
| FCOS (Tian et al., | 33.3 | 57.4 | 36.2 |
| ||
| RetinaNet (Lin et al., | 27.9 | 48.8 | 29.4 | 53.1 | ||
|
| ||||||
| AF-RCNN (Jiao et al., | 33.1 | 58.6 | 34.6 | 48.8 | ||
| HGLA (Liu et al., | 37.0 | 65.6 | 38.3 | 52.0 | ||
|
| ||||||
| MSR-RCNN | √ | 38.0 | 66.9 | 40.0 | 52.4 | |
| MSR-RCNN | √ | √ |
|
|
| 52.0 |
Compare results by category on our LLPD-26 dataset using AP50.
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| 1 | 16.1 | 19.2 | 21.7 | 4.5 | 12.8 | 20.1 |
| 20.4 |
| 2 | 58.7 | 58.9 | 63.5 | 54.4 | 58.7 | 63.3 | 66.1 |
|
| 3 | 70.2 | 67.9 | 70.1 | 60.9 | 65.5 | 71.7 | 72.8 |
|
| 4 | 69.6 | 69.4 | 70.9 | 58.0 | 66.0 | 72.8 | 72.3 |
|
| 5 | 84.9 | 85.2 | 85.0 | 80.7 | 83.5 | 86.1 |
| 85.8 |
| 6 | 72.1 | 71.1 | 74.4 | 66.0 | 70.4 | 76.2 |
| 77.4 |
| 7 | 72.5 | 71.9 | 73.4 | 62.4 | 70.9 | 74.0 |
| 74.5 |
| 8 | 62.0 | 60.6 | 66.7 | 57.5 | 59.4 | 66.1 | 65.5 |
|
| 9 | 47.5 | 47.5 | 50.9 | 43.0 | 47.3 | 51.9 |
| 53.5 |
| 10 | 70.9 | 70.5 | 74.2 | 59.6 | 68.5 | 74.2 | 77.2 |
|
| 11 | 79.3 | 78.2 | 80.3 | 73.2 | 76.1 | 81.6 | 81.0 |
|
| 12 | 27.7 | 26.9 | 26.7 | 0.10 | 25.5 |
| 29.5 | 32.3 |
| 13 | 55.3 | 58.3 | 54.6 | 41.5 | 53.4 | 55.4 | 56.8 |
|
| 14 | 66.7 | 64.5 | 66.4 | 57.3 | 62.0 | 67.4 | 67.5 |
|
| 15 | 39.8 | 45.3 | 47.3 | 8.10 | 33.1 | 45.2 | 48.0 |
|
| 16 | 40.2 | 45.2 | 51.7 | 7.50 | 33.0 | 50.7 | 49.6 |
|
| 17 | 57.9 | 65.1 | 66.8 | 15.0 | 55.4 | 70.8 |
| 70.6 |
| 18 | 56.1 | 58.7 | 60.5 | 35.9 | 55.0 | 58.0 |
| 63.3 |
| 19 | 56.6 | 58.4 | 64.9 | 54.3 | 57.7 | 61.9 | 65.1 |
|
| 20 | 83.0 | 82.7 | 82.1 | 78.1 | 80.6 | 83.7 | 83.3 |
|
| 21 | 89.5 | 89.5 | 89.5 | 86.9 | 87.5 | 90.0 | 90.0 |
|
| 22 | 93.1 | 92.4 | 94.4 | 93.8 | 91.7 | 94.4 |
| 94.4 |
| 23 | 59.9 | 51.7 | 63.2 | 54.1 | 54.1 | 61.2 |
| 63.9 |
| 24 | 72.8 | 73.3 | 74.9 | 64.1 | 71.4 | 74.8 | 74.0 |
|
| 25 | 53.3 | 49.7 | 54.8 | 1.20 | 14.8 | 50.0 |
| 56.4 |
| 26 | 64.8 | 70.4 | 65.0 | 49.6 | 68.2 | 70.2 |
| 63.1 |
| Mean | 62.3 | 62.8 | 65.2 | 48.8 | 58.6 | 65.6 | 66.9 |
|
The parts in bold represent the best performance.
Figure 6Improved performance of our MSR-RCNN on pest data of different sizes.
Figure 7The training loss and mAP50. (A) The comparison of training loss. (B) The comparison of test accuracy.
Figure 8Ablation of β in the Soft-IoU (SI).
MSR-RCNN network performance comparison results using different backbones.
|
|
|
|
| |
|---|---|---|---|---|
|
|
| 66.1 | 66.3 | 66.7 |
|
| 40.0 | 39.4 |
| 39.6 |
|
|
| 37.4 |
| 37.8 |
The parts in bold represent the best performance.
The performance of MSR with various detection methods.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Faster R-CNN (Ren et al., | 34.8 | 61.8 | 36.1 | 51.5 | |
| Faster R-CNN + FPN (Lin et al., | 35.4 | 62.3 | 37.7 | 50.5 | |
| Faster R-CNN + MSR | √ |
|
|
|
|
| Cascade R-CNN + FPN (Cai and Vasconcelos, | 36.0 | 62.6 | 38.5 | 50.2 | |
| Cascade R-CNN + MSR | √ |
|
|
|
|
| FCOS + FPN (Tian et al., | 33.1 | 57.0 | 35.9 |
| |
| FCOS + MSR | √ |
|
|
| 54.8 |
| RetinaNet + FPN (Lin et al., | 27.9 | 48.8 | 29.4 |
| |
| RetinaNet + MSR | √ |
|
|
| 52.7 |
The parts in bold represent the best performance.
Detection performance comparison on general object detection datasets.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| PASCAL VOC | Faster R-CNN | Resnet50 | - | 81.0 | - | - | - |
| MSR-RCNN | - | 81.8 | - | - | - | ||
| COCO | Faster R-CNN | Resnet50 | 37.4 | 58.1 | 40.4 | 21.2 | 41.0 |
| MSR-RCNN | 37.5 | 59.8 | 40.0 | 21.7 | 41.4 |
Where
represents the method of reproduction using MMdetection.
Figure 9The performance comparison between MSR-RCNN and Faster R-CNN on different datasets.
Figure 10Visualization results.