| Literature DB >> 36035694 |
Weikuan Jia1,2, Jinmeng Wei1, Qi Zhang1, Ningning Pan1, Yi Niu1,3, Xiang Yin4, Yanhui Ding1, Xinting Ge1,5.
Abstract
Fruit and vegetable picking robots are affected by the complex orchard environment, resulting in poor recognition and segmentation of target fruits by the vision system. The orchard environment is complex and changeable. For example, the change of light intensity will lead to the unclear surface characteristics of the target fruit; the target fruits are easy to overlap with each other and blocked by branches and leaves, which makes the shape of the fruits incomplete and difficult to accurately identify and segment one by one. Aiming at various difficulties in complex orchard environment, a two-stage instance segmentation method based on the optimized mask region convolutional neural network (mask RCNN) was proposed. The new model proposed to apply the lightweight backbone network MobileNetv3, which not only speeds up the model but also greatly improves the accuracy of the model and meets the storage resource requirements of the mobile robot. To further improve the segmentation quality of the model, the boundary patch refinement (BPR) post-processing module is added to the new model to optimize the rough mask boundaries of the model output to reduce the error pixels. The new model has a high-precision recognition rate and an efficient segmentation strategy, which improves the robustness and stability of the model. This study validates the effect of the new model using the persimmon dataset. The optimized mask RCNN achieved mean average precision (mAP) and mean average recall (mAR) of 76.3 and 81.1%, respectively, which are 3.1 and 3.7% improvement over the baseline mask RCNN, respectively. The new model is experimentally proven to bring higher accuracy and segmentation quality and can be widely deployed in smart agriculture.Entities:
Keywords: MobileNetv3; boundary patch refinement; green fruit; instance segmentation; mask RCNN
Year: 2022 PMID: 36035694 PMCID: PMC9399748 DOI: 10.3389/fpls.2022.955256
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
Overview of previous research work.
|
|
|
|
|
|
|---|---|---|---|---|
| Support Vector Machine (SVM) | SVM is a linear binary classifier with the largest interval defined in feature space | Fruit detection and grading algorithm (Bhargava and Bansal, | A. Strong robustness; B. High accuracy | A. Unable to multi classify; |
| Cluster Analysis (CA) | CA is the analytical process of grouping datasets into multiple classes consisting of similar objects | K-means clustering algorithm (Pham and Lee, | A. Fast convergence speed of the algorithm; B. Easy and efficient algorithm | A. Poor anti-interference ability; B. Clustering centers need artificial intervention and can easily lead to subjective errors |
| Convolutional Neural Network (CNN) | CNN is a deep neural network with convolutional structure, which can be directly used for image processing | Deep leaf algorithm (Triki et al., | A. Anchor-based methods have high accuracy; B. Anchor-free methods have low complexity and faster speed | A. Unable to balance speed and accuracy; B. Poor recognition of small and occluded objects |
| Generative Adversarial Network (GAN) | Generating networks and discriminating networks interact to learn the distribution of data | GAN data augmentation (Bird et al., | A. Fast sample generation; B. Reduce data preparation | A. Model training instability; B. The model is not easy to converge |
Figure 1Green persimmon fruit images under different complex orchard environments. (A) Overlapps, (B) Back-sunlighting fruit, (C) Occlusions, (D) Direct-sunlighting fruit, (E) Distant fruit, (F) LEDs-lighting fruit, (G) Close shot the fruit, (H) Side- Back-sunlighting fruit, (I) Fruit after rain.
Figure 2The overall structure of the optimized mask RCNN. The features of the input image are extracted through MobileNetv3 and FPN structure. The extracted features are used for subsequent classification, regression, and mask operations. Finally, the final segmentation result is obtained by optimizing the rough mask boundary of the boundary through the BPR module.
Figure 3Overall structure of MobileNetv3 network.
Figure 4Overall structure of squeeze-and-excite.
Specification for MobileNetv3.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 2242 × 3 | Conv2d, 3 × 3 | - | 16 | - | H-Swish | 2 |
| 1122 × 16 | Bneck, 3 × 3 | 16 | 16 | √ | ReLU | 2 |
| 562 × 16 | Bneck, 3 × 3 | 72 | 24 | - | ReLU | 2 |
| 282 × 24 | Bneck, 3 × 3 | 88 | 24 | - | ReLU | 1 |
| 282 × 24 | Bneck, 5 × 5 | 96 | 40 | √ | H-Swish | 2 |
| 142 × 40 | Bneck, 5 × 5 | 240 | 40 | √ | H-Swish | 1 |
| 142 × 40 | Bneck, 5 × 5 | 240 | 40 | √ | H-Swish | 1 |
| 142 × 40 | Bneck, 5 × 5 | 120 | 48 | √ | H-Swish | 1 |
| 142 × 48 | Bneck, 5 × 5 | 144 | 48 | √ | H-Swish | 1 |
| 142 × 48 | Bneck, 5 × 5 | 288 | 96 | √ | H-Swish | 2 |
| 72 × 96 | Bneck, 5 × 5 | 576 | 96 | √ | H-Swish | 1 |
| 72 × 96 | Bneck, 5 × 5 | 576 | 96 | √ | H-Swish | 1 |
| 72 × 96 | Conv2d, 1 × 1 | - | 576 | √ | H-Swish | 1 |
| 72 × 576 | Pool, 7 × 7 | - | - | - | - | 1 |
| 12 × 576 | Conv2d, 1 × 1, NBN | - | 1280 | - | H-Swish | 1 |
| 12 × 1280 | Conv2d, 1 × 1, NBN | - | k | - | - | 1 |
NBN represents no batch normalization. “-” represents not available. Bneck represents bottleneck. “Exp size” represents the size of the expanded dimension. “Out” represents the output feature matrix channel. “SE” represents squeeze-and-excitation.
Figure 5Structure diagram of RPN.
Figure 6Overall structure diagram of BPR. Where (5) and (6) represent rough mask patches and refined mask patches respectively. “After” represents the result graph after mask optimization. “Before” represents the result before the rough mask optimization.
Figure 7Overall experimental flow chart based on optimized mask RCNN model.
Based on the optimized mask RCNN model, the average precision, and average recall of persimmon datasets at different thresholds, sizes, and quantities are analyzed.
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
| Value (%) | 76.3 | 93.0 | 33.2 | 78.4 | 90.7 | 81.1 | 45.0 | 83.3 | 92.5 |
The lower right corner of the evaluation character in the table represents the size of the target fruit. S represents the small size of the target fruit; that is, the range of the rectangular area is [0, 32*32]. M represents medium size target fruit; that is, the range of rectangular area is [32*32, 96*96]. L represents the large size target fruit; that is, the range of the rectangular area is [96*96, ∞]. The number in the lower right corner of the character indicates that the threshold of IoU is 0.5.
Figure 8Visual diagram based on optimized mask RCNN test. (A) Distant fruit, (B) LEDs-lighting fruit, (C) Fruit after rain, (D) Overlapps, (E) Close shot the fruit, (F) Occlusions.
Comparison of instance segmentation performance of five different networks.
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
| SOLOv2 | 58.9 | 85.1 | 18.7 | 63.2 | 68.7 | 67.9 | 29.9 | 70.7 | 78.2 |
| Mask RCNN | 73.2 | 90.6 | 29.6 | 75.2 | 87.6 | 77.4 | 41.6 | 79.3 | 90.3 |
| YOLACT | 64.6 | 88.6 | 20.0 | 67.1 | 81.2 | 72.5 | 34.5 | 74.2 | 86.9 |
| Cascade_RCNN | 71.4 | 90.7 | 23.9 | 72.7 | 88.5 | 75.5 | 36.9 | 77.2 | 90.6 |
| Ours | 76.3 | 93.0 | 33.2 | 78.4 | 90.7 | 81.1 | 45.0 | 83.3 | 92.5 |
Figure 9Comparison of segmentation effects of five different models.
The effect comparison of three different models with or without BPR module.
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
| SOLO | 37.8 | 60 | 5.3 | 36.6 | 62.1 | 49.4 | 10.4 | 49.2 | 72.1 |
| SOLO+ | 39.7 | 62.3 | 7.1 | 39 | 64.5 | 53 | 12.7 | 51.2 | 74.6 |
| D2Det | 58 | 83.2 | 40.6 | 65 | 87.8 | 61.9 | 45.9 | 69.1 | 89.6 |
| D2Det+ | 61.1 | 86 | 43.3 | 67.9 | 90.1 | 65.1 | 48.8 | 72 | 92.7 |
| Mask RCNN | 73.2 | 90.6 | 29.6 | 75.2 | 87.6 | 77.4 | 41.6 | 79.3 | 90.3 |
| Mask RCNN+ | 75.8 | 92.9 | 32.5 | 77.9 | 89 | 80.6 | 44.5 | 82 | 92.1 |
“+” represents that BPR module is added to the model.