| Literature DB >> 36186021 |
Hong Lin1, Rita Tse1,2, Su-Kit Tang1,2, Zhen-Ping Qiang3, Giovanni Pau1,4,5.
Abstract
Image-based deep learning method for plant disease diagnosing is promising but relies on large-scale dataset. Currently, the shortage of data has become an obstacle to leverage deep learning methods. Few-shot learning can generalize to new categories with the supports of few samples, which is very helpful for those plant disease categories where only few samples are available. However, two challenging problems are existing in few-shot learning: (1) the feature extracted from few shots is very limited; (2) generalizing to new categories, especially to another domain is very tough. In response to the two issues, we propose a network based on the Meta-Baseline few-shot learning method, and combine cascaded multi-scale features and channel attention. The network takes advantage of multi-scale features to rich the feature representation, uses channel attention as a compensation module efficiently to learn more from the significant channels of the fused features. Meanwhile, we propose a group of training strategies from data configuration perspective to match various generalization requirements. Through extensive experiments, it is verified that the combination of multi-scale feature fusion and channel attention can alleviate the problem of limited features caused by few shots. To imitate different generalization scenarios, we set different data settings and suggest the optimal training strategies for intra-domain case and cross-domain case, respectively. The effects of important factors in few-shot learning paradigm are analyzed. With the optimal configuration, the accuracy of 1-shot task and 5-shot task achieve at 61.24% and 77.43% respectively in the task targeting to single-plant, and achieve at 82.52% and 92.83% in the task targeting to multi-plants. Our results outperform the existing related works. It demonstrates that the few-shot learning is a feasible potential solution for plant disease recognition in the future application.Entities:
Keywords: attention; cross-domain; few-shot learning; meta-learning; multi-scale feature fusion; plant disease recognition; sub-class classification; training strategy
Year: 2022 PMID: 36186021 PMCID: PMC9523606 DOI: 10.3389/fpls.2022.907916
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
The 14 species and 38 categories in PV.
|
|
|
|
|---|---|---|
| Apple | 4 | Apple scab, black rot, cedar apple rust, healthy |
| Blueberry | 1 | Healthy |
| Cherry | 2 | Healthy, powdery mildew |
| Corn | 4 | Gray leaf spot, common rust, healthy, northern leaf blight |
| Grape | 4 | Black rot, black measles, healthy, leaf blight |
| Orange | 1 | Haunglongbing |
| Peach | 2 | Bacterial spot, healthy |
| Pepper | 2 | Bacterial spot, healthy |
| Potato | 3 | Early blight, healthy, late blight |
| Raspberry | 1 | Healthy |
| Soybean | 1 | Healthy |
| Squash | 1 | Powdery mildew |
| Strawberry | 2 | Healthy |
| Tomato | 10 | Bacterial spot, early blight, healthy, late blight, leaf mold, septoria leaf spot, spider mites, target, mosaic virus, yellow leaf curl virus |
Figure 1(A) The original samples of AFD. (B) The leaf detection result by YOLO-v3. (C) The samples of 10 classes after segmentation and resizing.
Figure 2(A) The network architecture of our method. The training includes two stages: base-training stage and meta-learning stage. The CMSFF+CA Encoder is unfolded to CMSFF module and CA module. (B) The parallel multi-scale feature fusion and cascaded multi-scale feature fusion.
The algorithm of meta-learning.
|
|
|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
|
|
Three data settings of PV used in our experiments.
|
|
|
|
|
|---|---|---|---|
| PV-Setting-1 | (PV-1-22): apple-3,blueberry-1,cherry-2,corn-3,grape-3,orange-1,peach-2,pepper-1,potato-2,raspberry-1,soybean-1,squash-1,strawberry-1 | Apple-1,corn-1,grape-1,pepper-1,potato-1,strawberry-1 | (PV-1-10T): tomato-10 |
| PV-Setting-2 | (PV-2-22): apple-2,blueberry-1,cherry-1,corn-2,grape-2,orange-1,peach-1,pepper-1,potato-1,raspberry-1,soybean-1,squash-1,strawberry-1,tomato-6 | Apple-1,corn-1,grape-1,potato-1,tomato-2 | (PV-2-10): apple-1,cherry-1,corn-1,grape-1,peach-1,pepper-1,potato-1,strawberry-1,tomato-2 |
| PV-Setting-3 | (PV-3-10): apple-1,cherry-1,corn-1,grape-1,peach-1,pepper-1,potato-1,strawberry-1,tomato-2 | Apple-1,corn-1,grape-1,potato-1,tomato-2 | (PV-3-22): apple-2,blueberry-1,cherry-1,corn-2,grape-2,orange-1,peach-1,pepper-1,potato-1,raspberry-1,soybean-1,squash-1,strawberry-1,tomato-6 |
The total 38 classes are separated into three parts for training, validation and test, respectively. “Apple-1” means a class of apple species.
Figure 3(A) The testing classes of PV-Setting-1. (B) The testing classes of PV-Setting-2. (C) The testing classes of PV-Setting-3.
Figure 4The data formats used in base-training, meta-learning, and test. The five training strategies.
The group of experiments with different training strategies and different data settings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| e1 | MB | S1 | Mini | Mini | PV-1-10T | 41.08 | 60.59 | 66.27 | 69.87 | 71.26 | 71.86 | 72.30 |
| e2 | MB | S2 | Mini | PV-1-22 | PV-1-10T | 56.07 | 72.90 | 76.62 | 78.87 | 79.74 | 79.81 | 80.11 |
| e3 | MB | S3 | PV-1-22 | PV-1-22 | PV-1-10T |
|
|
|
|
|
|
|
|
| ||||||||||||
| e4 | MB | S1 | Mini | Mini | PV-2-10 | 60.23 | 83.08 | 87.02 | 88.97 | 89.61 | 89.76 | 90.12 |
| e5 | MB | S2 | Mini | PV-2-22 | PV-2-10 | 80.88 |
|
|
|
|
|
|
| e6 | MB | S3 | PV-2-22 | PV-2-22 | PV-2-10 |
| 91.47 | 93.14 | 94.00 | 94.29 | 94.41 | 94.53 |
|
| ||||||||||||
| e7 | MB | S1 | Mini | Mini | PV-3-22 | 65.46 | 85.37 | 88.81 | 90.54 | 91.09 | 91.33 | 91.45 |
| e8 | MB | S2 | Mini | PV-3-10 | PV-3-22 |
|
|
|
|
|
|
|
| e9 | MB | S3 | PV-3-10 | PV-3-10 | PV-3-22 | 74.58 | 84.77 | 86.82 | 87.82 | 88.29 | 88.43 | 88.57 |
|
| ||||||||||||
| e10 | MB | S1 | Mini | Minit | AFD-10 | 28.26 | 39.12 | 44.20 | 47.83 | 49.02 | 50.31 | 51.32 |
| e11 | MB | S4 | Mini | PV-2-22 | AFD-10 |
|
|
|
|
|
|
|
| e12 | MB | S5 | PV-2-22 | PV-2-22 | AFD-10 | 36.19 | 49,16 | 54.05 | 57.13 | 58.47 | 59.25 | 59.46 |
(Task in meta-learning: 5-way, 1-shot, 15-query; backbone network: Resnet12; batchsize: 128; Lr: 0.1 in base-training, 0.001 in meta-learning; distance metric: cosine similarity; Mini, Mini-ImageNet; TS, training strategy).
Figure 5(A) Intra-domain experiments with three data settings. (B) Cross-domain experiments with AFD. (C) The accuracy decreases as Way increases. (D) Distance metrics.
The ablation experiment results of MB, MB+CMSFF, and MB+CMSFF+CA.
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||
| e2 | MB | S2 | 56.07 | 72.90 | 76.62 | 78.87 | 79.74 | 79.81 | 80.11 |
| e13 | MB+CMSFF | S2 | 61.20 | 77.09 | 80.92 | 83.03 | 84.05 | 84.34 | 84.56 |
| e14 | MB+CMSFF+CA | S2 |
|
|
|
|
|
|
|
|
| |||||||||
| e5 | MB | S2 | 81.05 | 91.47 | 93.14 | 94.00 | 94.29 | 94.41 | 94.53 |
| e15 | MB+PMSFF | S2 | 81.46 | 91.86 | 93.51 | 94.57 | 94.81 | 94.88 | 95.03 |
| e16 | MB+CMSFF | S2 | 82.21 | 92.32 | 93.87 | 94.71 | 95.03 | 95.15 | 95.31 |
| e17 | MB+PMSFF+CA | S2 | 81.87 | 92.39 | 93.93 | 94.86 | 95.29 | 95.31 | 95.50 |
| e18 | MB+CMSFF+CA | S2 |
|
|
|
|
|
|
|
|
| |||||||||
| e8 | MB | S2 | 74.58 | 84.77 | 86.82 | 87.82 | 88.29 | 88.43 | 88.57 |
| e19 | MB+CMSFF | S2 | 76.61 | 88.45 | 90.17 | 91.32 | 91.78 | 91.86 | 92.14 |
| e20 | MB+CMSFF+CA | S2 |
|
|
|
|
|
|
|
|
| |||||||||
| e11 | MB | S4 | 38.41 | 51.71 | 55.58 | 58.08 | 58.84 | 59.70 | 60.09 |
| e21 | MB+CMSFF | S4 | 40.77 | 54.14 | 57.68 | 60.13 | 61.30 | 62.03 | 62.69 |
| e22 | MB+CMSFF+CA | S4 |
|
|
|
|
|
|
|
(Base-training: Mini-ImageNet; backbone network: Resnet12; distance metric: cosine similarity; TS, training strategy).
The results of different distance metrics.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| e23 | Dot product | 77.58 | 86.2 | 87.52 | 88.05 | 88.55 | 88.65 | 88.88 |
| e5 | Cosine similarity |
|
|
|
|
|
|
|
| e24 | Euclidean distance | 75.96 | 89.17 | 91.52 | 92.64 | 93.17 | 93.23 | 93.42 |
(Method: MB; backbone network: Resnet12; batchsize: 128; Lr: 0.1 in base-training, 0.001 in meta-learning).
The experiment efficiencies of different backbone networks.
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
| e25 | Convnet4 | 215.6 K | 0.01 | 40 m | 100 | 113.1 K | 0.001 | 31 m | 50 | |
| e26 | AlexNet | 3.8 M | 0.01 | 40 m | 100 | 3.7 M | 0.001 | 17 m | 50 | |
| e5 | Resnet12 | 8.0 M | 0.1 | 1.2 h | 100 | 8.0 M | 0.001 | 18 m | 20 | |
| e27 | Resnet18 | 11.2 M | 0.1 | 1.4 h | 100 | 11.2 M | 0.001 | 40 m | 50 | |
| e28 | Resnet50 | 23.6 M | 0.1 | 2.3 h | 100 | 23.5 M | 0.001 | 38 m | 30 | |
| e29 | Resnet101 | 42.6 M | 0.01 | 3.3 h | 100 | 42.5 M | 0.001 | 35 m | 20 | |
| e30 | DenseNet | 791.1 K | 0.1 | 3.8 h | 100 | 769.2 K | 0.001 | 1.9 h | 50 | |
| e31 | MobileNet-v2 | 3.6 M | 0.1 | 2.2 h | 100 | 3.5 M | 0.001 | 1.0 h | 50 | |
(Bae-training: Mini-imageNet; meta-learning: PV-2-22; distance metric: cosine similarity).
The results of different backbone networks.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| e25 | Convnet4 | 69.06 | 85.91 | 89.91 | 91.88 | 92.35 | 92.79 | 93.11 |
| e26 | AlexNet | 68.35 | 83.12 | 85.73 | 87.00 | 87.27 | 87.44 | 87.92 |
| e5 | Resnet12 | 80.88 |
|
|
|
|
|
|
| e27 | Resnet18 | 78.58 | 89.16 | 91.36 | 91.96 | 92.26 | 92.44 | 92.78 |
| e28 | Resnet50 |
| 90.91 | 92.56 | 93.86 | 94.08 | 94.15 | 94.33 |
| e29 | Resnet101 | 74.93 | 85.59 | 87.63 | 89.12 | 89.67 | 89.91 | 89.91 |
| e30 | DenseNet | 79.39 | 89.21 | 90.82 | 91.84 | 92.21 | 92.10 | 92.50 |
| e31 | MobileNet-V2 | 78.17 | 89.21 | 91.48 | 92.42 | 92.83 | 93.02 | 93.41 |
(Method: MB; backbone network: resnet12; batchsize: 128; Lr: 0.1 in base-training, 0.001 in meta-learning; Data: Mini-imageNet in base-training, PV-setting-2 in meta-learning and test).
Figure 6The best validation accuracy (%) of “1-shot, 5-way” task in base-training and meta-learning. The red digits represent the accuracy lifting ranges (%) of meta-learning.
The results compared with related works.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
| |||||
| Finetuning (Argüeso et al., | 18.2 | 25.4 | 30.3 | 41.1 | |
| Siamese contrastive (Argüeso et al., | 50.2 | 64.2 | 70.2 | 74.1 | |
| Siamese triplet (Argüeso et al., | 65.2 | 72.3 | 76.8 | 81.8 | |
| Single SS (Li and Chao, | 74.5 | 89.7 | 92.6 | 93.9 | |
| Iterative SS (Li and Chao, | 75.1 | 90.0 | 92.7 | 93.9 | |
| e32 |
| 76.4 | 91.0 | 93.2 | 94.2 |
| e33 |
| 80.0 | 91.9 | 93.7 |
|
| e34 |
|
|
|
|
|
|
| |||||
| Baseline (Li and Chao, | 32.8 | 46.7 | 64 | 73.2 | |
| Single SS (Li and Chao, | 33.7 | 50.9 | 66.7 | 74.7 | |
| Iterative SS (Li and Chao, | 34 | 53.1 | 68.8 | 75.6 | |
| e35 |
| 55.7 | 72.8 | 76.7 | 79.5 |
| e36 |
| 60.6 |
|
| 84.3 |
| e37 |
|
| 78.1 | 82.2 |
|
|
| |||||
| Baseline (Li and Chao, | 43.9 | 68.5 | 78.7 | 89.1 | |
| Single SS (Li and Chao, | 44.7 | 74.7 | 85.7 | 89.7 | |
| Iterative SS (Li and Chao, | 46.4 | 76.9 | 89.2 | 91.9 | |
| e38 |
| 77.1 | 91.1 | 92.9 | 93.8 |
| e39 |
| 78.8 | 91.6 | 93.5 | 94.6 |
| e40 |
|
|
|
|
|
|
| |||||
| Baseline (Li and Chao, | 50.7 | 63.1 | 77.2 | 89.3 | |
| Single SS (Li and Chao, | 52.3 | 67.6 | 79.9 | 90.1 | |
| Iterative SS (Li and Chao, | 55.2 | 69.3 | 80.8 | 91.5 | |
| e41 |
| 78.1 | 89.4 | 91.4 | 92.6 |
| e42 |
| 80.6 | 90.8 | 92.4 | 93.3 |
| e43 |
|
|
|
|
|
(Ours: backbone network: Resnet12; distance metric: cosine similarity; base-training: Mini-ImageNet).
Figure 7The results compared with related works. (A) Our work compares with (Argüeso et al., 2020) and (Li and Chao, 2021b). (B) Our work compares with Li and Chao (2021b) using the data split-1. (C) Our work compares with Li and Chao (2021b) using the data split-1. (D) Our work compares with Li and Chao (2021b) using the data split-1.