| Literature DB >> 35548295 |
Muhammad Hammad Saleem1, Kesini Krishnan Velayudhan1, Johan Potgieter2, Khalid Mahmood Arif1.
Abstract
The accurate identification of weeds is an essential step for a site-specific weed management system. In recent years, deep learning (DL) has got rapid advancements to perform complex agricultural tasks. The previous studies emphasized the evaluation of advanced training techniques or modifying the well-known DL models to improve the overall accuracy. In contrast, this research attempted to improve the mean average precision (mAP) for the detection and classification of eight classes of weeds by proposing a novel DL-based methodology. First, a comprehensive analysis of single-stage and two-stage neural networks including Single-shot MultiBox Detector (SSD), You look only Once (YOLO-v4), EfficientDet, CenterNet, RetinaNet, Faster Region-based Convolutional Neural Network (RCNN), and Region-based Fully Convolutional Network (RFCN), has been performed. Next, the effects of image resizing techniques along with four image interpolation methods have been studied. It led to the final stage of the research through optimization of the weights of the best-acquired model by initialization techniques, batch normalization, and DL optimization algorithms. The effectiveness of the proposed work is proven due to a high mAP of 93.44% and validated by the stratified k-fold cross-validation technique. It was 5.8% improved as compared to the results obtained by the default settings of the best-suited DL architecture (Faster RCNN ResNet-101). The presented pipeline would be a baseline study for the research community to explore several tasks such as real-time detection and reducing the computation/training time. All the relevant data including the annotated dataset, configuration files, and inference graph of the final model are provided with this article. Furthermore, the selection of the DeepWeeds dataset shows the robustness/practicality of the study because it contains images collected in a real/complex agricultural environment. Therefore, this research would be a considerable step toward an efficient and automatic weed control system.Entities:
Keywords: convolutional neural network; deep learning; optimization algorithms; transfer learning; weed detection
Year: 2022 PMID: 35548295 PMCID: PMC9083231 DOI: 10.3389/fpls.2022.850666
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
Summary of research articles related to weed detection by deep learning (DL) (divided in terms of novelty and research ideas of the work).
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Investigation of DL models for the identification of weeds | AlexNet, GoogLeNet, VGGNet, DetectNet | Three | Various surface condition regimes. | DetectNet F1-score = 0.9843 | Yu et al., |
| DL architectures were leveraged for weed detection and classification | DetectNet, GoogLeNet, and VGGNet | Three | Different stages and densities of growth | F1-score by DetectNet > 0.99 | Yu et al., |
| Speed-optimized CNN models were proposed | CNN model | Two | Images were taken with a field robot in a real environment. | A speed-up factor of 31 | Knoll et al., |
| DL model used with color index-based segmentation | CenterNet | One | Different illumination conditions, backgrounds, and growth stages | F1-score by the CenterNet model = 0.953 | Jin et al., |
| A tiny version of the YOLO model was proposed to reduce the computation time | Modified tiny YOLO-v3, YOLO-v3-tiny | Two | Synthetic images were generated | Mean average precision: 0.829 | Gao et al., |
| Various factors to develop weed identification system along with the significance of transfer learning | AlexNet, VGG-F, VGG-VD-16, Inception-v1, ResNet-50, ResNet-101 | Two | A robotic platform was used to take images on the field. | Accuracy by ResNet-101: 97.1+/-0.1% | Kounalakis et al., |
| Two DL detectors were used through a UAV | Faster RCNN and SSD | Six | Images were taken by a camera mounted on a UAV | Mean IoU by Faster RCNN: 0.85 | Veeranampalayam Sivakumar et al., |
| An improved DL model was proposed | Proposed Faster RCNN, KNN, SVM, and YOLO-v3 | Two | A camera mounted on a UAV in two agricultural fields | Overall average identification accuracy: 94.7% | Khan et al., |
| A CNN model was optimized for real-time weed recognition | ResNet-18 | Six | Dataset images were collected by a UAV | Overall accuracy: 94% | De Camargo et al., |
| Three ML and DL-based methods were used and compared | SVM, YOLO-v3, and Mask R-CNN | Two | A multispectral camera mounted on a drone was used | F1-score by YOLO and RCNN models: 94% | Osorio et al., |
| DL-based classification and detection models were used | VGG-16, ResNet-50, Inception-v3, YOLO-v3 | Four | Images were collected in a real field environment | mAP: 54.3% | Ahmad et al., |
| A graph CNN-based model was proposed to detect weeds | GCN-ResNet-101, AlexNet, ResNet-101, VGG-16 | Four | The Weeds were collected in three crops and a fourth was obtained by combining the three datasets. | Average recognition accuracy: 98.15% | Jiang et al., |
| A combination of DL and ML methods was considered | Xception, Inception-ResNet, VGNets, MobileNet, DenseNet, SVM, XGBoost, and Logistic Regression | Two | The dataset was collected under variable soil, color and illumination conditions. | F1-score: 99.29% | Espejo-Garcia et al., |
Figure 1Framework of this research.
Figure 2Sample of the annotated dataset for each class.
Hyperparameters of deep learning optimization algorithms with their respective DL architectures.
|
|
|
|
|---|---|---|
| Yolo-v4 | SGD with momentum | learning rate = 1 x 10−3, momentum = 0.9 |
| RetinaNet | learning rate = 3 x 10−4, momentum = 0.9 | |
| EfficientDet | learning rate = 2 x 10−4, momentum = 0.9 | |
| RFCN ResNet-101 | learning rate = 4 x 10−4, momentum = 0.9 | |
| Faster RCNN Inception-v2 | learning rate = 2 x 10−4, momentum = 0.9 | |
| Faster RCNN ResNet-50 | learning rate = 3 x 10−4, momentum = 0.9 | |
| SSD MobileNet | RMSProp | learning rate = 2 x 10−3, rho = 0.9, momentum = 0.9, epsilon = 1.0 x 10−2 |
| SSD Inception-v2 | learning rate = 2 x 10−4, rho = 0.9, momentum = 0.9, epsilon = 1.0 x 10−4 | |
| CenterNet ResNet-50 | Adam | learning rate = 1 x 10−3, epsilon = 1 x 10−7 |
| Faster RCNN ResNet-101 | SGD with momentum | learning rate = 3 x 10−4, momentum = 0.9 |
| Adam | learning rate = 1 x 10−5, epsilon = 1 x 10−2 | |
| RMSProp | learning rate = 3 x 10−4, rho = 0.9, momentum = 0.9, epsilon = 1.0 |
Figure 3Performance of the You look only Once (YOLO)-v4 model: (A) loss plot; (B) true positives for parthenium, rubber vine and siam weed; (C) examples of undetected images for lantana and snake weed classes.
Summary of the weed detection results of the DL single-stage and two-stage object detectors in terms of the average precision (in %) of each class.
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
| |||||||||
|
|
|
| ||||||||
|
|
|
|
|
|
| |||||
| C_App | 67.4 | 26.25 | 18.83 | 45.31 | 43.97 | 26.34 | 34.29 | 100 | 98.21 | 99.87 |
| Lntna | 66.61 | 62.22 | 31.65 | 9.09 | 28.79 | 9.09 | 100 | 96.83 | 99.45 | 82.46 |
| P_acacia | 73.87 | 34.45 | 0.75 | 9.09 | 9.67 | 1.82 | 56.5 | 28.64 | 94.08 | 70.06 |
| P_nium | 93.48 | 54.16 | 26.36 | 17.88 | 33.84 | 23.85 | 38.83 | 99.94 | 99.94 | 99.33 |
| P_sonia | 79.51 | 53.93 | 30.92 | 17.05 | 44.23 | 35.77 | 99.7 | 99.24 | 99.89 | 88.85 |
| R_vine | 96.33 | 60.44 | 76.99 | 27.27 | 44.18 | 35.88 | 92.9 | 99.77 | 100 | 99.84 |
| S_weed | 98.6 | 66.06 | 26.35 | 54.55 | 63.29 | 53.31 | 41.61 | 82.49 | 100 | 99.85 |
| Snk_wd | 58.19 | 62.4 | 21.36 | 14.91 | 34.79 | 33.57 | 0.55 | 4.17 | 15.38 | 86.17 |
| Ngtv | 83.17 | 13.2 | 81.72 | 0.11 | 26.57 | 26.61 | 31.18 | 51.28 | 78.13 | 62.35 |
| mAP (%) | 79.68 | 48.12 | 34.99 | 21.69 | 36.59 | 27.36 | 55.06 | 73.59 | 87.23 |
|
Bold values shows the highest mAP to select the best DL architecture.
Figure 4Performance of the single-shot multibox detector (SSD) architecture: (A) total loss with the Inception model; (B) total loss with the MobileNet model; (C) examples of a false-positive result for the negative class with the Inception-v2 model; (D) example of false positives for the eight classes of weeds with the MobileNet model. TP: true positive, FP: false positive.
Figure 5Performance of RetinaNet: (A) total loss plot; (B) examples of false-positive results for the negative class.
Figure 6Performance of the EfficientDet model; (A) total loss plots; (B) false positives of different classes with negative class.
Figure 7Performance of the CenterNet model: (A) total loss; (B) false-positive results for the chinee apple, and lantana classes.
Figure 8Performance of the Region-based fully convolutional network (RFCN) model: (A) total loss plot; (B) false-positive results for the negative class; (C) false-positive results for the snakeweed class.
Figure 9Faster Region-base fully convolutional neural network (RCNN) performance with various versions of DL models: (A) total loss plot for Inception-v2: (B) total loss plot for ResNet-50; (C) total loss plot for ResNet-101; (D) false positives of prickly acacia with Inception-v2; (E) false positives of snake weed classes with Inception-v2; (F) true positives with ResNet-50; (G) true positives with ResNet-101.
Summary of results and conclusions from each step of the proposed methodology.
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|
|
|
|
|
| ||||
| Training with default settings | Yolo-v4 | FS (608 x 608) | 2.83 | 12 | 79.68 | Few of the weed classes were successfully identified |
|
| SSD Inception-v2 | FS (300 x 300) | 4–6 | 11 | 48.12 | None of the weed classes was succeeded in achieving an AP of more than 90% | ||
| SSD MobileNet-v2 | FS (300 x 300) | 3–6 | 5 | 34.99 | Fastest model convergence, but unsatisfactory testing outcomes | ||
| SSD ResNet-50 (RetinaNet) | FS (640 x 640) | 0.55–0.75 | 14 | 21.69 | Achieved the lowest mAP among all the DL models | ||
| EfficientDet EfficientNet | AR (min: 512, max: 512) | 0.25–0.45 | 11.5 | 36.59 | Eight classes attained AP of <50% | ||
| CenterNet ResNet-50 | AR (min: 512, max: 512) | 1.5–2.5 | 12.5 | 27.36 | None of the classes achieved a satisfactory AP | ||
| RFCN ResNet-101 | AR (min: 600, max: 1,000) | 1.50 | 10 | 55.06 | The model was successful to detect three classes of weeds | ||
| Faster RCNN Inception-v2 | AR (min: 600, max: 1,000) | 1.50 | 8.5 | 73.59 | The model was successful to detect five classes of weed | ||
| Faster RCNN ResNet-50 | AR (min: 600, max: 1,000) | 0–1 | 9 | 87.23 | Seven classes of weeds with high AP (more than 90%) | ||
| Faster RCNN ResNet-101 | AR (min: 600, max: 1,000) | 0–1 | 10 | 87.64 | The most suitable DL architecture for this study due to its highest mean average precision compared to all other DL architectures. | ||
| Effects of image resizers/ | Faster RCNN ResNet-101 | AR with bicubic | 0–1.4 | 10 | 81.33 | Could not contribute to provide better detection results |
|
| AR with area | 0–0.87 | 10 | 91.55 | Found as the best interpolator | |||
| AR with NN | 0–0.98 | 10 | 86.93 | Almost similar performance to the bilinear method | |||
| FS with bilinear | 0–0.92 | 9.5 | 85.09 | Provided a comparatively lower mAP | |||
| FS with bicubic | 0–1.2 | 9.5 | 82.38 | Attained a low mAP just like with AR | |||
| FS with area | 0–1.5 | 9.5 | 85.68 | Area interpolator did not work with fixed shape resizer | |||
| FS with NN | 0–1.4 | 9.5 | 82.64 | Attained low AP of the weed classes | |||
| Effects of initializers and batch normalization | Tr (std: 0.01); SV (sf: 1.0, nd: true, mode: Fan_avg); RN (std: 0.01) | 0–0.87 | 10 | 91.55 | Very small values of std should not be taken close to zero; the normal distribution with an average of input and output units in the weight tensor should be considered |
| |
| BN (decay: 0.99, eps: 0.01) | 0–0.82 | 8.5 | 93.37 | An improvement of 1.82% was obtained with BN with a fast training convergence | |||
| Effects of optimizers | SGD with momentum | 0–0.87 | 8.5 | 93.37 | The default optimizer attained a high AP except for the negative clas |
| |
| Adam | 0–0.94 | 7.75 | 91.56 | Faster convergence with adaptive algorithm | |||
| RMSProp | 0–0.86 | 7.75 | 93.44 | The best-obtained DL optimizer, slightly improved the mAP without BN | |||
FS, Fixed-shape resizer; AR, Aspect ratio resizer; NN, Nearest neighbor; Tr, Truncated normal initializer; std, standard deviation; SV, Scaling variance initializer; sf, scaling factor; nd, normal distribution; RN, random normal initializer; BN, Batch normalization; eps, epsilon.
Figure 10Training plots for bilinear and area interpolation methods with aspect ratio image resizer (iteration steps from 50K and onwards are shown, when the model got training convergence).
Figure 11Average precision of each class by the Faster RCNN ResNet-101 model trained with three DL optimizers in the absence and presence of batch normalization.