| Literature DB >> 34065568 |
Sandro Augusto Magalhães1,2, Luís Castro1,2, Germano Moreira3, Filipe Neves Dos Santos1, Mário Cunha1,3, Jorge Dias4, António Paulo Moreira1,2.
Abstract
The development of robotic solutions for agriculture requires advanced perception capabilities that can work reliably in any crop stage. For example, to automatise the tomato harvesting process in greenhouses, the visual perception system needs to detect the tomato in any life cycle stage (flower to the ripe tomato). The state-of-the-art for visual tomato detection focuses mainly on ripe tomato, which has a distinctive colour from the background. This paper contributes with an annotated visual dataset of green and reddish tomatoes. This kind of dataset is uncommon and not available for research purposes. This will enable further developments in edge artificial intelligence for in situ and in real-time visual tomato detection required for the development of harvesting robots. Considering this dataset, five deep learning models were selected, trained and benchmarked to detect green and reddish tomatoes grown in greenhouses. Considering our robotic platform specifications, only the Single-Shot MultiBox Detector (SSD) and YOLO architectures were considered. The results proved that the system can detect green and reddish tomatoes, even those occluded by leaves. SSD MobileNet v2 had the best performance when compared against SSD Inception v2, SSD ResNet 50, SSD ResNet 101 and YOLOv4 Tiny, reaching an F1-score of 66.15%, an mAP of 51.46% and an inference time of 16.44ms with the NVIDIA Turing Architecture platform, an NVIDIA Tesla T4, with 12 GB. YOLOv4 Tiny also had impressive results, mainly concerning inferring times of about 5 ms.Entities:
Keywords: SSD benchmarking; fruit detection; machine learning; object detection; robotics vision; vision system
Year: 2021 PMID: 34065568 PMCID: PMC8160895 DOI: 10.3390/s21103569
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Tomatoes’ ripeness levels: (a) physiological or horticultural maturation; (b) early phase of ripening; and (c) ripened tomato.
Algorithms, methods and techniques proposed by different authors regarding tomato detection at different ripeness levels (N/A—Not Available).
| Method | Tomato Ripeness | Accuracy | Inference Time | Authors/Year |
|---|---|---|---|---|
| L*a*b* colour space and K-means clustering | Ripe | N/A |
| Yin et al. [ |
| L*a*b colour space and bi-level partition fuzzy logic entropy | Ripe | N/A | N/A | Huang et al. [ |
| L*a*b colour space and Threshold algorithm | Green, intermediate and ripe | 93% | N/A | Zhao et al. [ |
| RGB, HSI and YIQ colour spaces and morphological characteristics | Ripe |
| N/A | Arefi et al. [ |
| RGB colour space images into an HSI colour model | Ripe | 4 | Feng et al. [ | |
| Ripe | 4 | Feng et al. [ | ||
| RGB colour space into an HSI colour space, threshold method and Canny operator | Ripe | N/A | N/A | Zhang [ |
| R component of the RGB images and Sobel operator | Ripe | Clustered tomatoes: | N/A | Benavides et al. [ |
| Beef tomatoes: | ||||
| HSV colour space and watershed segmentation method | Ripe | N/A | Malik et al. [ | |
| Mathematical morphology and Fuzzy C-Means-based method | Ripe | N/A | N/A | Zhu et al. [ |
| Mathematical morphology, difference and iterative erosion course | Ripe | 50 | N/A | Xiang et al. [ |
| Normalised colour | 70 | |||
| Pixel-based segmentation, blob-based segmentation and X-means clustering | Green, intermediate and ripe | 88% | N/A | Yamamoto et al. [ |
| Haar-like features of grey-scale image and AdaBoost classifier | Ripe | 96% | N/A | Zhao et al. [ |
| Histograms of oriented gradients and SVM | Ripe | N/A | Liu et al. [ | |
| Ripe | N/A | Liu et al. [ | ||
| Analysis and selection of multiple features, RVM and bi-layer classification strategy | Ripe | N/A | Wu et al. [ | |
| Otsu segmentation algorithm | Ripe | N/A | Lili et al. [ | |
| Improved YOLOv3-tiny method | Ripe | F | N/A | Xu et al. [ |
| YOLOv3 detection model to create the proposed YOLOTomato model | Green, intermediate and Ripe | N/A | Liu et al. [ | |
| Feature pyramid network | Green, intermediate and Ripe | N/A | Sun et al. [ | |
| Faster R-CNN structure with the deep CNN ResNet-101 | Green | N/A | Mu et al. [ | |
| Comparison: R-CNN vs. SSD | Green, intermediate and Ripe | R-CNN: | N/A | de Luna et al. [ |
| SSD: | ||||
| SSD-based algorithm used to train and develop network models such as VGG16, MobileNet, Inception V2 | Green, intermediate and Ripe | Best performance is Inception V2 ( | N/A | Yuan et al. [ |
Figure 2Scheme for the SSD architecture using VGG16 as the backbone. Adapted from ref. [11].
Figure 3Anchor box shapes used in the SSD architecture. Adapted with permission from ref. [11].
Figure 4Overview of the performed methods. Training and evaluation pipeline.
Figure 5Greenhouses’ entrance.
Figure 6AgRob v16 inside an uncultivated greenhouse.
Figure 7Images split into px images with an overlapping ratio of 20%. The different colours are only for reference and distinguishing the different images.
Transformations applied to the images of the split dataset for data augmentation and the characteristics of those transformations.
| Transformation | Value |
|---|---|
| Rotation | −60° to 60° |
| Scaling | 50% to 150% |
| Translation | 0% to 30% left or right |
| Flip | Image mirroring |
| Blur (Gaussian Filter) |
|
| Gaussian Noise |
|
| Combination3 | Random combination of three of the previous transformations with random values |
Figure 8Example of augmentation applied to an image. (h) is the random combination of 3 of the other transformations.
Model location in TensorFlow and Darknet databases. All SSD models are in the TensorFlow Models database at http://download.tensorflow.org/models/object_detection/filename. YOLOv4 Tiny is in the Darknet database at https://github.com/AlexeyAB/darknet/releases/download/filename.
| SSD Model | File Name |
|---|---|
| SSD MobileNet v2 | ssd_mobilenet_v2_coco_2018_03_29.tar.gz |
| SSD Inception v2 | ssd_inception_v2_coco_2018_01_28.tar.gz |
| SSD ResNet 50 | ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.tar.gz |
| SSD ResNet 101 | ssd_resnet101_v1_fpn_shared_box_predictor_oid_512x512_sync_2019_01_20.tar.gz |
| YOLOv4 Tiny | darknet_yolo_v4_pre/yolov4-tiny.conv.29 |
Training batch size for each model.
| SSD Model | Batch Size |
|---|---|
| SSD MobileNet v2 | 24 |
| SSD Inception v2 | 32 |
| SSD ResNet 50 | 8 |
| SSD ResNet 101 | 8 |
| YOLOv4 Tiny | 64 |
Figure 9Evolution of the F1-score with the variation of the confidence threshold for all DL models in the validation set without augmentation.
Figure 10Evolution of the number of TPs, FPs, and FNs with the increase of the confidence threshold.
Confidence threshold for each DL model that optimises the F1-score metric.
| Confidence ⩾ | F1-Score | |
|---|---|---|
| YOLOv4 tiny | 49% | |
| SSD Inception v2 | 21% | |
| SSD MobileNet v2 | 40% | |
| SSD ResNet 50 | 46% | |
| SSD ResNet 101 | 34% |
Figure 11Precision × recall curve in the test set considering all the predictions.
Results of the different SSD and YOLO models over many metrics, considering all the predictions and the best computed confidence threshold.
| Model | Confidence ⩾ | Inference Time | mAP | Precision | Recall | F1 |
|---|---|---|---|---|---|---|
| YOLOv4 Tiny | 0% |
| ||||
| SSD Inception v2 | 0% |
| ||||
| SSD MobileNet v2 | 0% |
| ||||
| SSD ResNet50 | 0% |
| ||||
| SSD ResNet101 | 0% |
| ||||
| YOLOv4 Tiny | 49% |
| ||||
| SSD Inception v2 | 21% |
| ||||
| SSD MobileNet v2 | 40% |
| ||||
| SSD ResNet50 | 46% |
| ||||
| SSD ResNet101 | 34% |
|
Figure 12Precision × recall curve in the test set using the calibrated confidence threshold.
Figure 13Comparison between using unfiltered images (a–e) and filtered images through the computed confidence threshold (f–j).
Figure 14Result comparison for darkened images.
Figure 15Result comparison for occluded tomatoes.
Figure 16Result comparison for overlapped tomatoes.