| Literature DB >> 34198844 |
Luca Ghiani1, Alberto Sassu1, Francesca Palumbo2, Luca Mercenaro1, Filippo Gambella1.
Abstract
An early estimation of the exact number of fruits, flowers, and trees helps farmers to make better decisions on cultivation practices, plant disease prevention, and the size of harvest labor force. The current practice of yield estimation based on manual counting of fruits or flowers by workers is a time consuming and expensive process and it is not feasible for large fields. Automatic yield estimation based on robotic agriculture provides a viable solution in this regard. In a typical image classification process, the task is not only to specify the presence or absence of a given object on a specific location, while counting how many objects are present in the scene. The success of these tasks largely depends on the availability of a large amount of training samples. This paper presents a detector of bunches of one fruit, grape, based on a deep convolutional neural network trained to detect vine bunches directly on the field. Experimental results show a 91% mean Average Precision.Entities:
Keywords: deep learning; grape detection; object detection; precision agriculture; precision viticulture
Mesh:
Year: 2021 PMID: 34198844 PMCID: PMC8201373 DOI: 10.3390/s21113908
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Comparison between the main characteristics of grape detection proposed methods.
| Reference | Fully Automated Detection Process | Large Data Set (More Than a Thousand) | Large Grape Variety (More Than Ten) |
|---|---|---|---|
| [ | Yes | No | No |
| [ | No | No | No |
| [ | Yes | Yes | No |
| [ | Yes | No | No |
| [ | Yes | No | No |
| [ | No | No | No |
| [ | Yes | No | No |
| [ | No | No | Yes |
| [ | Yes | No | No (Shiraz and Cabernet Sauvignon) |
| [ | Yes | Yes | No |
| [ | Yes | Yes | No |
| [ | Yes | Yes | Yes |
| This work | Yes | Yes | Yes |
Figure 1Detailed workflow of the proposed methodology. After the labeling, the data set is divided in train, validation, and test. A pre-trained Mask R-CNN framework is fine-tuned using the augmented train set and the validation set. The experimental results are obtained by applying the detector to both the test set and our internal dataset.
Figure 2MATLAB Image Labeler used in the labeling process. For each image the smallest bounding box was hand drawn around every bunch of grapes.
Figure 3Samples images from GrapeCS-ML dataset 2: (a–c) include a color reference; (d–f) contain a volume reference.
Figure 4Samples images from our internal dataset: (a) cv. Cannonau; (b) cv. Cagnulari; (c,d) cv. Vermentino with different stage of maturation.
Figure 5MaskR-CNN framework (He et al. [28]). In this two-stage procedure, the first stage, called Region Proposal Network (RPN), estimates the position of bounding boxes. The second stage performs a classification, a bounding box regression, and extracts a binary mask.
Number of images contained in the GrapeCS-ML Dataset and in the internal dataset.
| GrapeCS-ML Dataset | ||
|---|---|---|
| Train | Set 1 | 1114 images |
| Validation | Set 2 | 505 images |
| Test | Set 3 | 204 images |
| Set 4 | 242 images | |
| Set 5 | 49 images | |
| Internal Dataset | 451 images | |
Numerosity (in brackets) per different size of the images contained in the GrapeCS-ML dataset and in the internal dataset.
| GrapeCS-ML Dataset | |
|---|---|
| Set 1 | 480 × 640 (1102), 640 × 480 (7), 1200 × 1600 (5) |
| Set 2 | 480 × 640 (253), 640 × 480 (198), 1200 × 1600 (28), 1600 × 1200 (26) |
| Set 3 | 480 × 640 (81), 640 × 480 (81), 1200 × 1600 (21), 1600 × 1200 (21) |
| Set 4 | 480 × 640 (35), 640 × 480 (206) |
| Set 5 | 640 × 480 (1), 3024 × 3024 (12), 3024 × 4032 (36), 3402 × 3752 (1) |
| Internal Dataset | 360 × 640 (1), 480 × 640 (29), 640 × 480 (17), 1600 × 2128 (2), 1904 × 2528 (3), 2048 × 1536 (36), 2112 × 2816 (23), 2304 × 3072 (1), 2320 × 3088 (120), 2560 × 1536 (3), 2816 × 2112 (139), 3072 × 2304 (9), 3088 × 2320 (43), 3456 × 4608 (2), 4160 × 2340 (1), 4608 × 3456 (22) |
Figure 6Examples of train dataset augmentation: (a) original image; (b) horizontal flipping; (c) image blurring.
Figure 7Evaluation of the IoU—Intersection over Union. This value is the ratio between the intersection and the union of the surfaces of the blue bounding box obtained by the classifier (Prediction) and the green one hand drawn during the ‘labelling’ process (Ground Truth). In (a) a sample image, in (b) a description of the calculation process.
Figure 8Two examples of IoU. In the example on the left the ratio between intersection and union of the ground truth and prediction bounding boxes is higher than 0.5 (0.52) while in the example on the right the ratio is lower (0.23).
Figure 9Example of Precision–Recall curve obtained during our experiments. The Average Precision, that is the area below the curve, has a value of 0.833. In this example there are three Precision or Recall value changes, but that number of changes could be different for each image.
Figure 10Training and validation loss profile over the number of epochs, which is the number of times the learning algorithm update the model by analyzing the entire training dataset. The two curves show the performance improvement on training and validation data.
Experimental results on both GrapeCS-ML and our internal dataset. The detector has been trained in three different ways: using the entire set 1 as train, with dataset augmentation; using only 10% of set 1 as a train, with dataset augmentation; using only 10% of set 1 as a train, without dataset augmentation.
| mAP | |||
|---|---|---|---|
| Dataset Name | Train Complete, with Augmentation | Train 10%, with Augmentation | Train 10%, without Augmentation |
| Validation (Set 2) | 93.97% | 90.95% | 85.24% |
| Test (Set 3 + Set 4 + Set 5) | 92.78% | 90.98% | 87.65% |
| Set 3 | 98.77% | 98.69% | 97.30% |
| Set 4 | 89.18% | 86.70% | 83.40% |
| Set 5 | 85.64% | 80.07% | 68.44% |
| Internal Dataset | 89.90% | 86.41% | 70.75% |
Figure 11Images from the three GrapeCS-ML subsets included in the test: (a–c) set 3; (d–f) set 4; (g–i) set 5.
Experimental results on both GrapeCS-ML and our internal dataset based on the number of bunches present in the images. After each mAP value, in brackets, the number of examined images is shown.
| Dataset Name | mAP (Total Number of Images) | |||||
|---|---|---|---|---|---|---|
| 1 bunch | 2 bunches | 3 bunches | 4 bunches | 5 bunches | 6 bunches | |
| Validation (Set 2) | 98.85% (369) | 82.61% (126) | 57.22% (10) | |||
| Test (Set 3, 4, 5) | 99.75% (395) | 65.41% (73) | 72.59% (15) | 51.72% (8) | 60.00% (2) | 64.63% (2) |
| Set 3 | 100.00% (195) | 72.22% (9) | ||||
| Set 4 | 99.45% (181) | 57.70% (53) | 65.28% (8) | |||
| Set 5 | 100.00% (19) | 96.97% (11) | 80.95% (7) | 51.72% (8) | 60.00% (2) | 64.63% (2) |
| Internal Dataset | 96.79% (218) | 85.39% (166) | 76.11% (46) | 80.89% (17) | 99.17% (4) | |
Figure 12Example of correct detection on a test image from the GrapeCS-ML dataset. The green box represents the ground truth while the blue one is the detection results. The IoU of the two boxes is greater than 0.5.
Figure 13Examples of errors in the GrapeCS-ML dataset. The green boxes represent the ground truth while the blues ones are the detection results. In (a) only one out of two bounding boxes is correctly detected, in (b,c) the two bunches are detected but as a single element, in (d) only the larger of the two bunches is correctly detected.
Figure 14Examples of errors in the internal dataset. The green boxes represent the ground truth while the blues ones are the detection results. In (a) the incorrect detection of overlapping bunches, in (b) undetected shaded parts, and in (c) leaves incorrectly detected as bunches.
Figure 15Example of same bunches correctly detected in two similar images. The image (a) is significantly overexposed compared to the image (b). The green boxes represent the ground truth while the blues ones are the detection results.