| Literature DB >> 35948597 |
Prabhakar Maheswari1, Purushothamman Raja2, Vinh Truong Hoang3.
Abstract
Yield estimation (YE) of the crop is one of the main tasks in fruit management and marketing. Based on the results of YE, the farmers can make a better decision on the harvesting period, prevention strategies for crop disease, subsequent follow-up for cultivation practice, etc. In the current scenario, crop YE is performed manually, which has many limitations such as the requirement of experts for the bigger fields, subjective decisions and a more time-consuming process. To overcome these issues, an intelligent YE system was proposed which detects, localizes and counts the number of tomatoes in the field using SegNet with VGG19 (a deep learning-based semantic segmentation architecture). The dataset of 672 images was given as an input to the SegNet with VGG19 architecture for training. It extracts features corresponding to the tomato in each layer and detection was performed based on the feature score. The results were compared against the other semantic segmentation architectures such as U-Net and SegNet with VGG16. The proposed method performed better and unveiled reasonable results. For testing the trained model, a case study was conducted in the real tomato field at Manapparai village, Trichy, India. The proposed method portrayed the test precision, recall and F1-score values of 89.7%, 72.55% and 80.22%, respectively along with reasonable localization capability for tomatoes.Entities:
Mesh:
Year: 2022 PMID: 35948597 PMCID: PMC9365763 DOI: 10.1038/s41598-022-17840-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Literature related to various fruit YE using DL-based semantic segmentation.
| Author | Fruit | Description | Results |
|---|---|---|---|
| Apolo-Apolo et al.[ | Citrus | Detection, counting and size estimation was done using faster R-CNN. The images were captured from the 20 sample trees of the citrus grove using a UAV | Precision, Recall and F1-score of 96%, 94% and 95% were obtained, respectively |
| Chen et al.[ | Detection and localization were performed using an improved YOLO version4 (YOLOv4). Citrus tree images were captured using Kinect V2 camera and the architecture detects the small fruits accurately from the complex background | Accuracy varies from 92.89 to 96.04% | |
| Kestur et al.[ | Mango | Detection was performed using CNN with a pixel-based prediction method. The training and testing were performed from the patches of 11,096 and 1,500 which were obtained from the original images of 40 and 4, respectively | Accuracy and F1-score of 73.6% and 84.4%, were obtained, respectively |
| Koirala et al.[ | Detection and counting were done using MangoYOLO, a modified version of YOLOv3 and tiny YOLOv2. The training, validation and test datasets of 1,300, 130 and 300 images were used, respectively | The overall F1-score is 89% | |
| Borianne et al.[ | Detection was done using faster R-CNN. Mango cultivars namely Kent, Keitt and Boucodiekhal were detected using the confidence and the non-maximal suppression threshold of 0.7 and 0.25, respectively | F1-score is 90% | |
| Fu et al.[ | Kiwi | Detection was done using faster R-CNN with Zeiler Fergus network (ZFnet). The 2,100 sub-images (of size 784 × 784 pixels) were obtained from the 700 original images (of size 2352 × 1568 pixels) captured from the field | The overall accuracy is 92.3% |
| Bargoti and Underwood[ | Mango, apple and almond | Detection and counting were done by faster R-CNN. The tree images for the three categories of fruits were captured using Digital Single-Lens Reflex (DSLR) camera with the resolution varying from 2 to 17 megapixels | The overall F1-score is greater than 90% |
| Sa et al.[ | Sweet pepper | Detection was performed using multi-modal faster R-CNN. The images were taken by two modalities namely RGB and NIR (Near-Infra Red) images. Early and late fusion methods were employed to combine the modalities | F1-score is 83.8% |
Literature for the YE of tomato fruit using DL-based semantic segmentation.
| Author | Fruit | Description | Results |
|---|---|---|---|
| Mu et al.[ | Tomato | Detection, counting and size estimation were performed using faster R-CNN with ResNet101. The architecture was already trained with the public dataset i.e., Common Objects in Context (COCO) dataset and the transfer learning technique are employed for tomato detection. This method provided an appreciable prediction for tomato yield estimation. However, longer training time and counting issues if the fruits are considerably shaded by the leaves can be further improved | Precision is 87.83% for IoU threshold greater than 0.5 and the coefficient of determination value R[ |
| Liu et al.[ | Detection and localization were done by the modified version of YOLOv3 called YOLO-Tomato. Better localization of tomatoes was achieved by replacing the traditional rectangular bounding box with a circular | Precision, recall and F1-score of 94.75%, 93.09% and 93.91%, were obtained, respectively | |
| Rahnemoonfar and Sheppardy[ | Detection and counting were done using a modified Inception-ResNet-A module. The architecture was trained using the synthetic (tomato) image dataset whereas tested using the real data. The architecture was efficient even in challenging conditions such as shadow, some degree of overlap, etc. However, it couldn’t count the green fruits as the training was done only for ripe and half-ripe fruits | Training and testing accuracy of 93% and 91% were obtained, respectively |
Figure 1Image data before and after pre-processing (a) sample original images from the tomato dataset (Afonso et al.[18]), (b) sample ground-truth images.
Figure 2Upsampling operation in the proposed method of SegNet with VGG19.
Figure 3Training the tomato dataset using SegNet with VGG19.
Figure 4Overall flow of the algorithm of the proposed method.
Figure 5Performance metrics graph for training and validation (a) training and validation for precision, (b) training and validation for recall, (c) training and validation for IoU, (d) training and validation for loss.
Performance comparison with other architectures.
| Architecture | Precision in % | Recall in % | IoU in % | F1-score in % | ||||
|---|---|---|---|---|---|---|---|---|
| Training | Validation | Training | Validation | Training | Validation | Training | Validation | |
| U-Net | 90.29 | 85.4 | 80.26 | 70.55 | 88.2 | 80.1 | 84.98 | 77.28 |
| SegNet with VGG16 | 94.25 | 90.3 | 81.22 | 72.44 | 90.0 | 82.9 | 87.25 | 80.39 |
| SegNet with VGG19 | 99.05 | 97.62 | 84.82 | 75.37 | 91.16 | 84.57 | 91.38 | 85.06 |
Figure 6Test image tomato detection and counting using trained SegNet with VGG19 architecture.
Figure 7Prediction for occlusion (a) original test image, (b) predicted test image with precision (in %).