| Literature DB >> 34249054 |
Prabhakar Maheswari1, Purushothaman Raja1, Orly Enrique Apolo-Apolo2, Manuel Pérez-Ruiz2.
Abstract
Smart farming employs intelligent systems for every domain of agriculture to obtain sustainable economic growth with the available resources using advanced technologies. Deep Learning (DL) is a sophisticated artificial neural network architecture that provides state-of-the-art results in smart farming applications. One of the main tasks in this domain is yield estimation. Manual yield estimation undergoes many hurdles such as labor-intensive, time-consuming, imprecise results, etc. These issues motivate the development of an intelligent fruit yield estimation system that offers more benefits to the farmers in deciding harvesting, marketing, etc. Semantic segmentation combined with DL adds promising results in fruit detection and localization by performing pixel-based prediction. This paper reviews the different literature employing various techniques for fruit yield estimation using DL-based semantic segmentation architectures. It also discusses the challenging issues that occur during intelligent fruit yield estimation such as sampling, collection, annotation and data augmentation, fruit detection, and counting. Results show that the fruit yield estimation employing DL-based semantic segmentation techniques yields better performance than earlier techniques because of human cognition incorporated into the architecture. Future directions like customization of DL architecture for smart-phone applications to predict the yield, development of more comprehensive model encompassing challenging situations like occlusion, overlapping and illumination variation, etc., were also discussed.Entities:
Keywords: deep learning; fruit detection and localization; precision agriculture; semantic segmentation; yield estimation
Year: 2021 PMID: 34249054 PMCID: PMC8267528 DOI: 10.3389/fpls.2021.684328
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
FIGURE 1Illustration of a typical Convolutional Neural Network (CNN).
FIGURE 2Intelligent fruit yield estimation in orchards.
Various sampling methods for tree sampling in an orchard.
| Sampling techniques | Description | Merits | Demerits |
| Simple random sampling ( | Randomly selecting samples from the whole population. | Completely represent the entire population. | Expensive and time-consuming. |
| Systematic random sampling ( | Provides an improved tradeoff between the precision of the estimator and the sampling interval. | Faster than simple random sampling. | Realization is difficult without knowing all the members in the population. |
| Stratified sampling ( | From the whole population, strata or sub categories are considered. Samples are taken from these strata randomly. | High precision and requires smaller samples. | If the sub category is not properly chosen, it is challenging to represent the entire population. |
| Smooth fractionator ( | Systemic sampling is applied to each uniquely (based on shape, size, texture, etc.) divided unit from the whole population for efficient sampling. | Robust for the heterogeneous population. | When the population of interest is sparsely distributed, it is inefficient. |
| Cluster sampling ( | Suitable for large and complex populations. | Minimum resources for the sampling process. | High sampling error. |
| Multistage sampling ( | At various levels, sampling is performed. | More flexible. | Large number of errors due to clusters in different stages. |
| Probability proportional to size sampling ( | Each population has a size before sampling, which is proportional to the probability of selecting a unit. | Well suited for sparsely distributed populations. | Reduced precision when sampling more clustered units. |
FIGURE 3Different types of camera models using various sensor technologies.
Different types of camera models available on the market.
| Sensors | Model | Resolution | Sensor size | References |
| Black and white sensor | Leica Q2 | 8,368 × 5,584 | 36 × 24 mm | |
| Nikon z7 | 3,840 × 2,160 | 35.9 × 23.9 mm | ||
| Canon EOS 5D | 4,368 × 2,912 | 36 × 24 mm | ||
| RGB sensor | Sony a5100 | 6,000 × 4,000 | 23.5 × 15.6 mm | |
| Ricoh GR III | 6,000 × 4,000 | 23.5 × 15.6 mm | ||
| Fujifilm X-E3 | 6,000 × 4,000 | 23.5 × 15.6 mm | ||
| Thermal sensor | Flir c2 | 320 × 240 | 128 × 96 mm | |
| Testo 871 | 320 × 240 | Not available | ||
| Fluke TI450 | 320 × 240 | Not available | ||
| Multispectral sensor | AGX710 | 12.3 MP | 89 × 88 × 98 mm | |
| MSC-AGRI-1-A | 512 × 512 | 5.5 × 5.5 μm | ||
| MSC-RGBN-1-A | 512 × 512 | 5.5 × 5.5 μm | ||
| Hyperspectral sensor | MC124MG-SY | 4,112 × 3,008 | 14.2 × 10.4 mm | |
| MQ022HG | 2,048 × 1,088 | 11.3 × 6.0 mm | ||
| OCI-UAV-1000 | 2048 | Not available |
FIGURE 4Random transformations intended for data augmentation.
FIGURE 5Fruit detection and localization using various DL-based semantic segmentation techniques.
CNN with pixel-based prediction literature.
| Methodology | Authors and year | Dataset | Results |
| Apple detection and yield estimation using multilayered perceptron and CNN | 8,000 images of 1,232 × 1,616 pixel, each 32 sub images of 308 × 202 pixels obtained from each image | F1 score was 0.791, and detection F1 score was 0.861. Squared correlation coefficient R2 was 0.826. | |
| Mango fruit detection and localization using multiple view geometry | 71,609 mangoes scanned from 522 trees | Single view squared correlation coefficient R2 was 0.81, dual view and multi view R2 was ≥ 0.90. | |
| Apple yield estimation using multi-scale sparse auto encoder feature learning method | 8,000 apple images of the dataset Image size of 1,232 × 1,616 pixels | Squared correlation coefficient, R2 was 0.81. Global accuracy was 92.5%, average accuracy was 85.1%, and F1 score was 87.3%. |
FIGURE 6Fully Convolutional Network (FCN).
FIGURE 7Encoder-decoder architecture of SegNet.
Literature studies for fruit yield estimation using the R-CNN and single shot detectors.
| Work | Authors and year | Dataset | Results |
| Citrus fruit yield and size estimation using faster RCNN | Images taken from (sample) 20 trees of citrus grove using a UAV during 3 consecutive campaigns | Standard error of 13.74 and 7.22% by manual and processed model predictions, respectively. | |
| Orange fruit detection using faster mask R- CNN | Original image size was 2,816 × 1,880. Sub images of 150 were obtained with a pixel size of 256 × 256 for training. RGB and HSV multimodal data were used. | For RGB images, F1 score and precision were 0.88 and 0.89, respectively. For the mixture of RGB and HSV images, F1 score and precision were 0.88 and 0.97, respectively. | |
| Apple fruit detection and counting using U-Net, GMM, and faster R-CNN | 103 images of 1920 × 1080 pixel size | Overall accuracy using different architectures lies between 95.56 and 97.83%. | |
| Citrus fruit detection using mask R-CNN | 200 images of 800 × 800 pixel size | Detection accuracy was 97% | |
| Kiwifruit detection using faster R-CNN with Zeiler and Fergus Network (ZFNet) | Training phase: 700 field images captured with a 2352 × 1568 pixel size. 2100 sub-images with784 × 784 pixel size Testing phase: 100 field images | Average precision during training was 89.3%. Occluded fruit was 82.5%. Overlapping fruit was 85.6%. Adjacent fruit was 94.3%. Separated fruit was 96.7%. Overall recognition ratio was 92.3%. | |
| Grape detection using mask R-CNN, YOLOv2 and YOLOv3 | 300 images with 4,432 boxed clusters and 2,020 masked clusters | F1 score of test set was 0.889, precision was 0.92, and recall was 0.86. | |
| Apple and pear fruit detection using modified YOLOv2 | Original images: Apple: 5,000 images Augmented images: 20,000 | F1 score before and after augmentation was 0.79 and 0.90, respectively. | |
| Mango fruit load estimation using MangoYOLO | Two sets of video (with low and high frames) were taken to assess the performance of MangoYOLO architecture. First test set: 110 frames and second test set is 1162 frames | R2 values of 0.665 and 0.988 were achieved for the first and second test set, respectively. | |
| Apple, almond and mango detection using faster R-CNN | Training images: Apple: 729, Almond: 385 and Mango: 1,154. Testing images: Apple: 112, Almond: 100 and Mango: 270. | F1 score: Apple: 0.904. Almond: 0.775. Mango: 0.908. |
Performance metrics used for evaluating semantic segmentation architectures.
| Performance metric | Description | Formulae |
| Root Mean Squared Error (RMSE) | Measures the squared difference between the actual output and predicted output | |
| Squared correlation coefficient (R2) | Measures the squared value of the linear relationship between two variables. | |
| Pixel Accuracy (P | Measures the number of pixels classified correctly in each class | |
| Precision (P) | Corresponds to the accurate detection of fruits | |
| Recall (R) | The architecture efficiency is usually measured by the metric of recall | |
| F1 score (F1) | The entire fruit detection performance is indicated by the F1 score, which gives the harmonic mean value of precision and recall. | |
| Intersection over Union (IoU) | Measures the ratio between the intersection and union of the ground truth pixels and the predicted pixels of the segmented output for each class of the image |
FIGURE 8Simple and complex structures of mango orchard.
FIGURE 9Data collection and annotation.