| Literature DB >> 35062507 |
Manuel Córdova1, Allan Pinto2, Christina Carrozzo Hellevik3, Saleh Abdel-Afou Alaliyat4, Ibrahim A Hameed4, Helio Pedrini1, Ricardo da S Torres4,5.
Abstract
Pollution in the form of litter in the natural environment is one of the great challenges of our times. Automated litter detection can help assess waste occurrences in the environment. Different machine learning solutions have been explored to develop litter detection tools, thereby supporting research, citizen science, and volunteer clean-up initiatives. However, to the best of our knowledge, no work has investigated the performance of state-of-the-art deep learning object detection approaches in the context of litter detection. In particular, no studies have focused on the assessment of those methods aiming their use in devices with low processing capabilities, e.g., mobile phones, typically employed in citizen science activities. In this paper, we fill this literature gap. We performed a comparative study involving state-of-the-art CNN architectures (e.g., Faster RCNN, Mask-RCNN, EfficientDet, RetinaNet and YOLO-v5), two litter image datasets and a smartphone. We also introduce a new dataset for litter detection, named PlastOPol, composed of 2418 images and 5300 annotations. The experimental results demonstrate that object detectors based on the YOLO family are promising for the construction of litter detection solutions, with superior performance in terms of detection accuracy, processing time, and memory footprint.Entities:
Keywords: citizen science; deep learning; litter; litter detection; machine learning; marine litter; neural networks; object detection; portable devices
Mesh:
Year: 2022 PMID: 35062507 PMCID: PMC8812282 DOI: 10.3390/s22020548
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Litter detection in real scenarios.
Comparison of deep learning approaches for object detection.
| CNN Methods | One-Stage | Two-Stage | Backbone | Inference Scale | Model Size |
|---|---|---|---|---|---|
| Faster R-CNN | ✓ | VGG-16 [ | Shorter side 600 |
| |
| ✓ | ZF [ | Shorter side 600 |
| ||
| Mask R-CNN | ✓ | ResNet-101-FPN | Shorter side 600 |
| |
| RetinaNet | ✓ | ResNet-50-FPN |
|
| |
| ResNet-101-FPN |
|
| |||
| ResNet-101-FPN |
|
| |||
| EfficientDet | ✓ | EfficientNet-B0-BiFPN |
|
| |
| ✓ | EfficientNet-B1-BiFPN |
|
| ||
| ✓ | EfficientNet-B2-BiFPN |
|
| ||
| ✓ | EfficientNet-B3-BiFPN |
|
| ||
| ✓ | EfficientNet-B4-BiFPN |
|
| ||
| ✓ | EfficientNet-B5-BiFPN |
|
| ||
| ✓ | EfficientNet-B6-BiFPN |
|
| ||
| ✓ | EfficientNet-B7-BiFPN |
|
| ||
| YOLO | ✓ | Own |
| – | |
| ✓ | VGG-16 [ |
| – | ||
| YOLOv2 | ✓ | Own-Darknet-19 |
| 194.0 | |
| ✓ | Own-Darknet-19 |
| 194.0 | ||
| YOLOv3 | ✓ | Own-Darknet-53 |
| 237.0 | |
| ✓ | Own-Darknet-53 |
| 237.0 | ||
| ✓ | Own-Darknet-53 |
| 237.0 | ||
| YOLOv4 | ✓ | CSPDarknet53 [ |
| 246.0 | |
| ✓ | CSPDarknet53 [ |
| 246.0 | ||
| ✓ | CSPDarknet53 [ |
| 246.0 | ||
| YOLOv5 | ✓ | Own-5s |
|
| |
| ✓ | Own-5m |
|
| ||
| ✓ | Own-5l |
|
| ||
| ✓ | Own-5x |
|
|
Datasets employed in the comparative study.
| # | # Bounding Boxes by Area | # | |||
|---|---|---|---|---|---|
| Dataset | Images | Small | Medium | Large | Annotations |
| TACO [ | 1500 | 384 | 1305 | 3095 | 4784 |
| PlastOpol | 2418 | 33 | 445 | 4822 | 5300 |
1 Small→area ≤ 322. 2 Medium→322 < area ≤ 962. 3 Large→area > 962.
Figure 2Examples from PlastOPol dataset. (a) Types of litter. (b) Types of environment. (c) Natural Backgrounds. (d) Occlusion. (e) Lighting.
Figure 3Bounding boxes by size. (a) PlastOPol. (b) TACO [37]. (c) MJU-waste [33].
Figure 4Bounding boxes by location. (a) PlastOPol. (b) TACO [37]. (c) MJU-waste [33].
Figure 5Examples from TACO dataset.
Training protocol values.
| Hyper-Parameters | |||||||
|---|---|---|---|---|---|---|---|
| Method | Input Size |
| Epochs | Batch Size |
| Post-Processing | Confidence Threshold |
| EfficientDet-d0 |
|
| 300 | 48 | 200, 250 | Soft-NMS |
|
| EfficientDet-d5 |
|
| 300 | 12 | 200, 250 | Soft-NMS |
|
| Faster R-CNN |
|
| 300 | 8 | 243 | NMS |
|
| Mask R-CNN |
|
| 300 | 8 | 243 | NMS |
|
| RetinaNet |
|
| 300 | 8 | 243 | NMS |
|
| YOLO-v5x |
|
| 100 | 12 | – | NMS |
|
| YOLO-v5s |
|
| 100 | 12 | – | NMS |
|
1https://github.com/google/automl/tree/master/efficientdet (accessed on 2 July 2021). 2 https://github.com/facebookresearch/detectron2 (accessed on 2 July 2021). 3 https://github.com/ultralytics/yolov5 (accessed on 2 July 2021). Lr—learning rate. Lr—epochs in which the learning rate is decayed.
Litter detection results on PlastOPol (best results appear in bold).
| Methods | AP50 | AP@ | AR@ | F1@ |
|---|---|---|---|---|
| RetinaNet [ |
|
|
|
|
| Faster R-CNN [ |
|
|
|
|
| Mask R-CNN [ |
|
|
|
|
| EfficientDet-d0 [ |
|
|
|
|
| EfficientDet-d5 [ |
|
|
|
|
| YOLO-v5s [ |
|
|
|
|
|
|
|
|
|
|
PlastOPol visual results.
| Ground Truth | Faster R-CNN | Mask R-CNN | RetinaNet | EfficientDet-d0 | EfficientDet-d5 | YOLO-v5s | YOLO-v5x |
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Litter detection results on TACO (best results appear in bold).
| Methods | AP50 | AP@ | AR@ | F1@ |
|---|---|---|---|---|
| RetinaNet [ |
|
|
|
|
| Faster R-CNN [ |
|
|
|
|
| Mask R-CNN [ |
|
|
|
|
| EfficientDet-d0 [ |
|
|
|
|
| EfficientDet-d5 [ |
|
|
|
|
| YOLO-v5s [ |
|
|
|
|
|
|
|
|
|
|
TACO visual results.
| Ground Truth | Faster R-CNN | Mask R-CNN | RetinaNet | EfficientDet-d0 | EfficientDet-d5 | YOLO-v5s | YOLO-v5x |
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 6Efficiency (GPU) vs. effectiveness vs. model size. (a) PlastOPol. (b) TACO.
Figure 7Efficiency on Motorola Moto-G6.