| Literature DB >> 35890857 |
Guillermo Sánchez-Brizuela1, Francisco-Javier Santos-Criado2, Daniel Sanz-Gobernado1, Eusebio de la Fuente-López1, Juan-Carlos Fraile1, Javier Pérez-Turiel1, Ana Cisnal1.
Abstract
Medical instruments detection in laparoscopic video has been carried out to increase the autonomy of surgical robots, evaluate skills or index recordings. However, it has not been extended to surgical gauzes. Gauzes can provide valuable information to numerous tasks in the operating room, but the lack of an annotated dataset has hampered its research. In this article, we present a segmentation dataset with 4003 hand-labelled frames from laparoscopic video. To prove the dataset potential, we analyzed several baselines: detection using YOLOv3, coarse segmentation, and segmentation with a U-Net. Our results show that YOLOv3 can be executed in real time but provides a modest recall. Coarse segmentation presents satisfactory results but lacks inference speed. Finally, the U-Net baseline achieves a good speed-quality compromise running above 30 FPS while obtaining an IoU of 0.85. The accuracy reached by U-Net and its execution speed demonstrate that precise and real-time gauze segmentation can be achieved, training convolutional neural networks on the proposed dataset.Entities:
Keywords: convolutional neural networks; image object detection; image segmentation; minimally invasive surgery; surgical tool detection
Mesh:
Year: 2022 PMID: 35890857 PMCID: PMC9319965 DOI: 10.3390/s22145180
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1The videos have been recorded using a laparoscopic simulator with a Storz Telecam.
Figure 2Different frames sampled from the dataset.
Figure 3Colorized outline of gauze masks from the dataset over their correspondent frame. Best viewed in color.
Video-wise train and evaluation splits.
| Split | Videos |
|---|---|
| Train | VID00 {02, 03, 06, 07, 10, 13, 17, 18, 22, 23, 25, 30} |
| Evaluation | VID00 {04, 11, 16, 21, 24, 28} |
Distribution of fragments and masks in each split.
| Split | Fragments | Masks | |
|---|---|---|---|
| Gauze | No Gauze | ||
| Train | 61,860 (76.45%) | 64,938 (74.46%) | 3019 (75.42%) |
| Evaluation | 19,058 (23.55%) | 22,270 (25.54%) | 984 (24.58%) |
Figure 4Each video frame is divided into square cells of size 100 × 100 to classify each cell as gauze or background. Cells on the left are discarded by its irrelevant dimensions and those in the last row are partially overlapped with the previous row to maintain the 100 × 100 size.
Summary of the tested CNN architectures.
| Network | Layers | Parameters | Input Size |
|---|---|---|---|
| InceptionV3 | 48 | 23.8 | 299 × 299 |
| MobileNetV2 | 28 | 2.5 | 224 × 224 |
| ResNet-50 | 50 | 25.6 | 224 × 224 |
Technical specifications of the computing system.
| Hardware | Model |
|---|---|
| Processor | AMD Ryzen 7 3800X |
| Memory | DDR4 16GB × 2 (3000 MHz) |
| Graphics Processing Unit | NVIDIA GeForce RTX 3070, 8GB GDDR6 VRAM |
| Operating System | Ubuntu 20.0 LTS 64 bits |
Overall results obtained using the evaluation videos.
| Network | Precision [%] | Recall [%] | F1 Score [%] | mAP [%] | FPS |
|---|---|---|---|---|---|
| DarkNet-53 | 94.34 | 76.00 | 84.18 | 74.61 | 34.94 |
Figure 5Results obtained with YOLOv3 on (a) blood-stained gauze and (b) both clean and blood-stained gauzes in the same image.
Figure 6YOLOv3 detection errors. (a) The surgical tool has been marked as gauze due to a large glint on its metallic surface. (b) Gauze in the lower right corner has been overlooked.
Overall results provided by the pretrained networks in the evaluation videos.
| Network | Precision [%] | Accuracy [%] | Recall [%] | F1 Score [%] | MCC [%] | FPS |
|---|---|---|---|---|---|---|
| InceptionV3 | 75.67 | 77.68 | 89.08 | 81.82 | 54.60 | 21.78 |
| MobileNetV2 | 84.23 | 75.67 | 76.68 | 80.28 | 49.08 | 18.77 |
| ResNet-50 | 94.09 | 90.16 | 91.31 | 92.67 | 77.76 | 13.70 |
Figure 7Results obtained using (a) ResNet-50, (b) MobileNetV2 and (c) InceptionV3. Green squares denote the fragments that have been classified as gauze.
Results of the U-Net model in the evaluation videos.
| Model | IoU | Frames per Second (FPS) |
|---|---|---|
| U-Net | 0.8531 | 31.88 |
Figure 8Segmentation results on the evaluation set. Left column: Input image, center column: hand-labeled ground truth, right column: U-Net prediction. Each row (a–d) corresponds to a different frame from the evaluation videos. Best viewed in color.
Overview of the characteristics of each video.
| Video | Gauzes | Stained | Objects | Camera | Duration |
|---|---|---|---|---|---|
| VID0002 | 1 | Yes | Tools (1) | Movement | 0 min 49 s |
| VID0003 | 1 | Yes | Tools (1) | Movement | 1 min 3 s |
| VID0004 | 1 | Yes | Tools (1) | Movement | 1 min 7 s |
| VID0005 | 0 | - | None | Movement | 1 min 7 s |
| VID0006 | 1 | Yes | Tools (1) | Movement | 1 min 40 s |
| VID0007 | 1 | Yes | Tools (1), Plastic bag | Movement | 2 min 45 s |
| VID0008 | 0 | - | Tools (1) | Movement | 2 min 43 s |
| VID0009 | 0 | - | None | Movement | 0 min 23 s |
| VID0010 | 1 | Yes | Tools (1) | Movement | 1 min 42 s |
| VID0011 | 1 | Yes | Tools (1) | Movement | 1 min 15 s |
| VID0012 | 0 | - | Tools (1) | Movement | 0 min 54 s |
| VID0013 | 1 | Yes | Tools (1) | Movement | 0 min 32 s |
| VID0014 | 0 | - | Tools (2) | Movement | 1 min 19 s |
| VID0015 | 0 | - | Tools (2) | Movement | 0 min 44 s |
| VID0016 | 1 | Yes | Tools (2) | Static | 1 min 1 s |
| VID0017 | 1 | Yes | Tools (1) | Static | 0 min 50 s |
| VID0018 | 1 | Yes | Tools (2), Plastic bag | Static | 0 min 20 s |
| VID0019 | 0 | - | Tools (1), Plastic bag | Static | 0 min 52 s |
| VID0020 | 0 | - | None | Movement | 0 min 25 s |
| VID0021 | 1 | Yes | Tools (1) | Movement | 0 min 41 s |
| VID0022 | 1 | No | Tools (1) | Movement | 1 min 28 s |
| VID0023 | 2 | Both | Tools (1) | Movement | 2 min 47 s |
| VID0024 | 2 | Both | Tools (1), Plastic bag | Movement | 1 min 40 s |
| VID0025 | 2 | Both | Tools (1), Plastic bag | Movement | 1 min 1 s |
| VID0026 | 0 | - | None | Movement | 0 min 11 s |
| VID0027 | 0 | - | Tools (1) | Movement | 2 min 53 s |
| VID0028 | 2 | Both | Tools (1) | Static | 0 min 48 s |
| VID0029 | 0 | - | Tools (1) | Movement | 0 min 48 s |
| VID0030 | 2 | Both | Tools (1) | Movement | 5 min 7 s |
| VID0031 | 0 | - | None | Movement | 0 min 33 s |
| VID0100 | 1 | No | Tools (1) | Static | 0 min 38 s |
| VID0101 | 1 | No | Tools (1) | Static | 0 min 33 s |
| VID0102 | 1 | No | Tools (1) | Static | 0 min 43 s |
| VID0103 | 1 | Yes | Tools (1) | Static | 0 min 28 s |
| VID0104 | 1 | Yes | Tools (1) | Movement | 0 min 39 s |
| VID0105 | 1 | Yes | Tools (1) | Movement | 0 min 39 s |
| VID0106 | 1 | Yes | Tools (1) | Static | 0 min 24 s |
| VID0107 | 1 | Yes | Tools (1) | Static | 0 min 27 s |
| VID0108 | 1 | Yes | Tools (1) | Static | 0 min 32 s |
| VID0110 | 1 | Both | Tools (1) | Static | 0 min 51 s |
| VID0111 | 1 | Yes | Tools (1) | Static | 0 min 27 s |
| VID0112 | 1 | Yes | Tools (1) | Static | 0 min 19 s |