| Literature DB >> 35624404 |
Felix Nickel1, Lena Maier-Hein2,3,4, Lucas-Raphael Müller5,6, Jens Petersen7, Amine Yamlahi2, Philipp Wise8, Tim J Adler2,3, Alexander Seitel2, Karl-Friedrich Kowalewski9, Beat Müller8, Hannes Kenngott8.
Abstract
PURPOSE: As human failure has been shown to be one primary cause for post-operative death, surgical training is of the utmost socioeconomic importance. In this context, the concept of surgical telestration has been introduced to enable experienced surgeons to efficiently and effectively mentor trainees in an intuitive way. While previous approaches to telestration have concentrated on overlaying drawings on surgical videos, we explore the augmented reality (AR) visualization of surgical hands to imitate the direct interaction with the situs.Entities:
Keywords: Computer vision; Deep learning; Hand tracking; Surgical data science; Telestration
Mesh:
Year: 2022 PMID: 35624404 PMCID: PMC9307534 DOI: 10.1007/s11548-022-02637-9
Source DB: PubMed Journal: Int J Comput Assist Radiol Surg ISSN: 1861-6410 Impact factor: 3.421
Fig. 1Our telestration approach compared to the state of the art. a Previous approaches to surgical telestration rely on overlaying drawings on laparoscopic videos, while our concept is based on b the augmented reality (AR) visualization of the expert surgeon’s hand
Fig. 2Concept overview. Our approach to surgical telestration relies on a camera that continuously captures a hand of the mentor who observes the operation either on-site or remotely. The camera data are processed by a two-stage neural network, which outputs both the skeleton (represented by 21 keypoints) and the segmented hand. The hand segmentation is overlaid on the surgical screen for intuitive coaching, while the skeleton representation is stored for long-term analysis
Fig. 3Overview of the models used for real-time hand tracking. Our approach comprises three core components. (1) a bounding box module using the YOLOv5s architecture, (2) a skeleton tracking module using an EfficientNet B3 and (3) a segmentation module using a FPN-EfficientNet B1. (2) and (3) operate on images cropped to the respective bounding boxes (see “Real-time hand localization”, “Real-time skeleton tracking”, “Real-time hand segmentation” section)
Fig. 5Skeleton tracking performance for our method (orange) vs. MediaPipe (blue) as the baseline. Fraction of successful localizations (left) and mean regression distance (right) for successful localizations and validated with respect to the different hand properties. Note that for MediaPipe there are only very few successful localizations for blue gloves and none for green ones
Fig. 4Representative results for a diverse set of gestures. The outputs of the three models for bounding box prediction (top) as well as skeleton tracking and segmentation (bottom) are shown
Fig. 6Representative failure cases of the skeleton extraction model (top row) and the segmentation model (bottom row)
Fig. 7Results of the prospective validation study. Skeleton tracking performance (upper row), quantified by mean regression distance and hand tracking performance (lower row), quantified by the dice similarity coefficient (DSC) are shown for the camera used in the training data set (D435i) as well as a previously unseen camera (L515). Each color corresponds to a different mentor. No notable differences were obtained for the different gestures