| Literature DB >> 34655392 |
Jinglu Zhang1, Yinyu Nie2, Yao Lyu1, Xiaosong Yang1, Jian Chang1, Jian Jun Zhang1.
Abstract
PURPOSE: Surgical gesture recognition has been an essential task for providing intraoperative context-aware assistance and scheduling clinical resources. However, previous methods present limitations in catching long-range temporal information, and many of them require additional sensors. To address these challenges, we propose a symmetric dilated network, namely SD-Net, to jointly recognize surgical gestures and assess surgical skill levels only using RGB surgical video sequences.Entities:
Keywords: Self-attention; Surgical gesture recognition; Surgical skill assessment; Temporal convolutional network
Mesh:
Year: 2021 PMID: 34655392 PMCID: PMC8580939 DOI: 10.1007/s11548-021-02495-x
Source DB: PubMed Journal: Int J Comput Assist Radiol Surg ISSN: 1861-6410 Impact factor: 2.924
Fig. 1The multi-task architecture for joint surgical gesture segmentation and skill assessment
Fig. 2The architecture of SD-Net. The network features a symmetric structure that encode and decode signals with dilated convolutions to aggregate spatial features from hierarchical temporal span. A self-attention module is designed in the middle to bridge the global frame-to-frame adjacency across the full temporal domain. Each node represents a feature vector computed from previous nodes
Fig. 3Self-attention module in SD-Net
Performance comparison of surgical gesture recognition on suturing task averaged over eight cross-validation runs under LOUO scheme. Acc., Edit and F1@{10, 25, 50} represent the frame-wise accuracy, edit distance and F1 score in different thresholds
| Suturing (LOUO) | Acc. | Edit | F1@10 | F1@25 | F1@50 |
|---|---|---|---|---|---|
| Bi-LSTM [ | 77.4 | 66.8 | 77.8 | – | – |
| ED-TCN [ | 80.8 | 84.7 | 89.2 | - | – |
| RL [ | 81.43 | 87.96 | 92.0 | 90.5 | 82.2 |
| 3D-CNN [ | 84.3 | 80.0 | 87.0 | – | – |
| C3D-MTL-VF [ | 82.0 | 86.6 | 90.6 | 89.1 | 80.3 |
| SD-Net [ | 90.1 | 89.9 | 92.5 | 92.0 | 88.2 |
| SD-Net (w. multi-task) |
Bold values indicate the comparing with all the listed methods, which one reaches the highest evaluation score of that column (evaluation metric)
Ablation experiments of surgical gesture recognition on suturing task averaged over eight cross-validation runs under LOUO scheme. Acc., Edit and F1@{10, 25, 50}, represent the frame-wise accuracy, edit distance and F1 score in different thresholds
| Suturing (LOUO) | Acc. | Edit | F1@10 | F1@25 | F1@50 |
|---|---|---|---|---|---|
| Self-attn only | 87.8 | 44.0 | 54.8 | 53.5 | 49.0 |
| Dilation only | 90.1 | 76.8 | 81.9 | 81.5 | 78.5 |
| Encoder dilation+ attn | 76.9 | 82.5 | 81.8 | 79.3 | |
| Decoder dilation +attn | 90.5 | 77.9 | 83.4 | 83.4 | 79.7 |
| Symmetric dilation + attn | 90.7 | 83.7 | 87.7 | 86.9 | 83.6 |
| Symmetric dilation + attn + pooling | 90.1 | 89.9 | 92.5 | 92.0 | 88.2 |
| SD-Net (w. multi-task) | 90.5 |
Bold values indicate the comparing with all the listed methods, which one reaches the highest evaluation score of that column (evaluation metric)
Fig. 4Visualization results from ablation study
Fig. 5Confusion matrix of the suturing task, obtained from LOUO evaluation