| Literature DB >> 34539089 |
Yaqing Hou1, Wenkai Zhang1, Qian Liu1, Hongwei Ge1, Jun Meng1, Qiang Zhang1, Xiaopeng Wei1.
Abstract
Computer vision (CV) technologies are assisting the health care industry in many respects, i.e., disease diagnosis. However, as a pivotal procedure before and after surgery, the inventory work of surgical instruments has not been researched with the CV-powered technologies. To reduce the risk and hazard of surgical tools' loss, we propose a study of systematic surgical instrument classification and introduce a novel attention-based deep neural network called SKA-ResNet which is mainly composed of: (a) A feature extractor with selective kernel attention module to automatically adjust the receptive fields of neurons and enhance the learnt expression and (b) A multi-scale regularizer with KL-divergence as the constraint to exploit the relationships between feature maps. Our method is easily trained end-to-end in only one stage with few additional calculation burdens. Moreover, to facilitate our study, we create a new surgical instrument dataset called SID19 (with 19 kinds of surgical tools consisting of 3800 images) for the first time. Experimental results show the superiority of SKA-ResNet for the classification of surgical tools on SID19 when compared with state-of-the-art models. The classification accuracy of our method reaches up to 97.703%, which is well supportive for the inventory and recognition study of surgical tools. Also, our method can achieve state-of-the-art performance on four challenging fine-grained visual classification datasets.Entities:
Keywords: Attention mechanism; Deep learning; Fine-grained classification; Health care
Year: 2021 PMID: 34539089 PMCID: PMC8435567 DOI: 10.1007/s00521-021-06368-x
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.606
Fig. 1Two distinct forceps categories from the proposed SID19 dataset. a Alice forceps with different states, views and angles; b Appendix forceps with different states, views and angles
Fig. 2The framework of our SKA-ResNet consists of two parts. A: A feature extractor with SKA module embedded to extract high expression feature maps; B: A multi-scale regularizer taking relationships between feature maps as a constraint to exploit multi-scale learning
Fig. 3The detailed procedure of our SKA module with two selective kernel branches is illustrated
Fig. 4The example standard residual block with SKA module called SKA block, in which “Residual” refers a sequence of convolutional layers in standard residual block of ResNet
The four columns refer to ResNet50 backbone, SGE-ResNet50, SK-ResNet50 and the proposed SKA-ResNet50, respectively
| Stage | Output | ResNet50 | SGE-ResNet50 | SK-ResNet50 | SKA-ResNet50 |
|---|---|---|---|---|---|
| conv1 | |||||
| conv2 | |||||
| conv3 | |||||
| conv4 | |||||
| conv5 | |||||
| #Params. | 25.56M | 25.56M | 26.15M | 26.15M | |
| GFLOPs | 4.122 | 4.127 | 4.185 | 4.195 | |
Inside the brackets is the general shape of a residual block, including filter sizes and feature dimensionalities. The number of stacked blocks on each stage is presented outside the brackets. All modules are embedded into the end of the standard residual block. #Params. denotes the number of parameters and GFLOPs represents the number of multiply-adds
Fig. 5The contrast between coarse-grained classes and fine-grained classes. a coarse-grained classes: Alice forceps and Tissue tweezers; b fine-grained classes: Alice forceps and Appendix forceps
Performance of ResNet50 and SKA-ResNet50 as a function of batch size
| Batch_size | 16 | 32 | 64 | 128 |
|---|---|---|---|---|
| ResNet50 top-1 acc (%) | 93.026 | 94.817 | 95.014 | 95.313 |
| SKA-ResNet50 top-1 acc (%) | 95.925 | 97.703 | 97.754 | 97.758 |
Performance of ResNet50 and SKA-ResNet50 as a function of scale of input data
| Scale | 112 | 224 | 336 | 448 |
|---|---|---|---|---|
| ResNet50 top-1 acc (%) | 91.341 | 95.014 | 95.483 | 95.804 |
| SKA-ResNet50 top-1 acc (%) | 95.187 | 97.703 | 97.739 | 97.813 |
Ablation performance on SID19 with CBAM layer and different MSL regularizer strategies alternatively employed on ResNet-50 with SKA module
| Method | top-1 acc (%) | |
|---|---|---|
| w/o CBAM layer | w/ CBAM layer | |
| w/o MSL | 97.042 | - |
| 96.985 | 97.117 | |
| 97.476 | 97.611 | |
| 97.002 | 97.183 | |
| 97.538 | 97.646 | |
| 97.497 | 97.619 | |
| 97.602 | ||
Fig. 6Top-1 error curves on SID19 based on ResNet50, SGE-ResNet50, SK-ResNet50 and SKA-ResNet50
Comparing our SKA-ResNet50 with ResNet50, SGE-ResNet50 and SK-ResNet50 on SID19
| Method | Param. | GFLOPS | top-1 acc (%) |
|---|---|---|---|
| ResNet50 [ | 25.56M | 4.122 | 95.014 |
| SGE-ResNet50 [ | 25.56M | 4.127 | 96.203 |
| SK-ResNet50 [ | 26.15M | 4.185 | 96.197 |
| SKA-ResNet50(w/o MSL) | 26.15M | 4.195 |
Experimental results on SID19. All state-of-the-art methods with lightweight attention modules are displayed
| Model | Param. | GFLOPS | top-1 acc (%) |
|---|---|---|---|
| ResNet [ | 25.56M | 4.122 | 95.621 |
| SE-ResNet [ | 28.09M | 4.130 | 96.607 |
| BAM-ResNet [ | 25.92M | 4.205 | 96.622 |
| CBAM-ResNet [ | 28.09M | 4.139 | 96.618 |
| SK-ResNet [ | 26.15M | 4.185 | 96.947 |
| SGE-ResNet [ | 25.56M | 4.127 | 96.913 |
| SKA-ResNet(ours) | 26.15M | 4.195 |
That all these methods are implemented with the proposed multi-scale regularizer
Experimental results based on specific attention-based methods for FGVC
| Method | Backbone | 1-Stage | top-1 acc (%) |
|---|---|---|---|
| FCAN [ | ResNet50 | 97.227 | |
| RA-CNN [ | VGG-19 | 96.932 | |
| MA-CNN [ | VGG-19 | 96.859 | |
| MAMC [ | ResNet50 | 96.883 | |
| DT-RAM [ | ResNet50 | 96.765 | |
| DFL-CNN [ | ResNet50 | 97.197 | |
| NTS-Net [ | ResNet50 | 97.314 | |
| SKA-ResNet(ours) | ResNet50 |
The third column indicates whether the method is trained and tested in one stage or not
Experimental results based on other methods for FGVC
| Method | Backbone | Top-1 acc (%) |
|---|---|---|
| VGG-D | 94.978 | |
| Bilinear Pooling [ | ResNet50 | 96.574 |
| Compact Bilinear Pooling [ | ResNet50 | 96.697 |
| iSQRT-COV [ | ResNet50 | 97.185 |
| ResNet50 | 97.220 | |
| Cross-X [ | SENet | 97.213 |
| SGENeXt(ours) | ResNet50 |
The displayed methods include high-order statistics learning and multi-scale feature relationship learning
Comparison of our approach to recent results on four standard FGVC datasets: CUB-200-2011, Stanford Cars, Stanford Dogs and FGVC-Aircraft
| Method | Accuracy(%) | |||
|---|---|---|---|---|
| Birds | Cars | Dogs | Aircraft | |
| ResNet-50 [ | 84.5 | 92.9 | 88.1 | 90.3 |
| FCAN [ | 84.7 | 93.1 | 88.9 | – |
| RA-CNN [ | 85.3 | 92.5 | 87.3 | 88.2 |
| MA-CNN [ | 86.5 | 92.8 | – | 89.9 |
| MAMC [ | 86.2 | 93.0 | 84.8 | – |
| DFL-CNN [ | 87.4 | 93.1 | – | 91.7 |
| NTS-Net [ | 87.5 | 93.9 | – | 91.4 |
| B-CNN [ | 84.1 | 91.3 | – | 84.1 |
| iSQRT-COV [ | 88.1 | 92.8 | – | 90.0 |
| Cross-X [ | 87.7 | 94.6 | 88.9 | 92.6 |
| SKA-ResNet(ours) | ||||
Fig. 7Visualization results of ResNet50, SK-ResNet50, SGE-ResNet50 and SKA-ResNet50. The activation map is calculated for the last convolutional outputs. The ground-truth label is shown on the top of each input image and P denotes the softmax score of each network for the ground-truth class