Literature DB >> 34693536

Vision Transformer-based recognition of diabetic retinopathy grade.

Jianfang Wu1, Ruo Hu1, Zhenghong Xiao1, Jiaxu Chen2, Jingwei Liu3.   

Abstract

BACKGROUND: In the domain of natural language processing, Transformers are recognized as state-of-the-art models, which opposing to typical convolutional neural networks (CNNs) do not rely on convolution layers. Instead, Transformers employ multi-head attention mechanisms as the main building block to capture long-range contextual relations between image pixels. Recently, CNNs dominated the deep learning solutions for diabetic retinopathy grade recognition. However, spurred by the advantages of Transformers, we propose a Transformer-based method that is appropriate for recognizing the grade of diabetic retinopathy.
PURPOSE: The purposes of this work are to demonstrate that (i) the pure attention mechanism is suitable for diabetic retinopathy grade recognition and (ii) Transformers can replace traditional CNNs for diabetic retinopathy grade recognition.
METHODS: This paper proposes a Vision Transformer-based method to recognize the grade of diabetic retinopathy. Fundus images are subdivided into non-overlapping patches, which are then converted into sequences by flattening, and undergo a linear and positional embedding process to preserve positional information. Then, the generated sequence is input into several multi-head attention layers to generate the final representation. The first token sequence is input to a softmax classification layer to produce the recognition output in the classification stage.
RESULTS: The dataset for training and testing employs fundus images of different resolutions, subdivided into patches. We challenge our method against current CNNs and extreme learning machines and achieve an appealing performance. Specifically, the suggested deep learning architecture attains an accuracy of 91.4%, specificity = 0.977 (95% confidence interval (CI) (0.951-1)), precision = 0.928 (95% CI (0.852-1)), sensitivity = 0.926 (95% CI (0.863-0.989)), quadratic weighted kappa score = 0.935, and area under curve (AUC) = 0.986.
CONCLUSION: Our comparative experiments against current methods conclude that our model is competitive and highlight that an attention mechanism based on a Vision Transformer model is promising for the diabetic retinopathy grade recognition task.
© 2021 American Association of Physicists in Medicine.

Entities:  

Keywords:  Vision Transformer; deep learning; diabetic retinopathy; multi-head attention

Mesh:

Year:  2021        PMID: 34693536     DOI: 10.1002/mp.15312

Source DB:  PubMed          Journal:  Med Phys        ISSN: 0094-2405            Impact factor:   4.071


  3 in total

1.  Swin Transformer Improves the IDH Mutation Status Prediction of Gliomas Free of MRI-Based Tumor Segmentation.

Authors:  Jiangfen Wu; Qian Xu; Yiqing Shen; Weidao Chen; Kai Xu; Xian-Rong Qi
Journal:  J Clin Med       Date:  2022-08-08       Impact factor: 4.964

2.  Predicting demographic characteristics from anterior segment OCT images with deep learning: A study protocol.

Authors:  Yun Jeong Lee; Sukkyu Sun; Young Kook Kim
Journal:  PLoS One       Date:  2022-08-11       Impact factor: 3.752

3.  A more effective CT synthesizer using transformers for cone-beam CT-guided adaptive radiotherapy.

Authors:  Xinyuan Chen; Yuxiang Liu; Bining Yang; Ji Zhu; Siqi Yuan; Xuejie Xie; Yueping Liu; Jianrong Dai; Kuo Men
Journal:  Front Oncol       Date:  2022-08-25       Impact factor: 5.738

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.