Shuai Wang1, Yang Cong2, Jun Cao3, Yunsheng Yang4, Yandong Tang2, Huaici Zhao5, Haibin Yu6. 1. State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Nanta Street 114, Shenyang 110016, China; University of Chinese Academy of Sciences, No. 19A Yuquan Road, Beijing 100049, China. Electronic address: shuaiwang@sia.cn. 2. State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Nanta Street 114, Shenyang 110016, China. 3. Department of Computer Science, Arizona State University, 1711 South Rural Road, Tempe, AZ 85287, USA. 4. Department of Gastroenterology and Hepatology, Chinese PLA General Hospital, 28 Fuxing Road, Beijing 100000, China. 5. Key Laboratory of Image Understanding and Computer Vision, Shenyang Institute of Automation, Chinese Academy of Sciences, Nanta Street 114, Shenyang 110016, China. 6. Key Laboratory of Networked Control Systems, Shenyang Institute of Automation, Chinese Academy of Sciences, Nanta Street 114, Shenyang 110016, China.
Abstract
OBJECTIVE: This paper aims at developing an automated gastroscopic video summarization algorithm to assist clinicians to more effectively go through the abnormal contents of the video. METHODS AND MATERIALS: To select the most representative frames from the original video sequence, we formulate the problem of gastroscopic video summarization as a dictionary selection issue. Different from the traditional dictionary selection methods, which take into account only the number and reconstruction ability of selected key frames, our model introduces the similar-inhibition constraint to reinforce the diversity of selected key frames. We calculate the attention cost by merging both gaze and content change into a prior cue to help select the frames with more high-level semantic information. Moreover, we adopt an image quality evaluation process to eliminate the interference of the poor quality images and a segmentation process to reduce the computational complexity. RESULTS: For experiments, we build a new gastroscopic video dataset captured from 30 volunteers with more than 400k images and compare our method with the state-of-the-arts using the content consistency, index consistency and content-index consistency with the ground truth. Compared with all competitors, our method obtains the best results in 23 of 30 videos evaluated based on content consistency, 24 of 30 videos evaluated based on index consistency and all videos evaluated based on content-index consistency. CONCLUSIONS: For gastroscopic video summarization, we propose an automated annotation method via similar-inhibition dictionary selection. Our model can achieve better performance compared with other state-of-the-art models and supplies more suitable key frames for diagnosis. The developed algorithm can be automatically adapted to various real applications, such as the training of young clinicians, computer-aided diagnosis or medical report generation.
OBJECTIVE: This paper aims at developing an automated gastroscopic video summarization algorithm to assist clinicians to more effectively go through the abnormal contents of the video. METHODS AND MATERIALS: To select the most representative frames from the original video sequence, we formulate the problem of gastroscopic video summarization as a dictionary selection issue. Different from the traditional dictionary selection methods, which take into account only the number and reconstruction ability of selected key frames, our model introduces the similar-inhibition constraint to reinforce the diversity of selected key frames. We calculate the attention cost by merging both gaze and content change into a prior cue to help select the frames with more high-level semantic information. Moreover, we adopt an image quality evaluation process to eliminate the interference of the poor quality images and a segmentation process to reduce the computational complexity. RESULTS: For experiments, we build a new gastroscopic video dataset captured from 30 volunteers with more than 400k images and compare our method with the state-of-the-arts using the content consistency, index consistency and content-index consistency with the ground truth. Compared with all competitors, our method obtains the best results in 23 of 30 videos evaluated based on content consistency, 24 of 30 videos evaluated based on index consistency and all videos evaluated based on content-index consistency. CONCLUSIONS: For gastroscopic video summarization, we propose an automated annotation method via similar-inhibition dictionary selection. Our model can achieve better performance compared with other state-of-the-art models and supplies more suitable key frames for diagnosis. The developed algorithm can be automatically adapted to various real applications, such as the training of young clinicians, computer-aided diagnosis or medical report generation.
Authors: Tim Boers; Joost van der Putten; Maarten Struyvenberg; Kiki Fockens; Jelmer Jukema; Erik Schoon; Fons van der Sommen; Jacques Bergman; Peter de With Journal: Sensors (Basel) Date: 2020-07-24 Impact factor: 3.576