Literature DB >> 30786704

Recent Development of Computer Vision Technology to Improve Capsule Endoscopy.

Junseok Park¹, Youngbae Hwang², Ju-Hong Yoon², Min-Gyu Park², Jungho Kim², Yun Jeong Lim³, Hoon Jai Chun⁴.

Abstract

Capsule endoscopy (CE) is a preferred diagnostic method for analyzing small bowel diseases. However, capsule endoscopes capture a sparse number of images because of their mechanical limitations. Post-procedural management using computational methods can enhance image quality. Additional information, including depth, can be obtained by using recently developed computer vision techniques. It is possible to measure the size of lesions and track the trajectory of capsule endoscopes using the computer vision technology, without requiring additional equipment. Moreover, the computational analysis of CE images can help detect lesions more accurately within a shorter time. Newly introduced deep leaning-based methods have shown more remarkable results over traditional computerized approaches. A large-scale standard dataset should be prepared to develop an optimal algorithms for improving the diagnostic yield of CE. The close collaboration between information technology and medical professionals is needed.

Entities: Chemical Disease Species

Keywords: Capsule endoscopy; Computer vision technology; Deep learning

Year: 2019 PMID： 30786704 PMCID： PMC6680009 DOI： 10.5946/ce.2018.172

Source DB: PubMed Journal: Clin Endosc ISSN： 2234-2400

Introduction

Since the introduction of capsule endoscopy (CE) in the year 2000, it has become the preferred diagnostic method for small bowel diseases because of its low invasiveness. However, the diagnostic yield of CE can be influenced by many factors. Several quality indicators have been suggested to standardize the methods of CE and reduce interpretation-related errors [1]. Generally, CE video sequences are reviewed after post-procedural reconstruction. This process is time-consuming. In addition, there is the possibility of misinterpretation due to the limitation of human concentration. The evolvement of computer vision technology can ameliorate the diagnostic abilities of CE. Computational methods regarding modifying and interpreting CE images may reduce the image review time and error rates significantly [2]. Moreover, the introduction of deep learning to computer vision has resulted in the outstanding improvement of lesion recognition [3,4]. Since capsule endoscopes remain passive moving devices, only limited information can be obtained from their images. Many mechanical improvements to endoscopes have been studied. For safety reasons, the immediate clinical application is difficult. Advances in computer vision have allowed us to gain more details regarding the current generation of capsule endoscopes. It is possible to measure the size of lesions and predict their location more accurately than with other methods. Subsequent therapeutic procedures can be performed more systemically with this information. This review focuses on important advances in computer vision technology that can be applied to CE in the deep-learning era. These advances are organized into four categories: image enhancement for improved visual quality, depth sensing for three-dimensional image interpretation, Simultaneous Localization and Mapping (SLAM) for the exact localization of capsule endoscope, and automated lesion detection for reducing review time. Technical information in this review will be explained with scenarios familiar to clinicians. This review is expected to promote active communications between medical and information technology (IT) experts.

Image enhancement for better visual quality

Capsule endoscopes capture images with low lighting and limited power. Videos with low resolutions and low frame rates are transmitted wirelessly to a recorder installed outside of the human body. In addition, blur images are often captured due to capsule endoscopes’ short depths. These degradations in image quality can increase the difficulty of providing accurate diagnoses. The computational processing of these images can correct the fundamental problem of CE (Fig. 1).

Fig. 1.

Examples of deblurred capsule endoscopy images using computer vision technology. The two images on the left side with blur were obtained directly from one of the capsule endoscope’s cameras. The blur of images is corrected, as shown on the right side, using depth information measured with two cameras at different angles.

Noise is an inevitable problem in imaging systems. The hardware limitations of commercially available capsule endoscopes can produce noisy images that need to be fixed by post-procedural corrections. Classical noise suppression methods, including the use of bilateral filters and Gaussian blur filters, may produce erroneous and unusual CE results [5]. The ability to reduce noise while maintaining the details of images is required for CE. Non-local means filters, adaptive median (AM) filters, block-matching and 3D filtering, and K-nearest neighbor filters have been compared in terms of their endoscopy-image correcting abilities. The AM filter, particularly, showed better results in reducing impulse noise while preserving image details than other 3 methods. Gopi et al. have proposed double density dual-tree complex wavelet transform (DDDT-CWT) methods for reducing noise of images (Table 1) [6]. These authors first converted images into YCbCr color spaces. They then applied a DDDT-CWT-based grayscale noise reduction method separately for each color spaces. They demonstrated the performance of DDDT-CWT by comparing the DDDT-CWT method to three other methods.

Table 1.

Computer Vision Technologies for the Enhancement of Capsule Endoscopy Images

Study	Suggested algorithm	Purpose	Outcome
Gopi et al. [6]	DDDT-CWT	Noise reduction	Improved PSNR and SSIM than other three algorithms
Liu et al. [7]	TV minimization on MFISTA/FGP framework	De-blurring	Improved PSNR for the simulation results of CE images
Peng et al. [8]	Synthesis from DPM with aligned nearby sharp frames	De-blurring	Improved SSD errors, showing experimental result on video sample
Duda et al. [9]	Average of upsampled and registered low-resolution images	De-blurring	Improved PSNR
Singh et al. [12]	Interpolation function using DWT	De-blurring	Improved PSNR, MSE, and ME
Wang et al. [13]	Adaptive dictionary pair learning	De-blurring	Improved PSNR for the dataset of CE images

CE, capsule endoscopy; DDDT-CWT, double density dual-tree complex wavelet transform; DPM, direct patch matching; DWT, discrete wavelet transform; FGP, fast gradient projection; ME, maximum error; MFISTA, monotone fast iterative shrinkage/thresholding algorithm; MSE, mean square error; PSNR, peak signal-to-noise ratio; SSD, sum of squared differences; SSIM, structural similarity index.

Capsule endoscopes are usually equipped with fisheye lenses that have small depths of field. Blurred images may be obtained due to fast camera motions with low frame rates and the use of the wrong lens focus. Liu et al. have introduced a deblurring method that uses total variation minimization framework and the monotone fast iterative shrinkage/thresholding technique combined with a fast gradient projection algorithm (Table 1) [7]. They demonstrated the effectiveness of this algorithm by presenting the simulation results of images that had noise and blur experimentally added to them. Furthermore, blurry video frames can be corrected by using synthesized images with references to nearby sharp frames. Peng et al. have proposed a synthesis method that follows a non-parametric mesh-based motion model to align sharp frames with blurry frames [8]. Various endoscopic video samples with blurred frames can be sufficiently corrected with their method. Capsule endoscopes are restricted in terms of size and data transmission bandwidth. There is a limit to applying better optical or imaging sensors to capture high resolution images. The computational resolution enhancement of images after transmission is an efficient method for obtaining accurate diagnoses. The algorithm proposed by Duda et al. was simpler than other methods [9]. It can be calculated in real-time [9]. They averaged upsampling and registered low-resolution image sequences. Häfner et al. have introduced a method to prevent the over-sharpening problem that occurs in the super-resolution process and evaluated their method in the context of colonic polyp classification [10,11]. In addition, Singh et al. have introduced a method of interpolation function using discrete wavelet transform [12]. Their algorithm showed superior results in enhancing endoscopic images over other traditional image super-resolution techniques [12]. Wang et al. have also proposed an adaptive dictionary pair learning technique [13]. They formed the dictionary pair by selecting relevant normalized patches of high-resolution images and low-resolution images. Their method can restore the textures and edges of CE images effectively.

Depth sensing for three - dimensional interpretation

The depth of images can provide additional information about a subject. However, commonly used endoscopic imaging systems produce flat images without depth information. Depth information can be obtained by the computerized analysis of endoscopic images. Various signals, including focus, shading, and motion, can be used for depth estimation. The Shape-from-X technique is named after how it can use various types of signals for the purpose. Karargyris et al. have developed a Shape-from-Shading technique for CE (Table 2) [14]. They reconstructed the three dimensional-surfaced video frames of protruding features. Moreover, the Shape-from-Motion technique can take a video sequence as input and recover camera motion and geometric structures and Fan et al. have adopted this technique for constructing three-dimensional meshes through Delaunay triangulation [15].

Table 2.

Computer Vision Technologies for Depth Sensing and Capsule Endoscope Localization

Study	suggested algorithm	Purpose	outcome
Karargyris et al. [14]	Shape-from-shading	Depth sensing	Create three dimensional-surfaced CE videos
Fan et al. [15]	SIFT, epipolar geometry	Depth sensing	Three-dimensional reconstruction of the GI tract’s inner surfaces from CE images
Park et al. [16]	Stereo-type capsule endoscope, direct attenuation model	Depth sensing	Create three-dimensional depth map, size estimation for lesions observed in stereo-type CE images
Turan et al. [24]	Vision-based SLAM, Shape-from-shading	Capsule localization	Improved RMSE for the three-dimensional reconstruction of stomach model and capsule trajectory length

CE, capsule endoscopy; GI, gastrointestinal; RMSE, root mean square error; SIFT, scale invariant feature transform; SLAM, simultaneous localization and mapping.

Recently developed capsule endoscopes with stereo-vision can accurately and robustly estimate depth maps from the gastrointestinal tract. Park et al. have used a novel capsule endoscope consisting of two cameras for depth-sensing and three-dimensionally rendering intestinal structures (Fig. 2) [16]. They can also measure the size of lesions in a large bowel phantom model accurately.

Fig. 2.

Depth map and three-dimensional reconstruction sample of a capsule endoscope with a stereo-camera. Depth maps are calculated with capsule endoscope stereo-cameras. Bright pixels on the second image from left indicate that farther than dark ones. The depth information allows us to construct three-dimensional models of the structure, as shown the two images on the right side.

SLAM for the exact localization of capsule endoscopes

The exact location of lesions is important for determining the subsequent interventions of CE. The three-dimensional position of capsule endoscopes in the abdominal cavity can be obtained with the external sensor arrays of a CE system [17]. However, the three-dimensional spatial position of capsule endoscopes does not represent its intraluminal location in the gastrointestinal tract. It is necessary to track the trajectory of capsule endoscopes and measure their distance from specific landmarks of the intestine in order to determine their intraluminal location. The analysis of the color and texture of images can help divide CE videos into specific regions and estimate the motion of capsule endoscopes, including their rotation and displacement [18-21]. The intestine has a dynamic environment due to continuous peristalsis. Its internal surface also has many textureless regions. To overcome such circumstantial disadvantages, SLAM technology that can simultaneously perform camera position estimations and three-dimensional reconstructions can be applied. Mahmoud et al. have tracked the specific points of organs using epipolar geometry [22,23]. The information from specific points using two different perspectives may be successfully used to reconstruct a semi-dense map of organs [22,23]. Moreover, a recent non-rigid map fusion-based direct SLAM method has achieved high accuracy for the extensive evaluation of pose estimation and map reconstruction (Table 2) [24]. By analyzing shapes and shades, vision-based SLAM methods can add depth information for CE images. Furthermore, the experimental results of image reconstruction have suggested the effectiveness of both looping the trajectory of capsule endoscopes and scanning the inner surface of organs.

Automated lesion detection for reducing review time

CE image analysis requires long and insipid review times. In addition, only a small fraction of CE images contains clinically significant lesions [25]. These long review times can lead to high-lesion miss rates, even if interpretations are performed by well-trained professionals [26]. Choosing the appropriate images to review will shorten the review time and contribute to providing accurate diagnoses. However, the automatic detection of pathology using CE has long been a challenge. Recent studies regarding the analysis of the color and texture of images have shown adequate results in discovering hemorrhages and other representative lesions [27-32]. Since the introduction of deep learning methods to computer vision, image recognition performance on large scale datasets has been greatly improved (Fig. 3). Deep learning-based image recognition technology has been applied to endoscopic image analysis and has shown surprising results in pathology detection [33-36]. Zou et al. have analyzed 75,000 images with a Convolutional Neural Networks-based method to categorize images into organs of origin (Table 3) [37]. For detecting polyps and classifying normal CE images, a deep learning-based Stacked Sparse AutoEncoder method has shown improved pathology detection results for 10,000 images [38]. Recent works in deep-learning have shown better performance. However, deep-learning methods need large datasets to overcome the fundamental overfitting problem [39].

Fig. 3.

Scheme of automated lesion detection for capsule endoscopy images using Deep-running. The input images are numerically weighted via the hidden layers of large datasets. The image with the most weight is selected on the output layer.

Table 3.

Deep Learning-Based Computer Vision Technologies for Analyzing Capsule Endoscopy Images

Study	No. of images for training	No. of images for testing	Outcome
Zou et al. [37]	60,000	15,000	Classify CE images according to the organ of origin, accuracy: 95%
Jia et al. [38]	8,200 (2,050 positives)	1,800 (800 positives)	Bleeding detection for annotated CE images, F1 score: 0.9955[a)]

CE, capsule endoscopy.

The harmonic average of the precision and recall, .

Conclusions

The computational analysis of images can improve the clinical yield of CE without the assistance of mechanical augmentation. Image enhancement techniques can correct errors and improve the quality of images, depth information can used to measure lesions and track the movement of capsule endoscopes, and automated lesion recognition can reduce CE image review times. Moreover, the recently introduced stereo-vision capsule endoscope and deep-learning methods in computer vision can lead to the outstanding improvement of CE image analysis. Lastly, the close collaboration between medical and IT professionals would enable CE to achieve higher diagnostic yields.

5 in total

1. Development of a deep learning-based software for calculating cleansing score in small bowel capsule endoscopy.

Authors: Ji Hyung Nam; Youngbae Hwang; Dong Jun Oh; Junseok Park; Ki Bae Kim; Min Kyu Jung; Yun Jeong Lim
Journal: Sci Rep Date: 2021-02-24 Impact factor: 4.379

2. Small Bowel Detection for Wireless Capsule Endoscopy Using Convolutional Neural Networks with Temporal Filtering.

Authors: Geonhui Son; Taejoon Eo; Jiwoong An; Dong Jun Oh; Yejee Shin; Hyenogseop Rha; You Jin Kim; Yun Jeong Lim; Dosik Hwang
Journal: Diagnostics (Basel) Date: 2022-07-31

3. Artificial intelligence that determines the clinical significance of capsule endoscopy images can increase the efficiency of reading.

Authors: Junseok Park; Youngbae Hwang; Ji Hyung Nam; Dong Jun Oh; Ki Bae Kim; Hyun Joo Song; Su Hwan Kim; Sun Hyung Kang; Min Kyu Jung; Yun Jeong Lim
Journal: PLoS One Date: 2020-10-29 Impact factor: 3.240

Review 4. Role of Artificial Intelligence in Video Capsule Endoscopy.

Authors: Ioannis Tziortziotis; Faidon-Marios Laskaratos; Sergio Coda
Journal: Diagnostics (Basel) Date: 2021-06-30

Review 5. Review: Colon Capsule Endoscopy in Inflammatory Bowel Disease.

Authors: Writaja Halder; Faidon-Marios Laskaratos; Hanan El-Mileik; Sergio Coda; Stevan Fox; Saswata Banerjee; Owen Epstein
Journal: Diagnostics (Basel) Date: 2022-01-08

5 in total