Literature DB >> 35068857

Artificial intelligence-assisted colonoscopy: A review of current state of practice and research.

Mahsa Taghiakbari¹, Yuichi Mori², Daniel von Renteln³.

Abstract

Colonoscopy is an effective screening procedure in colorectal cancer prevention programs; however, colonoscopy practice can vary in terms of lesion detection, classification, and removal. Artificial intelligence (AI)-assisted decision support systems for endoscopy is an area of rapid research and development. The systems promise improved detection, classification, screening, and surveillance for colorectal polyps and cancer. Several recently developed applications for AI-assisted colonoscopy have shown promising results for the detection and classification of colorectal polyps and adenomas. However, their value for real-time application in clinical practice has yet to be determined owing to limitations in the design, validation, and testing of AI models under real-life clinical conditions. Despite these current limitations, ambitious attempts to expand the technology further by developing more complex systems capable of assisting and supporting the endoscopist throughout the entire colonoscopy examination, including polypectomy procedures, are at the concept stage. However, further work is required to address the barriers and challenges of AI integration into broader colonoscopy practice, to navigate the approval process from regulatory organizations and societies, and to support physicians and patients on their journey to accepting the technology by providing strong evidence of its accuracy and safety. This article takes a closer look at the current state of AI integration into the field of colonoscopy and offers suggestions for future research. ©The Author(s) 2021. Published by Baishideng Publishing Group Inc. All rights reserved.

Entities: Chemical

Keywords: Adenoma; Artificial intelligence; Colonoscopy; Computational intelligence; Endoscopy; Surveillance

Mesh：

Year: 2021 PMID： 35068857 PMCID： PMC8704267 DOI： 10.3748/wjg.v27.i47.8103

Source DB: PubMed Journal: World J Gastroenterol ISSN： 1007-9327 Impact factor: 5.742

Core Tip: Artificial intelligence (AI)-assisted decision support systems for endoscopy have shown promising results for the detection and classification of colorectal lesions. However, their integration into clinical practice is currently limited by the lack of design, validation, and testing under real-life clinical conditions. Further work is required to address the challenges of AI integration, to navigate the regulatory approval process, and to support physicians and patients on their journey to accepting the technology by providing strong evidence of accuracy and safety. This article describes the current state of AI integration into colonoscopy practice and offers suggestions for future research.

INTRODUCTION

Colorectal cancer (CRC) is the fourth most commonly diagnosed and the third most fatal cancer worldwide in 2018[1]. The prevalence costs of cancer care were estimated to be $14.1 billion for CRC in the United States in 2010[2]. Over the past decade, CRC incidence and mortality have declined as a result of the increase in CRC screening and prevention examinations[3]. Colonoscopy is a screening tool with high sensitivity for the detection of precancerous and cancerous lesions, and may contribute to an approximately 80%, and up to 60% reduction in CRC incidence and mortality, respectively[4-8]. Colonoscopy prevents CRC by breaking the adenoma-carcinoma sequence through detection and removal of premalignant colorectal polyps[3]. Furthermore, it is a cost-effective procedure that often allows surgery to be avoided in patients with adenomas or CRCs that do not invade deeper than the superficial submucosa[9]. However, the quality of colonoscopy procedures depends on the experience of the endoscopists and the techniques and technology used[10]. A suboptimal colonoscopy examination can result in interval cancers, which are CRCs that occur after a colonoscopy and before the next surveillance examination, and are usually due to non-detection and/or incomplete resection of premalignant polyps. Recent research has shown that CRC precursor lesions are incompletely resected in about 14% of colonoscopy procedures[11]. Quality indicators have been established to describe and measure the quality of colonoscopy examinations[12], and the use of pre- and intraprocedural quality metrics has been shown to result in both an increase in colonoscopy quality and standardization of procedures[12,13]. One of the most recognized quality metrics is the adenoma detection rate (ADR), which is the proportion of an endoscopist’s patients undergoing screening colonoscopy who have at least one adenoma detected; every 1% increase in the ADR has been shown to result in a 3% decrease in the risk of post-colonoscopy CRC[10]. Over 90% of colorectal polyps are diminutive (≤ 5 mm) or small (≤ 10 mm), and most of these polyps are non-neoplastic[10]. Recent advances in image-enhanced endoscopy [IEE; e.g., blue-light imaging, narrow-band imaging (NBI), and i-Scan] have resulted in enhanced visualization of the polyp surface pattern. IEE can be employed for the optical classification of colorectal polyps during colonoscopy, obviating the need for pathology[14,15]. The American Society for Gastrointestinal Endoscopy (ASGE) Technology Committee, in its Preservation and Incorporation of Valuable endoscopic Innovations (PIVI) statement, has recommended the optical evaluation of diminutive polyps, adopting a “resect and discard” strategy for all diminutive colorectal polyps, and a “diagnosis and leave” strategy for diminutive rectosigmoid polyps, if the endoscopist can reach the recommended threshold of ≥ 90% agreement with histopathology results for surveillance interval assignment and ≥ 90% negative predictive value (NPV) for diagnosis of adenomatous histology, respectively[14,15]. Optical diagnosis can distinguish between neoplastic and non-neoplastic polyps and therefore deliver clinical and cost benefits by reducing the number of unnecessary histopathology examinations and providing immediate surveillance interval recommendations to patients. However, despite the demonstrated high accuracy of optical diagnosis for diminutive polyps, endoscopists have been reluctant to support its broad implementation because of concerns about incorrect diagnoses, assignment of inappropriate surveillance intervals, and related medicolegal issues[16]. To address the shortcomings in current colonoscopy practice, research has been directed at standardizing colonoscopy procedures among endoscopists through the integration of artificial intelligence (AI) into colonoscopy practice. AI could provide real-time support to physicians by automatically recognizing specific polyp patterns in colonoscopy images and/or videos, as well as suggesting the most probable histology and providing a confidence level for the predicted histology. The use of such technology would help to mitigate the effects of endoscopist experience in optical diagnosis. Computer-assisted, or most recently, AI-assisted colonoscopy diagnostic systems (CAD) for detection (CADe) and classification (CADx) of colorectal polyps are currently the two main areas of research and implementation of AI in clinical practice. AI-assisted colonoscopy improves ADR and allows for reliable, operator-independent pathology prediction of colorectal polyps. However, there is still a substantial communication gap between computer and medical fields, with scientists in these two disciplines divided in terms of background knowledge, available resources, research typology, and awareness of unmet needs in clinical practice. In this review, we summarize the most important aspects of the application of CADe and CADx in routine colonoscopy practice.

DEVELOPMENT OF COMPUTER-ASSISTED DIAGNOSTIC SYSTEMS

Pairing colonoscopy devices with image-enhanced technology (i.e., white-light endoscopy and chromoendoscopy) has improved the quality of care to patients by increasing the precision of colonoscopy procedures[4]. Recently, research efforts have focused on integrating computational power and previously collected data to enhance the simultaneous detection and classification of colonoscopy images or videos and support endoscopists in their decisions about the presence and/or histology of a polyp. Machine learning is a subset of AI that allows mathematical methods to develop an algorithm based on given data (e.g., polyp images or videos) to predict the same pattern or a specific task in unseen or unknown data[17]. The final output of these systems (e.g., detection or classification of polyps) is based on pre-defined features or extraction of the most relevant image features (e.g., polyps), which may help in the specification, detection, or classification of a new image. In conventional machine learning (i.e., handcrafted models), a researcher manually introduces the clinically relevant polyp features to the machine learning algorithm. In contrast, in the most advanced machine learning method, which is called deep learning, polyp features, clinically relevant or not, are automatically extracted by the algorithm without prior introduction by a researcher. As a result, the output is based on the capture and summary of complex polyp characteristics, either for detection (i.e., discrimination of polyp from background mucosa) or prediction of histopathology (i.e., neoplastic or non-neoplastic)[17]. Deep learning employs deep neural networks (DNNs), which imitates the complex interconnected neural network in the human brain. These artificial neurons are positioned in several detections and pooling layers, taking weighted data (from the precedent layer), processing it, and passing the output (processed data) to the next layer. Each layer performs as a “step of abstraction[17]”, which forms a hierarchy of common features that grow in complexity throughout the layers (i.e., edge- > basic shape- > object- > class prediction). In other words, each layer would extract useful and relevant features from a given data that would facilitate the classification of the images. When data are presented, the DNN performs the repetitive iterations of a previously chosen model (i.e., support vector machines, random forests, or neural networks) throughout the deeper layers, so-called hierarchical feature learning[17]. For computer-assisted colonoscopy, the development of the AI model is primarily based on supervised data, where data are retrospectively labeled by one or a group of expert endoscopists. For example, in CADx, colonoscopy images or videos will be labeled as neoplastic or non-neoplastic based on the reference standard of pathology results (Figure 1), which would have been reviewed and finalized following consensus by several pathologists. In CADe, however, polyp images or videos will be reviewed by experienced endoscopists, and polyp borders will be delineated based on consensus by endoscopists. Ultimately, the output of the AI algorithm will identify the presence of a polyp, or be able to discriminate between a neoplastic and non-neoplastic polyp (Figure 2)[17]. However, there are some shortcomings and barriers to the development and implementation of CAD systems in real-time endoscopy practice, as discussed below.

Figure 1

Prediction of colorectal polyp histology by the ENDOBRAIN computer-aided classification system for colonoscopy.

Figure 2

Detection of a colorectal polyp by the ENDOAID computer-aided detection system for colonoscopy. The green box delineates the area containing a polyp.

Prediction of colorectal polyp histology by the ENDOBRAIN computer-aided classification system for colonoscopy. Detection of a colorectal polyp by the ENDOAID computer-aided detection system for colonoscopy. The green box delineates the area containing a polyp.

Datasets

The data used to develop a CAD system will be divided into three or more datasets: One training dataset to build the AI model, one validation dataset to check the generalizability of the model, and at least one test dataset from another source of data to test the performance of the model[17]. Commonly, training and validation data are derived from the same source (i.e., colonoscopies performed at a single center); however, it is crucial to avoid overlap of data; otherwise, evaluation of the model hyperparameters would be flawed and would lead to “model overfitting.” Model overfitting is an error in modeling that occurs when the model is too tightly fitted to the training data and random fluctuations in the training data are learned as concepts by the model. The problem is that the fitted model does not generalize to new data due to its low bias and high variance. Overfitting can be avoided by tight monitoring of the model during the training by constantly evaluating the model performance in the training and validation data[17]. Researchers should use large and heterogeneous data, including normal and abnormal colonoscopies. A sufficient number of colonoscopy images or video frames would ensure a robust evaluation of model performance. Data should ideally be collected from multiple centers and diverse patients in terms of race, age, sex, and medical issues. A lack of ground truth data or reliable annotated “big data” for generating effective and high-performance AI models could limit the broad application of CAD systems in clinical settings[18]. This is a challenging goal to achieve as it requires millions of colonoscopy images and videos to be annotated by multiple highly experienced experts to ensure a consensus on ambiguous images. Annotation and data labeling by experts should follow a uniform and standardized protocol, otherwise, the generalizability and performance evaluation of the model will be unreliable.

Gold standard comparison

The absence of a “gold standard” for diagnosing polyp histology would affect the accuracy of CAD performance. Although pathology results are currently regarded as the reference standard, the interobserver agreement among pathologists is not 100%; polyp histology determined by one pathologist might be different from that of another pathologist when reassessing the same specimen slides[19-22]. Therefore, the pathology data used for AI models must be re-evaluated by several pathologists prior to inclusion to ensure agreement on polyp pathology.

Technical transparency

The application of CAD in routine practice is a product of an interdisciplinary collaboration between medical and AI researchers. A recent review demonstrated that researchers failed to report the AI model characteristics effectively[23]. Researchers should ensure that they clearly define and report the AI model architecture or hyperparameters, including the number of deep layers and learning rate. The definition and testing of hyperparameters are crucial to the validation process owing to their direct effect on the model’s performance; optimal model generalizability in the validation step implies the correct choice of hyperparameters. Researchers should briefly explain the source of data, the process of data selection, and the number of patients, including images/videos frames, normal colonoscopies (i.e., without polyp identification), colonoscopy centers, and participating endoscopists together with their level of expertise[17]. Furthermore, researchers should adopt appropriate techniques to prevent model overfitting. Data leakage may occur when the testing dataset results are used to tune the model parameters instead of using the results derived from the validation dataset. Therefore, the model may over-fit toward the unseen data, risking a biased estimate of model performance. The stringent use of high-quality still images instead of videos that contain large variability in colonoscopy images may increase the risk of overfitting.

Computer-assisted polyp detection system

In the context of CAD, although the shift from separate engineering and medical disciplines to combined medical and engineering research has gained momentum over the last decade, pilot studies established the idea of CADe as early as 2003[24,25]. The primary hand-crafted AI models used the pre-described polyp features (e.g., color and/or texture-based features) and annotated colonoscopy videos for the detection of colorectal polyps[25-29]. Other studies used the same idea and developed several AI models that resulted in up to 90% sensitivity[30-32]. However, these studies used small and homogeneous datasets to develop and validate the AI models, raising doubts over the model’s optimal performance. The hand-crafted features used to build the model led to suboptimal performance, probably because of impaired feature recognition and description, and a high level of false-positive detection owing to the presence of colonic folds, blood vessels, and feces in the lateral view. After the invention of DNNs, important polyp features could be automatically recognized. Subsequently, the accuracy and sensitivity of models improved, signaling the great potential for CADe application. Recently, Yamada et al[33] developed a CADe system using a supervised DNN, and validated the system using a dataset of 705 still images of 752 lesions and 4135 still images of noncancerous tissue. This system performed well, with a sensitivity and specificity of 97.3% and 99.0%, respectively, and an area under the curve (AUC) of 0.975 in the validation set. Misawa et al[34] developed a model based on 546 short colonoscopy videos, comprising 155 polyp-positive and 391 polyp-negative videos. Two experts retrospectively annotated videos for polyp presentation to provide a gold standard for comparison. The model presented sensitivity, specificity, and accuracy of 90.0%, 63.3%, and 76.5%, respectively. The polyp detection rate and false-positive detection rate were 95% and 60%, respectively. Other significant research used a large dataset for training an AI model, which comprised 8641 annotated images from over 2000 colonoscopies[35]. The model generated excellent detection capability, with an AUC of 99% and an accuracy of 96.4%. The performance of this model was also superior to that of experts. The authors tested model performance in 20 colonoscopy videos with a total duration of 5 h, during which colonoscopists removed 28 polyps. After reviewing the videos by four independent experts, eight additional polyps were identified (36 polyps) without the use of AI assistance and 17 additional polyps were detected with AI assistance (total 45 polyps). The model had a false-positive rate of 7%. Research with a prospective design and focusing on the evaluation of the real-time performance of CADe is scarce. Wang et al[36] conducted a prospective non-blinded clinical trial, which aimed to measure ADR with and without the application of CADe. Using 522 and 536 colonoscopies in the control and intervention arms, respectively, the authors found a statistically significant increase in ADR (29.1% vs 20.3%) and an increased number of adenomas per patient (0.53 vs 0.31) when CADe was used. The false-positive rate was 7.5% per colonoscopy, and there was no significant difference in the procedure time. CADe could detect a higher number of diminutive adenomas and hyperplastic polyps, which represent a higher risk of unnecessary polypectomies, pathology examinations, and longer procedure times. To date, the generalizability of this system has not been tested in Western clinical settings. In contrast to the results of the latter study, Klare et al[37] prospectively evaluated endoscopist performance using CADe assistance during the real-time colonoscopy procedures of 55 patients. However, the endoscopists only observed the regular monitor, and an independent investigator observed the monitor dedicated to representing the real-time outputs of the CADe system in a separate room, which was blinded from the endoscopists’ sight. Therefore, the endoscopists were blinded to the real-time CADe outputs. This system did not increase the precision of polyp detection in real-time practice: In per-patient analysis, the application of CADe resulted in endoscopists achieving a lower ADR (29.1% vs 30.9%); in per-polyp analysis, CADe could only detect 55 out of 73 polyps previously detected by endoscopists. Tables 1 and 2 shows the summary of the recent studies evaluating a CADe system.

Table 1

Summary of the randomized controlled trials involving computer-aided detection for colonoscopy

Ref.	Year	Study design	Study aim	CADe system	Image modality	Number of patients in the CADe group	Number of patients in the control group	Number of polyps (CADe vs control group)	Adenoma detection rate (%) (CADe vs control group)	Polyp detection rate (%) (CADe vs control group)	Number of false-positive rate (%) (CADe vs control group)	Withdrawal time (CADe vs control group), min ± SD; minute
Wang et al[36]	2019	Non-blinded prospective randomised controlled study	To investigate whether a high-performance real-time CADe system can increase polyp and adenoma detection rates in the real clinical setting	The real-time automatic polyp detection system (Shanghai Wision AI Co., Ltd.) based on artificial neural network-SegNet architecture	Real-time Video stream	522	536	767 (498 vs 269)	29.1 vs 20.3; P < 0.001; 95%CI = 1.21-2.135	45.0 vs 29.1; P < 0.001; 95%CI = 1.532-2.544	39 vs 0	6.18 ± 1.38 vs 6.07 ± 1.11; P = 0.15
Wang et al[74]	2020	Double-blind Prospective randomised trial	To assess the effectiveness of a CADe system for improving detection of colon adenomas andpolyps; to analyse the characteristics ofpolyps missed by endoscopists	The real-time automatic polyp detection system (Shanghai Wision AI Co., Ltd.) based on artificial neural network-SegNet architecture	Real-time Video stream	484	478	809 (501 vs 308)	34.0 vs 28.0; P = 0.030; OR = 1.36, 95%CI = 1.03–1.79	52.0 vs 37.0; P < 0.0001; OR = 1.86, 95%CI = 1.44–2.41	48 in CADe group (control group not reported)	6.48 ± 1.32 vs 6.37 ± 1.09; P = 0.14
Su et al[75]	2020	Single-blind Prospective randomised trial	To develop an automatic quality control system; to investigate whether the system could increase the detection of polyps and adenomas in real clinical practice	Five deep learning convolutional neural networks (DCNNs) based on AlexNet, ZFNet, and YOLO V2	Real-time Video stream	308	315	273 (177 vs 96)	28.9 vs 16.5; P < 0.001; OR = 2.055, 95%CI = 1.397-3.024	38.3 vs 25.4; P = 0.00; OR = 1.824, 95%CI = 1.296-2.569	62 in CADe system (control group not reported)	7.03 ± 1.01 vs 5.6 ± 1.26; P < 0.001
Gong et al[76]	2020	Single-blind Prospective randomised trial	To evaluate whether the CADe system could improve polyp yield during colonoscopy	ENDOANGEL based on the deep neural networks and perceptual hash algorithms	Real-time video stream	355	349	302 (178 vs 124)	16 vs 8; P = 0.001; OR = 2.30, 95%CI = 1.40-3·77	47 vs 34; P = 0.0016; OR = 1.69, 95%CI = 1.22-2.34	For endoscope being inside = 0.8; For identification of the caecum = 2; for prediction of slipping = 0	6.38 ± 2·48 vs 4.76 ± 254; P < 0.0001
Liu et al[77]	2020	Double-blind Prospective randomised trial	To study the impact of CADe system on the detection rateof polyps and adenomas in colonoscopy	The convolutional threedimensional (3D) neural network	Real-time video stream	508	518	734 (486 vs 248)	39.1 vs 23.9; P < 0.001; OR = 1.637, 95%CI = 1.201‑2.220	43.7 vs 27.8; P < 0.001; OR = 1.57, 95%CI = 1.586‑2.483	36 in CADe system (control group not reported)	6.82 ± 1.78 vs 6.74 ± 1.62; P < 0.001
Luo et al[78]	2021	Non-blinded Prospective randomised trial	To explore whether CADe could improve the polyp detection rate in the actual clinical environment	A CNN algorithm based on a YOLO network architecture	Real-time Video stream	150	150	185 (105 vs 80)	38.7 vs 34.0; P < 0.001	-	52 in CADe system (control group not reported)	6.22 ± 0.55 vs 6.17 ± 0.52; P = 0.102
Repici et al[79]	2020	Singles-blind Prospective randomised trial	To assess the safety and efficacy of a CADe system for the detection of colorectal neoplasia	The CNN (GI-Genius; Medtronic)	Real-time Video stream	341	344	596 (353 vs 243)	54.8 vs 40.4; P < 0.001; RR = 1.30, 95%CI = 1.14-1.45	279/341 (82) 214/344 (62)	-	417 ± 101 seconds for the CADe group vs 435 ± 149 for controls; P = 0.1
Wang1 et al[80]	2020	Singles-blind Prospective randomised trial	To investigate the impact of CADe on adenoma miss and detection rate	The artificial neural network (EndoScreener, Shanghai Wision AI Co,Ltd, Shanghai, Chin)	Real-time Video stream	184 (CADe-routine group)2	185 (Routine-CADe group)3	529 (244 vs 285)	42.39 vs 35.68; P = 0.186; OR = 1.327, 95%CI = 0.872–2.018	63.59 vs 55.14; P = 0.09; OR = 1.421, 95%CI = 0.936–2.157	67 in CADe system (control group not reported)	6.55 (5.34–7.77) vs 6.51 (5.45–7.57); P = 0.7454

The total adenoma miss rate by computer-assisted detection system (CADe) [colonoscopy = 13.89%, 95% confidence interval (CI) = 8.24%–19.54%]; by routine colonoscopy = 40.00%, 95%CI=31.23%–48.77%, P < 0.0001. The total polyp miss rate by CADe colonoscopy = 12.98%, 95%CI = 9.08%–16.88%; by routine colonoscopy = 45.90%, 95%CI = 39.65%–52.15%, P < 0.0001). Visible adenoma miss rate: Routine-CADe group = 24.21% vs CADe-routine group = 1.59%, P < 0.001; Visible polyp miss rate: Routine-CADe group = 30.89% vs CADe-routine group = 2.36%; P < 0.001.

It means that the colonoscopy was performed by the CADe system and then the conventional method.

It means that the colonoscopy was performed by the conventional method and then the CADe system.

Median (interquartile range).

CADe: Computer-assisted detection system; CNN: Convolutional neural network; DCNN: Deep learning convolutional neural network; SD: Standard deviation; OR: Odds ratio; RR: Relative risk; CI: Confidence interval.

Table 2

Summary of the non-controlled studies involving computer-aided detection for colonoscopy

Ref.	Year	Study design	System	Image modality	Number of patients/colonoscopies used for training/test datasets (total)	Number of colonoscopy/polyp images/videos used for training/test datasets	Diagnostic properties
Park and Sargent[81]	2016	Retrospective	CADe based on DCNN using a conditional random field model	Still images	35 (colonoscopy videos)	562/562 (colonoscopy still images)	Sensitivity = 86%; specificity = 85%; AUC = 0.8585
Fernández-Esparrach et al[73]	2016	Retrospective	CADe based on energy map	Still images	NA/24 colonoscopy videos containing 31 different polyps	NA/Experiment A: 612 polyp images from all 24 videos. Experiment B: 47886 frames from the 24 videos	Experiment A: accuracy = small vs all polyps = 77.5%, 95%CI = 71.5%–82.6% vs 66.2%, 95%CI = 61.4%–70.7%; P < 0.01. Experiment B: The AUC = high quality frames vs all Frames = 0.79, 95%CI = 0.70–0.87 vs 0.75, 95%CI = 0.66–0.83
Yu et al[82]	2017	Retrospective	CADe based on three-dimensional (3-D) deep learning integration framework by leveraging the 3-D fully CNN (3D-FCN)	Videos	20/18 (colonoscopy videos)	3799 frames with polyps in total	Sensitivity = 71%; PPV = 88%; precision = 88.1%
Billah et al[83]	2017	Retrospective	CADe based on CNN and color wavelet features using a linear support vector machine	Still images	100 (colonoscopy videos for combined training and test datasets)	14000 still images (combined for training and test datasets)	Accuracy = 98.65%; sensitivity = 98.79%; specificity = 98.52%
Zhang et al[84]	2017	Retrospective	CADe based on DCNN	Still images	NA	2262/150 random, 30 NBI (colonoscopy still images)	Accuracy = 85.9%; sensitivity = 98%; PPV = 99%; precision = 87.3%; recall rate = 87.6%; AUC = 1.0
Wang et al[85]	2018	Retrospective	CADe based on DNN	Still images	1290/1138 (2428) patients	27113/5545 (colonoscopy images)	Sensitivity = 94.38%, 95%CI = 93.80%-94.96% in images with polyp; AUC = 0.984
Misawa et al[34]	2018	Retrospective	CADe based on CNN	Videos	59/14 (73)	411/135 (colonoscopy videos containing 150 polyps)	Per-polyp sensitivity = 94%; per-frame sensitivity = 90%; specificity = 63.3%; accuracy = 76.5%; false positive rate = 60%; AUC = 0.87
Yamada et al[33]	2019	Retrospective	CADe based on DNN	Videos	NA/77 (number of videos)	13983/4840 (colonoscopy videos)	Sensitivity = 97.3%, 95%CI = 95.9%–98.4%; specificity = 99.0%, 95%CI = 98.6%–99.2%; AUC = 0.975, 95%CI = 0.964–0.986)
Urban et al[35]	2018	Retrospective	CADe based on deep learning CNN	Videos	Several training and validation sets: (1) Cross-validation on the 8641 images; (2) Training on the 8641 images and testing on the 9 videos, 11 videos, and independent dataset; and (3) Training on the 8641 images and 9 videos and testing on the 11 videos and independent dataset		Sensitivity = 96.9%; specificity: 95%; AUC = 0.991; accuracy = 96.4%; false positive rate = 7%
Klare et al[37]	2019	Prospective	Automated polyp detection software (“KoloPol,” Fraunhofer IIS, Erlangen, Germany) based on CNN	Live colonoscopy videos	NA	NA/55 (colonoscopy videos)	Per-polyp sensitivity = 75.3%, 95%CI = 62.3%-84.9%; PDR = 50.9%, 95%CI = 37.1%-64.4%; ADR = 29.1%, 95%CI = 17.6%-42.9%
Ozawa et al[86]	2020	Retrospective	CADe based on DCNN	Still images	12895 patients	16418/7077	Sensitivity = 92%; PPV = 86%; accuracy = 83%; identified adenomas = 97%

CADe: Computer-assisted detection system; CNN: Convolutional neural network; DCNN: Deep learning convolutional neural network; AUC: Area Under the Receiver Operating Characteristic curve; PPV: Positive predictive value; NPV: Negative predictive value; PDR: Polyp detection rate; ADR: Adenoma detection rate; CI: Confidence interval.

Summary of the randomized controlled trials involving computer-aided detection for colonoscopy The total adenoma miss rate by computer-assisted detection system (CADe) [colonoscopy = 13.89%, 95% confidence interval (CI) = 8.24%–19.54%]; by routine colonoscopy = 40.00%, 95%CI=31.23%–48.77%, P < 0.0001. The total polyp miss rate by CADe colonoscopy = 12.98%, 95%CI = 9.08%–16.88%; by routine colonoscopy = 45.90%, 95%CI = 39.65%–52.15%, P < 0.0001). Visible adenoma miss rate: Routine-CADe group = 24.21% vs CADe-routine group = 1.59%, P < 0.001; Visible polyp miss rate: Routine-CADe group = 30.89% vs CADe-routine group = 2.36%; P < 0.001. It means that the colonoscopy was performed by the CADe system and then the conventional method. It means that the colonoscopy was performed by the conventional method and then the CADe system. Median (interquartile range). CADe: Computer-assisted detection system; CNN: Convolutional neural network; DCNN: Deep learning convolutional neural network; SD: Standard deviation; OR: Odds ratio; RR: Relative risk; CI: Confidence interval. Summary of the non-controlled studies involving computer-aided detection for colonoscopy CADe: Computer-assisted detection system; CNN: Convolutional neural network; DCNN: Deep learning convolutional neural network; AUC: Area Under the Receiver Operating Characteristic curve; PPV: Positive predictive value; NPV: Negative predictive value; PDR: Polyp detection rate; ADR: Adenoma detection rate; CI: Confidence interval.

Computer-assisted polyp classification system

Computer-assisted diagnosis of the histopathology of colorectal polyps has become an area of significant research interest because of its potential to prevent the resection of low-risk polyps and reduce the number of unnecessary histopathology examinations. Many studies have successfully developed and validated CADx models, the use of which would allow the “diagnosis and leave strategy” to be implemented. In a prospective pilot study, in which the data from 128 patients undergoing colonoscopy using NBI were used to test a CADx system (209 polyps detected and removed), three polyp features were used to build the AI model: Mean vessel length, vessel circumference, and mean brightness within detected blood vessels[38]. The results showed that the endoscopists’ ability to predict polyp histology was superior to that of CADx, which had a sensitivity of 90% and specificity of 70.2% in differentiating neoplastic from non-neoplastic images compared with histopathology as the gold standard. The system's diagnostic performance was compared with that of endoscopists, who were blinded to the histopathology reference standard. Endoscopists accurately predicted polyp histology with a sensitivity of 93.8% and specificity of 85.7% when there was interobserver agreement. In cases of disagreement between endoscopists, the suggested safe prediction of polyp histology (i.e., classification as neoplastic) produced a sensitivity of 96.9% and specificity of 71.4%. Overall, CADx could predict polyp histology with an approximate sensitivity and specificity of 90% and 70%, respectively; however, the overall correct classification rate was moderate (85.3%). Notably, this AI algorithm was not fully automated; thus, its real-time performance in a clinical setting remains to be determined. Another limitation of this study was the use of data from NBI colonoscopies. Although NBI may assist polyp classification, its use may cast doubt on the generalizability of the model, especially in clinical settings where NBI is not available. The real-time evaluation of CADx is important if the technology is to be integrated into clinical practice. Some studies have used the real-time decision outputs from support vector machines for building CADx algorithms, with promising results[39-43]. Moreover, Chen et al[44] demonstrated that an AI model could accurately predict the histopathology of 284 diminutive polyps, comprising 96 hyperplastic and 188 neoplastic polyps diagnosed using NBI, with 96.3% sensitivity, 78.1% specificity, 91.5% NPV, and 89.6% PPV. This study and the study by Byrne et al[45] that used the combination of CADe and CADx systems (described below), are remarkable in that they achieved the threshold NPV of ≥ 90% recommended by the ASGE PIVI statement, favoring the implementation of the “diagnose and leave” strategy for diminutive rectosigmoid polyps[46]. However, the results of the former study need to be confirmed in a prospective study, ideally in a controlled trial, where the probability of selection bias is less, and the AI model can be compared with a conventional setting (without using AI). More prospective studies assessing CADx are required to support the integration into clinical practice. The existing prospective studies resulted in a high and favorable diagnostic performance, which provided strong evidence to support the real-time application of CADx[47,48]. In contrast, the AI models developed and tested in a prospective trial by Kuiper et al[49] did not show sufficient power for differentiating adenomatous from non-adenomatous lesions. Another CADx model in a prospective study by Rath et al[50] could only produce moderate accuracy, sensitivity, and specificity (84.7%, 81.8%, and 85.2%, respectively), although the NPV was relatively high at 96.1%. This model would therefore allow diminutive rectosigmoid polyps to be diagnosed and left in situ without resection. The authors suggested that the low prevalence of neoplastic polyps could explain the model's moderate diagnostic performance compared with hyperplastic polyps in their dataset, which might proportionately result in an overestimation of the NPV, and an underestimation of the accuracy and PPV of the model. Table 3 shows the summary of the recent studies evaluating a CADe system.

Table 3

Summary of the non-controlled studies involving computer-aided diagnosis for colonoscopy including studies with combined detection and diagnosis systems

Ref.	Year	Study design	Study aim	System	Number of patients/colonoscopies used for training/test datasets (total)	Number of colonoscopy/polyp images/videos used in training/test datasets	Diagnostic properties
Tischendorf et al[38]	2010	Prospective pilot	Distinguishing adenomas from non-adenomas	CADx based on SVMs	NA/128; Colonoscopy videos	NA/209 polyps containing 160 neoplastic and 49 non-neoplastic polyps in the test dataset	CADx: Sensitivity = 90%, specificity = 70%, correct classification rate = 85.3%. Consensus decision between the human. Observers: Sensitivity = 93.8%, specificity = 85.7%, correct classification rate = 91.9%. “Safe” decision, when there was interobserver discrepancy: Sensitivity = 96.9%, specificity = 71.4%, correct classification rate = 90.9%
Aihara et al[47]	2013	Prospective	Distinguishing neoplastic from non-neoplastic lesion	CADx based on numerical color analysis of autofluorescence endoscopy as an Adobe AIRapplication	NA/32 patients in the test dataset	NA/102 lesions containing 75 neoplastic lesions in the test dataset	Sensitivity = 94.2%; specificity = 88.8%; PPV = 95.6%; NPV = 85.2%
Mori et al[87]	2015	Retrospective pilot	Distinguishing small (≤ 10 mm) neoplastic from non-neoplastic lesion	CADx (EC-CAD) based on CNN	NA/152 patients in the test dataset	NA/176 small polyps in the test dataset containing 137 neoplastic and 39 non-neoplastic polyps for the test dataset	Accuracy = 89.2%, 95%CI = 83.7%-93.4%; Sensitivity = 92.0%, 95%CI = 86.1%-95.9%; specificity of 79.5%, 95%CI = 63.5%-90.7%
Kuiper et al[49]	2015	Retrospective	Distinguishing small (≤ 9 mm) neoplastic from non-neoplastic lesion	CADx (WavSTAT) based on CNN	NA/87 patients in the test dataset	NA/207 small lesions in the test dataset	Accuracy = 74.4%, 95%CI = 68.1%–79.9%; sensitivity = 85.3%, 95%CI = 0.78–0.90; specificity = 58.8%, 95%CI = 0.48–0.69; PPV = 74.8%, 95%CI = 0.67–0.81; NPV = 73.5%; accuracy of on-site recommended surveillance interval = 73.7%
Misawa et al[34]	2018	Retrospective	Distinguishing neoplastic from non-neoplastic lesion categorized	CADx based on SVMs	NA	979 images containing 381 non-neoplasms and 598 neoplasms in the training dataset/100 images containing 50 non-neoplasms and 50 neoplasms in the test dataset	Accuracy = 90.0%, 95%CI = 82.4–95.1; sensitivity = 84.5%, 95%CI = 72.6–92.7; specificity = 97.6%, 95%CI = 87.4–99.9; PPV = 98.0%, 95%CI = 89.4–99.9; NPV = 82.0%, 95%CI = 68.6–91.4
Byrne et al[51]	2018	Retrospective	Distinguishing neoplastic from non-neoplastic lesions	CADx + CADe based on an improved DCNN model using NBI	NA	NA/21804 unseen frames in the test dataset	Accuracy = 99.94%; sensitivity = 95.95%; specificity = 91.66%; NPV = 93.6%; prediction of polyp videos = 97.6%
Mori et al[48]	2018	Prospective	Distinguishing diminutive (≤ 5 mm) neoplastic from non-neoplastic lesions	CADx based on SVMs used with NBI and endocytoscope	NA/791 patients in the test dataset	61925/466 polyps from 325 patients in the test dataset	CADx-NBI: Sensitivity = 92.7%, 95%CI = 89.1–95.4; specificity = 89.8%, 95%CI = 84.4–93.9; PPV = 93.7%, 95%CI = 90.2–96.2; NPV = 88.3%, 95%CI = 82.7–92.6. CADx-endocytoscope: Sensitivity = 91.3%, 95%CI = 87.5–94.3; specificity = 88.7%, 95%CI = 83.1–93.0; PPV = 92.9%, 95%CI = 89.3–95.6; NPV = 86.3%, 95%CI = 80.4–90.9
Byrne et al[45]	2019	Retrospective	Distinguishing diminutive (≤ 5 mm) neoplastic from non-neoplastic lesions	CADx based on DCNN		Training dataset: 60089 frames from 223 polyp videos (29% NICE type 1, 53% NICE type 2 and 18% of normal mucosa with no polyp)/validation dataset: 40 videos (NICE type 1, NICE type 2 and two videos of normal mucosa)/test dataset: 125 consecutively identified diminutive polyps, comprising 51 hyperplastic polyps and 74 adenomas	Accuracy = 94%, 95%CI = 86%-97%; sensitivity = 98%, 95%CI = 92%-100%; Specificity = 83%, 95%CI = 67%-93%; NPV = 97%; PPV = 90%
Song et al[88]	2020	Retrospective	Distinguishing adenomas from SPs	CADx based on DCNN	NA	12480 image patches of 624 polyps/two test datasets of 545 polyp	Agreement between the true polyp histology CADx = 0.614–0.642; accuracy = 81.3%–82.4%; sensitivity = 82.1%; specificity = 93.7%; PPV = 78%; NPV = 95%; the AUC = 0.93–0.95, 0.86–0.89, and 0.89–0.91 for serrated polyps, benign adenoma/mucosal or superficial submucosal cancer, and deep submucosal cancer, respectively
Kudo et al[89]	2020	Retrospective	Distinguishing small (≤ 10 mm) neoplastic from non-neoplastic lesions	The EndoBRAIN system (CADx + CADe based on DCNN)	NA/89 patients test set	69,142 images taken at 520-fold magnification and 2,000 polyps/100 lesions (≤ 10 mm) in the test dataset	CADe: Accuracy = 98%, 95%CI = 97.3%–98.6%; sensitivity = 96.9%, 95%CI = 95.8%–97.8%; specificity = 100%, 95%CI = 99.6%–100%; PPV = 100%, 95%CI = 99.8%–100%; NPV = 94.6%, 95%CI = 92.7%–96.1%; CADx: Accuracy = 96%, 95%CI = 95.1%–96.8%; sensitivity = 96.9%, 95%CI = 95.8%–97.8%; specificity = 94.3%, 95%CI = 92.3%–95.9%; PPV = 96.9%, 95%CI = 95.8%–97.8%; NPV = 94.3%, 95%CI = 92.3%–95.9%

CADe: Computer-assisted detection system; CADx: Computer-assisted diagnosis system; CNN: Convolutional neural network; DCNN: Deep learning convolutional neural network; AUC: Area Under the Receiver Operating Characteristic curve; PPV: Positive predictive value; NPV: Negative predictive value; SVM: Support vector machine; SP: Serrated polyps; CI: Confidence interval.

Summary of the non-controlled studies involving computer-aided diagnosis for colonoscopy including studies with combined detection and diagnosis systems CADe: Computer-assisted detection system; CADx: Computer-assisted diagnosis system; CNN: Convolutional neural network; DCNN: Deep learning convolutional neural network; AUC: Area Under the Receiver Operating Characteristic curve; PPV: Positive predictive value; NPV: Negative predictive value; SVM: Support vector machine; SP: Serrated polyps; CI: Confidence interval.

Combined CADe and CADx models

The ideal CAD system would support the simultaneous detection and classification of polyps to optimize colonoscopy outcomes and achieve the best level of CRC prevention. A recent study evaluated the real-time application of CADx in combination with CADe[45]. The validated model was tested on a series of 125 diminutive polyps, comprising 51 hyperplastic polyps and 74 adenomas. The combined model could not detect histopathology in 15% of polyps. For the remaining 106 polyps histologically predicted with high confidence, the AI model demonstrated an accuracy of 94%, sensitivity of 98%, specificity of 83%, NPV of 97%, and positive predictive value (PPV) of 90%. In a significant study, Byrne et al[51] developed a new platform using three distinct AI CADe and CADx algorithms to provide endoscopists with a full workflow from detection to classification: An NBI light detector, a polyp detector, and an optical biopsy. The NBI light detector runs throughout the colonoscopy procedure to ensure the detection of all colorectal polyps with white light imaging, and the optical biopsy provides an accurate polyp classification using NBI light. The NBI light model resulted in an excellent accuracy of 99.94% when tested in 21804 unseen colonoscopy video frames. However, the detection mode using white light resulted in a sensitivity of only 79%. The optical biopsy model could accurately classify 97.6% of polyps, which was significantly higher than a previous CADx model tested by the same research team[45], and had a sensitivity of 95.95%, specificity of 91.66%, and NPV of 93.6% for polyp classification.

QUALITY ASSESSMENT OF COLONOSCOPY BY COMPUTER

Few studies have evaluated an AI-assisted system for the ability to accurately and automatically assess the quality of a colonoscopy procedure, including the identification of critical anatomical landmarks, especially when the endoscopic field is blurry[52,53]. Filip et al[53] developed a “Colometer” system that could rate colonoscopy quality based on the percentage of the withdrawal time with adequate visualization. This system could detect the factors associated with optimal real-time visualization of the mucosa, including image clarity, withdrawal velocity, and level of bowel cleanliness. A dataset of expert-annotated images and videos was used to train the AI model. The authors compared the quality rated by this system with that of three independent experts. There was a strong correlation between AI and expert quality ratings (ρ coefficient 0.65, P = 0.01). In another study, a system comprising two AI algorithms was designed to automatically detect the appendiceal orifice on a colon image or video[54]. The first algorithm was developed to detect the appendiceal orifice on endoscopic images based on the local shape, lighting, and intensity differences from a normal edge direction. The second algorithm was designed to detect the appendiceal orifice in the colonoscopy videos using a frame intensity histogram. The system could detect the orifice in images with an average sensitivity and specificity of 96.86% and 90.47%, respectively, and correctly classified 21 out of 23 colonoscopy videos (accuracy 91.30%).

RECOMMENDATIONS FOR FUTURE RESEARCH

Despite potential benefits of AI in colonoscopy, regulatory approval and standardization of AI models are difficult goals to achieve for a number of reasons described below.

Polyp morphology

Datasets might underrepresent particular polyp morphologies that are not common findings during colonoscopy. For example, non-polypoid lesions with Paris classification of flat and/or depressed morphology are more likely to harbor advanced histology or malignancy but are not a common finding during colonoscopy[55]. The endoscopic detection of non-polypoid lesions is problematic because of their surface pattern resemblance to normal mucosa[56]. Moreover, serrated polyps comprise about 30% of colon polyps, with sessile serrated polyp/adenoma (SSA/P) prevalence being less than 10%[57]. It has been proven that SSA/Ps can be responsible for CRC through a serrated (hyperplastic-SSP/A-serrated-CRC) sequence[58]. However, SSA/Ps can hardly be distinguished from normal mucosa or hyperplastic polyps by features of crypt distortion. Research has shown that previously diagnosed hyperplastic polyps might be reclassified as SSAs after pathological reassessment[19-22], particularly for larger (> 5 mm) or right-sided polyps, and co-existing adenomas containing advanced histology[19,21,59]. A recent meta-analysis showed that pathological reassessment of resected polyps led to a significant change in diagnosis from hyperplastic to SSA for polyps in the right colon and polyps ≥ 5 mm (odds ratio 4.401 and 8.336, respectively)[59]. Moreover, there is poor agreement among pathologists in the determination of high-risk polyp features owing to the various approaches used for preparing biopsy specimens or level of expertise[19,60]. Therefore, the development of an AI platform capable of detecting and distinguishing subtle adenomatous features from normal mucosa with a high level of accuracy would be a valuable clinical tool.

Metadata

Most studies have failed to assess the performance and accuracy of AI models according to polyp size, polyp location, bowel preparation score, or withdrawal time[18]. Patients’ information including demographic and clinical characteristics (e.g., colonoscopy indication, disease status), procedure-related quality characteristics (i.e., bowel preparation level, withdrawal time), procedure time and room, endoscopists fatigue (i.e., the procedure performed in the morning or afternoon) are the important factors that are linked with the long-term non-endoscopic outcome of interest. In other words, the detection and classification of colorectal polyps are the intermediate outcomes of the colonoscopy, but the prevention of interval cancer during the surveillance period, or the evaluation of the effectiveness of medical therapy and the need for surgical treatment in patients with inflammatory bowel diseases are the ultimate goals of the colonoscopy depending on the primary indication of the procedure. As mentioned in Kudo et al[61], metadata is a critical component in establishing optimal AI platforms that can perform well in real-world practice with suboptimal conditions. For example, SSA/Ps are mainly located in the right colon, where endoscopic access and complete inspection of the mucosa are challenging[58]. Collecting a high number of colonoscopy videos with a high number of SSA/P polyps and cross-linking with patient's data would increase the accuracy and effectiveness of the colonoscopy. Future AI models must incorporate the information of the polyp size and location as well as the clinical, pre-procedural, and polyp morphological characteristics rather than focusing on the polyp images and videos alone.

Prospective real-time studies

The robustness of AI platforms has not been widely estimated in real-time clinical settings through prospective studies. Most studies have been retrospective in design and subject to selection bias. Therefore, the comparison of accuracy between model and endoscopists may falsely deviate in favor of CAD. For example, in CADe, the researcher might exclude unclear colonoscopy or polyp images/videos; a fuzzy or blurred endoscopic view may occur when water or blood obscures the field, or when feces cover the bowel surface preventing a complete examination. There should also be a mixture of polyp-positive and polyp-negative images from abnormal and normal colonoscopies in all training, validation, and test datasets. The development of AI models must be rigorously based on a training dataset that is preferably gathered during real-time colonoscopies. Data should be collected prospectively by both experienced and novice endoscopists to represent the actual state of practice when assessing the model. The elimination of selection bias is most relevant to CADe systems and less so to CADx systems. Studies should be based in several centers to ensure the reproducibility of the results at the testing level. Testing CAD systems in non-academic settings will demonstrate whether the model represents actual real-world practice, where more polyps are missed and/or there is no access to advanced technologies such as NBI. In addition, real-time and multicenter studies may help to clarify the place of AI in the diagnostic process. Prospective studies would provide robust evidence to support the application of CAD and enhance endoscopists’ trust in optical polyp classification[62]. Nevertheless, CAD is still an operator-dependent technology as it is the experienced endoscopists who must provide the annotated datasets for the development of the system, and the accuracy of the AI output relies on the endoscopist presenting a clear endoscopic field to the system. Certain challenges such as prolonged procedure times, high positivity rate, and inability to predict the histology in the presence of feces or blood in the visual field should be mitigated to prevent suboptimal diagnosis. Physicians should continue to follow the recommended procedural measures, including sufficient bowel preparation and photo documentation, to avoid legal and insurance issues. Researchers should prioritize prospective controlled trials to allow a precise comparison between the settings that use and do not use AI platforms, otherwise, the real benefits of the AI system cannot be determined. Crossover studies, where patients act as their own controls and undergo colonoscopy both with and without AI support would be useful as fewer patients would be needed. In practice, the endoscopist would first detect and classify a polyp before using the AI support system to ensure the accuracy of their classification. This process should be performed in a time-efficient manner as the benefit of AI assistance would be irrelevant if the procedure was significantly prolonged.

Standardization of endpoints

All research evaluating the diagnostic accuracy of CAD systems should use standardized research endpoints derived from the latest guidelines. Similarly to other diagnostic evaluation studies, sensitivity, specificity, PPV, NPV, and AUC must be reported, as well as confusion matrices and mean average precision for multiclass classifications and intersection over union (IoU), or the DICE coefficient for segmentation (i.e., delineation) in particular situations[63,64]. The use of such a comprehensive set of metrics would provide convincing evidence, reassuring physicians about the reliability of AI tools. For example, ADR must be reported for all research related to the evaluation of CADe systems, as such systems aim to achieve complete detection of all colorectal lesions. Similarly, the NPV of CADx systems must be reported to confirm the ability of CADx to achieve the recommended NPV benchmark of ≥ 90% according to the PIVI statement[46]. In addition, for surveillance interval assignment, the agreement between AI-based assignment and that of the histopathology reference standard must reach the ≥ 90% threshold recommended by the PIVI statement[46].

Transparency of AI analyses

We should avoid the black-box phenomenon when the decision-making process of the model by the convolutional neural network cannot be deconvoluted due to the complexity of the process[65,66]. An important aspect of the wide application of AI platforms is the trust that physicians and responsible regulatory officials place in the AI analyses. Research should move toward facilitating extreme transparency in the generation and validation of AI models to avoid hesitancy about their public implementation.

Safety and cost-effectiveness

Finally, as well as CADe and CADx systems, a computer-based support system that aids endoscopists in selecting the most appropriate polypectomy procedure is necessary. Current practice involves the use of forceps to remove diminutive polyps, especially for the resection of polyps up to 2 mm[67]; however, the rate of incomplete resection is lower for the removal of polyps ≥ 3 mm when a snare is used[68]. In addition to providing a suggestion for an appropriate polypectomy device, AI can also help to estimate polyp size, delineate the extent of the lesion and a safe polypectomy margin, and identify post-resection lesion remnants that indicate an incomplete resection and the need for further tissue removal at colonoscopy follow-up. The goal of this system is to provide a complete polypectomy that will reduce the risk of interval cancer, as about 30% of all interval cancers are thought to be caused by incomplete resection of CRC precursors[11,69,70]. In addition to addressing the challenges associated with the development of reliable AI models that can be confidently employed in routine practice with high efficacy, research is needed to assess the cost-effectiveness of these systems related to the reduction in the number of patients diagnosed with interval cancer, reduction in the number of unnecessary pathology evaluations for low-confidence predictions of polyp histology by optical diagnosis, and facilitation of efficient physician-patient communication concerning future clinical arrangements. Adapting the newly developed AI-based techniques in routine practice and enhancing endoscopists’ trust in the new devices is only possible by a symbiotic relationship between academia and industry. It would facilitate obtaining regulatory approval from health authorities regarding research involving human subjects, constructing large “ground truth” data for developing AI models, and transporting knowledge and technology to ultimately access the market[71]. Several manufacturers have obtained the regulatory approvals to launch and commercialize their AI-based colonoscopy devices around the world (Table 4); however, many of them have not provided a detailed report of their devices’ performance. Further research should try to compare the performance of different AI-based systems in real-time settings by conducting prospective controlled trials with multiple intervention arms sing different commercially available AI-based colonoscopy systems. Due to the time- and cost-consuming nature of these studies, an alternative method for accelerating research is to test the “benchmarks” using the publicly available datasets such as the ASU-Mayo colonoscopy video database[29], the CVC-ClinicDB database[28], the Kvasir dataset[72], and the ETIS-Larib Polyp database. Nonetheless, these datasets contain a limited number of colonoscopy videos and images and may not reflect the true performance of an AI-based system.

Table 4

Commercially available computer-assisted colonoscopy tools that have cleared regulatory approval

Computer assissted system	Product	Manufacturer	Year of regulatory approval	Place of regulatory approval
CADx	EndoBRAIN	Cybernet System Corp./Olympus Corp.	2018	Japan
CADe	GI Genius	Medtronic Corp.	2019 in Europe; 2021 in United States	Europe/United States
CADe	ENDO-AID	Olympus Corp.	2020	Europe
CADe/CADx	CAD EYE	Fujifilm Corp.	2020	Europe/Japan
CADe	DISCOVERY	Pentax Corp.	2020	Europe
CADe	EndoBRAIN-EYE	Cybernet System Corp./Olympus Corp.	2020	Japan
CADe	EndoAngel	Wuhan EndoAngel Medical Technology Company	2020	China
CADe	EndoScreener	WISION A.I.	2020	China
CADx	EndoBRAIN-PLUS	Cybernet System Corp./Olympus Corp.	2020	Japan
CADx	EndoBRAIN-UC	Cybernet System Corp./Olympus Corp.	2020	Japan
CADe	WISE VISION	NEC Corp.	2021	Europe/Japan
CADe	ME-APDS	Magentiq Eye	2021	Europe
CADe	CADDIE	Odin Vision	2021	Europe

CADe: Computer-assisted detection system; CADx: Computer-assisted diagnosis system.

Commercially available computer-assisted colonoscopy tools that have cleared regulatory approval CADe: Computer-assisted detection system; CADx: Computer-assisted diagnosis system.

CONCLUSION

AI research is a rapidly evolving discipline that promises to enhance physicians’ performance. AI models have demonstrated the ability to compete with and outperform endoscopists, suggesting that all endoscopists would benefit from becoming familiar with CAD technology and comfortable with the integration of AI-assisted devices in colonoscopy practice. The decision support systems are being offered as reliable tools for the detection and classification of colorectal polyps, with the primary aim of outperforming endoscopists by detecting all CRC precursors; however, the new era of AI platforms has seen attempts to establish considerably more complex systems, in which the detection and classification of polyps are supported. Despite the recent achievements in designing and validating such systems, the current lack of AI-assisted systems that support endoscopists in monitoring colonoscopy quality, and that automatically annotate colonoscopy videos, suggest appropriate polypectomy devices, and indicate the completeness of polypectomy, limits the role of AI in colonoscopy practice. Through the integration of the most recent advances in computer science into colonoscopy practice, it appears possible to improve the quality of diagnosis, treatment, and screening in patients. However, AI platforms are still in their infancy in terms of clinical establishment and require much more exploration and innovation. They must be trusted by all physicians, regulatory organizations responsible for approval for clinical use, and patients. The AI-assisted colonoscopy is highly dependent on the endoscopist, who must attempt to present the clearest possible image or video to the AI model for analysis, and then take account of other concurrent patient factors such as the family history of CRC or the results of previous colonoscopies. The human qualities of respect and empathy must be apparent when communicating with patients to overcome any mistrust or reservations patients may have toward the new technology. Therefore, at the current stage of AI development, AI models can only “serve as a second observer, or a concurrent observer, but not an independent decision-maker”[73].

78 in total

1. WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians.

Authors: Jorge Bernal; F Javier Sánchez; Gloria Fernández-Esparrach; Debora Gil; Cristina Rodríguez; Fernando Vilariño
Journal: Comput Med Imaging Graph Date: 2015-03-20 Impact factor: 4.790

2. SVM-MRF segmentation of colorectal NBI endoscopic images.

Authors: Tsubasa Hirakawa; Tom Tamaki; Bisser Raytchev; Kazufumi Kaneda; Tetsushi Koide; Yoko Kominami; Shigeto Yoshida; Shinji Tanaka
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2014

3. Integrating Online and Offline Three-Dimensional Deep Learning for Automated Polyp Detection in Colonoscopy Videos.

Authors:
Journal: IEEE J Biomed Health Inform Date: 2016-12-07 Impact factor: 5.772

4. Making optical biopsy a clinical reality in colonoscopy.

Authors: James E East; Colin J Rees
Journal: Lancet Gastroenterol Hepatol Date: 2018-01

5. Lower Adenoma Miss Rate of Computer-Aided Detection-Assisted Colonoscopy vs Routine White-Light Colonoscopy in a Prospective Tandem Study.

Authors: Pu Wang; Peixi Liu; Jeremy R Glissen Brown; Tyler M Berzin; Guanyu Zhou; Shan Lei; Xiaogang Liu; Liangping Li; Xun Xiao
Journal: Gastroenterology Date: 2020-06-17 Impact factor: 22.682

Review 6. Changing pathological diagnosis from hyperplastic polyp to sessile serrated adenoma: systematic review and meta-analysis.

Authors: Yaron Niv
Journal: Eur J Gastroenterol Hepatol Date: 2017-12 Impact factor: 2.566

7. Factors Associated With Classification of Hyperplastic Polyps as Sessile Serrated Adenomas/Polyps on Morphologic Review.

Authors: Joseph C Anderson; Mikhail Lisovsky; Mary A Greene; Catherine Hagen; Amitabh Srivastava
Journal: J Clin Gastroenterol Date: 2018-07 Impact factor: 3.062