| Literature DB >> 35148764 |
Ziyi Jin1, Tianyuan Gan1, Peng Wang1, Zuoming Fu1, Chongan Zhang1, Qinglai Yan2, Xueyong Zheng3, Xiao Liang3, Xuesong Ye4.
Abstract
Gastric disease is a major health problem worldwide. Gastroscopy is the main method and the gold standard used to screen and diagnose many gastric diseases. However, several factors, such as the experience and fatigue of endoscopists, limit its performance. With recent advancements in deep learning, an increasing number of studies have used this technology to provide on-site assistance during real-time gastroscopy. This review summarizes the latest publications on deep learning applications in overcoming disease-related and nondisease-related gastroscopy challenges. The former aims to help endoscopists find lesions and characterize them when they appear in the view shed of the gastroscope. The purpose of the latter is to avoid missing lesions due to poor-quality frames, incomplete inspection coverage of gastroscopy, etc., thus improving the quality of gastroscopy. This study aims to provide technical guidance and a comprehensive perspective for physicians to understand deep learning technology in gastroscopy. Some key issues to be handled before the clinical application of deep learning technology and the future direction of disease-related and nondisease-related applications of deep learning to gastroscopy are discussed herein.Entities:
Keywords: Computer-aided; Deep learning; Gastroscopy; Stomach
Mesh:
Year: 2022 PMID: 35148764 PMCID: PMC8832738 DOI: 10.1186/s12938-022-00979-8
Source DB: PubMed Journal: Biomed Eng Online ISSN: 1475-925X Impact factor: 2.819
Fig. 1Diagram of the screening process of publications included in the analysis of this review. Duplication means that the same record is retrieved using different keywords. Relation means that the record applies deep learning technology in gastroscopy image processing (excluding wireless capsule endoscopy)
Fig. 2Illustration of the four main tasks of gastric image analysis using deep learning technology. A CNN is the most popular neural network that has already been applied to gastroscopic image analysis. Recently, RNNs and GANs have been used to enhance the performance of CNN-based gastroscopic image processing methods for the four main tasks
Fig. 3Diagnosis with a CAD system for the endoscopic video of a post-eradication subject. The computer-aided diagnosis system for white-light imaging (WLI-CAD, upper side) returned a prediction value of 0.492 for a post-eradication status, which turned out to be an incorrect prediction. However, linked colour imaging (LCI-CAD, lower side) returned a prediction value of 0.985 for a post-eradication status, which turned out to be the correct prediction. The lower heatmap demonstrates that hot spots were drawn against the contrast between a pale reddish tone and a whitish tone of the gastric mucosa in the captured LCI image.
(Reproduced with permission from Ref. [37]. Copyright 2020 Springer Nature Publishing)
Fig. 4Sample images for the early detection of gastric cancer using a convolutional neural network (CNN) system. Real‐time detection using the CNN system is displayed on the left‐hand‐side screen (video image).
(Reproduced with permission from Ref. [53]. Copyright 2018 John Wiley and Sons Publishing)
Fig. 5Example of an original ME-NBI image and the feature extraction procedure for its classification. The original ME-NBI image is classified into three types: CGT, LGN, and EGC.
(Reproduced with permission from Ref. [36]. Copyright 2020 Elsevier Publishing)
Fig. 6Examples of images with different differentiation statuses. A Cancerous image (differentiated type). B Cancerous image (undifferentiated type). C Noncancerous image.
(Reproduced with permission from Ref. [32]. Copyright 2020 Elsevier Publishing)
Fig. 7Delineation results. a–c Successful gastric cancer detection and delineation, d false positives (FP) in healthy subjects, and e false negatives in abnormal cases.
(Reproduced with permission from Ref. [82]. Copyright 2020 MDPI Publishing)
Fig. 8Some examples of detection results using a conventional SSD and SSD-GPNet. The first column shows the "ground truth" as labelled by experienced doctors. The second column shows the SSD detection results. The last column shows the SSD-GPNet detection results
(Reproduced with permission from Ref. [54]. Copyright 2019 Zhang et al.)
Fig. 9Attention maps generated using the Grad-CAM method. The yellow wireframe areas are the lesion regions of GIM annotated by an experienced endoscopist. The attention map is a heatmap laid over the original image, where a warmer colour indicates a higher contribution to the classification decision.
(Reproduced with permission from Ref. [22]. Copyright 2020 Elsevier Publishing)
Fig. 10Interpretable thermodynamic maps for the automatic diagnosis of chronic atrophic gastritis. a Original images. The red boxes are the areas of focus labelled by a doctor. b Heatmaps generated with class activation mapping. The orange–red regions of the heatmaps are consistent with the atrophic mucosa labelled by the doctors according to pathological results
(Adapted with permission from Ref. [21] Copyright 2020 Elsevier Publishing)
Fig. 11Upper gastrointestinal anatomy detection with multitask convolutional neural networks. a Workflow of the proposed method. b Examples of MT-AD detection results
(Adapted with permission from Ref. [13] Copyright 2019 John Wiley and Sons Publishing)
Fig. 12EGD images classified into 31 sites and representative images identified by IDEA. The images show 31 sites determined by IDEA and the prediction confidence. Class 0, hypopharynx; 1, oesophagus; 2, gastroesophageal junction; 3, gastric cardia P in the antegrade view; 4–7, lower body (A, G, P, L); 8–11, middle-upper body in the antegrade view (A, G, P, L); 12–15 gastric cardia in the retroflex view (A, G, P, L); 16–18, middle-upper body in retroflex view (A, P, L); 19–21, angularis (A, P, L); 22–26, antrum (whole, A, G, P, L); 27, pylorus; 28–29, duodenal bulb (A, P); 30, duodenal descending. A, anterior wall; G, greater curvature; P, posterior wall; L, lesser curvature.
(Reproduced with permission from Ref. [2]. Copyright 2021 Elsevier Publishing)
Fig. 13Artefact detection results on video sequence data with three object detection baselines. Bounding boxes for two different video sequences (each frame taken approximately after 10 frames).
(Reproduced with permission from Ref. [7]. Copyright 2021 Elsevier Publishing)
Fig. 14Depth estimation results for endoscopy data of the stomach. The model performed prediction on the Kvasir dataset for the stomach. Since there is no ground truth depth information available for this dataset, depth estimations are only provided with their corresponding raw images.
(Reproduced with permission from Ref. [101]. Copyright 2021 Elsevier Publishing)
Fig. 153D reconstruction results of the input frames from endoscopy data of the stomach. a Original frames from endoscopy data of the stomach. b Result of 3D reconstruction using these input frames.
(Adapted with permission from Ref. [101]. Copyright 2021 Elsevier Publishing)
Disease-related application of deep learning to gastroscopic image processing
| Target disease | Main purpose | Reference | Imaging modality | DL task type | Dataset information | Network architecture | Result |
|---|---|---|---|---|---|---|---|
| GC | Detection of GC | Wang et al. [ | WLI | Image classification | A total of 1350 images depicting cancer (highly suspicious) and 103,514 normal images Train:validation:test = 6:2:2 | AlexNet GoogLeNet VGGNet | Sensitivity: 79.622% Specificity: 78.48% Misdiagnoses rate: 20.377% Misdiagnosis rate: 21.51% |
| Hirasawa et al. [ | WLI CE NBI | Object detection | Training dataset: 13,584 endoscopic images of gastric cancer Testing dataset: 2296 stomach images collected from 69 consecutive patients with 77 gastric cancer lesions | SSD | The CNN required 47 s to analyse 2296 test images The CNN correctly diagnosed 71 of 77 gastric cancer lesions with an overall sensitivity of 92.2% 161 noncancerous lesions were detected as gastric cancer, resulting in a positive predictive value of 30.6% 70 of the 71 lesions (98.6%) with a diameter of 6 mm or more as well as all invasive cancers were correctly detected | ||
| Ishioka et al. [ | WLI CE NBI | Object detection | Training dataset: 13,584 endoscopic images of gastric cancer Testing dataset: video images were collected from 68 endoscopic submucosal dissection procedures for early gastric cancer in 62 patients | SSD | The CNN correctly diagnosed 64 of 68 lesions (94.1%) Median time for lesion detection was 1 s (range: 0–44 s) after the lesions first appeared on the screen | ||
| Ikenoyama et al. [ | WLI CE NBI | Object detection | Training dataset: 13,584 endoscopic images from 2639 lesions of gastric cancer Testing dataset: 2940 images from 140 cases | SSD | The average diagnostic time for analysing 2940 test endoscopic images by the CNN and endoscopists were 45.5 ± 1.8 s and 173.0 ± 66.0 min, respectively The sensitivity, specificity, and positive and negative predictive values for the CNN were 58.4%, 87.3%, 26.0%, and 96.5%, respectively. These values for 67 endoscopists were 31.9%, 97.2%, 46.2%, and 94.9%, respectively The CNN had a significantly higher sensitivity than the endoscopists (by 26.5%) | ||
| Luo et al. [ | WLI | Semantic segmentation | A total of 1,036,496 endoscopy images from 84,424 individuals Train:validation:test = 8:1:1 | DeepLabv3 + | The diagnostic accuracy in identifying upper gastrointestinal cancers was 0.955 in the internal validation set, 0.927 in the prospective set, and ranged from 0.915 to 0.977 in the five external validation sets The diagnostic sensitivity was similar to that of the expert endoscopist (0.942 vs. 0.945) and superior sensitivity compared with competent (0.858) and trainee (0.722) endoscopists The positive predictive value was 0.814 for the system, 0.932 for the expert endoscopist, 0.974 for the competent endoscopist, and 0.824 for the trainee endoscopist The negative predictive value was 0.978 for the system, 0.980 for the expert endoscopist, 0.951 for the competent endoscopist, and 0.904 for the trainee endoscopist | ||
| Diagnosis of GC | Sakai et al. [ | WLI | Image classification | Training dataset: 9587 cancer images and 9800 normal images Testing dataset: 4653 cancer images and 4997 normal images | GoogLeNet | Accuracy: 87.6% Sensitivity: 80.0% Specificity: 94.8% | |
| Cho et al. [ | WLI | Image classification | Training dataset: 4205 images from 1057 patients Testing dataset: 812 images from 212 patients. An additional 200 images from 200 patients were collected and used for prospective validation | Inception-Resnet-v2 | The weighted average accuracy of the model reached 84.6% for the five-category classification The mean area under the curve (AUC) of the model for differentiating gastric cancer and neoplasm was 0.877 and 0.927, respectively In prospective validation, the model showed lower performance compared with the endoscopist with the best performance (five-category accuracy 76.4% vs. 87.6%; cancer 76.0% vs. 97.5%; neoplasm 73.5% vs. 96.5%; P < 0.001). However, there was no significant difference between the model and the endoscopist with the worst performance in the differentiation of gastric cancer (accuracy 76.0% vs. 82.0%) and neoplasm (AUC 0.776 vs. 0.865) | ||
| Lee et al. [ | WLI | Image classification | Training dataset: 200 Ulcer images, 337 cancer images, 180 normal images Testing dataset: 20 ulcer images, 30 cancer images, 20 normal images | ResNet-50 VGGNet-16 Inception v4 | The AUCs were 0.95, 0.97, and 0.85 for the three classifiers. ResNet-50 showed the highest level of performance The cases involving normal, i.e., normal vs. ulcer and normal vs. cancer resulted in accuracies above 90%. The case of ulcer vs. cancer classification resulted in a lower accuracy of 77.1%, | ||
| Li et al. [ | ME-NBI | Image classification | Training dataset: A total of 386 images of noncancerous lesions and 1702 images of early gastric cancer Testing dataset: A total of 341 endoscopic images (171 noncancerous lesions and 170 early gastric cancer) | Inception-v3 | The sensitivity, specificity, and accuracy of the CNN system in the diagnosis of early gastric cancer were 91.18%, 90.64%, and 90.91%, respectively No significant difference in the specificity and accuracy of diagnosis between the CNN and experts. However, the diagnostic sensitivity of the CNN was significantly higher than that of the experts The diagnostic sensitivity, specificity and accuracy of the CNN were significantly higher than those of the nonexperts | ||
| Horiuchi et al. [ | ME-NBI | Image classification | Training dataset: 1492 EGC and 1078 gastritis images Testing dataset: 151 EGC and 107 gastritis images | GoogLeNet | Accuracy: 85.3% Sensitivity: 95.4% Specificity: 71.0% PPV: 82.3% NPV: 91.7% The overall test speed was 51.83 images/s (0.02 s/image) | ||
| Horiuchi et al. [ | ME-NBI | Image classification | Training dataset: 1492 cancerous and 1078 noncancerous images obtained using ME-NBI Testing dataset: 174 videos (87 cancerous and 87 noncancerous videos) Comparisons were made between the system and 11 experts who were skilled in diagnosing EGC using ME-NBI with clinical experience of more than 1 year | GoogLeNet | AUC: 0.8684 Accuracy: 85.1% Sensitivity: 87.4% Specificity: 82.8% PPV: 83.5% NPV: 86.7% The CAD system was significantly more accurate than two experts, significantly less accurate than one expert, and not significantly different from the remaining eight experts | ||
| Hu et al. [ | ME-NBI | Image classification | A total of 1777 ME-NBI images from 295 cases were collected from 3 centres Training cohort (TC, n = 170) Internal test cohort (ITC, n = 73) External test cohort (ETC, n = 52) compared the model with eight endoscopists with varying experience | VGG-19 | AUC: 0.808 in the ITC and 0.813 in the ETC Similar predictive performance to the senior endoscopists (accuracy: 0.770 vs. 0.755; sensitivity: 0.792 vs. 0.767; specificity: 0.745 vs. 0.742) better than the junior endoscopists (accuracy: 0.770 vs. 0.728) After referring to the results of the system, the average diagnostic ability of the endoscopists was significantly improved in terms of accuracy, sensitivity, PPV, and NPV | ||
| Liu et al. [ | ME-NBI | Image classification | A total of 3871 ME-NBI images including 1130 CGT, 1114 LGN and 1627 EGC tenfold cross-validation | ResNet-50 VGG-16 Inception-v3 InceptionResNetv2 | ResNet-50 is the best among the four networks Accuracy: 0.96 f1-scores: 0.92, 0.91 and 0.99 for classifying ME-NBI images into CGT, LGN and EGC, respectively | ||
| Ueyama et al. [ | ME-NBI | Image classification | Training dataset: 5574 ME‐NBI images (3797 EGCs, 1777 non‐cancerous mucosa and lesions) Testing dataset: 2300 ME‐NBI images (1430 EGCs, 870 non‐cancerous mucosa and lesions) | ResNet-50 | The AI‐assisted CNN‐CAD system required 60 s to analyse 2300 test images Accuracy: 98.7% Sensitivity: 98% Specificity: 100% Positive predictive value: 100% Negative predictive value: 96.8% All misdiagnosed images of EGCs were of low‐quality or of superficially depressed and intestinal‐type intramucosal cancers that were difficult to distinguish from gastritis, even by experienced endoscopists | ||
| Zhang et al. [ | WLI | Image classification Semantic segmentation | Training dataset: 21,217 gastroscopic images of peptic ulcer (PU), early gastric cancer (EGC), high‐grade intraepithelial neoplasia (HGIN), advanced gastric cancer (AGC), gastric submucosal tumours (SMTs), and normal gastric mucosa without lesions Testing dataset: 1091 images The CNN diagnosis was compared with those of 10 endoscopists with over 8 years of experience in endoscopic diagnosis | ResNet34 DeepLabv3 | The diagnostic specificity and PPV of the CNN were higher than that of the endoscopists for the EGC and HGIN images (specificity: 91.2% vs. 86.7%; PPV: 55.4% vs. 41.7%) The diagnostic accuracy of the CNN was close to those of the endoscopists for the lesion‐free, EGC and HGIN, PU, AGC, and SMTs images The CNN had an image recognition time of 42 s for all the test set images | ||
| Determining the invasion depth of GC | Zhu et al. [ | WLI | Image classification | Training dataset: 790 images Testing dataset: 203 images | ResNet-50 | At a threshold value of 0.5, the sensitivity was 76.47%, the specificity was 95.56%, the AUC was 0.94, the overall accuracy was 89.16%, the positive predictive value was 89.66%, and the negative predictive value was 88.97% The CNN–CAD system achieved significantly higher accuracy (by 17.25%) and specificity (by 32.21%) than human endoscopists | |
| Cho et al. [ | WLI | Image classification | Internal dataset: a total of 2899 images Train:validation:test = 8:1:1 External dataset: 206 images for testing | DenseNet-161 | In the internal test, the mean area under the curve discriminating submucosal invasion was 0.887 In the external test, the mean area under the curve reached 0.887 Clinical simulation showed that 6.7% of patients who underwent gastrectomy in the external test were accurately qualified by the established algorithm for potential endoscopic resection, avoiding unnecessary operation | ||
| Delineating the margin of GC | An et al. [ | WLI CE ME-NBI | Semantic segmentation | Training dataset: WLI: 343 images from 260 patients CE: 546 images from 67 patients Testing dataset: WLI: 321 images from 218 patients CE: 34 images from 14 patients | UNet + + | The system had an accuracy of 85.7% on the CE images and 88.9% on the WLE images under an overlap ratio threshold of 0.60 in comparison with the manual markers labelled by the experts On the ESD videos, the resection margins predicted by the system covered all areas of high-grade intraepithelial neoplasia and cancers The minimum distance between the margins predicted by the system and the histological cancer boundary was 3.44 ± 1.45 mm which outperformed the resection margin based on ME-NBI | |
Detection of GC Anatomical classification | Wu et al. [ | NBI BLI WLI | Image classification | Training dataset: 3170 gastric cancer and 5981 benign images for detect GC; 24,549 images from different parts of stomach for monitor blind spots Testing dataset: 100 gastric cancer and 100 benign images for detect GC; 170 images for monitor blind spots | VGG-16 ResNet-50 | The DCNN identified EGC from nonmalignancy with an accuracy of 92.5%, a sensitivity of 94.0%, a specificity of 91.0%, a positive predictive value of 91.3%, and a negative predictive value of 93.8% The DCNN classified gastric locations into 10 or 26 parts with an accuracy of 90% or 65.9% | |
Detection of GC Determining the invasion depth of GC | Yoon et al. [ | WLI | Image classification | A total of 11,539 images (896 T1a-EGC, 809 T1b-EGC, and 9834 non-EGC) Train:validation:test = 6:2:2 | VGG-16 | AUC for EGC detection: 0.981 AUC for depth prediction: 0.851 | |
Detection of GC Delineating the margin of GC | Shibata et al. [ | WLI | Image classification Semantic segmentation | A total of 1208 healthy and 533 cancer images fivefold cross-validation | Mask R-CNN | For the detection task: the sensitivity and false positives (FPs) per image were 96.0% and 0.10 FP/image, respectively For segmentation task: the average Dice index was 71% | |
Classifying the type of GC Delineating the margin of GC | Ling et al. [ | ME-NBI | Image classification | For CNN1 to identify EGC differentiation status Training dataset: 2217 images from 145 EGC patient Testing dataset: 1870 images from 139 EGC patients The performance of CNN1 was then compared with that of experts using 882 images from 58 EGC patients For CNN2 to delineate the EGC margins Training dataset: 928 images from 132 EGC patients Testing dataset: 742 images from 87 EGC patients | VGG-16 and ResNet-50 UNet + + | The system predicted the differentiation status of EGCs with an accuracy of 83.3% in the testing dataset In the man – machine contest, CNN1 performed significantly better than the five experts (86.2% vs. 69.7%) The system delineated EGC margins with an accuracy of 82.7% in differentiated EGC and 88.1% in undifferentiated EGC under an overlap ratio of 0.80 In unprocessed EGC videos, the system achieved real-time diagnosis of EGC differentiation status and EGC margin delineation in ME-NBI endoscopy | |
| HP | Detection of HP | Itoh et al. [ | WLI | Image classification | 179 upper gastrointestinal endoscopy images obtained from 139 patients (65 were HP-positive and 74 were HP-negative) Training dataset: 149 were used as training images, The 149 training images were subjected to data augmentation, which yielded 596 images Testing dataset: the remaining 30 (15 from HP-negative patients and 15 from HP-positive patients) were set aside to be used as test images | GoogLeNet | Sensitivity: 86.7% Specificity: 86.7% AUC: 0.956 |
| Nakashima et al. [ | WLI BLI LCI | Image classification | Training dataset: For per group (WLI BLI LCI): 486 images (rotated 90, 180, and 270 degrees) in addition to the original 162, for a total of 648 Testing dataset: For per group (WLI BLI LCI): 60 images | GoogLeNet | AUC for WLI: 0.66 AUC for BLI: 0.96 AUC for LCI: 0.95 | ||
| Zheng et al [ | WLI | Image classification | Training dataset: 11,729 gastric images Testing dataset: 3755 gastric images | ResNet-50 | The AUC for a single gastric image was 0.93 with sensitivity, specificity, and accuracy of 81.4%, 90.1%, and 84.5%, respectively, using an optimal cut-off value of 0.3 The AUC for multiple gastric images per patient was 0.97 with sensitivity, specificity, and accuracy of 91.6%, 98.6%, and 93.8%, respectively, using an optimal cut-off value of 0.4 | ||
| Diagnosis of HP | Nakashima et al. [ | LCI WLI | Image classification | Training dataset: 6639 WLI images and 6248 LCI images from 395 subjects Testing dataset: Videos of 120 subjects | A 22-layer skip-connection architecture | For the LCI-CAD system: Accuracy: 84.2% for uninfected, 82.5% for currently infected, and 79.2% for post-eradication status For the WLI-CAD system: Accuracy: 75.0% for uninfected, 77.5% for currently infected, and 74.2% for post-eradication status The LCI-CAD system demonstrated significantly superior diagnostic accuracy to that of the WLI-CAD system and comparable diagnostic accuracy to that of experienced endoscopists | |
| Shichijo et al. [ | WLI | Object detection | Training dataset: 98,564 endoscopic images from 5236 patients (742 H. pylori-positive, 3649 -negative, and 845 -eradicated) Testing dataset: 23,699 images from 847 patients; 70 positive, 493 negative, and 284 eradicated | GoogLeNet | 80% (465/582) of negative diagnoses were accurate, 84% (147/174) eradicated, and 48% (44/91) positive The time needed to diagnose 23,699 images was 261 s | ||
| GP | Detection of GP | Zhang et al. [ | WLI | Image classification | Training dataset: 708 images Testing dataset: 50 images | SSD | The model can realize real-time polyp detection with 50 frames per second (FPS) The model can achieve the mean average precision (mAP) of 90.4% The model has excellent performance in improving polyp detection recalls over 10%, especially in small polyp detection |
| GIM | Diagnosis of GIM | Yan et al. [ | NBI ME-NBI | Image classification | Training dataset: 1880 endoscopic images (1048 GIM and 832 non-GIM) from 336 patients Testing dataset: 477 pathologically confirmed images (242 GIM and 235 non-GIM) from 80 patients | EfficientNetB4 | AUC: 0.928 Sensitivity: 91.9% Specificity: 86.0% Accuracy: 88.8% |
| CAG | Diagnosis of CAG | Zhang et al. [ | White-light i-Scan | Image classification | A total of 5470 images of the gastric antrum of 1699 patients (3042 images depicted atrophic gastritis and 2428 did not) fivefold cross-validation The diagnoses of the deep learning model were compared with those of three experts | DenseNet121 | Accuracy: 0.942 Sensitivity: 0.945 Specificity: 0.940 The detection rates of mild, moderate, and severe atrophic gastritis were 93%, 95%, and 99%, respectively The diagnostic performance of the CNN model was higher than that of the experts |
WLI, white-light imaging; CE, chromoendoscopy; NBI, narrow-band imaging; GC, gastric cancer; SSD, single-shot multibox detection; CNN, convolutional neural network; HP, Helicobacter pylori; AUC, area under curve; BLI, blue-light imaging; LCI, linked colour imaging; DCNN, deep convolutional neural network; EGC, early gastric cancer; FPS, frame per second; mAP, mean average precision; GP, gastric polyp; CAD, computer-aided diagnosis; WLE, white-light endoscopy; ESD, endoscopic submucosal dissection; ME, magnifying endoscope; PPV, positive predictive value; NPV, negative predictive value; CGT, chronic gastritis; LGN, low-grade neoplasia; AI, artificial intelligence; GIM, gastric intestinal metaplasia; PU, peptic ulcer; HGIN, high-grade intraepithelial neoplasia; AGC, advanced gastric cancer; SMTs, submucosal tumours; CAG, chronic atrophic gastritis
Non-disease-related application of deep learning in gastroscopic image processing
| Main purpose | Reference | Imaging modality | DL task type | Dataset information | Network Architecture | Result |
|---|---|---|---|---|---|---|
| Anatomical classification | Takiyama et al. [ | WLI | Image classification | Training dataset: 27,335 EGD images categorized into four major anatomical locations (larynx, oesophagus, stomach and duodenum) and three subsequent subclassifications for stomach images (upper, middle, and lower regions) Testing dataset: 17,081 EGD images | GoogLeNet | AUCs of 1.00 for larynx and oesophagus images, and 0.99 for stomach and duodenum images AUCs of 0.99 for the upper, middle, and lower stomach within the stomach |
| Chen et al. [ | WLI | Image classification | 437 patients were randomized to unsedated U-TOE, unsedated C-EGD, or sedated C-EGD, and each group was divided into two subgroups: with or without the assistance of a DL system to monitor blind spots during EGD | VGG-16 | The blind spot rate with DL-assisted sedated C-EGD was significantly lower than that of unsedated U-TOE and unsedated C-EGD (3.42% vs. 21.77% vs. 31.23%) The blind spot rate of the DL subgroup was lower than that of the control subgroup in all 3 groups (sedated C-EGD: 3.42% vs. 22.46%; unsedated U-TOE: 21.77% vs. 29.92%; unsedated C-EGD: 31.23% vs. 42.46%) | |
| Igarashi et al. [ | WLI NBI CE | Image classification | A total of 85,246 raw upper GI endoscopic images from 441 patients with gastric cancer Training dataset: 49,174 images Testing dataset: 36,072 images | AlexNet | Accuracy: 0.965 | |
| Li et al. [ | WLI | Image classification | Training dataset: 170,297 images and 5779 endoscopic videos Testing dataset: 3100 EGD images and 129 videos | Inception-v3 + LSTM | For images: the sensitivity, specificity, and accuracy of DCNN were determined as 97.18%,99.91%, and 99.83%, respectively For videos: the sensitivity, specificity, and accuracy of DCNN were 96.29%,93.32%, and 95.30%, respectively The DCNN was able to process one image in 80 ms using an NVIDIA GTX1080TI GPU, which means the frame rate performance of this model was 12.5 fps, thus meeting the real-time requirement | |
| Detection of artefacts | Ali et al. [ | WLI NBI | Object detection | A total of 1290 endoscopic images from seven unique patient videos, of which 1229 images are WLI, and 61 images are NBI Training dataset: 90% of total, 1161 images Testing dataset: 10% of total, 129 images | YOLO v3-spp | mAP of 45.7 at IoU thresholds 0.25 Overall mAP of 30.63 Detection speed of 88 ms per image |
| Zhang et al. [ | WLI NBI | Object detection | Training dataset: 2322 images Validation dataset: 291 images Testing dataset: 195 images | Cascade RCNN | Score_d (0.6 * Map + 0.4 * IoU): 0.3429 | |
| Depth estimation and 3D reconstruction | Widya et al. [ | WLI CE | Semantic segmentation | Training dataset: 7978 no-IC images and 7453 IC-sprayed images Testing dataset: 7 subjects | CycleGAN | The generated VIC images from cycleGAN achieve better results on all subjects compared to the baseline no-IC green-channel images Using the VIC images for SfM significantly improves the number of reconstructed images All reconstruction results using the VIC images achieve more than 95% of reconstructed images The triangulated 3D points also demonstrate significant improvement |
| Ozyoruk et al. [ | N/A | Semantic segmentation | A total of 42,700 images from ex vivo porcine Testing dataset: 1548 stomach frames from ex vivo porcine | spatial attention based ResNet | The RMSE values of stomach are 0.41 cm for depth estimation using Endo-SfMLearner | |
Screening of informatic frames Anatomical classification | Wu et al. [ | WLI | Image classification | Training dataset: 12,220 in vitro, 25,222 in vivo and 16,760 unqualified EGD images of over 3000 patients for training the network to identify whether a scope was in or outside the body (DCNN1). 34,513 qualified EGD images labelled into 26 different sites for training the network of classifying gastric sites (DCNN2) Testing dataset: A total of 107 stored EGD videos A total of 324 patients were recruited and randomized. 153 and 150 patients were analysed in the system-assisted group and control group, respectively | VGG-16 | The system monitored blind spots with an accuracy of 90.40% in EGD videos Blind spot rate was lower in system-assisted group compared with the control group (5.86% vs. 22.46%), |
| Xu et al. [ | WLI NBI | Image classification Object detection | Training dataset: For the classification task: 34,145 images For the detection task: 47,623 images Testing dataset: For the classification task: 6000 images For the detection task: 12,600 images | SSD | 93.74% mean average precision (Map) for the detection task 98.77% accuracy for the classification task | |
Detection of GC Anatomical classification | Wu et al. [ | NBI BLI WLI | Image classification | Training dataset: 3170 gastric cancer and 5981 benign images for detect GC 24,549 images from different parts of stomach for monitor blind spots Testing dataset: 100 gastric cancer and 100 benign images for detecting GC, 170 images for monitor blind spots | VGG-16 ResNet-50 | The DCNN identified EGC from nonmalignancy with an accuracy of 92.5%, a sensitivity of 94.0%, a specificity of 91.0%, a positive predictive value of 91.3%, and a negative predictive value of 93.8% The DCNN classified gastric locations into 10 or 26 parts with an accuracy of 90% or 65.9% |
WLI, white-light imaging; EGD, oesophagogastroduodenoscopy; AUC, area under curve; DCNN, deep convolutional neural network; NBI, narrow-band imaging; BLI, blue-light imaging; GC, gastric cancer; EGC, early gastric cancer; mAP, mean average precision; IoU, intersection over union; U-TOE, ultrathin transoral endoscopy; C-EGD, conventional oesophagogastroduodenoscopy; DL, deep learning; CE, chromoendoscopy; GI, gastrointestinal; LSTM, long short-term memory networks; IC, indigo carmine; VIC, virtual indigo carmine; SfM, structure from motion; RMSE, root mean square error
Fig. 16The statistical analysis of publications cited in this review. a The proportion of application types of deep learning to gastroscopy; b the percentage of each gastric disease in the disease-related application of deep learning; c the percentage of each nondisease-related application of deep learning; d the number of publications in this field each year