| Literature DB >> 35626433 |
Francesco Renna1,2, Miguel Martins1,2, Alexandre Neto1,3, António Cunha1,3, Diogo Libânio4, Mário Dinis-Ribeiro4, Miguel Coimbra1,2.
Abstract
Stomach cancer is the third deadliest type of cancer in the world (0.86 million deaths in 2017). In 2035, a 20% increase will be observed both in incidence and mortality due to demographic effects if no interventions are foreseen. Upper GI endoscopy (UGIE) plays a paramount role in early diagnosis and, therefore, improved survival rates. On the other hand, human and technical factors can contribute to misdiagnosis while performing UGIE. In this scenario, artificial intelligence (AI) has recently shown its potential in compensating for the pitfalls of UGIE, by leveraging deep learning architectures able to efficiently recognize endoscopic patterns from UGIE video data. This work presents a review of the current state-of-the-art algorithms in the application of AI to gastroscopy. It focuses specifically on the threefold tasks of assuring exam completeness (i.e., detecting the presence of blind spots) and assisting in the detection and characterization of clinical findings, both gastric precancerous conditions and neoplastic lesion changes. Early and promising results have already been obtained using well-known deep learning architectures for computer vision, but many algorithmic challenges remain in achieving the vision of AI-assisted UGIE. Future challenges in the roadmap for the effective integration of AI tools within the UGIE clinical practice are discussed, namely the adoption of more robust deep learning architectures and methods able to embed domain knowledge into image/video classifiers as well as the availability of large, annotated datasets.Entities:
Keywords: artificial intelligence; computer vision; convolutional neural networks; deep learning; upper GI endoscopy (UGIE)
Year: 2022 PMID: 35626433 PMCID: PMC9141387 DOI: 10.3390/diagnostics12051278
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Single-frame algorithms for anatomical landmark detection.
| Authors | Data Annotation Protocol | Dataset | Classes | Algorithm | Pre-Processing | Validation | Average Performance | AI Impact (Clinical Setting) |
|---|---|---|---|---|---|---|---|---|
| Undefined | ( | 4 sites + 3 gastric sites | GoogleLeNet | Black frame cropping | Holdout set | (4 anatomical classes) |
| |
| 2 experts with >10 years of experience | ( | 10 or 26 sites | VGG16-Resnet50 | CNN filters blurry frames | Holdout set | Accuracy: | ( | |
| 2 expert endoscopists (years of experience unknown) | ( | 10 sites + uninformative + NBI | Muli-Task Custom CNN + SSD | None | Holdout set | Average precision (mAP): |
| |
| 1 doctoral student | 3704 UGIE images (WLI+LCI frames) optimal views | 11 sites + N/A | Inception-v3 | Data-driven ROI cropping | 5-fold C.V. | Accuracy: |
| |
| 1 expert with >30 years of experience | ( | 10 sites from UGIE + 4 classes pertaining to specimens and other examinations | AlexNet | None | Holdout set | Accuracy: |
| |
| >1 endoscopist with >5 years of experience | ( | 11 sites + NBI | Custom CNN+RCF | ROI extraction + bilinear interpolation | 5-fold C.V. | Accuracy: |
| |
| Unclear | ( | 8 classes | ResNeSt | None | Holdout set | Accuracy: |
|
* Addressed in different classification tasks. ** We only consider the protocol with all the landmarks and N/A classes in this article. *** We considered only the values for the first ResNeSt, since the ampulla was divided into two categories and trained with a second model and different dataset. **** Only findings concerning blind spots were considered.
Multi-frame algorithms for anatomical landmark detection.
| Authors | Data Annotation Protocol | Dataset | Classes | Algorithm | Augmentation | Validation | Average Performance | AI Impact (Clinical Setting) |
|---|---|---|---|---|---|---|---|---|
| 2 seniors 1–5 years of experience | 26 sites + N/A | VGG-16 + DRL | None | 10-fold C.V. | Accuracy: | (Single-center randomized controlled trial) [ | ||
| 5 blinded seniors with >3 years of experience | 8 sites | SENet + positional loss | Random scaling, cropping, rotation and horizontal flip | 10-fold C.V. | Accuracy: |
| ||
| UGIE images from clinical reports | 6 sites + background | EfficientNet-b3 + thresholded sliding window with exponential decay | Random shear, scaling and translation | Holdout set and weighted oversampling | Accuracy: |
| ||
| 3 blinded seniors with 5–10 years of experience | 170,297 UGIE | 31 sites | Inception-v3 + LSTM | Random HSV jittering and corner cropping | Holdout set | Accuracy: |
|
* Measured on isolated still frames that do not necessarily belong to the same UGIE video. Background class included. ** Doctoral students only classified unqualified frames in the N/A class for the anatomical site classification task. *** Only findings regarding blind spots in conventional EGD were included; human level of expertise unknown.
Lesion detection algorithms.
| Authors | Classes | Algorithm | Dataset | Image Modality | Results | AI Impact (Clinical Setting) |
|---|---|---|---|---|---|---|
| Non atrophic gastritis | DenseNet121 | WLI | Accuracy:94% |
| ||
| EGC | SSD | ( | WLI | Sensitivity: 92% |
| |
| non-EGC | VGG-16 | WLI | Specificity: 98% |
| ||
| Intestinal Metaplasia (IM) | Xception | ( | NBI | EfficientNetB4: |
| |
| neoplastic lesions | YOLO v3 and ResNet50 | WLI | Accuracy: 89%; Sensitivity 92%; Specificity 88% | (Multi-center prospective controlled trial) |
Lesion characterization algorithms.
| Authors | Classes | Algorithm | Dataset | Image Modality | Results | AI Impact (Clinical Setting) |
|---|---|---|---|---|---|---|
| Invasion depth: | ResNet-50 | WLI | EGC invasion WLI: | (Multi-center prospective controlled trial) | ||
| T1a | VGG-16 | WLI | Specificity: 75% Sensitivity: 82% |
| ||
| M-SM1 | ResNet-50 | WLI | WLI Accuracy: 95% |
| ||
| P0 (M or SM1) | ResNet-50 | WLI | Accuracy: 89% | CNN | ||
| GA | VGG-16 | Magnifying NBI | GA | (Multi-center Prospective blinded trial) |
* An external dataset of 1526 images has been used to test the performance of the model.
Figure 1Examples of key anatomical landmarks, containing (from left to right, top to bottom): the middle esophagus, gastroesophageal (GEJ) junction and Z-Line, second portion (D2) of duodenum, antrum, pylorus, the transition from the antrum in the gastric body, the incisura, fundus and cardia in retroflexed view (R) and the body in retroflex view (R).
Figure 2(Top) Convolutional single-frame algorithm versus (Bottom) recurrent multi-frame algorithm. In single-frame algorithms, frames sampled from UGIE images are processed independently. In multi-frame approaches, the frames sampled from UGIE videos are processed in sequence.
Figure 3The single-frame classifier pipeline proposed by Wu et al. [40]. Note that there are two separate classification objectives with respect to anatomical locations: one considering 10 anatomical sites and another considering 26 anatomical sites (adapted from [40]).
Figure 4The DRL scheme adopted in [45]. There are 27 possible states concerning anatomical locations and N/A. The last row shows the observed classes until time t. At each timestep, a new prediction is displayed on the top row, with the color-coded confidence score. The lighter the shade, the higher the confidence (adapted from [45]).
Figure 5(Left) An early gastric cancer lesion missed by an endoscopist and (Right) correctly identified by the convolutional neural network by bounding box regression model by [52,53]. Note that, not only is the CNN’s prediction cued from subtle textural changes in the mucosa, but also that the output is a plausible bounding box around it. (Adapted from data kindly provided by Ishioka et al. [52]).
Figure 6Schematic illustration of the method proposed in [63] that uses a ResNet50 model to classify the gastric lesion invasion depth in WLI, NBI, and indigo carmine images (adapted from [63]).