| Literature DB >> 31011253 |
Young Joo Yang1, Chang Seok Bang2.
Abstract
Artificial intelligence (AI) using deep-learning (DL) has emerged as a breakthrough computer technology. By the era of big data, the accumulation of an enormous number of digital images and medical records drove the need for the utilization of AI to efficiently deal with these data, which have become fundamental resources for a machine to learn by itself. Among several DL models, the convolutional neural network showed outstanding performance in image analysis. In the field of gastroenterology, physicians handle large amounts of clinical data and various kinds of image devices such as endoscopy and ultrasound. AI has been applied in gastroenterology in terms of diagnosis, prognosis, and image analysis. However, potential inherent selection bias cannot be excluded in the form of retrospective study. Because overfitting and spectrum bias (class imbalance) have the possibility of overestimating the accuracy, external validation using unused datasets for model development, collected in a way that minimizes the spectrum bias, is mandatory. For robust verification, prospective studies with adequate inclusion/exclusion criteria, which represent the target populations, are needed. DL has its own lack of interpretability. Because interpretability is important in that it can provide safety measures, help to detect bias, and create social acceptance, further investigations should be performed.Entities:
Keywords: Artificial intelligence; Computer-assisted; Convolutional neural network; Deep-learning; Endoscopy; Gastroenterology
Mesh:
Year: 2019 PMID: 31011253 PMCID: PMC6465941 DOI: 10.3748/wjg.v25.i14.1666
Source DB: PubMed Journal: World J Gastroenterol ISSN: 1007-9327 Impact factor: 5.742
Figure 1Schematic graphical summary for artificial intelligence, machine learning and deep learning development. A: Definition of artificial intelligence, machine learning (ML) and deep learning (DL). B: Comparison of process between classic ML and DL. C: Modes of learning and examples of ML.
Artificial intelligence terminology
| Artificial intelligence | Machine intelligence that has cognitive functions similar to those of humans such as “learning” and “problem solving.” |
| Machine learning | Mathematical algorithms which is automatically built from given data (known as input training data) and predicts or makes decisions in uncertain conditions without being explicitly programmed |
| Support vector machines | Discriminative classifier formally defined by an optimizing hyperplane with the largest functional margin |
| Artificial neural networks | Multilayered interconnected network which consists of an input, hidden connection (between the input and output layer), and output layer |
| Deep learning | Subset of machine learning technique that composed of multiple-layered neural network algorithms |
| Convolutional neural networks | Specific class of artificial neural networks that consists of (1) convolutional and pooling layers, which are the two main components to extract distinct features; and (2) fully connected layers to make an overall classification |
| Overfitting | Modelling error which occurs when a certain learning model tailors itself too much on the training dataset and predictions are not well generalized to new datasets |
| Spectrum bias | Systematic error occurs when the dataset used for model development does not adequately represent or reflect the range of patients who will be applied in clinical practice (target population) |
Summary of clinical studies using artificial intelligence for recognition of diagnosis and prediction of prognosis
| Pace et al[ | 2005 | Diagnosis of gastroesophageal reflux disease | Retrospective | 159 patients (10 times cross validation) | “backpropagation” ANN | 101/clinical variables | Accuracy: 100% |
| Lahner et al[ | 2005 | Recognition of atrophic corpus gastritis | Retrospective | 350 patients (subdivided several times into training and test set equally) | ANN | 37 to 3 /clinical and biochemical variables (experiment 1 to 5) | Accuracy: 96.6%, 98.8%, 98.4%, 91.3% and 97.7% (experiment 1-5, respectively) |
| Pofahl et al[ | 1998 | Prediction of length of stay for patients with acute pancreatitis | Retrospective | 195 patients (training set: 156, test set: 39) | “backpropagation” ANN | 71/clinical variables | Sensitivity: 75 % (for prediction of a length of stay more than 7 d) |
| Das et al[ | 2003 | Prediction of outcomes in acute lower gastrointestinal bleeding | Prospective | 190 patients (training set: 120, internal validation set: 70, external validation set: 142) | ANN | 26/clinical variables | Accuracy (external validation set): 97% for death, 93% for, recurrent bleeding, 94% for need for intervention |
| Sato et al[ | 2005 | Prediction of 1-year and 5-year survival of esophageal cancer | Retrospective | 418 patients (training-: validation-: test set = 53%: 27%: 20%) | ANN | 199/ clinicopathologic, biologic, and genetic variables | AUROC for 1 year- and 5 year survival prediction: 0.883 and 0.884, respectively |
| Rotondano et al[ | 2011 | Prediction of mortality in nonvariceal upper gastrointestinal bleeding | Prospective, multicenter | 2380 patients (5 × 2 cross-validation) | ANN | 68/clinical variables | Accuracy: 96.8%, AUROC: 0.95, sensitivity: 83.8%, specificity: 97.5%, |
| Takayama et al[ | 2015 | Prediction of prognosis in ulcerative colitis after cytoapheresis therapy | Retrospective | 90 patients (training set: 54, test set: 36) | ANN | 13/clinical variables | Sensitivity: 96.0%, specificity: 97.0% |
| Hardalaç et al[ | 2015 | Prediction of mucosal healing by azathioprine therapy in IBD | Retrospective | 129 patients (training set: 103, validation set: 13, test set: 13) | “feed-forward back-propagation” and “cascade-forward” ANN | 6/clinical variables | Total correct classification rate: 79.1% |
| Peng et al[ | 2015 | Prediction of frequency of onset, relapse, and severity of IBD | Retrospective | 569 UC and 332 CD patients (training set: data from 2003-2010, validation set: data in 2011) | ANN | 5/meteorological data | Accuracy in predicting the frequency of relapse of IBD (mean square error = 0.009, mean absolute percentage error = 17.1%) |
| Ichimasa et al[ | 2018 | Prediction of lymph node metastasis, thus minimizing the need for additional surgery in T1 colorectal cancer | Retrospective | 690 patients (training set: 590, validation set: 100) | SVM | 45/ Clinicopathological variables | Accuracy: 69%, sensitivity: 100%, specificity: 66% |
| Yang et al[ | 2013 | Prediction of postoperative distant metastasis in esophageal squamous cell carcinoma | Retrospective | 483 patients (training set: 319, validation set: 164) | SVM | 30/7 clinicopathological variables and 23 immunomarkers | Accuracy: 78.7% sensitivity: 56.6%, specificity: 97.7%, PPV: 95.6%, NPV: 72.3% |
AI: Artificial intelligence; ANN: Artificial neural network; AUROC: Area under receiver operating characteristic; IBD: Inflammatory bowel disease; UC: Ulcerative colitis; CD: Crohn’s disease; SVM: Support vector machine; PPV: Positive predictive value; NPV: Negative predictive value.
Summary of clinical studies using artificial intelligence in the upper gastrointestinal field
| Takiyama et al[ | 2018 | Recognition of anatomical locations of EGD images | Retrospective | Training set: 27335 images from 1750 patients. Validation set: 17081 images from 435 patients | CNN | White-light endoscopy | AUROCs: 1.00 for the larynx and esophagus, and 0.99 for the stomach and duodenum recognition |
| van der Sommen et al[ | 2016 | Discrimination of early neoplastic lesions in Barrett’s esophagus | Retrospective | 100 endoscopic images from 44 patients (leave-one-out cross-validation on a per-patient basis) | SVM | White-light endoscopy | Sensitivity: 83%, specificity: 83% (per-image analysis) |
| Swager et al[ | 2017 | Identification of early Barrett’s esophagus neoplasia on ex vivo volumetric laser endomicroscopy images. | Retrospective | 60 volumetric laser endomicroscopy images | Combination of several methods (SVM, discriminant analysis, AdaBoost, random forest, | Ex vivo volumetric laser endomicroscopy | Sensitivity: 90%, specificity: 93% |
| Kodashima et al[ | 2007 | Discrimination between normal and malignant tissue at the cellular level in the esophagus | Prospective | 10 patients | ImageJ program | Endocytoscopy | Difference in the mean ratio of total nuclei to the entire selected field, 6.4 ± 1.9% in normal tissues and 25.3 ± 3.8% in malignant samples |
| Shin et al[ | 2015 | Diagnosis of esophageal squamous dysplasia | Prospective, multicenter | 375 sites from 177 patients (training set: 104 sites, test set: 104 sites, validation set: 167 sites) | Linear discriminant analysis | HRME | Sensitivity: 87%, specificity: 97% |
| Quang et al[ | 2016 | Diagnosis of esophageal squamous cell neoplasia | Retrospective, multicenter | Same data from reference number 26 | Linear discriminant analysis | Tablet-interfaced HRME | Sensitivity: 95%, specificity: 91% |
| Horie et al[ | 2019 | Diagnosis of esophageal cancer | Retrospective | Training set: 8428 images from 384 patients. Test set: 1118 images from 97 patients | CNN | White-light endoscopy with NBI | Sensitivity 98% |
| Huang et al[ | 2004 | Diagnosis of | Prospective | Training set: 30 patients. Test set: 74 patients | Refined feature selection with neural network | White-light endoscopy | Sensitivity: 85.4%, specificity: 90.9% |
| Shichijo et al[ | 2017 | Diagnosis of | Retrospective | Training set: CNN1: 32208 images; CNN2: images classified according to 8 different locations in the stomach. Test set: 11481 images from 397 patients | CNN | White-light endoscopy | Accuracy: 87.7%, sensitivity: 88.9%, specificity: 87.4%, diagnostic time: 194 s. |
| Itoh et al[ | 2018 | Diagnosis of | Prospective | Training set: 149 images (596 images through data augmentation. Test set: 30 images | CNN | White-light endoscopy | AUROC: 0.956, sensitivity: 86.7%, specificity: 86.7%, |
| Nakashima et al[ | 2018 | Diagnosis | Prospective pilot | 222 patients (training set: 162, test set: 60) | CNN | White-light endoscopy and image-enhanced endoscopy, such as blue laser imaging-bright and linked color imaging | AUROC: 0.96 (blue laser imaging-bright), 0.95 (linked color imaging) |
| Kubota et al[ | 2012 | Diagnosis of depth of invasion in gastric cancer | Retrospective | 902 images (10 times cross validation) | “backpropagation” ANN | White-light endoscopy | Accuracy: 77.2%, 49.1%, 51.0%, and 55.3% for T1-4 staging, respectively |
| Hirasawa et al[ | 2018 | Detection of gastric cancers | Retrospective | Training set: 13584 images. Test set: 2296 images. | CNN | White-light endoscopy, chromoendoscopy, NBI | Sensitivity: 92.2%, detection rate with a diameter of 6 mm or more: 98.6% |
| Zhu et al[ | 2018 | Diagnosis of depth of invasion in gastric cancer (mucosa/SM1/deeper than SM1) | Retrospective | Training set: 790 images. Test set: 203 images | CNN | White-light endoscopy | Accuracy: 89.2%, AUROC: 0.94, sensitivity: 74.5%, specificity: 95.6% |
| Kanesakaet al[ | 2018 | Diagnosis of early gastric cancer using magnifying NBI images | Retrospective | Training set: 126 images. Test set: 81 images | SVM | Magnifying NBI | Accuracy: 96.3%, sensitivity: 96.7%, specificity: 95%, PPV: 98.3%, |
| Gatos et al[ | 2017 | Diagnosis of chronic liver disease | Retrospective | 126 patients (56 healthy controls, 70 with chronic liver disease | SVM | Ultrasound shear wave elastography imaging with a stiffness value-clustering | AUROC: 0.87, highest accuracy: 87.3%, sensitivity: 93.5%, specificity: 81.2% |
| Kuppili et al[ | 2017 | Detection and characterization of fatty liver | Prospective | 63 patients who underwent liver biopsy (10 times cross validation) | Extreme Learning Machine to train single-layer feed-forward neural network | Ultrasound liver images | Accuracy: 96.75%, AUROC: 0.97 (validation performance) |
| Liu et al[ | 2017 | Diagnosis of liver cirrhosis | Retrospective | 44 images from controls and 47 images from patients with cirrhosis | SVM | Ultrasound liver capsule images | AUROC: 0.951 |
AI: Artificial intelligence; EGD: Esophagogastroduodenoscopy; CNN: Convolutional neural network; AUROC: Area under receiver operating characteristic; SVM: Support vector machine; HRME: High-resolution microendoscopy; NBI: Narrow band image; H. pylori: Helicobacter pylori; ANN: Artificial neural network; PPV: Positive predictive value.
Summary of clinical studies using artificial intelligence in the lower gastrointestinal field
| Fernandez-Esparrach et al[ | 2016 | Detection of colonic polyps | Retrospective | 24 videos containing 31 polyps | Window Median Depth of Valleys Accumulation maps | White-light colonoscopy | Sensitivity: 70.4%. Specificity: 72.4% |
| Misawa et al[ | 2018 | Detection of colonic polyps | Retrospective | 546 short videos (training set: 105 polyp-positive videos and 306 polyp-negative videos, test set: 50 polyp-positive videos and 85 polyp-negative videos) from 73 full length videos | CNN | White-light colonoscopy | Accuracy: 76.5%. Sensitivity: 90.0%. Specificity: 63.3%. |
| Urban et al[ | 2018 | Detection of colonic polyps | Retrospective | 8641 images with 20 colonoscopy videos | CNN | White-light colonoscopy with NBI | Accuracy: 96.4%. AUROC: 0.991 |
| Klare et al[ | 2019 | Detection of colonic polyps | Prospective | 55 patients | Automated polyp detection software | White-light colonoscopy | Polyp detection rate: 50.9%. Adenoma detection rate: 29.1% |
| Wang et al[ | 2018 | Detection of colonic polyps | Retrospective | Training set: 5545 images from 1290 patients. Validation set A: 27113 images from 1138 patients. Validation set B: 612 images. Validation set C: 138 video clips from 110 patients. Validation set D: 54 videos from 54 patients | CNN | White-light colonoscopy | Dataset A: AUROC: 0.98 for at least one polyp detection, per-image sensitivity: 94.4%, per-image specificity: 95.2%. Dataset B: per-image sensitivity: 88.2%. Dataset C: per-image sensitivity: 91.6%, per-polyp sensitivity: 100%. Dataset D: per-image specificity: 95.4% |
| Tischendort et al[ | 2010 | Classification of colorectal polyps on the basis of vascularization features. | Prospective pilot | 209 polyps from 128 patients | SVM | Magnifying NBI images | Accurate classification rate: 91.9% |
| Gross et al[ | 2011 | Differentiation of small colonic polyps of < 10 mm | Prospective | 434 polyps from 214 patients | SVM | Magnifying NBI images | Accuracy: 93.1%. Sensitivity: 95.0%. Specificity: 90.3%. |
| Takemura et al[ | 2010 | Classification of pit patterns | Retrospective | Training set: 72 images. Validation set: 134 images | HuPAS software version 1.3 | Magnifying endoscopic images with crystal violet staining | Accuracies of the type I, II, IIIL, and IV pit patterns of colorectal lesions: 100%, 100%, 96.6%, and 96.7%, respectively |
| Takemura et al[ | 2012 | Classification of histology of colorectal tumors | Retrospective | Training set: 1519 images. Validation set: 371 images | HuPAS software version 3.1 using SVM | Magnifying NBI images | Accuracy: 97.8% |
| Kominami et al[ | 2016 | Classification of histology of colorectal polyps | Prospective | Training set: 2247 images from 1262 colorectal lesion. Validation: 118 colorectal lesions | SVM with logistic regression | Magnifying NBI images | Accuracy: 93.2%, Sensitivity: 93.0%, Specificity: 93.3%, PPV: 93%, NPV: 93.3% |
| Byrne et al[ | 2017 | Differentiation of histology of diminutive colorectal polyps | Retrospective | Training set: 223 videos, Validation set: 40 videos. Test set: 125 videos | CNN | NBI video frames | Accuracy: 94%, Sensitivity: 98%, Specificity: 83% |
| Chen et al[ | 2018 | Identification of neoplastic or hyperplastic polyps of < 5 mm | Retrospective | Training set: 2157 images. Test set: 284 images | CNN | Magnifying NBI images | Sensitivity: 96.3%, specificity: 78.1%, PPV: 89.6%, NPV: 91.5% |
| Komeda et al[ | 2017 | Discrimination adenomas from non-adenomatous polyps | Retrospective | 1200 images from the endoscopic videos (10 times cross validation) | CNN | White-light colonoscopy with NBI and chromoendoscopy | Accuracy in validation: 75.1% |
| Mori et al[ | 2015 | Discrimination of neoplastic changes in small polyps | Retrospective | Test set: 176 polyps form 152 patients | Multivariate regression analysis | Endocytoscopy | Accuracy: 89.2%, Sensitivity: 92.0% |
| Mori et al[ | 2016 | Development of 2nd generation model, which was mentioned in reference number 56 | Retrospective | Test set: 205 small colorectal polyps (≤ 10 mm) from 123 patients | SVM | Endocytoscopy | Accuracy: 89% for both diminutive(< 5 mm) and small (< 10 mm) polyps |
| Misawa et al[ | 2016 | Diagnosis of colorectal lesions using microvascular findings | Retrospective | Training set: 979 images, validation set: 100 images | SVM | Endocytoscopy with NBI | Accuracy: 90% |
| Mori et al[ | 2018 | Diagnosis of neoplastic diminutive polyp | Prospective | 466 diminutive polyps from 325 patients | SVM | Endocytoscopy with NBI and stained images | Prediction rate: 98.1% |
| Takeda et al[ | 2017 | Diagnosis of invasive colorectal cancer | Retrospective | Training set: 5543 images from 238 lesions. Test set: 200 images | SVM | Endocytoscopy with NBI and stained images | Accuracy: 94.1% Sensitivity: 89.4%, Specificity: 98.9%, PPV: 98.8%, NPV: 90.1% |
| Maeda et al[ | 2018 | Prediction of persistent histologic inflammation in ulcerative colitis patients | Retrospective | Training set: 12900 images.Test set: 9935 images | SVM | Endocytoscopy with NBI | Accuracy: 91%, Sensitivity: 74%, Specificity: 97% |
AI: Artificial intelligence; CNN: Convolutional neural network; NBI: Narrow band image; AUROC: Area under receiver operating characteristic; SVM: Support vector machine; PPV: Positive predictive value; NPV: Negative predictive value.
Summary of clinical studies using artificial intelligence in the capsule endoscopy
| Leenhardt et al[ | 2019 | Detection of gastrointestinal angiectasia | Retrospective | 600 control images and 600 typical angiectasia images (divided equally into training and test datasets) | CNN | Sensitivity: 100%, specificity: 96%, PPV: 96%, NPV: 100%. |
| Zhou et al[ | 2017 | Classification of celiac disease | Retrospective | Training set: 6 celiac disease patients, 5 controls. Test set: additional 5 celiac disease patients, 5 controls | CNN | Sensitivity: 100%, specificity: 100% (for test dataset) |
| He et al[ | 2018 | Detection of intestinal hookworms | Retrospective | 440000 images | CNN | Sensitivity: 84.6%, specificity: 88.6% |
| Seguí et al[ | 2016 | Characterization of small intestinal motility | Retrospective | 120000 images (training set: 100000, test set: 20000) | CNN | Mean classification accuracy: 96% |
AI: Artificial intelligence; CNN: Convolutional neural network; PPV: Positive predictive value; NPV: Negative predictive value.
Figure 2Interpretability-accuracy tradeoff in classification algorithms of machine learning.