| Literature DB >> 35318364 |
Lukas Buendgens1, Didem Cifci1, Narmin Ghaffari Laleh1, Marko van Treeck1, Maria T Koenen1,2, Henning W Zimmermann1, Till Herbold3, Thomas Joachim Lux4, Alexander Hann4, Christian Trautwein1, Jakob Nikolas Kather5,6,7.
Abstract
Artificial intelligence (AI) is widely used to analyze gastrointestinal (GI) endoscopy image data. AI has led to several clinically approved algorithms for polyp detection, but application of AI beyond this specific task is limited by the high cost of manual annotations. Here, we show that a weakly supervised AI can be trained on data from a clinical routine database to learn visual patterns of GI diseases without any manual labeling or annotation. We trained a deep neural network on a dataset of N = 29,506 gastroscopy and N = 18,942 colonoscopy examinations from a large endoscopy unit serving patients in Germany, the Netherlands and Belgium, using only routine diagnosis data for the 42 most common diseases. Despite a high data heterogeneity, the AI system reached a high performance for diagnosis of multiple diseases, including inflammatory, degenerative, infectious and neoplastic diseases. Specifically, a cross-validated area under the receiver operating curve (AUROC) of above 0.70 was reached for 13 diseases, and an AUROC of above 0.80 was reached for two diseases in the primary data set. In an external validation set including six disease categories, the AI system was able to significantly predict the presence of diverticulosis, candidiasis, colon and rectal cancer with AUROCs above 0.76. Reverse engineering the predictions demonstrated that plausible patterns were learned on the level of images and within images and potential confounders were identified. In summary, our study demonstrates the potential of weakly supervised AI to generate high-performing classifiers and identify clinically relevant visual patterns based on non-annotated routine image data in GI endoscopy and potentially other clinical imaging modalities.Entities:
Mesh:
Year: 2022 PMID: 35318364 PMCID: PMC8941159 DOI: 10.1038/s41598-022-08773-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Target diseases and classifier performance for the gastroscopy dataset.
| ICD | Diagnosis | AUROC mean | AUROC 95% CI | N pos exams | N neg. exams | p-val |
|---|---|---|---|---|---|---|
| K29 | Gastritis and duodenitis | 0.698 | 0.007 | 9740 | 19,766 | 0.0000 |
| K44 | Diaphragmatic hernia | 0.637 | 0.029 | 3725 | 25,781 | 0.0000 |
| K21 | Gastro-oesophageal reflux disease | 0.619 | 0.023 | 3497 | 26,009 | 0.0000 |
| K22 | Other diseases of oesophagus | 0.621 | 0.023 | 2619 | 26,887 | 0.0000 |
| R10 | Abdominal and pelvic pain | 0.672 | 0.033 | 1419 | 28,087 | 0.0000 |
| K25 | Gastric ulcer | 0.613 | 0.022 | 772 | 28,734 | 0.0000 |
| K26 | Duodenal ulcer | 0.694 | 0.039 | 626 | 28,880 | 0.0000 |
| K31 | Other diseases of stomach and duodenum | 0.612 | 0.059 | 401 | 29,105 | 0.0001 |
| I85 | Oesophageal varices | 0.650 | 0.003 | 332 | 29,174 | 0.0000 |
| R12 | Heartburn | 0.587 | 0.107 | 312 | 29,194 | 0.0420 |
| R11 | Nausea and vomiting | 0.648 | 0.068 | 279 | 29,227 | 0.0000 |
| C16 | Malignant neoplasm of stomach | 0.693 | 0.066 | 245 | 29,261 | 0.0000 |
| K92 | Other diseases of digestive system | 0.697 | 0.030 | 222 | 29,284 | 0.0000 |
| D13 | Benign neoplasm of other and ill-defined parts of digestive system | 0.593 | 0.142 | 210 | 29,296 | 0.0948 |
| D37 | Neoplasm of uncertain or unknown behaviour of oral cavity and digestive organs | 0.595 | 0.092 | 136 | 29,370 | 0.0728 |
| D48 | Neoplasm of uncertain or unknown behaviour of other and unspecified sites | 0.512 | 0.131 | 127 | 29,379 | 0.5610 |
| K57 | Diverticular disease of intestine | 0.543 | 0.196 | 93 | 29,413 | 0.4163 |
| K20 | Oesophagitis | 0.640 | 0.066 | 86 | 29,420 | 0.0136 |
All targets which reached an area under the receiver operating curve (AUC, mean ± standard deviation [std]) of above 0.70 are highlighted in bold. N pos./neg. exams = number of positive/negative examinations (with and without diagnosis, respectively). P-val. P-value for examination scores between groups.
Target diseases and classifier performance for the colonoscopy dataset.
| ICD | Diagnosis | AUROC mean | AUROC 95% CI | N pos exams | N neg. exams | p-val |
|---|---|---|---|---|---|---|
| K57 | Diverticular disease of intestine | 0.691 | 0.020 | 3403 | 15,538 | 0.0000 |
| D12 | Benign neoplasm of colon, rectum, anus and anal canal | 0.694 | 0.013 | 3192 | 15,749 | 0.0000 |
| K64 | Haemorrhoids and perianal venous thrombosis | 0.613 | 0.033 | 1657 | 17,284 | 0.0000 |
| C18 | Malignant neoplasm of colon | 0.686 | 0.051 | 332 | 18,609 | 0.0000 |
| A09 | Other gastroenteritis and colitis of infectious and unspecified origin | 0.639 | 0.063 | 247 | 18,694 | 0.0001 |
| K92 | Other diseases of digestive system | 0.646 | 0.073 | 239 | 18,702 | 0.0000 |
| K60 | Fissure and fistula of anal and rectal regions | 0.593 | 0.028 | 171 | 18,770 | 0.0233 |
| K55 | Vascular disorders of intestine | 0.620 | 0.067 | 151 | 18,790 | 0.0085 |
| K62 | Other diseases of anus and rectum | 0.574 | 0.163 | 130 | 18,811 | 0.2370 |
| R19 | Other symptoms and signs involving the digestive system and abdomen | 0.551 | 0.107 | 114 | 18,827 | 0.3459 |
| L30 | Other dermatitis | 0.542 | 0.152 | 62 | 18,879 | 0.4652 |
All targets which reached an area under the receiver operating curve (AUC, mean ± standard deviation [std]) of above 0.70 are highlighted in bold. N pos./neg. exams = number of positive/negative examinations (with and without diagnosis, respectively). P-val. P-value for examination scores between groups.
Figure 1Outline of this study. (A) From a large clinical database of gastroscopy and colonoscopy examination, we retrieved images (left) and matched them to disease diagnoses obtained from the corresponding reports (right). (B) Images were slightly preprocessed by cropping, resizing and color-normalization by adaptive histogram equalization. (C) The experimental workflow relied on examination-level three-fold cross-validation, in which three artificial intelligence (AI) networks were trained for each disease category and the test set was rotated. (D) In the weakly supervised AI workflow, all images from a given examination (Exam) inherited the one-hot-encoded disease label. No image-level or pixel-level annotations were used.
Figure 2Classification performance for end-to-end artificial intelligence in the gastroscopy and colonoscopy data set. (A) Classifier performance in the gastroscopy data set as measured as the mean area under the receiver operating curve (AUROC) of three cross-validation experiments for each disease category with 95% confidence intervals. (B) Receiver operating curve (ROC) for C15, (C) for K91, (D) for B37 and (E) for R13. (F) AUROCs for all disease categories in the colonoscopy data set and ROCs for (G) disease category I84, (H) for K91, (I) for K51 and (J) for C20.
Figure 3Highly scoring images identified by the artificial intelligence system in the gastroscopy and colonoscopy data set. The two highest scoring images for the eight highest scoring patients are shown for the gastroscopy dataset: (A) C16, (B) B37, and the colonoscopy data set: (C) K57 and (D) D12.
Figure 4Importance maps for highly predictive image regions in the gastroscopy dataset. (A) Representative images with corresponding importance maps (generated by GradCAM) for C16 (gastric cancer). (B) Representative images for B37 (candidiasis). (C) Representative images for K29 (Gastritis and duodenitis). White circles indicate corresponding areas in the original image and the GradCAM heatmap.
Results of the reader study in the gastroscopy data set.
| ICD | Diagnosis | Typical | Not typical | Missing | Device | Artifact |
|---|---|---|---|---|---|---|
| B37 | Candidiasis | 0.68 | 0.18 | 0.03 | 0.07 | 0.03 |
| K26 | Duodenal ulcer | 0.60 | 0.23 | 0.00 | 0.15 | 0.02 |
| K20 | Oesophagitis | 0.57 | 0.32 | 0.02 | 0.08 | 0.02 |
| K22 | Other diseases of oesophagus | 0.55 | 0.15 | 0.22 | 0.05 | 0.03 |
| C15 | Malignant neoplasm of oesophagus | 0.50 | 0.13 | 0.13 | 0.18 | 0.05 |
| K29 | Gastritis and duodenitis | 0.48 | 0.23 | 0.03 | 0.18 | 0.07 |
| C16 | Malignant neoplasm of stomach | 0.48 | 0.33 | 0.05 | 0.13 | 0.00 |
| K25 | Gastric ulcer | 0.47 | 0.33 | 0.10 | 0.08 | 0.02 |
| K92 | Other diseases of digestive system | 0.47 | 0.35 | 0.03 | 0.15 | 0.00 |
| D13 | Benign neoplasm of other and ill-defined parts of digestive system | 0.42 | 0.32 | 0.07 | 0.18 | 0.02 |
| K21 | Gastro-oesophageal reflux disease | 0.42 | 0.38 | 0.02 | 0.18 | 0.00 |
| CXX | Malignant neoplasm of oesophagus or stomach | 0.40 | 0.17 | 0.12 | 0.28 | 0.03 |
| R11 | Nausea and vomiting | 0.38 | 0.37 | 0.08 | 0.13 | 0.03 |
| K31 | Other diseases of stomach and duodenum | 0.37 | 0.35 | 0.02 | 0.22 | 0.05 |
| D37 | Neoplasm of uncertain/unknown behaviour, oral cavity/digestive organs | 0.37 | 0.43 | 0.02 | 0.17 | 0.02 |
| R13 | Dysphagia | 0.28 | 0.47 | 0.12 | 0.10 | 0.03 |
| I85 | Oesophageal varices | 0.27 | 0.48 | 0.03 | 0.20 | 0.02 |
| D48 | Neoplasm of uncertain/unknown behaviour of other/unspecified sites | 0.25 | 0.43 | 0.07 | 0.23 | 0.02 |
| K57 | Diverticular disease of intestine | 0.20 | 0.68 | 0.00 | 0.10 | 0.02 |
| R12 | Heartburn | 0.18 | 0.65 | 0.05 | 0.10 | 0.02 |
| K91 | Postprocedural disorders of digestive system, not elsewhere classified | 0.17 | 0.28 | 0.10 | 0.42 | 0.03 |
| K44 | Diaphragmatic hernia | 0.12 | 0.57 | 0.00 | 0.30 | 0.02 |
| R10 | Abdominal and pelvic pain | 0.10 | 0.75 | 0.08 | 0.05 | 0.02 |
For each diagnosis category in the gastroscopy dataset, the AI system selected 60 highly predictive images (2 highest scoring images from 10 highest scoring patients in each of the 3 folds). These images were manually classified into five categories by a trained observer. The fraction of images in each category in every category is shown.
Figure 5Classification performance in the external validation set. (A) Receiver operating curves (ROC) with area under the curve (AUC) in the colonoscopy validation set for K57 (diverticulosis), C18 (colon cancer) and C20 (rectal cancer). (B) ROC in the gastroscopy validation set for B37 (candidiasis), C15 (esophageal cancer) and C16 (gastric cancer). (C) Highest scoring images (two each) in highest scoring patients (eight) as selected by the AI model from the validation set for K57 (diverticulosis) (D) and B37 (candidiasis).