| Literature DB >> 35463023 |
Lijia Wang1, Liping Chen1, Xianyuan Wang2, Kaiyuan Liu2, Ting Li2, Yue Yu2, Jian Han1, Shuai Xing1, Jiaxin Xu1, Dean Tian1, Ursula Seidler3, Fang Xiao1.
Abstract
Objective: Evaluation of the endoscopic features of Crohn's disease (CD) and ulcerative colitis (UC) is the key diagnostic approach in distinguishing these two diseases. However, making diagnostic differentiation of endoscopic images requires precise interpretation by experienced clinicians, which remains a challenge to date. Therefore, this study aimed to establish a convolutional neural network (CNN)-based model to facilitate the diagnostic classification among CD, UC, and healthy controls based on colonoscopy images.Entities:
Keywords: Crohn’s disease; artificial intelligence; classification; colonoscopy image; convolutional neural network; deep learning; inflammatory bowel disease; ulcerative colitis
Year: 2022 PMID: 35463023 PMCID: PMC9024394 DOI: 10.3389/fmed.2022.789862
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
FIGURE 1Overall study design. The main processes involved are eligible colonoscopy image set construction, model development, and final evaluation the performance between CNNs and clinicians. CD, Crohn’s disease; UC, ulcerative colitis; IBD, inflammatory bowel disease; PPV, positive predictive value; NPV, negative predictive value.
FIGURE 2Proposed convolutional neural network for colonoscopy image classification with ResNeXt-101 residual network architecture. A layer is shown as (# in channels, filter size, # out channels). Conv, convolutional layer; AvgPool, average pool; FC, full connected layer; CD, Crohn’s disease; UC, ulcerative colitis.
Demographic characteristics of CD patients, UC patients and healthy controls.
| Variables | CD patients | UC patients | Healthy controls |
| ( | ( | ( | |
| Gender, | |||
| Male | 163 (75.12) | 179 (64.16) | 56 (56.00) |
| Female | 54 (24.88) | 100 (35.84) | 44 (44.00) |
| Age, Median (IQR), y | 28 (22–35) | 45 (31–54) | 42 (34–51) |
| Montreal classification, | |||
| UC extent | |||
| E1 Proctitis | NA | 45 (16.13) | NA |
| E2 Left-sided colitis | NA | 92 (32.97) | NA |
| E3 Extensive colitis | NA | 142 (50.90) | NA |
| Age at diagnosis (A), | |||
| A1 16 years or younger | 18 (8.29) | NA | NA |
| A2 17–40 years | 159 (73.27) | NA | NA |
| A3 Over 40 years | 40 (18.43) | NA | NA |
| Location (L), | |||
| L1 Terminal ileum | 6 (2.76) | NA | NA |
| L2 Colon | 96 (44.24) | NA | NA |
| L3 Ileocolon | 115 (53.00) | NA | NA |
| L4 Upper GI | 0 | NA | NA |
| Behavior (B), | |||
| B1 Non-stricturing, non-penetrating | 129 (59.45) | NA | NA |
| B2 Stricturing | 69 (31.80) | NA | NA |
| B3 Penetrating | 19 (8.76) | NA | NA |
| P Perianal disease modifier | 12 (5.5) | NA | NA |
CD, Crohn’s disease; UC, ulcerative colitis; IQR, interquartile range; GI, gastrointestinal; NA, not applicable.
Diagnostic performance of the CNN model and clinicians in classifying CD, UC or normal on endoscopic images in the test dataset.
| The CNN model | Clinician 1 | Clinician 2 | Clinician 3 | Clinician 4 | Clinician 5 | Clinician 6 | |
|
| |||||||
| Accuracy | 92.39 (90.88–93.67) | 91.70 (90.14–93.04) | 81.28 (79.16–83.23) | 87.24 (85.39–88.89) | 78.53 (76.31–80.59) | 73.53 (71.17–75.76) | 86.90 (85.03–88.57) |
| Sensitivity | 87.53 (84.16–90.28) | 86.28 (82.80–89.16) | 76.92 (72.84–80.56) | 65.49 (61.03–69.70) | 36.38 (32.10–40.88) | 26.82 (22.96–31.06) | 80.04 (76.13–83.46) |
| Specificity | 94.78 (93.14–96.05) | 94.37 (92.69–95.69) | 83.42 (80.90–85.67) | 97.95 (96.79–98.71) | 99.28 (98.46–99.68) | 96.52 (95.12–97.54) | 90.28 (88.21–92.03) |
| PPV | 89.19 (85.95–91.77) | 88.30 (84.96–90.99) | 69.55 (65.41–73.40) | 94.03 (90.78–96.22) | 96.15 (91.91–98.30) | 79.14 (71.94–84.94) | 80.21 (76.30–83.62) |
| NPV | 93.91 (92.18–95.28) | 93.32 (91.53–94.76) | 88.01 (85.70–90.00) | 85.22 (82.98–87.22) | 90.15 (88.17–91.83) | 72.82 (70.29–75.21) | 90.18 (88.10–91.94) |
| F1–score | 0.88 (0.85–0.91) | 0.87 0.84–0.90) | 0.73 (0.69–0.77) | 0.77 (0.73–0.81) | 0.53 (0.48–0.58) | 0.40 (0.35–0.45) | 0.80 (0.76–0.84) |
|
| |||||||
| Accuracy | 93.35 (91.92–94.55) | 92.39 (90.88–93.67) | 79.84 (77.67–81.85) | 90.26 (88.59–91.71) | 83.20 (81.16–85.06) | 59.60 (57.02–62.12) | 86.76 (84.89–88.44) |
| Sensitivity | 90.49 (87.47–92.86) | 92.91 (90.19–94.94) | 67.81 (63.46–71.88) | 92.51 (89.73–94.60) | 84.62 (81.06–87.63) | 96.36 (94.20–97.76) | 80.57 (76.74–83.91) |
| Specificity | 94.81 (93.17–96.09) | 92.12 (90.19–93.71) | 86.00 (83.61–8810) | 89.11 (86.93–90.97) | 82.47 (79.89–84.79) | 40.77 (37.66–43.96) | 89.94 (87.82–91.73) |
| PPV | 89.94 (86.87–92.37) | 85.79 (82.48–88.58) | 71.28 (66.92–75.29) | 81.32 (77.79–84.41) | 71.21 (67.33–74.81) | 45.46 (42.42–48.54) | 80.40 (76.57–83.75) |
| NPV | 95.11 (93.50–96.35) | 96.21 (94.71–97.31) | 83.91 (81.43–86.12) | 95.87 (94.30–97.04) | 91.27 (89.15–93.02) | 95.62 (93.40–97.31) | 90.03 (87.92–91.81) |
| F1–score | 0.90 (0.87–0.93) | 0.89 (0.86–0.92) | 0.70 (0.65–0.74) | 0.87 (0.83–0.89) | 0.77 (0.74–0.81) | 0.62 (0.58–0.65) | 0.80 (0.77–0.84) |
|
| |||||||
| Accuracy | 98.35 (97.52–98.92) | 97.26 (96.25–98.01) | 95.54 (94.32–96.52) | 94.65 (93.34–95.72) | 85.60 (83.67–87.34) | 83.47 (81.44–85.32) | 98.77 (98.02–99.25) |
| Sensitivity | 98.14 (96.37–99.09) | 92.75 (89.97–94.83) | 90.48 (87.42–92.88) | 100 (99.02–100) | 99.59 (98.35–99.93) | 50.72 (46.17–55.26) | 98.14 (96.37–99.09) |
| Specificity | 98.46 (97.41–99.10) | 99.49 (98.74–99.81) | 98.05 (96.91–98.79) | 92.00 (90.07–93.59) | 78.67 (75.94–81.17) | 99.69 (99.02–99.92) | 99.08 (98.19–99.55) |
| PPV | 96.93 (94.87–98.21) | 98.90 (97.30–99.60) | 95.83 (93.45–97.40) | 86.10 (82.89–88.80) | 69.81 (66.21–73.19) | 98.79 (96.21–99.69) | 98.14 (96.37–99.09) |
| NPV | 99.07 (98.18–99.55) | 96.52 (95.14–97.53) | 95.41 (93.88–96.58) | 100 (99.47–100) | 99.74 (98.96–99.95) | 80.33 (77.95–82.51) | 99.08 (98.19–99.55) |
| F1–score | 0.98 (0.96–0.99) | 0.96 (0.93–0.97) | 0.93 (0.90–0.95) | 0.93 (0.90–0.94) | 0.82 (0.79–0.84) | 0.67 (0.62–0.71) | 0.98 (0.96–0.99) |
|
| 92.04 (90.50–93.35) | 90.67 (89.03–92.09) | 78.33 (76.11–80.40) | 86.08 (84.17–87.79) | 73.66 (71.30–75.89) | 58.30 (55.72–60.84) | 86.21 (84.31–87.92) |
All results are given as a percentage (95% CI).
CNN, convolutional neural network; CD, Crohn’s disease; UC, ulcerative colitis; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value.
Diagnostic accuracy of the CNN model and clinicians in per-patient analysis.
| Threshold value | The CNN model | Clinician 1 | Clinician 2 | Clinician 3 | Clinician 4 | Clinician 5 | Clinician 6 |
| 50% | 90.91 (80.62–96.25) | 93.94 (84.44–98.04) | 78.79 (66.66–87.52) | 83.33 (71.71–90.99) | 59.09 (46.30–70.82) | 56.06 (43.35–68.07) | 90.91 (80.62–96.25) |
| 60% | 87.88 (76.96–94.25) | 93.94 (84.44–98.04) | 69.70 (57.00–80.09) | 81.82 (70.01–89.96) | 59.09 (46.30–70.82) | 50.00 (37.56–62.44) | 86.36 (75.18–93.19) |
| 70% | 86.36 (75.18–93.19) | 89.39 (78.77–95.27) | 63.64 (50.82–74.86) | 75.76 (63.38–85.11) | 53.03 (40.43–65.27) | 46.97 (34.73–59.57) | 77.27 (65.01–86.32) |
| 80% | 77.27 (65.01–86.32) | 81.82 (70.01-89.86) | 53.03 (40.43—-65.27) | 66.67 (53.89–77.50) | 46.97 (34.73–59.57) | 40.91 (29.18–53.70) | 63.64 (50.82–74.86) |
| 90% | 60.61 (47.80–72.18) | 68.18 (55.43–78.80) | 36.36 (25.14–49.18) | 54.55 (41.89–66.68) | 40.91 (29.18–53.70) | 36.36 (25.14–49.18) | 50.00 (37.56–62.44) |
All results are given as a percentage (95% CI).
CNN, convolutional neural network; CI, confidence interval.
FIGURE 3Comparison of prediction results between the CNN model and clinicians for each test image. (A) Bar diagram of the comparison of results between clinicians and the CNN model. The horizontal axis represents a corresponding proportion of the number of clinicians who classified the images correctly, and the corresponding quantities of the images are shown according to the height of the column and the number on the longitudinal axis. The blue column: the CNN model’s prediction is correct; the red column: the CNN model’s prediction is wrong. (B) Illustration of misclassified images by the CNN model, but not by the participating clinicians: (a–c) UC images misclassified as CD. (d) Normal images misclassified as CD. (C) Illustration of images that were misclassified by all clinicians, but not by the CNN model: (a–c) CD images misclassified as UC by all clinicians; (d–k) CD images misclassified as UC by most clinicians and misclassified as normal images by the rest; (l) UC misclassified as CD by all clinicians. CD, Crohn’s disease; UC, ulcerative colitis; CNN, convolutional neural network.