| Literature DB >> 32010746 |
Thomas K L Lui1, Kenneth K Y Wong2, Loey L Y Mak1, Elvis W P To1, Vivien W M Tsui1, Zijie Deng3, Jiaqi Guo3, Li Ni3, Michael K S Cheung1,3, Wai K Leung1.
Abstract
Background and study aims Artificial intelligence (AI)-assisted image classification has been shown to have high accuracy on endoscopic diagnosis. We evaluated the potential effects of use of an AI-assisted image classifier on training of junior endoscopists for histological prediction of gastric lesions. Methods An AI image classifier was built on a convolutional neural network with five convolutional layers and three fully connected layers A Resnet backbone was trained by 2,000 non-magnified endoscopic gastric images. The independent validation set consisted of another 1,000 endoscopic images from 100 gastric lesions. The first part of the validation set was reviewed by six junior endoscopists and the prediction of AI was then disclosed to three of them (Group A) while the remaining three (Group B) were not provided this information. All endoscopists reviewed the second part of the validation set independently. Results The overall accuracy of AI was 91.0 % (95 % CI: 89.2-92.7 %) with 97.1 % sensitivity (95 % CI: 95.6-98.7%), 85.9 % specificity (95 % CI: 83.0-88.4 %) and 0.91 area under the ROC (AUROC) (95 % CI: 0.89-0.93). AI was superior to all junior endoscopists in accuracy and AUROC in both validation sets. The performance of Group A endoscopists but not Group B endoscopists improved on the second validation set (accuracy 69.3 % to 74.7 %; P = 0.003). Conclusion The trained AI image classifier can accurately predict presence of neoplastic component of gastric lesions. Feedback from the AI image classifier can also hasten the learning curve of junior endoscopists in predicting histology of gastric lesions.Entities:
Year: 2020 PMID: 32010746 PMCID: PMC6976335 DOI: 10.1055/a-1036-6114
Source DB: PubMed Journal: Endosc Int Open ISSN: 2196-9736
Fig. 1Representative figures of AI image classifier for prediction of histology of sessile gastric lesions.
Fig. 2 Study flow.
Clinicopathological characteristics of the validation set.
|
|
|
|
| |
|
| 100 | 50 | 50 | 1.00 |
|
| 14.9 mm | 15.4 mm | 14.7 mm | 0.73 |
|
| ||||
| IIa or IIa-like | 55.0 % (n = 55) | 48.0 % (n = 24) | 62.0 % (n = 31) | 0.23 |
| IIb or IIb-like | 22.0 % (n = 22) | 18.0 % (n = 9) | 26.0 % (n = 13) | 0.47 |
| IIc or IIc-like | 11.0 % (n = 11) | 14.0 % (n = 7) | 8.0 % (n = 7) | 0.52 |
| Is or Is-like | 12.0 % (n = 12) | 20.0 % (n = 10) | 4.0 % (n = 2) | 0.12 |
|
| ||||
| Antrum | 71.0 % (n = 71) | 72.0 % (n = 36) | 70.0 % (n = 35) | 1.00 |
| Body | 29.0 % (n = 29) | 28.0 % (n = 14) | 30.0 % (n = 15) | 1.00 |
|
| ||||
| Gastritis | 36.0 % (n = 36) | 34.0 % (n = 17) | 38.0 % (n = 19) | 0.84 |
| Intestinal metaplasia | 14.0 % (n = 14) | 12.0 % (n = 6) | 16.0 % (n = 8) | 0.77 |
| Hyperplastic | 2.0 % (n = 2) | 4.0 % (n = 2) | 0 % (n = 0) | 0.47 |
| Low-grade dysplasia | 30.0 % (n = 30) | 32.0 % (n = 16) | 28.0 % (n = 14) | 0.83 |
| High-grade dysplasia | 5.0 % (n = 5) | 4.0 % (n = 2) | 6.0 % (n = 3) | 1.00 |
| Adenocarcinoma | 13.0 % (n = 13) | 14.0 % (n = 7) | 12.0 % (n = 6) | 1.00 |
|
| ||||
| Intramucosal | 4 % (n = 4) | 4 % (n = 2) | 4 % (n = 2) | 1.00 |
| Submucosal | 9 % (n = 9) | 10 % (n = 5) | 8 % (n = 4) | 1.00 |
|
| ||||
| Well differentiated | 6 % (n = 6) | 6 % (n = 3) | 6 % (n = 3) | 1.00 |
| Moderately differentiated | 6 % (n = 6) | 6 % (n = 3) | 6 % (n = 3) | 1.00 |
| Poorly differentiated | 1 % (n = 1) | 2 % (n = 1) | 0 % (n = 0) | 1.00 |
|
| ||||
| AUROC (95 % CI) | 0.92 (0.89–0.93) | 0.92 (0.90–0.94) | 0.91 (0.89–0.93) | 0.53 |
| Accuracy (95 % CI) | 91.0 % (89.1–92.7 %) | 91.6 % (88.8–93.9 %) | 90.4 % (87.5–92.8 %) | 0.52 |
AUROC, area under the receiver operating characteristics curve; CI, confidence interval
Analysis of the performance of AI according to lesion characteristics.
| Accuracy (95 %CI) | AUROC (95 %CI) | |
|
| ||
| > 10 mm | 90.7 % (88.5–92.7 %) | 0.90 (0.88–0.92) |
| ≤ 10 mm | 91.9 % (87.4 %-95.2 %) | 0.93 (0.89–0.97) |
|
| ||
| IIa or IIa-like | 91.4 % (88.8–93.6 %) | 0.92 (0.89–0.94) |
| IIb or IIb-like | 83.6 % (78.0–88.2 %) | 0.91 (0.89–0.94) |
| IIc or IIc-like | 98.2 % (96.5–99.9 %) | 0.99 (0.97–0.99) |
| Is or Is-like | 95.8 % (90.5–98.6 %) | 0.95 (0.91–0.99) |
|
| ||
| Antrum | 89.2 % (86.8–91.4 %) | 0.90 (0.88–0.91) |
| Body | 95.2 % (92.0–97.3 %) | 0.95 (0.92–0.97) |
AUROC, area under the receiver operating characteristics curve.
Summary of the performance of AI and all endoscopists: first part of validation.
| Endoscopist | ||||||||
| AI | Senior | I | II | III | IV | V | VI | |
| Sensitivity | 96.0 % (93.4–98.6 %) | 88.1 % (83.7 %-91.6 %) | 96.0 % (93.4–98.6 %) | 42.3 % (35.8–48.8 %) | 77.1 % (71.6–82.6 %) | 52.5 % (45.9–59.0 %) | 87.9 % (83.6–92.1 %) | 85.2 % (80.5–89.8 %) |
| Specificity | 88.1 % (84.3–91.9 %) | 79.8 % (73.9–84.8 %) | 48.0 % (42.1–54.0 %) | 94.2 % (91.4–96.9 %) | 58.8 % (53.1–64.6 %) | 82.7 % (78.2–87.1 %) | 61.7 % (56.0–67.4 %) | 40.4 % (34.6–46.2 %) |
| PPV | 86.6 % 82.4 %-90.9 %) | 84.4 % (79.7 %-88.4 %) | 59.8 % (54.7–64.8 %) | 85.4 % (78.9–92.0 %) | 60.1 % (54.5–65.8 %) | 70.9 % (64.0–77.8 %) | 64.9 % (59.5–70.2 %) | 53.5 % (48.3–58.7 %) |
| NPV | 96.4 % (94.2–98.7 %) | 84.4 % (79.7–88.4 %) | 93.7 % (89.7–97.7 %) | 67.0 % (62.3–71.7 %) | 76.1 % (70.5–81.9 %) | 68.4 % (63.4–73.3 %) | 86.4 % (81.6–91.1 %) | 77.2 % (70.4–84.1 %) |
|
Accuracy
| 91.6 % (89.1–94.0 %) | 84.4 % (80.9–87.5) | 69.4 % (65.3–73.4 %) | 71.1 % (67.1–75.1 %) | 67.0 % (62.9–71.1 %) | 69.2 % (65.2–73.3 %) | 73.4 % (69.5–77.2 %) | 60.4 % (56.1–64.7 %) |
|
AUROC
| 0.92 (0.89–0.95) | 0.84 (0.81–0.87) | 0.72 (0.68–0.77) | 0.68 (0.63–0.73) | 0.68 (0.63–0.73) | 0.68 (0.63–0.72) | 0.75 (0.71–0.79) | 0.63 (0.58–0.68) |
| Mean confidence | 84.0 % (82.6–85.4 %) | 94.6 % (60.0–100.0 %) | 92.5 % (91.1–93.9 %) | 75.4 % (74.5–76.2 %) | 75.0 % (74.0–75.9 %) | 85.6 % (84.6–86.7 %) | 87.1 % (86.0–88.4 %) | 75.5 % (74.5–76.5 %) |
PPV, positive predictive value; NPV, negative predictive value; AUROC, area under the receiver operating characteristics curve.
AI is superior to all junior endoscopists in terms of accuracy and AUROC (all P < 0.01). Number in brackets refer to 95 % confidence intervals
Summary of the performance of AI and all endoscopists: second part of validation.
| Endoscopist | ||||||||
| AI | Senior | I | II | III | IV | V | VI | |
| Sensitivity | 98.4 % (96.8–99.9 %) | 87.6 % (84.4–90.4 %) | 99.6 % (98.7–99.9 %) | 60.8 % (53.6–67.2 %) | 73.9 % (68.2–79.6 %) | 39.1 % (32.8–45.4 %) | 80.8 % (75.8–86.0 %) | 73.9 % (68.2–79.6 %) |
| Specificity | 82.4 % (77.6–87.1 %) | 73.4 % (67.3–79.1 %) | 51.5 % (45.5–57.4 %) | 85.2 % (81.0–89.4 %) | 81.5 % (76.9–86.1 %) | 96.3 % (94.0–98.6 %) | 55.6 % (49.6–61.5 %) | 59.3 % (53.4–65.1 %) |
| PPV | 84.8 % (80.7–89.0 %) | 81.5 % (76.9–85.6 %) | 63.6 % (58.6–68.6 %) | 77.8 % (71.7–83.8 %) | 77.3 % (71.7–82.8 %) | 90.0 % (84.1–95.9 %) | 39.2 % (33.8–44.7 %) | 60.7 % (55.0–66.4 %) |
| NPV | 98.1 %. (96.3–99.9 %) | 99.4 % (96.7–99.9 %) | 99.3 % (97.9–99.9 %) | 71.9 % (67.0–76.8 %) | 78.6 % (73.8–83.4 %) | 65.0 % (60.3–69.7 %) | 77.3 % (71.4–83.2 %) | 72.7 % (66.8–78.6 %) |
|
Accuracy
| 90.4 % (87.8–93.0 %) | 87.6 % (84.4–90.4 %) | 73.6 % (69.7–77.5 %) | 74.0 % (70.2–77.8 %) | 78.0 % (74.4–81.6 %) | 70.0 % (65.9–74.0 %) | 67.2 % (63.1–71.3 %) | 66.0 % (61.8–70.1 %) |
|
AUROC
| 0.91 (0.88–0.93) | 0.90 (0.88–0.93) | 0.75 (0.71–0.80) | 0.73 (0.69–0.78) | 0.78 (0.73–0.82) | 0.68 (0.63–0.73) | 0.68 (0.64–0.73) | 0.67 (0.65–0.70) |
| Mean Confidence | 82.3 % (80.6–84.0 %) | 94.9 % (70.0–100.0 %) | 90.4 % (89.7–91.0 %) | 75.6 % (74.9–76.3 %) | 75.2 % (74.5–75.9 %) | 78.1 % (77.1–79.1 %) | 87.5 % (86.4–88.6 %) | 75.3 % (74.4–76.3 %) |
PPV, positive predictive value; NPV, negative predictive value; AUROC, area under the receiver operating characteristics curve.
AI is superior to all junior endoscopists in terms of accuracy and AUROC (all P < 0.01). Number in brackets refer to 95 % confidence intervals.
Comparison of the performance of Group A and Group B endoscopists.
| Group A Endoscopists | Group B Endoscopists | |||||
| 1 st part of validation set | 2 nd part of validation set |
| 1 st part of validation set | 2 nd part of validation set |
| |
| Sensitivity | 72.0 % (68.7–75.4 %) | 82.7 % (79.8–85.6 %) | 0.049 | 75.1 % (71.9–78.5 %) | 64.6 % (61.0 %-68.2 %) | < 0.001 |
| Specificity | 67.0 % (63.7–70.1 %) | 68.1 % (64.8–71.4 %) | 0.80 | 61.6 % (58.3–64.9 %) | 70.4 % (67.2–73.5 %) | < 0.001 |
| PPV | 63.9 % (60.5–67.3 %) | 68.4 % (65.1–71.6 %) | 0.29 | 61.2 % (57.9–64.5 %) | 65.0 % (61.5–68.6 %) | 0.50 |
| NPV | 74.7 % (71.6–77.9 %) | 82.5 % (79.5–85.5 %) | 0.049 | 75.5 % (72.2–78.8 %) | 70.0 % (66.9–73.2 %) | 0.12 |
| Accuracy | 69.3 % (67.0 %–71.6 %) | 74.7 % (72.5 %–77.0 %) | 0.003 | 67.7 % (65.3 %-70.0 %) | 67.7 % (65.3 %-70.1 %) | 0.11 |
| AUROC | 0.69 (0.67–0.72) | 0.75 (0.72–0.77) | 0.02 | 0.68 (0.66–0.71) | 0.67 (0.65–0.70) | 0.12 |
| Mean confidence | 80.7 % (80.1–81.3 %) | 80.4 % (79.9–80.9 %) | 0.88 | 82.8 % (82.1–83.5 %) | 80.3 % (79.7–80.9 %) | < 0.001 |
PPV, positive predictive value; NPV, negative predictive value; AUROC, area under the operator characteristics curve.