| Literature DB >> 35896791 |
Sun-Ju Byeon1, Jungkap Park2, Yoon Ah Cho3, Bum-Joo Cho4,5.
Abstract
Colonoscopy is an effective tool to detect colorectal lesions and needs the support of pathological diagnosis. This study aimed to develop and validate deep learning models that automatically classify digital pathology images of colon lesions obtained from colonoscopy-related specimen. Histopathological slides of colonoscopic biopsy or resection specimens were collected and grouped into six classes by disease category: adenocarcinoma, tubular adenoma (TA), traditional serrated adenoma (TSA), sessile serrated adenoma (SSA), hyperplastic polyp (HP), and non-specific lesions. Digital photographs were taken of each pathological slide to fine-tune two pre-trained convolutional neural networks, and the model performances were evaluated. A total of 1865 images were included from 703 patients, of which 10% were used as a test dataset. For six-class classification, the mean diagnostic accuracy was 97.3% (95% confidence interval [CI], 96.0-98.6%) by DenseNet-161 and 95.9% (95% CI 94.1-97.7%) by EfficientNet-B7. The per-class area under the receiver operating characteristic curve (AUC) was highest for adenocarcinoma (1.000; 95% CI 0.999-1.000) by DenseNet-161 and TSA (1.000; 95% CI 1.000-1.000) by EfficientNet-B7. The lowest per-class AUCs were still excellent: 0.991 (95% CI 0.983-0.999) for HP by DenseNet-161 and 0.995 for SSA (95% CI 0.992-0.998) by EfficientNet-B7. Deep learning models achieved excellent performances for discriminating adenocarcinoma from non-adenocarcinoma lesions with an AUC of 0.995 or 0.998. The pathognomonic area for each class was appropriately highlighted in digital images by saliency map, particularly focusing epithelial lesions. Deep learning models might be a useful tool to help the diagnosis for pathologic slides of colonoscopy-related specimens.Entities:
Mesh:
Year: 2022 PMID: 35896791 PMCID: PMC9329279 DOI: 10.1038/s41598-022-16885-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Data composition for the firstly split datasets for building deep learning models.
| Whole dataset | Training set | Tuning set | Test set | |||||
|---|---|---|---|---|---|---|---|---|
| Image N | Patient N | Image N | Patient N | Image N | Patient N | Image N | Patient N | |
| Overall | 1865 | 703 | 1484 | 561 | 173 | 71 | 208 | 71 |
| ADC | 429 | 206 | 332 | 158 | 42 | 23 | 55 | 25 |
| TA | 462 | 231 | 357 | 179 | 46 | 25 | 59 | 27 |
| TSA | 150 | 71 | 137 | 61 | 3 | 3 | 10 | 7 |
| SSA | 278 | 192 | 220 | 149 | 34 | 26 | 24 | 17 |
| HP | 189 | 161 | 154 | 132 | 12 | 10 | 23 | 19 |
| NC | 357 | 261 | 284 | 208 | 36 | 28 | 37 | 25 |
ADC advanced tubular adenocarcinoma, TA tubular adenoma, TSA traditional serrated adenoma, SSA sessile serrated adenoma, HP hyperplastic polyp, NC nonspecific change.
Figure 1Heatmap of confusion matrix for the best-performing models: (A) DenseNet-161 (B) EfficientNet-B7.
Per-class model performances of deep learning models for six-class classification.
| Model | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) | F1 score (%) | AUC (95% CI) |
|---|---|---|---|---|---|---|
| ADC | 99.4 (98.4–100.0) | 99.6 (98.9–100.0) | 98.2 (95.4–100.0) | 99.8 (99.4–100.0) | 98.8 (97.5–100.0) | 1.000 (0.999–1.000) |
| TA | 97.7 (95.3–100.0) | 98.7 (97.6–99.7) | 95.6 (91.3–99.8) | 99.3 (98.7–99.9) | 96.6 (93.3–99.9) | 0.995 (0.989–1.000) |
| TSA | 98.4 (95.9–100.0) | 100.0 (100.0–100.0) | 100.0 (100.0–100.0) | 99.8 (99.5–100.0) | 99.2 (97.9–100.0) | 0.999 (0.999–1.000) |
| SSA | 96.5 (93.4–99.5) | 99.1 (98.8–99.3) | 93.8 (91.4–96.1) | 99.4 (98.8–100.0) | 95.0 (94.0–96.1) | 0.993 (0.985–1.000) |
| HP | 91.5 (87.7–95.3) | 99.6 (99.3–99.9) | 97.0 (94.7–99.4) | 99.1 (98.8–99.3) | 94.1 (92.9–95.3) | 0.991 (0.983–0.999) |
| NC | 97.7 (94.0–100.0) | 99.8 (99.5–100.0) | 99.1 (97.7–100.0) | 99.3 (98.3–100.0) | 98.4 (96.7–100.0) | 0.995 (0.986–1.000) |
| ADC | 97.3 (94.7–99.8) | 99.8 (99.4–100.0) | 99.1 (97.6–100.0) | 99.1 (98.2–100.0) | 98.1 (96.7–99.6) | 0.997 (0.991–1.000) |
| TA | 95.3 (94.2–96.4) | 97.8 (95.6–100.0) | 93.9 (88.7–99.0) | 98.4 (98.1–98.8) | 94.5 (92.3–96.6) | 0.997 (0.994–0.999) |
| TSA | 95.1 (90.5–99.7) | 100.0 (100.0–100.0) | 100.0 (100.0–100.0) | 99.7 (99.4–99.9) | 97.4 (95.0–99.9) | 1.000 (1.000–1.000) |
| SSA | 97.5 (95.5–99.6) | 99.2 (99.0–99.5) | 95.0 (92.6–97.4) | 99.6 (99.3–99.9) | 96.2 (94.3–98.2) | 0.995 (0.992–0.998) |
| HP | 93.6 (87.5–99.6) | 99.2 (98.9–99.6) | 93.7 (92.2–95.2) | 99.3 (98.5–100.0) | 93.5 (91.2–95.8) | 0.995 (0.995–0.995) |
| NC | 95.0 (90.9–99.0) | 98.8 (97.8–99.8) | 95.1 (91.1–99.1) | 98.8 (97.7–99.8) | 94.9 (92.5–97.4) | 0.997 (0.994–1.000) |
PPV positive predictive value, NPV negative predictive value, AUC area under the receiver operating characteristic curve, CI confidence interval, ADC advanced tubular adenocarcinoma, TA tubular adenoma, TSA traditional serrated adenoma, SSA sessile serrated adenoma, HP hyperplastic polyp, NC nonspecific change.
Figure 2Per-class receiver operating characteristic curves for the best-performing models: (A) DenseNet-161 (B) EfficientNet-B7.
Model performances of deep learning models for binary classification discriminating advanced colorectal adenocarcinoma.
| Model | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) | F1 score (%) | AUC (95% CI) |
|---|---|---|---|---|---|---|
| DenseNet-161 | 97.1 (96.5–97.8) | 99.8 (99.4–100.0) | 99.1 (97.6–100.0) | 99.1 (98.8–99.5) | 98.1 (97.3–98.9) | 0.995 (0.988–1.000) |
| EfficientNet-B7 | 98.5 (97.2–99.8) | 99.8 (99.4–100.0) | 99.1 (97.6–100.0) | 99.6 (99.2–99.9) | 98.8 (97.5–100.0) | 0.998 (0.995–1.000) |
PPV positive predictive value, NPV negative predictive value, AUC area under the receiver operating characteristic curve, CI confidence interval.
Figure 3Representative images of Grad-CAM for each class: (A) adenocarcinoma, (B) tubular adenoma, (C) traditional serrated adenoma, (D) sessile serrated adenoma, (E) hyperplastic polyp, and (F) non-specific change.