| Literature DB >> 35204638 |
Bum-Joo Cho1,2, Jeong-Won Kim3, Jungkap Park1, Gui-Young Kwon4, Mineui Hong5, Si-Hyong Jang6, Heejin Bang7, Gilhyang Kim3, Sung-Taek Park8.
Abstract
Artificial intelligence has enabled the automated diagnosis of several cancer types. We aimed to develop and validate deep learning models that automatically classify cervical intraepithelial neoplasia (CIN) based on histological images. Microscopic images of CIN3, CIN2, CIN1, and non-neoplasm were obtained. The performances of two pre-trained convolutional neural network (CNN) models adopting DenseNet-161 and EfficientNet-B7 architectures were evaluated and compared with those of pathologists. The dataset comprised 1106 images from 588 patients; images of 10% of patients were included in the test dataset. The mean accuracies for the four-class classification were 88.5% (95% confidence interval [CI], 86.3-90.6%) by DenseNet-161 and 89.5% (95% CI, 83.3-95.7%) by EfficientNet-B7, which were similar to human performance (93.2% and 89.7%). The mean per-class area under the receiver operating characteristic curve values by EfficientNet-B7 were 0.996, 0.990, 0.971, and 0.956 in the non-neoplasm, CIN3, CIN1, and CIN2 groups, respectively. The class activation map detected the diagnostic area for CIN lesions. In the three-class classification of CIN2 and CIN3 as one group, the mean accuracies of DenseNet-161 and EfficientNet-B7 increased to 91.4% (95% CI, 88.8-94.0%), and 92.6% (95% CI, 90.4-94.9%), respectively. CNN-based deep learning is a promising tool for diagnosing CIN lesions on digital histological images.Entities:
Keywords: artificial intelligence; cervical intraepithelial neoplasia; convolutional neural network; deep learning; histology image
Year: 2022 PMID: 35204638 PMCID: PMC8871214 DOI: 10.3390/diagnostics12020548
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Data composition for the first splitting of the training and test datasets.
| Whole Dataset | Training Set | Test Set | ||||
|---|---|---|---|---|---|---|
| Image N | Patient N | Image N | Patient N | Image N | Patient N | |
| Overall | 1106 | 588 | 989 | 542 | 117 | 68 |
| CIN 3 | 266 | 183 | 236 | 165 | 30 | 19 |
| CIN 2 | 231 | 108 | 210 | 97 | 21 | 11 |
| CIN 1 | 266 | 143 | 234 | 129 | 32 | 14 |
| Non-neoplasm | 343 | 250 | 309 | 225 | 34 | 25 |
N, numbers; CIN, cervical intraepithelial neoplasia.
Accuracies of deep learning models.
| Four-Class Classification | Three-Class Classification | |||
|---|---|---|---|---|
| DenseNet-161 | EfficientNet-b7 | DenseNet-161 | EfficientNet-b7 | |
| Mean accuracy | 0.885 | 0.895 | 0.914 | 0.926 |
| 95% CI | 0.863–0.906 | 0.833–0.957 | 0.888–0.940 | 0.904–949 |
| Test 1 | 0.906 | 0.957 | 0.940 | 0.949 |
| Test 2 | 0.873 | 0.853 | 0.901 | 0.919 |
| Test 3 | 0.875 | 0.875 | 0.901 | 0.911 |
CI, confidence interval; CI, confidence interval.
Figure 1A training curve for training and validation accuracies. The validation accuracy reached a plateau within 20 epochs during model training.
Figure 2Heatmaps for confusion matrix of the best-performing CNN models and human pathologists in the four-class classification. There were three false-negative cases in the best-performing DenseNet-161 (a) model, there was no false-negative or false-positive case with the best-performing EfficientNet-B7 (b). Pathologist 1 (c) classified CIN2 with higher sensitivity than pathologist 2 (d).
Per-class performances of the deep learning models in the four-class classification.
| Model/Class | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) | F1 Score | AUC (95% CI) |
|---|---|---|---|---|---|---|
| DenseNet-161 | ||||||
| CIN3 | 95.3 (93.7–96.8) | 94.4 (93.1–95.6) | 85.0 (82.3–87.7) | 98.3 (97.8–98.9) | 89.8 (88.8–90.8) | 0.989 (0.982–0.996) |
| CIN2 | 75.2 (67.7–82.8) | 94.1 (91.8–96.4) | 76.1 (62.3–89.9) | 93.8 (92.5–95.0) | 75.5 (64.9–86.1) | 0.947 (0.932–0.963) |
| CIN1 | 82.1 (77.8–86.5) | 98.3 (97.4–99.2) | 94.2 (92.2–96.2) | 94.5 (92.6–96.4) | 87.7 (84.5–91.0) | 0.979 (0.968–0.990) |
| Non-neoplasm | 95.6 (90.9–100.0) | 98.0 (96.3–99.7) | 95.0 (91.0–99.0) | 98.3 (96.6–100.0) | 95.2 (92.0–98.4) | 0.996 (0.991–1.000) |
| EfficientNet-B7 | ||||||
| CIN3 | 97.5 (95.4–99.5) | 96.3 (94.1–98.6) | 90.0 (84.2–95.8) | 99.1 (98.4–99.8) | 93.6 (89.6–97.5) | 0.990 (0.981–0.999) |
| CIN2 | 73.0 (62.2–83.9) | 96.7 (93.7–99.7) | 86.8 (75.2–98.4) | 93.6 (92.3–94.8) | 79.1 (69.1–89.1) | 0.956 (0.946–0.967) |
| CIN1 | 85.2 (73.3–97.1) | 96.3 (95.1–97.6) | 88.5 (88.2–88.8) | 95.5 (91.3–99.8) | 86.5 (80.4–92.6) | 0.971 (0.950–0.993) |
| Non-neoplasm | 95.6 (90.9–100.0) | 96.3 (92.2–100.0) | 92.3 (84.8–99.8) | 98.3 (96.6–100.0) | 93.8 (88.8–98.8) | 0.996 (0.992–0.999) |
PPV, positive predictive value; NPV, negative predictive value; AUC, area under the receiver operating characteristic curve; CI, confidence interval.
Figure 3Per-class ROC curves for four-class classification the best-performing CNN models. For DenseNet-161 (a) and EfficientNet-B7 (b) with best performance, AUC was higher in discriminating non-neoplasm and CIN3 rather than in classifying CIN2 and CIN1.
Figure A1Histology of misclassified cases by CNN models. A case with scarce koilocytotic cells but basal atypia was false-negative (a). CIN3 showing basal/parabasal-type atypia throughout most of the epithelium but not all was downgraded to CIN2 (b). CIN2 (c) downgraded as CIN1 showed koilocytotic changes in the upper half and maturation in upper most layers but had atypia focally extending to the lower half of the epithelium (black arrow). In CIN1 upgraded as CIN2, the epithelium was disoriented (d). CIN2 with koilocytosis (e) and atrophic CIN2 (f) were upgraded as CIN3.
Figure 4Heatmaps for confusion matrix of the best-performing CNN models and human pathologists in the three-class classification. The overall accuracies increased up to 94.0% by DenseNet-161 (a) and 94.9% by EfficientNet-B7 (b), similar to those of human pathologists 1 and 2, 95.7% (c) and 92.3% (d), respectively.
Per-class performances of the deep learning models in the three-class classification.
| Model/Class | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) | F1 Score | AUC (95% CI) |
|---|---|---|---|---|---|---|
| DenseNet-161 | ||||||
| CIN2-3 | 92.0 (86.9–97.1) | 92.4 (85.3–99.6) | 92.5 (87.0–98.0) | 93.4 (90.2–96.7) | 92.1 (88.9–95.3) | 0.981 (0.973–0.989) |
| CIN1 | 80.9 (70.9–90.8) | 96.0 (94.2–97.7) | 87.0 (84.0–89.9) | 94.5 (93.3–95.6) | 83.5 (77.6–89.4) | 0.974 (0.968–0.980) |
| Non-neoplasm | 97.8 (94.2–100.0) | 97.5 (95.6–99.5) | 94.4 (90.0–98.9) | 99.1 (97.6–100.0) | 95.9 (95.5–96.4) | 0.996 (0.992–0.999) |
| EfficientNet-B7 | ||||||
| CIN2-3 | 94.8 (92.8–96.7) | 93.4 (90.1–96.8) | 92.9 (90.3–95.6) | 95.1 (92.3–97.9) | 93.8 (91.7–96.0) | 0.982 (0.971–0.993) |
| CIN1 | 86.1 (82.4–89.7) | 96.4 (95.2–97.5) | 87.6 (81.2–94.0) | 95.6 (94.3–96.9) | 86.8 (82.1–91.4) | 0.979 (0.972–0.985) |
| Non-neoplasm | 94.7 (92.8–96.6) | 98.4 (97.0–99.7) | 96.0 (92.8–99.2) | 97.8 (97.1–98.6) | 95.3 (94.0–96.6) | 0.993 (0.985–1.000) |
PPV, positive predictive value; NPV, negative predictive value; AUC, area under the receiver operating characteristic curve; CI, confidence interval.
Figure 5Grad-CAM images by EfficientNet-B7. Normal squamous epithelium was highlighted in Grad-CAM images (a–d). Images from cervix interpreted as non-neoplasm by the EfficientNet-B7 include exocervix (a), metaplastic muco-sa from transformation zone (b), cervicitis and erosion (c) and atrophic mucosa (d). In CIN1, layers with koilocytotic cells were mainly highlighted (e). The highlighted areas extended to the upper two-third of the epithelium in CIN2 (f) and full-thickness of the epithelium in CIN3 (g). Normal endocervical glands ((g), black arrows) were not highlighted.