| Literature DB >> 35428854 |
Jaesung Heo1, June Hyuck Lim1, Hye Ran Lee2, Jeon Yeob Jang2, Yoo Seob Shin2, Dahee Kim3, Jae Yol Lim3, Young Min Park3, Yoon Woo Koh3, Soon-Hyun Ahn4, Eun-Jae Chung4, Doh Young Lee4, Jungirl Seok5, Chul-Ho Kim6.
Abstract
In this study, we developed a deep learning model to identify patients with tongue cancer based on a validated dataset comprising oral endoscopic images. We retrospectively constructed a dataset of 12,400 verified endoscopic images from five university hospitals in South Korea, collected between 2010 and 2020 with the participation of otolaryngologists. To calculate the probability of malignancy using various convolutional neural network (CNN) architectures, several deep learning models were developed. Of the 12,400 total images, 5576 images related to the tongue were extracted. The CNN models showed a mean area under the receiver operating characteristic curve (AUROC) of 0.845 and a mean area under the precision-recall curve (AUPRC) of 0.892. The results indicate that the best model was DenseNet169 (AUROC 0.895 and AUPRC 0.918). The deep learning model, general physicians, and oncology specialists had sensitivities of 81.1%, 77.3%, and 91.7%; specificities of 86.8%, 75.0%, and 90.9%; and accuracies of 84.7%, 75.9%, and 91.2%, respectively. Meanwhile, fair agreement between the oncologist and the developed model was shown for cancer diagnosis (kappa value = 0.685). The deep learning model developed based on the verified endoscopic image dataset showed acceptable performance in tongue cancer diagnosis.Entities:
Mesh:
Year: 2022 PMID: 35428854 PMCID: PMC9012779 DOI: 10.1038/s41598-022-10287-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Dataset characteristics.
| Hospital | Diagnosis | n | % |
|---|---|---|---|
| AUH | Non-malignancy | 1867 | 75.71 |
| Malignancy | 599 | 24.29 | |
| SNUH | Non-malignancy | 157 | 23.54 |
| Malignancy | 510 | 76.46 | |
| NCC | Non-malignancy | 648 | 94.74 |
| Malignancy | 36 | 5.26 | |
| BRH | Non-malignancy | 220 | 62.50 |
| Malignancy | 132 | 37.50 | |
| YUH | Non-malignancy | 743 | 52.81 |
| Malignancy | 664 | 47.19 | |
| Total | Non-malignancy | 3635 | 65.19 |
| Malignancy | 1941 | 34.81 | |
| Total | 5576 | 100.00 |
AUH Ajou University Hospital, SNUH Seoul National University Hospital, NCC National Cancer Center, BRH Boramae Medical Center, YUH Yonsei University Hospital.
Figure 1Overview of the development and evaluation of the tongue cancer diagnosis algorithm.
Diagnostic performance of CNN models in internal validation (a) and external validation (b).
| Model | Sensitivity (95% CI) | Specificity (95% CI) | Precision (95% CI) | F1-score (95% CI) | Accuracy (95% CI) | AUROC (95% CI) | AUPRC (95% CI) |
|---|---|---|---|---|---|---|---|
| CNN | 0.712 (0.685–0.739) | 0.860 (0.839–0.881) | 0.733 (0.706–0.760) | 0.720 (0.693–0.747) | 0.809 (0.785–0.833) | 0.882 (0.862–0.902) | 0.932 (0.917–0.947) |
| VGG16 | 0.822 (0.799–0.845) | 0.911 (0.894–0.928) | 0.832 (0.809–0.855) | 0.826 (0.803–0.849) | 0.880 (0.860–0.900) | 0.950 (0.937–0.963) | 0.974 (0.964–0.984) |
| VGG19 | 0.801 (0.777–0.825) | 0.910 (0.893–0.927) | 0.828 (0.805–0.851) | 0.813 (0.789–0.837) | 0.872 (0.852–0.892) | 0.941 (0.927–0.955) | 0.969 (0.959–0.979) |
| DenseNet121 | 0.886 (0.867–0.905) | 0.913 (0.896–0.930) | 0.844 (0.822–0.866) | 0.864 (0.843–0.885) | 0.904 (0.886–0.922) | 0.959 (0.947–0.971) | 0.977 (0.968–0.986) |
| DenseNet169 | 0.890 (0.871–0.909) | 0.921 (0.905–0.937) | 0.859 (0.838–0.880) | 0.873 (0.853–0.893) | 0.910 (0.893–0.927) | 0.960 (0.948–0.972) | 0.977 (0.968–0.986) |
| DenseNet201 | 0.866 (0.845–0.887) | 0.928 (0.912–0.944) | 0.866 (0.845–0.887) | 0.865 (0.844–0.886) | 0.907 (0.889–0.925) | 0.960 (0.948–0.972) | 0.978 (0.969–0.987) |
| MobileNetV1 | 0.817 (0.794–0.840) | 0.913 (0.896–0.930) | 0.840 (0.818–0.862) | 0.822 (0.799–0.845) | 0.879 (0.859–0.899) | 0.946 (0.932–0.960) | 0.969 (0.959–0.979) |
| MobileNetV2 | 0.612 (0.582–0.642) | 0.925 (0.909–0.941) | 0.819 (0.796–0.842) | 0.782 (0.757–0.807) | 0.817 (0.794–0.840) | 0.931 (0.916–0.946) | 0.961 (0.949–0.973) |
| ResNet34 | 0.690 (0.662–0.718) | 0.842 (0.820–0.864) | 0.709 (0.681–0.737) | 0.687 (0.659–0.715) | 0.789 (0.764–0.814) | 0.873 (0.853–0.893) | 0.934 (0.919–0.949) |
| ResNet101 | 0.710 (0.683–0.737) | 0.905 (0.887–0.923) | 0.802 (0.778–0.826) | 0.749 (0.723–0.775) | 0.838 (0.816–0.860) | 0.920 (0.904–0.936) | 0.957 (0.945–0.969) |
| ResNet152 | 0.744 (0.718–0.770) | 0.908 (0.891–0.925) | 0.812 (0.788–0.836) | 0.775 (0.750–0.800) | 0.851 (0.829–0.873) | 0.926 (0.910–0.942) | 0.960 (0.948–0.972) |
| EfficientNetB3 | 0.618 (0.589–0.647) | 0.920 (0.904–0.936) | 0.804 (0.780–0.828) | 0.681 (0.653–0.709) | 0.815 (0.791–0.839) | 0.899 (0.881–0.917) | 0.944 (0.930–0.958) |
| CNN | 0.767 (0.723–0.811) | 0.563 (0.511–0.615) | 0.521 (0.469–0.573) | 0.614 (0.563–0.665) | 0.639 (0.589–0.689) | 0.716 (0.669–0.763) | 0.818 (0.778–0.858) |
| VGG16 | 0.701 (0.653–0.749) | 0.821 (0.781–0.861) | 0.706 (0.658–0.754) | 0.700 (0.652–0.748) | 0.776 (0.732–0.82) | 0.866 (0.830–0.902) | 0.917 (0.888–0.946) |
| VGG19 | 0.642 (0.592–0.692) | 0.893 (0.861–0.925) | 0.784 (0.741–0.827) | 0.704 (0.656–0.752) | 0.799 (0.757–0.841) | 0.887 (0.854–0.920) | 0.930 (0.903–0.957) |
| DenseNet121 | 0.795 (0.753–0.837) | 0.831 (0.792–0.870) | 0.750 (0.705–0.795) | 0.765 (0.721–0.809) | 0.817 (0.777–0.857) | 0.885 (0.852–0.918) | 0.906 (0.876–0.936) |
| DenseNet169 | 0.793 (0.751–0.835) | 0.853 (0.816–0.890) | 0.773 (0.729–0.817) | 0.777 (0.734–0.82) | 0.830 (0.791–0.869) | 0.895 (0.863–0.927) | 0.918 (0.889–0.947) |
| DenseNet201 | 0.769 (0.725–0.813) | 0.876 (0.842–0.910) | 0.793 (0.751–0.835) | 0.778 (0.735–0.821) | 0.836 (0.797–0.875) | 0.892 (0.860–0.924) | 0.913 (0.884–0.942) |
| MobileNetV1 | 0.701 (0.653–0.749) | 0.878 (0.844–0.912) | 0.789 (0.746–0.832) | 0.730 (0.684–0.776) | 0.811 (0.77–0.852) | 0.884 (0.851–0.917) | 0.906 (0.876–0.936) |
| MobileNetV2 | 0.435 (0.383–0.487) | 0.909 (0.879–0.939) | 0.757 (0.712–0.802) | 0.619 (0.568–0.67) | 0.732 (0.686–0.778) | 0.802 (0.760–0.844) | 0.847 (0.809–0.885) |
| ResNet34 | 0.674 (0.625–0.723) | 0.717 (0.670–0.764) | 0.607 (0.556–0.658) | 0.623 (0.572–0.674) | 0.701 (0.653–0.749) | 0.793 (0.751–0.835) | 0.871 (0.836–0.906) |
| ResNet101 | 0.532 (0.480–0.584) | 0.883 (0.849–0.917) | 0.741 (0.695–0.787) | 0.612 (0.561–0.663) | 0.751 (0.706–0.796) | 0.842 (0.804–0.880) | 0.902 (0.871–0.933) |
| ResNet152 | 0.662 (0.613–0.711) | 0.856 (0.819–0.893) | 0.744 (0.698–0.79) | 0.695 (0.647–0.743) | 0.783 (0.740–0.826) | 0.856 (0.819–0.893) | 0.908 (0.878–0.938) |
| EfficientNetB3 | 0.524 (0.472–0.576) | 0.865 (0.829–0.901) | 0.739 (0.693–0.785) | 0.572 (0.520–0.624) | 0.737 (0.691–0.783) | 0.816 (0.776–0.856) | 0.873 (0.838–0.908) |
Figure 2Receiver operating characteristic curves and precision-recall curves for the deep learning algorithm on internal validation dataset (A) and external validation datasets (B).
Figure 3Performance of the deep learning model and comparison with human readers.
Agreement of the model and human readers.
| Malignancy prediction | |||
|---|---|---|---|
| Kappa value | 95% CI | P value | |
| Specialist | 0.685 | 0.606–0.763 | < 0.001 |
| General physician | 0.482 | 0.389–0.575 | < 0.001 |
Figure 4Validation and test structure diagram of the tongue cancer dataset for deep learning.