| Literature DB >> 34907442 |
Wanjun Zhao1, Qingbo Kang2, Feiyan Qian3, Kang Li2, Jingqiang Zhu1, Buyun Ma4.
Abstract
PURPOSE: This study investigates the efficiency of deep learning models in the automated diagnosis of Hashimoto's thyroiditis (HT) using real-world ultrasound data from ultrasound examinations by computer-assisted diagnosis (CAD) with artificial intelligence.Entities:
Keywords: Hashimoto’s thyroiditis; artificial intelligence; convolutional neural networks; radiologists; ultrasound
Mesh:
Year: 2022 PMID: 34907442 PMCID: PMC8947219 DOI: 10.1210/clinem/dgab870
Source DB: PubMed Journal: J Clin Endocrinol Metab ISSN: 0021-972X Impact factor: 5.958
Baseline characteristics of patients with HT or non-HT in training set and validation set
| HT | Non-HT | |||||||
|---|---|---|---|---|---|---|---|---|
| All | Training Set | Validation set | All | Training Set | Validation set | |||
| Items | (n = 10739) | (n = 7463) | (n = 3276) | P value | (n = 10 379) | (n = 7426) | (n = 2953) |
|
| Age | 36.92±14.81 | 37.00±14.85 | 36.68±14.80 | 0.949 | 44.73±14.85 | 44.61±14.78 | 45.48±14.84 | 0.880 |
| Gender | 0.406 | 0.437 | ||||||
| Male | 2684 | 1835 | 849 | 2594 | 1840 | 754 | ||
| Female | 8055 | 5578 | 2477 | 7785 | 5586 | 2199 | ||
| TSH | 4.87±17.39 | 4.97±17.63 | 4.79±13.85 | 0.720 | 5.35±51.56 | 5.44±52.72 | 4.82±45.02 | 0.694 |
| FT3 | 15.40±88.00 | 14.96±79.95 | 16.76±104.15 | 0.464 | 8.75±39.20 | 8.66±39.15 | 9.27±44.67 | 0.640 |
| FT4 | 34.79±164.36 | 34.19±158.36 | 36.44±175.98 | 0.666 | 20.12±44.70 | 20.15±54.00 | 19.94±31.42 | 0.904 |
| TgAb | 845.75±1099.25 | 843.45±1092.08 | 861.68±1189.79 | 0.646 | 32.87±129.61 | 32.21±126.00 | 33.03±137.08 | 0.870 |
| TPOAb | 373.53±859.29 | 371.35±859.17 | 380.10±877.72 | 0.736 | 13.19±10.03 | 13.21±10.00 | 13.05±9.49 | 0.579 |
| Hyperthyroidism | 2093 (19.49) | 1476(20.05) | 617 (18.83) | 0.267 | 2004 (19.31) | 1424 (19.18) | 580 (19.64) | 0.607 |
| Hypothyroidism | 2863 (26.66) | 1969 (26.38) | 894 (27.29) | 0.340 | 2736 (26.36) | 1975 (26.60) | 761 (25.77) | 0.403 |
Qualitative variables are in n (%), and quantitative variables are in mean ± SD.
Abbreviations: FT3, free triiodothyronine; FT4, free thyroxine; TgAb, TPOAb, thyroid peroxidase antibodies; TSH, thyroid-stimulating hormone.
Figure 1.Flowchart of the procedures in the development of deep learning models for Hashimoto’s thyroiditis (HT) diagnosis on ultrasound. Using data sets from 2 hospitals, the deep learning model with convolutional neural networks was trained to differentiate HT. Abbreviations: FN, false negative; FP, false positive; NPV, negative predictive value; PPV, positive predictive value; TN, true negative; TP, true positive.
Diagnostic performance of the final ensembled model and the 9 basic version convolutional neural network models with test time augmentation
| Model | Accuracy | Sensitivity | Specificity | PPV | NPV | AUC | F1 (avg) | κ value |
|---|---|---|---|---|---|---|---|---|
| VGG19 | 0.842 | 0.835 | 0.850 | 0.860 | 0.823 | 0.900 | 0.842 | 0.684 |
| VGG19 (TTA) | 0.851 | 0.845 | 0.858 | 0.868 | 0.833 | 0.918 | 0.851 | 0.702 |
| ResNet18 | 0.850 | 0.846 | 0.856 | 0.867 | 0.833 | 0.917 | 0.850 | 0.700 |
| ResNet18 (TTA) | 0.865 | 0.867 | 0.862 | 0.875 | 0.854 | 0.928 | 0.864 | 0.729 |
| ResNet50 | 0.860 | 0.855 | 0.865 | 0.875 | 0.843 | 0.922 | 0.859 | 0.719 |
| ResNet50 (TTA) | 0.870 | 0.867 | 0.874 | 0.884 | 0.856 | 0.931 | 0.870 | 0.740 |
| ResNet152 | 0.864 | 0.859 | 0.868 | 0.879 | 0.848 | 0.926 | 0.863 | 0.727 |
| ResNet152 (TTA) | 0.874 | 0.871 | 0.878 | 0.888 | 0.859 | 0.932 | 0.874 | 0.748 |
| DenseNet169 | 0.860 | 0.856 | 0.865 | 0.875 | 0.844 | 0.923 | 0.860 | 0.720 |
| DenseNet169 (TTA) | 0.871 | 0.863 | 0.879 | 0.888 | 0.852 | 0.931 | 0.870 | 0.741 |
| DenseNet264 | 0.866 | 0.860 | 0.874 | 0.883 | 0.849 | 0.930 | 0.866 | 0.732 |
| DenseNet264 (TTA) | 0.876 | 0.867 | 0.886 | 0.894 | 0.857 | 0.932 | 0.876 | 0.752 |
| EfficientNet-b0 | 0.864 | 0.874 | 0.853 | 0.868 | 0.859 | 0.924 | 0.864 | 0.727 |
| EfficientNet-b0 (TTA) | 0.874 | 0.879 | 0.869 | 0.882 | 0.867 | 0.933 | 0.874 | 0.748 |
| EfficientNet-b4 | 0.870 | 0.879 | 0.860 | 0.875 | 0.865 | 0.930 | 0.870 | 0.739 |
| EfficientNet-b4 (TTA) | 0.878 | 0.882 | 0.874 | 0.886 | 0.870 | 0.935 | 0.878 | 0.756 |
| EfficientNet-b7 | 0.874 | 0.880 | 0.868 | 0.881 | 0.867 | 0.933 | 0.874 | 0.748 |
| EfficientNet-b7 (TTA) | 0.881 | 0.885 | 0.877 | 0.889 | 0.873 | 0.937 | 0.881 | 0.762 |
| Ensemble model | 0.889 | 0.887 | 0.892 | 0.901 | 0.877 | 0.938 | 0.889 | 0.778 |
| Ensemble (TTA) model | 0.892 | 0.890 | 0.895 | 0.904 | 0.880 | 0.940 | 0.892 | 0.784 |
Abbreviations: DenseNet, Dense Nework; EfficientNet, Efficient Network; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; κ value, the Fleiss’s κ value; ResNet, Residual Network; TTA, test time augmentation; VGG, Visual Geometry Group Network.
Figure 2.Receiver operating characteristic (ROC) curves of the HT-CAD model on different hospitals. Orange line shows the performance of HT-CAD model on all validated Hashimoto’s thyroiditis (HT) images, including images from Hospitals A and Hospital B; the area under the curve (AUC) is 0.940. Green line indicates the performance of HT-CAD model on HT images from A hospital, and the AUC is 0.949. Purple line indicates the performance of HT-CAD model on HT images from B hospital, and the AUC is 0.936. There is no statistical difference (P > 0.05).
Comparison the performance of HT-CAD in different two hospitals
| Accuracy(95% CI) | Sensitivity (95%C I) | Specificity (95% CI) | PPV | NPV | AUC | F1 (avg) | κ value | |
|---|---|---|---|---|---|---|---|---|
| Performance | ||||||||
| All | 0.892 (0.881-0.902) | 0.890 (0.868-0.911) | 0.895 (0.874-0.913) | 0.904 | 0.880 | 0.940 | 0.892 | 0.784 |
| Hospital A | 0.901 (0.890-0.911) | 0.898 (0.878-0.916) | 0.902 (0.884-0.0.919) | 0.892 | 0.892 | 0.949 | 0.886 | 0.798 |
| Hospital B | 0.887 (0.876-0.898) | 0.884 (0.866-0.903) | 0.891 (0.875-0.909) | 0.911 | 0.873 | 0.936 | 0.896 | 0.780 |
|
| ||||||||
| All vs Hospital A | 0.127 | 0.135 | 0.188 | — | — | — | — | — |
| All vs Hospital B | 0.314 | 0.265 | 0.377 | — | — | — | — | — |
| Hospital A vs Hospital B | 0.071 | 0.069 | 0.104 | — | — | — | — | — |
Abbreviations: AUC, area under the curve; κ value, Fleiss’s κ value; NPV, negative predictive value; PPV, positive predictive value.
The comparison of diagnostic performance between HT-CAD and senior or junior radiologists
| Accuracy(95%CI) | Sensitivity(95%CI) | Specificity(95%CI) | PPV | NPV | AUC | F1(avg) | κ value | |
|---|---|---|---|---|---|---|---|---|
| Performance | ||||||||
| CNN model | 0.892 (0.881-0.902) | 0.890 (0.868-0.911) | 0.895 (0.874-0.913) | 0.904 | 0.880 | 0.940 | 0.892 | 0.784 |
| Radiologists | ||||||||
| Senior | 0.801 (0.784-0.818) | 0.805 (0.786-0.822) | 0.797 (0.778-0.814) | 0.805 | 0.797 | 0.801 | 0.805 | 0.602 |
| Junior | 0.654 (0.639-0.667) | 0.660 (0.644-0.676) | 0.647 (0.626-0.667) | 0.662 | 0.646 | 0.654 | 0.661 | 0.308 |
|
| ||||||||
| Senior vs CNN model | <0.001 | <0.001 | <0.001 | |||||
| Junior vs CNN model | <0.001 | <0.001 | <0.001 | |||||
| Senior vs junior | <0.001 | <0.001 | <0.001 |
Abbreviations: PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; κ value, Fleiss’s κ value; CNN, convolutional neural networks.
Comparison performance of HT-CAD in different subgroups by thyroid hormone levels
| Accuracy (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | PPV | NPV | AUC | F1 (avg) | κ value | |
|---|---|---|---|---|---|---|---|---|
| Performance | ||||||||
| All | 0.892 (0.881-0.902) | 0.890 (0.868-0.911) | 0.895 (0.874-0.913) | 0.904 | 0.880 | 0.940 | 0.892 | 0.784 |
| Group A (with hyperthyroidism) | 0.871 (0.861-0.880) | 0.911 (0.893-0.929) | 0.674 (0.656-0.692) | 0.922 | 0.660 | 0.861 | 0.920 | 0.586 |
| Group B (with hypothyroidism) | 0.888 (0.877-0.897) | 0.883 (0.861-0.905) | 0.874 (0.854-0.891) | 0.950 | 0.754 | 0.931 | 0.920 | 0.731 |
| Group C (with euthyroidism) | 0.894 (0.884-0.902) | 0.896 (0.874-0.915) | 0.908 (0.889-0.925) | 0.879 | 0.908 | 0.947 | 0.887 | 0.787 |
|
| ||||||||
| All vs Group A | <0.001 | <0.001 | <0.001 | — | — | |||
| All vs Group B | 0.384 | 0.219 | 0.003 | — | — | |||
| All vs Group C | 0.625 | 0.247 | 0.084 | — | — | |||
| Group A vs Group C | <0.001 | <0.001 | <0.001 | — | — | |||
| Group B vs Group C | 0.289 | 0.084 | <0.001 | — | — | |||
| Group A vs Group B | 0.005 | <0.001 | <0.001 | — | — |
Abbreviations: AUC, area under the curve; κ value, Fleiss’s κ value; NPV, negative predictive value; PPV, positive predictive value.
Figure 3.Receiver operating characteristic (ROC) curves of the HT-CAD model on different thyroid hormone levels. (A) The ROC curve of the HT-CAD model in the hyperthyroidism subgroup. (B) The ROC curve of the HT-CAD model in the hypothyroidism subgroup. (C) The ROC curve of the HT-CAD model in the euthyroidism subgroup. The red dots indicate the diagnostic sensitivities and specificities of senior radiologists. The green dots indicate the diagnostic sensitivities and specificities of junior radiologists. Compared to the senior and junior radiologists, the HT-CAD model showed the better diagnostic performance in the hyperthyroidism, hypothyroidism, and euthyroidism subgroups.
Figure 4.Visualization of HT-CAD model of Hashimoto’s thyroiditis (HT). (A and C) The original ultrasonic images of HT patients. (B and D) Heat map of HT-CAD model based on 2 HT ultrasonic images.