| Literature DB >> 33987082 |
Ge-Ge Wu1, Wen-Zhi Lv2, Rui Yin3, Jian-Wei Xu4, Yu-Jing Yan1, Rui-Xue Chen5, Jia-Yu Wang1, Bo Zhang6, Xin-Wu Cui1, Christoph F Dietrich7.
Abstract
OBJECTIVE: The purpose of this study was to improve the differentiation between malignant and benign thyroid nodules using deep learning (DL) in category 4 and 5 based on the Thyroid Imaging Reporting and Data System (TI-RADS, TR) from the American College of Radiology (ACR). DESIGN AND METHODS: From June 2, 2017 to April 23, 2019, 2082 thyroid ultrasound images from 1396 consecutive patients with confirmed pathology were retrospectively collected, of which 1289 nodules were category 4 (TR4) and 793 nodules were category 5 (TR5). Ninety percent of the B-mode ultrasound images were applied for training and validation, and the residual 10% and an independent external dataset for testing purpose by three different deep learning algorithms.Entities:
Keywords: artificial intelligence; deep learning; thyroid cancer; thyroid imaging reporting and data system (TI-RADS); ultrasound
Year: 2021 PMID: 33987082 PMCID: PMC8111071 DOI: 10.3389/fonc.2021.575166
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1Workflow of the construction of the training and test dataset.
Figure 3Performance of the ensemble D-CNN models in identifying patients with thyroid cancer in TR4 (A), TR5 (C), and TR4&5 (E) on three inner test datasets and TR4 (B), TR5 (D), and TR4&5 (F) on three outer test datasets. The red dots on each ROC curve demonstrate the performance of the radiologists. AUC, area under the curve; DCNN, deep convolutional neural network; ROC, receiver operating characteristics curve.
Basic information of the patients.
| Internal dataset (n=1396) | External dataset (n=197) | |
|---|---|---|
| Age (year) | 45.48 ± 10.33 (8-71) | 45.54 ± 11.82 (16-77) |
| ≤20 | 13(0.9) | 1(0.5) |
| 20-30 | 85(6.1) | 27(13.7) |
| 30-40 | 281(20.1) | 37(18.8) |
| 40-50 | 549(39.3) | 62(31.5) |
| ≥50 | 468(33.5) | 70(35.5) |
| Gender | ||
| Male | 337(24.1) | 47(23.9) |
| Female | 1059(75.9) | 150(76.1) |
Characteristics of the thyroid nodules in internal set enrolled in this survey.
| Task1 | Task2 | Task3 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Training dataset(n=1146) | Test dataset A (n=143) | Test dataset B (n=112) | Training dataset(n=698) | Test dataset A(n=95) | Test dataset B (n=101) | Training dataset (n=1844) | Test dataset A (n=238) | Test dataset B (n=213) | |
| Pathology | |||||||||
| benign | 637(55.6) | 70(49.0) | 77(68.8) | 297(42.6) | 32(33.7) | 36(35.6) | 934(50.7) | 102(42.9) | 113(53.1) |
| malignant | 509(44.4) | 73(51.0) | 35(31.2) | 401(57.4) | 63(66.3) | 65(64.4) | 910(49.3) | 136(57.1) | 100(46.9) |
| Diameter (mm) | |||||||||
| ≤ 0.5 | 221(19.3) | 26(18.1) | 19(17.0) | 93(13.3) | 14(14.7) | 9(8.9) | 314(17.0) | 40(16.8) | 28(13.1) |
| 0.5‐1.0 | 431(37.6) | 57(39.9) | 55(49.0) | 295(42.3) | 41(43.2) | 35(34.7) | 726(39.4) | 98(41.2) | 90(42.3) |
| 1.0‐2.0 | 176(15.4) | 39(27.3) | 28(25.0) | 125(17.9) | 25(26.3) | 29(28.7) | 301(16.3) | 64(26.9) | 57(26.8) |
| > 2.0 | 318(27.7) | 21(14.7) | 10(9.0) | 185(36.5) | 15(15.8) | 28(27.7) | 503(23.3) | 36(15.1) | 38(17.8) |
| Internal Composition | |||||||||
| Cystic/partially | 0(0) | 0(0) | 0(0) | 0(0) | 0(0) | 0(0) | 0(0) | 0(0) | 0(0) |
| Mixed | 48(4.2) | 7(4.9) | 6(5.3) | 3(0.4) | 1(1.1) | 1(1.0) | 51(2.8) | 8(3.4) | 7(3.3) |
| Solid/almost | 1098(95.8) | 136(95.1) | 106(94.6) | 695(99.6) | 94(98.9) | 100(99.0) | 1793(97.2) | 230(96.6) | 206(96.7) |
| Echogenicity | |||||||||
| Anechoic | 0(0) | 0(0) | 0(0) | 0(0) | 0(0) | 0(0) | 0(0) | 0(0) | 0(0) |
| Hyperechoic/ | 30(2.6) | 6(4.2) | 4(3.6) | 5(0.7) | 1(1.1) | 0(45) | 35(1.9) | 7(2.9) | 4(1.9) |
| Hypoechoic | 1113(97.1) | 137(95.8) | 107(95.5) | 681(95.6) | 92(96.8) | 100(99.0) | 1814(98.3) | 229(96.2) | 207(97.2) |
| Very hypoechoic | 3(0.3) | 0(0) | 1(0.9) | 12(1.7) | 2(2.1) | 1(1.0) | 15(0.8) | 2(0.8) | 2(0.9) |
| Shape | |||||||||
| Wider-than-tall | 1143(99.7) | 142(99.3) | 112(100.0) | 478(68.5) | 65(68.4) | 70(69.3) | 1621(87.9) | 207(87.0) | 182(85.4) |
| Taller-than-wide | 3(0.3) | 1(0.7) | 0(0) | 220(31.5) | 30(31.6) | 32(31.7) | 223(12.1) | 31(13.0) | 31(14.6) |
| Margins | |||||||||
| Smooth/ | 992(86.6) | 108(75.5) | 85(78.9) | 477(68.3) | 60(63.2) | 62(61.4) | 1469(79.7) | 168(70.6) | 147(69.0) |
| Lobulated/ | 153(13.3) | 35(37.5) | 27(24.1) | 210(30.1) | 30(31.5) | 32(31.7) | 363(19.7) | 65(27.3) | 59(27.7) |
| Extra-thyroid extension | 1(0.1) | 0(0) | 0(0) | 11(1.6) | 5(5.3) | 7(6.9) | 12(0.6) | 5(2.1) | 7(3.3) |
| Echogenic foci | |||||||||
| None/large comet-tail artifacts | 991(86.5) | 122(85.3) | 93(83.0) | 92(13.2) | 16(16.8) | 19(18.8) | 1083(58.7) | 138(58.0) | 112(52.6) |
| Macrocalcifications | 133(11.6) | 16(11.2) | 10(8.9) | 17(2.4) | 5(5.3) | 3(3.0) | 150(8.1) | 21(8.8) | 13(6.1) |
| Peripheral calcifications | 22(1.9) | 6(4.2) | 6(5.4) | 3(0.4) | 0(0) | 1(1.0) | 25(1.4) | 6(2.5) | 7(3.3) |
| Punctate echogenic foci | 22(1.9) | 7(4.9) | 5(4.5) | 601(86.1) | 76(80.0) | 78(77.2) | 623(33.8) | 83(34.9) | 83(39.0) |
*Test set A and test set B referred to the internal test set and external test set. Data in parentheses are percentages.
Performance of deep learning containing three CNNs compared with the radiologists in differentiating benign and malignant thyroid nodules classified into ACR TI-RADS category 4.
| ResNet-50 | Inception- | Desnet-121 | Radiologists |
| |
|---|---|---|---|---|---|
| Resnet-v2 | value | ||||
| Internal dataset (n=143) | |||||
| Accuracy | 0.874 | 0.846 | 0.846 | 0.734 | 0.010 |
| Sensitivity | 0.836 | 0.918 | 0.863 | 0.684 | 0.004 |
| Specificity | 0.914 | 0.771 | 0.871 | 0.786 | 0.066 |
| PPV | 0.910 | 0.807 | 0.875 | 0.769 | 0.115 |
| NPV | 0.842 | 0.900 | 0.859 | 0.705 | 0.024 |
| Kappa value | 0.749 | 0.691 | 0.693 | 0.470 | |
| F1 | 0.846 | 0.775 | 0.846 | 0.649 | |
| AUROC | 0.936 | 0.902 | 0.911 | 0.735 | |
| External dataset (n=112) | |||||
| Accuracy | 0.830 | 0.821 | 0.795 | 0.741 | 0.033 |
| Sensitivity | 0.829 | 0.657 | 0.800 | 0.686 | 0.108 |
| Specificity | 0.831 | 0.896 | 0.792 | 0.766 | 0.101 |
| PPV | 0.690 | 0.742 | 0.636 | 0.571 | 0.037 |
| NPV | 0.914 | 0.852 | 0.897 | 0.843 | 0.226 |
| Kappa value | 0.626 | 0.571 | 0.553 | 0.429 | |
| F1 | 0.812 | 0.785 | 0.775 | 0.713 | |
| AUROC | 0.904 | 0.845 | 0.842 | 0.726 | |
Performance of deep learning containing three CNNs compared with the radiologists in differentiating benign and malignant thyroid nodules classified into ACR TI-RADS category 4 and 5.
| ResNet-50 | Inception- | Desnet-121 | Radiologists |
| |
|---|---|---|---|---|---|
| Resnet-v2 | value | ||||
| Internal dataset (n=238) | |||||
| Accuracy | 0.832 | 0.811 | 0.824 | 0.718 | 0.007 |
| Sensitivity | 0.882 | 0.794 | 0.824 (0.747-0.882) | 0.662 | <0.001 |
| Specificity | 0.745 | 0.833 | 0.843 (0.755-0.905) | 0.794 | 0.227 |
| PPV | 0.822 | 0.864 | 0.875 (0.802-0.925) | 0.811 | 0.429 |
| NPV | 0.826 | 0.752 | 0.782 (0.691-0.853) | 0.638 | 0.009 |
| Kappa value | 0.635 | 0.619 | 0.660 | 0.442 | |
| F1 | 0.852 | 0.784 | 0.836 | 0.668 | |
| AUROC | 0.879 | 0.883 | 0.892 | 0.728 | |
| External dataset (n=213) | |||||
| Accuracy | 0.784 | 0.770 | 0.761 | 0.723 | 0.009 |
| Sensitivity | 0.790 | 0.860 | 0.710 | 0.680 | 0.023 |
| Specificity | 0.779 | 0.690 | 0.805 | 0.761 | 0.530 |
| PPV | 0.760 | 0.711 | 0.763 | 0.716 | 0.055 |
| NPV | 0.807 | 0.848 | 0.758 | 0.729 | 0.071 |
| Kappa value | 0.567 | 0.544 | 0.517 | 0.442 | |
| F1 | 0.784 | 0.770 | 0.758 | 0.722 | |
| AUROC | 0.829 | 0.807 | 0.793 | 0.721 | |
Figure 2Heatmaps of the region of interest (ROI) of the thyroid nodules using class activation mapping (CAM). The red color showed the prediction regions the CNNs focused which estimated to be determined as the thyroid cancer. Three radiologists and DL correctly predicted a malignant (A) thyroid nodule diagnosed as micro papillary carcinoma TR4 and a benign (B) one diagnosed as non-toxic nodular goiter of TR4. ResNet50, Desnet121, and the radiologists deemed a malignant nodule (C) diagnosed as papillary carcinoma of TR5 as malignance but a DL algorithm named Inception-ResNet version 2 judged it as benign. All CNNs correctly predicted a benign (D) thyroid nodule diagnosed as Hashimoto’s thyroiditis of TR5 but the radiologists all predicted wrongly.
Performance of deep learning containing three CNNs compared with the radiologists in differentiating benign and malignant thyroid nodules classified into ACR TI-RADS category 5.
| ResNet-50 | Inception- | Desnet-121 | Radiologists |
| |
|---|---|---|---|---|---|
| Resnet-v2 | value | ||||
| Internal dataset (n=95) | |||||
| Accuracy | 0.863 | 0.811 | 0.832 | 0.695 (0.596-0.778) | 0.022 |
| Sensitivity | 0.841 | 0.841 | 0.952 | 0.635 | <0.001 |
| Specificity | 0.906 | 0.750 | 0.594 | 0.813 | 0.026 |
| PPV | 0.946 | 0.869 | 0.822 | 0.870 | 0.055 |
| NPV | 0.744 | 0.706 | 0.864 | 0.531 | 0.026 |
| Kappa value | 0.709 | 0.592 | 0.582 | 0.396 | |
| F1 | 0.854 | 0.791 | 0.793 | 0.688 | |
| AUROC | 0.915 | 0.838 | 0.906 | 0.724 | |
| External dataset (n=101) | |||||
| Accuracy | 0.822 | 0.713 | 0.802 | 0.703 | 0.080 |
| Sensitivity | 0.846 | 0.615 | 0.754 | 0.677 | 0.211 |
| Specificity | 0.778 | 0.889 | 0.889 | 0.750 | 0.128 |
| PPV | 0.873 | 0.909 | 0.925 | 0.830 | 0.132 |
| NPV | 0.737 | 0.561 | 0.667 | 0.563 | 0.203 |
| Kappa value | 0.616 | 0.446 | 0.598 | 0.397 | |
| F1 | 0.808 | 0.711 | 0.796 | 0.694 | |
| AUROC | 0.845 | 0.770 | 0.842 | 0.713 | |