Literature DB >> 36238043

Deep Learning in Thyroid Ultrasonography to Predict Tumor Recurrence in Thyroid Cancers.

Jieun Kil, Kwang Gi Kim, Young Jae Kim, Hye Ryoung Koo, Jeong Seon Park.   

Abstract

Purpose: To evaluate a deep learning model to predict recurrence of thyroid tumor using preoperative ultrasonography (US). Materials and
Methods: We included representative images from 229 US-based patients (male:female = 42:187; mean age, 49.6 years) who had been diagnosed with thyroid cancer on preoperative US and subsequently underwent thyroid surgery. After selecting each representative transverse or longitudinal US image, we created a data set from the resulting database of 898 images after augmentation. The Python 2.7.6 and Keras 2.1.5 framework for neural networks were used for deep learning with a convolutional neural network. We compared the clinical and histological features between patients with and without recurrence. The predictive performance of the deep learning model between groups was evaluated using receiver operating characteristic (ROC) analysis, and the area under the ROC curve served as a summary of the prognostic performance of the deep learning model to predict recurrent thyroid cancer.
Results: Tumor recurrence was noted in 49 (21.4%) among the 229 patients. Tumor size and multifocality varied significantly between the groups with and without recurrence (p < 0.05). The overall mean area under the curve (AUC) value of the deep learning model for prediction of recurrent thyroid cancer was 0.9 ± 0.06. The mean AUC value was 0.87 ± 0.03 in macrocarcinoma and 0.79 ± 0.16 in microcarcinoma.
Conclusion: A deep learning model for analysis of US images of thyroid cancer showed the possibility of predicting recurrence of thyroid cancer. Copyrights
© 2020 The Korean Society of Radiology.

Entities:  

Keywords:  Deep Learning; Recurrence; Thyroid Cancer, Papillary; Ultrasonography

Year:  2020        PMID: 36238043      PMCID: PMC9431857          DOI: 10.3348/jksr.2019.0147

Source DB:  PubMed          Journal:  Taehan Yongsang Uihakhoe Chi        ISSN: 1738-2637


INTRODUCTION

Thyroid cancer is the most common endocrine malignancy worldwide (123). Papillary thyroid carcinoma (PTC) is the most common (70–95%) histologic type and accounts for most new cases (456). Advances in technology have yielded high-frequency ultrasonic transducers and improved spatial resolution, and ultrasonography (US) is the preferred imaging method for thyroid disease screening. The incidence of papillary thyroid microcarcinoma (PTMC) is rapidly increasing due largely to improved ultrasonographic detection of small nodules not detected by clinical palpation (789). Thyroid microcarcinomas, which are almost exclusively PTC, are defined as thyroid cancers measuring ≤ 1 cm in maximum dimension (10). PTMC was introduced as a subtype of papillary thyroid cancer by the World Health Organization in 2004 and accounts for 30% to 40% of all papillary thyroid cancers (11). Although the prognosis of differentiated thyroid cancer (DTC) is excellent, cervical lymph node metastasis is found pathologically in 30–80% of patients with PTC (312131415), and the overall tumor recurrence rate varies from 7.7% to 36% (315). Poor prognostic factors in thyroid carcinoma include large tumors, tumors that extend beyond the thyroid capsule, male sex, and lymph-node metastases (161718). Several studies have revealed that the location, tumor size, degree of thyroid capsule abutment, multiplicity, or microcalcifications may be predictive of lateral lymph node metastasis in patients with PTC (192021). Recently, a deep learning model has been used as a tool for medical image analysis with promising results in various applications such as computerized prognosis for Alzheimer's disease, organ segmentation, and detection of diseases (222324). A few studies have investigated the utility of gray-scale US characteristics and a deep learning model in predicting prognosis of PTC (25262728). However, there are no published data regarding prognostic US features using a deep learning model in patients with thyroid cancer. Therefore, we evaluated a deep learning model to predict recurrence of thyroid cancer on preoperative US images.

MATERIALS AND METHODS

PATIENTS

This study was approved by our Institutional Review Board, and informed consent was waived (IRB No. 2010-03-031-001). Two radiologists consecutively reviewed the preoperative US images of patients diagnosed with thyroid cancer and underwent surgery between 2006 and 2011. We included 229 patients, 42 males and 187 females (median age, 49.6 years; range, 19–83 years). These patients underwent either total thyroidectomy (n = 173) or hemithyroidectomy (n = 56). The mean follow-up period was 7.1 years and ranged from 3.2–13.7 years. The clinical and histological features of age, sex, recurrence site, tumor size, histologic type, multifocality and lymph node metastasis were evaluated in each group. According to our institutional protocol, postoperative surveillance for the thyroid cancer patients consisted of annual neck US, clinical physical examination, the measurement of serum thyroid-stimulating hormone, free thyroxine, thyroglobulin (Tg), and chest X-ray. Additionally, computed tomography examinations were performed in the patients with suspicious lateral neck lymph node or lung metastasis.

ULTRASONOGRAPHIC IMAGES

Preoperative US was performed prospectively by one of two radiologists (four and nine years of experience, respectively) dedicated to thyroid imaging interpretation. US examinations were performed using a 5–12 MHz transducer (Philips Medical Systems, Bothell, WA, USA). Each representative transverse or longitudinal US image was selected by consensus of the two radiologists. We used the Image J program (National Institute of Mental Health, Bethesda, MD, USA) to draw a region of interest manually on the US images to outline the thyroid cancer (Fig. 1). All raw images were cropped into squares with long axis along the side of the image, which reduce the image distortion caused by resizing process. Finally, cropped images were resized to a 224 × 224 pixel (Fig. 2).
Fig. 1

The process of manually drawing a region of interest on a representative ultrasonograpy image of thyroid cancer using the Image J program. The bright line in the image on the right represents the tumor margin.

Fig. 2

A schematic diagram of the process of cropping and resizing an ultrasonograpy image of a patient with thyroid cancer.

DEEP LEARNING ANALYSIS

Image augmentation protocol was performed as an analysis method in deep learning to overcome the lack of statistical power due to a small number of samples. Using image augmentation, initial images were expanded by image transformation such as rotation, translation, scale up, and horizontal flip. An augmented data set with 898 images was used for image analysis. We used 717 of 898 images for training and 181 images for test data and all data divided per patient basis. Each training/validation split constituted an approximate 80:20 split of the 898 images, retaining the underlying distribution of disease prevalence. In same patient, the mean augmented US images were 3.96 images per person in the train set, 3.77 images in the test set, and 3.92 images in the whole data set. All augmented US images of same patients are separately used between train and test set. The learning algorithm generated from the train set of this study was evaluated in a test set consisting of completely independent patient data set. The breakdown of training and validation data sets according to tumor size is listed in the Table 1. The images were resized to a 224 × 224 pixel with a Bi-Cubic interpolation algorithm.
Table 1

Breakdown of Training and Validation Data Sets According to Tumor Size

Thyroid CancerNo. of Samples
Training SetValidation Set
No RecurrenceRecurrenceNo RecurrenceRecurrence
Microcarcinoma25924656
Macrocarcinoma380549614
We built the system with an NVIDIA TITAN Xp (NVIDIA, Santa Clara, CA, USA) graphics processing unit, a Xeon E5-1650 v4 (Intel, Santa Clara, CA, USA) central processing unit, and 128 GB of random access memory. The Python 2.7.6 and Keras 2.1.5 (with the TensorFlow backend) frameworks for neural networks were used for deep learning with a convolutional neural network. The proposed model for thyroid classification was transferred from the pre-trained on 1.2 million everyday color images from ImageNet (http://www.image-net.org/) that consisted of 1000 categories. A total of 434 training data was then fine-tuned under the conditions of 16 batch size, 150 epoch, and 0.0001 learning rate using the ResNet-50 model (29). The ResNet-50 model has improved performance in terms of training and generalization errors by utilizing shortcut connections to significantly reduce the difficulty of deep training (30).

STATISTICAL ANALYSIS

We used the moonBook package within the R statistical language for comparison of demographic characteristics between the no recurrence and recurrence groups (3132). We also compared demographic information between train and test set. All statistical analysis were performed based on per-patient data. Values for predicting recurrent thyroid cancer were calculated as the area under the curve (AUC), and the US images were used as input data for the deep learning approach. The predictive performance of the deep learning model between the no recurrence and recurrence groups was evaluated using receiver operating characteristic (ROC) analysis, and the area under the ROC curve served as a summary of the prognostic performance of the model in recurrent thyroid cancer. We also performed ROC analysis in subgroups of thyroid carcinoma larger than 1 cm in greatest diameter (macrocarcinoma) and 1 cm or less (microcarcinoma). In this experiment, 181 US images of pathologically diagnosed thyroid cancer were sampled with k-fold cross-validation (k = 3) for performance of a deep learning program predicting thyroid cancer recurrence with ROC curves. We also compared the AUC values of the validation steps between macrocarcinoma and microcarcinoma. All reported p-values were two-sided, and p < 0.05 was considered statistically significant. We used t-distributed stochastic neighbor-embedding (tSNE) method to visualize the predictive performance of the deep learning model.

RESULTS

We first compared the clinical characteristics between train and test set. There was no significant difference in sex, age, cancer size, histologic type, and lymph node metastasis between train set and test set (Table 2). It was only statistically significant at p < 0.022 in multifocality.
Table 2

Comparison of Demographic Characteristics of the Train Set and Test Set

Clinical VariableSubgroupTrain Set (n = 181, %)Test Set (n = 48, %) p-Value
Age50.1 ± 11.749.2 ± 13.10.612
SexMale151 (83.7)39 (82.0)0.880
Female30 (16.3)9 (18.0)
Tumor size (cm)Median diameter [IQR]0.6 [0.4–1.2]0.6 [0.5–1.2]0.647
Macrocarcinoma113 (62.5)30 (61.7)0.376
Microcarcinoma68 (37.5)18 (38.3)
Histologic typePTC176 (97.1)47 (98.4)0.149
FC3 (1.9)0 (0.0)
MC0 (0.0)1 (1.6)
Other2 (1.0)0 (0.0)
MultifocalityNo120 (66.3)39 (80.5)0.022
Yes61 (33.7)9 (19.5)
Lymph node metastasisYes172 (95.2)45 (92.2)0.127
No9 (4.8)4 (7.8)

The results for categorical variables are presented as percentages in parentheses for the three genotypic groups. The Shapiro-Wilk test is used to assess the normality of the continuous variables. The continuous variables are non-normally distributed and are described as median and IQR in brackets. Differences between the groups are assessed using Fisher's exact test for categorical variables and the Kruskal-Wallis rank sum test for interval scale measurements. A p-value < 0.05 is considered statistically significant.

FC = follicular carcinoma, IQR = interquartile range, MC = medullary carcinoma, PTC = papillary thyroid carcinoma

The demographic characteristics of the no recurrence and recurrence groups are listed in the Table 3. Among a total of 229 patients, 180 patients (78.6%) had no recurrent tumors (no recurrence group), while 49 patients had tumor recurrence (21.4%) (recurrence group). The recurrence rate was higher for macrocarcinoma (40.8%) than microcarcinoma (12.7%). Recurrence sites comprised regional lymph nodes (n = 34, 69.4%), lung (n = 4, 8.2%), bone (n = 2, 4.1%), and others (n = 9, 18.4%). In the patients with tumor recurrence, the median time of recurrence was 42.9 months ± 30.4 (median 35.7 months; range 5.3–110.9 months) after initial surgery. At the time of recurrence, the mean of serum Tg level was 140.5 ± 734 ng/mL. Among 34 patients with regional lymph node recurrence, 8 patients were presenting as a palpable neck mass in physical examination and 48 patients were detected in US.
Table 3

Comparison of Demographic Characteristics of the No Recurrence and Recurrence Groups

Clinical VariableSubgroupNo Recurrence (n = 180, %)Recurrence (n = 49, %) p-Value
Age50.4 ± 12.246.9 ± 13.70.086
SexMale152 (84.4)35 (71.4)0.060
Female28 (15.6)14 (28.6)
Tumor size (cm)Median diameter [IQR]0.6 [0.4–1.0]1.4 [0.7–2.5]< 0.001
Macrocarcinoma138 (76.7)20 (40.8)< 0.001
Microcarcinoma42 (23.3)29 (59.2)
Histologic typePTC177 (98.3)47 (95.9)0.157
FC1 (0.6)1 (2.0)
MC2 (1.1)0 (0.0)
Other0 (0.0)1 (2.0)
MultifocalityNo145 (80.6)29 (59.2)0.001
Yes35 (19.4)20 (40.8)
Lymph node metastasisYes175 (97.2)17 (34.7)< 0.001
No5 (2.8)32 (65.3)

The results for categorical variables are presented as percentages in parentheses for the three genotypic groups. The Shapiro-Wilk test is used to assess the normality of the continuous variables. The continuous variables are non-normally distributed and are described as median and IQR in brackets. Differences between the groups are assessed using Fisher's exact test for categorical variables and the Kruskal-Wallis rank sum test for interval scale measurements. A p-value < 0.05 is considered statistically significant.

FC = follicular carcinoma, IQR = interquartile range, MC = medullary carcinoma, PTC = papillary thyroid carcinoma

Tumor size and multifocality were significantly different between the no recurrence and recurrence groups (p < 0.05). Fig. 3 displays ROC curves for predicting tumor recurrence using a deep learning model in thyroid cancers using three-fold cross-validation. The overall mean AUC value of the deep learning model for predicting recurrent thyroid cancer was 0.9 ± 0.06. The mean AUC value was 0.87 ± 0.03 in macrocarcinoma and 0.79 ± 0.16 in microcarcinoma (Table 4). There were no statistically significant differences between k-fold cross validation of AUC values (p > 0.05). The Sensitivity, specificity and accuracy of the deep learning program predicting thyroid cancer recurrence are listed in the Table 5.
Fig. 3

Receiver operating characteristic curves for prediction of thyroid cancer recurrence.

AUC = area under the curve

Table 4

Three-Fold Cross-Validation of Performance of the Deep Learning Program Predicting Thyroid Cancer Recurrence with AUC

Group (No. of Samples)Actual GroupPrediction3-Fold Cross Validation
1st2nd3rdAccuracyAUC
MeanMean
Macrocarcinoma (n =110)No recurrence group (n = 96)19491950.920.87
251
Recurrence group (n = 14)11195
359
Microcarcinoma (n = 71)Microcarcinoma (n = 71)16565650.870.79
000
Recurrence group (n = 6)1221
445
Overall (n = 181)No recurrence group (n = 161)11551591600.930.90
621
Recurrence group (n = 20)11389
71211

AUC = area under the curve

Table 5

The Sensitivity, Specificity, and Accuracy of the Deep Learning Program Predicting Thyroid Cancer Recurrence

Sensitivity (%)Specificity (%)Accuracy (%)
Macrocarcinoma59.597.292.4
Microcarcinoma27.810093.9
Overall50.097.990.0
The intraclass features were divided into two groups on a 2-dimensional (2D) diagram of tumor recurrence using the tSNE technique, which revealed patterns distinguishing them in each group (Fig. 4).
Fig. 4

2D visualization of the presence of tumor recurrence generated using the tSNE technique in thyroid cancer between recurrence (open triangles) and no recurrence (close circles) groups.

tSNE = t-distributed stochastic neighbor-embedding

DISCUSSION

DTC has an excellent prognosis; however, there is considerable local recurrence and distant metastasis in long-term follow-up. As microcarcinomas are comprising an increasing proportion of thyroid cancers, a series of observational trials has shown that most PTMC can be observed with active surveillance (333435). In the era of active surveillance with DTC or PTMC, studies exploring methods for preoperative prediction of thyroid cancer prognosis have been emerging. According to previous studies, tumor size, multiplicity, and extrathyroidal tumor extension or lymph node metastases are poor prognostic factors and patients with these risk factors for recurrence have a nearly seven-fold higher mortality rate (31516171836). Our results also showed that thyroid cancers that were larger and multifocal had a significantly higher rate of recurrence (p < 0.05). Several studies have evaluated poor prognostic factors for thyroid cancer using thyroid US image analysis, and several US features have been reported as poor prognostic factors, including upper location, presence of calcifications, or features suggestive of malignant thyroid nodules (2037). Recently, various computer-aided analyses of thyroid US images have been used in the detection and differential diagnosis of malignant thyroid nodules, and they showed excellent diagnostic performance with accuracy over 70% or high AUC values over 0.9 (253839). Kim et al. (40) investigated texture analysis of thyroid US in patients with PTMC; however, there was no correlation between prognosis and lymph node metastasis. In our deep learning model for predicting tumor recurrence, the overall AUC value was 0.90 and was slightly higher in macrocarcinoma (0.87 ± 0.03) than microcarcinoma (0.79 ± 0.16). In our data, recurrence of microcarcinoma was less frequent than that of macrocarcinoma (p < 0.001), and the difference in incidence and prevalence of recurrence according to tumor size may have affected model performance. In our data, we made a deep learning model to predict thyroid cancer recurrence, and the sensitivity was low (50%) and the specificity was high (97.9%). With these results, this model can provide a parameter to subdivide low-risk group in thyroid cancer patients at the time of preoperative US. In the era of active surveillance for the thyroid cancer patients, further investigations including multivariate analysis with deep learning algorithm and many clinical parameters could suggest a better predictive model. This study has several limitations. First, it was retrospective and performed using data from a single institution; thus, it may be subject to bias. Second, the included US data were obtained using one machine and may not be representative of results from other machines. Third, one representative transverse or longitudinal US image was selected among many slices. Using 3D volumetric data including the whole tumor would be optimal considering the heterogeneity of cancer. The last limitation is the possibility of overfitting in this deep learning algorithm using relatively small volume of data. To prevent the overfitting in deep learning, we designed to collect US images with a variety of size of thyroid cancer. We also analyzed the cropped images without the area of normal thyroid tissue to reduce the number of potential data features that lead to ruin the model. A larger sample size of patients with recurrent thyroid cancer or multicenter trial may overcome such as overfitting of results. In conclusion, a deep learning model for analysis of US images showed the possibility of a feasible algorithm to predict thyroid cancer recurrence.
  34 in total

1.  Clinical risk factors associated with cervical lymph node recurrence in papillary thyroid carcinoma.

Authors:  Seung-Kuk Baek; Kwang-Yoon Jung; Sun-Mook Kang; Soon-Young Kwon; Jeong-Soo Woo; Seung-Hyun Cho; Eun-Jae Chung
Journal:  Thyroid       Date:  2010-02       Impact factor: 6.568

Review 2.  Lymph node dissection in patients with differentiated thyroid carcinoma--who benefits?

Authors:  B Mann; H J Buhr
Journal:  Langenbecks Arch Surg       Date:  1998-10       Impact factor: 3.445

3.  Computerized analysis of calcification of thyroid nodules as visualized by ultrasonography.

Authors:  Woo Jung Choi; Jeong Seon Park; Kwang Gi Kim; Soo-Yeon Kim; Hye Ryoung Koo; Young-Jun Lee
Journal:  Eur J Radiol       Date:  2015-06-25       Impact factor: 3.528

4.  Trends in Thyroid Cancer Incidence and Mortality in the United States, 1974-2013.

Authors:  Hyeyeun Lim; Susan S Devesa; Julie A Sosa; David Check; Cari M Kitahara
Journal:  JAMA       Date:  2017-04-04       Impact factor: 56.272

Review 5.  Prognosis and growth activity depend on patient age in clinical and subclinical papillary thyroid carcinoma.

Authors:  Yasuhiro Ito; Akira Miyauchi; Kaoru Kobayashi; Akihiro Miya
Journal:  Endocr J       Date:  2013-12-09       Impact factor: 2.349

6.  Latent feature representation with stacked auto-encoder for AD/MCI diagnosis.

Authors:  Heung-Il Suk; Seong-Whan Lee; Dinggang Shen
Journal:  Brain Struct Funct       Date:  2013-12-22       Impact factor: 3.270

7.  Long-term impact of initial surgical and medical therapy on papillary and follicular thyroid cancer.

Authors:  E L Mazzaferri; S M Jhiang
Journal:  Am J Med       Date:  1994-11       Impact factor: 4.965

8.  Association of Preoperative US Features and Recurrence in Patients with Classic Papillary Thyroid Carcinoma.

Authors:  Soo-Yeon Kim; Jin Young Kwak; Eun-Kyung Kim; Jung Hyun Yoon; Hee Jung Moon
Journal:  Radiology       Date:  2015-05-08       Impact factor: 11.105

9.  Cancer statistics in Korea: incidence, mortality, survival, and prevalence in 2008.

Authors:  Kyu-Won Jung; Sohee Park; Hyun-Joo Kong; Young-Joo Won; Joo Young Lee; Eun-Cheol Park; Jin-Soo Lee
Journal:  Cancer Res Treat       Date:  2011-03-31       Impact factor: 4.679

10.  Establishment and validation of the scoring system for preoperative prediction of central lymph node metastasis in papillary thyroid carcinoma.

Authors:  Wen Liu; Ruochuan Cheng; Yunhai Ma; Dan Wang; Yanjun Su; Chang Diao; Jianming Zhang; Jun Qian; Jin Liu
Journal:  Sci Rep       Date:  2018-05-03       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.