Literature DB >> 34745008

Computed Tomography-Based Radiomics Model to Predict Central Cervical Lymph Node Metastases in Papillary Thyroid Carcinoma: A Multicenter Study.

Jingjing Li^1,2, Xinxin Wu², Ning Mao^3,4,5, Guibin Zheng⁶, Haicheng Zhang⁴, Yakui Mou^2,5, Chuanliang Jia^2,4,5, Jia Mi⁷, Xicheng Song^2,4,5.

Abstract

Objectives: This study aimed to develop a computed tomography (CT)-based radiomics model to predict central lymph node metastases (CLNM) preoperatively in patients with papillary thyroid carcinoma (PTC).
Methods: In this retrospective study, 678 patients with PTC were enrolled from Yantai Yuhuangding Hot3spital (n=605) and the Affiliated Hospital of Binzhou Medical University (n=73) within August 2010 to December 2020. The patients were randomly divided into a training set (n=423), an internal test set (n=182), and an external test set (n=73). Radiomics features of each patient were extracted from preoperative plain scan and contrast-enhanced CT images (arterial and venous phases). One-way analysis of variance (ANOVA) and least absolute shrinkage and selection operator algorithm were used for feature selection. The K-nearest neighbor, logistics regression, decision tree, linear-support vector machine (linear-SVM), Gaussian-SVM, and polynomial-SVM algorithms were used to establish radiomics models for CLNM prediction. The clinical risk factors were selected by ANOVA and multivariate logistic regression. Incorporated with clinical risk factors, a combined radiomics model was established for the preoperative prediction of CLNM in patients with PTCs. The performance of the combined radiomics model was evaluated using the receiver operating characteristic (ROC) and calibration curves in the training and test sets. The clinical usefulness was evaluated through decision curve analysis (DCA).
Results: A total of 4227 radiomic features were extracted from the CT images of each patient, and 14 non-zero coefficient features associated with CLNM were selected. Four clinical variables (sex, age, tumor diameter, and CT-reported lymph node status) were significantly associated with CLNM. Linear-SVM led to the best prediction model, which incorporated radiomic features and clinical risk factors. Areas under the ROC curves of 0.747 (95% confidence interval [CI] 0.706-0.782), 0.710 (95% CI 0.634-0.786), and 0.764 (95% CI 0.654-0.875) were obtained in the training, internal, and external test sets, respectively. The linear-SVM algorithm also showed better sensitivity (0.702 [95% CI 0.600-0.790] vs. 0.477 [95% CI 0.409-0.545]) and accuracy (0.670 [95% CI 0.600-0.738] vs. 0.642 [95% CI 0.569-0.712]) than an experienced radiologist in the internal test set in the combined radiomics model. The calibration plot reflected a favorable agreement between the actual and estimated probabilities of CLNM. The DCA indicated the clinical usefulness of the combined radiomics model.
Conclusion: The combined radiomics model is a non-invasive preoperative tool that incorporates radiomic features and clinical risk factors to predict CLNM in patients with PTC.

Entities: Chemical

Keywords: central lymph node metastases; computer aided diagnosis; machine learning; papillary thyroid carcinoma; radiomics

Mesh：

Year: 2021 PMID： 34745008 PMCID： PMC8567994 DOI： 10.3389/fendo.2021.741698

Source DB: PubMed Journal: Front Endocrinol (Lausanne) ISSN： 1664-2392 Impact factor: 5.555

Introduction

Papillary thyroid carcinoma (PTC) is the predominant pathological type of thyroid malignancy (1), and its incidence has rapidly increased in the past decade (2, 3). Although PTC is regarded as an indolent tumor, a portion of cancer cells metastasize to lymph nodes (LNs) around the thyroid gland, including central cervical lymph node metastases (CLNM) and lateral CLNM (4). CLNM has been reported in up to 60%–70% of patients and is an important risk factor for locoregional recurrence (5–9). CLNM is an independent risk factor for the prognosis of patients with PTC (10). Tumor cells usually occur first in the central neck (level VI) and then in the lateral neck (levels II, III, and IV) (11). LN metastases rarely occur first in the lateral neck compartments (12, 13). Therefore, central cervical compartment has the highest risk of metastases (14). Central cervical LNs are the “sentinel” LNs of PTC neck LNs. Improving the diagnostic level of central cervical LNs will help reduce unnecessary LN dissection. However, how to identify patients with CLNM accurately remains a problem because of the lack of discriminate features. Preoperative diagnosis of CLNM plays an important role in the formulation of surgical plans (15–17). Thus, the accurate preoperative prediction of CLNM provides a reference for individualized treatment decision. Currently, ultrasound (US) and contrast-enhanced computed tomography (CT) plays an important role in preoperative nodal staging and is a standard procedure in clinical practice (18, 19). However, preoperative US can only detect 20%–31% of CLNM (20, 21) and may only change the surgical procedure of 20% patients (22). Its efficacy in identifying malignant nodes is unsatisfactory (23). By contrast, CT has a variable sensitivity of 35%–77% and specificity of 70%–96% for the diagnosis of CLNM in patients with thyroid cancer (24, 25). Meanwhile, the diagnostic efficacy of CT is still limited by the subjective judgment of different radiologists, which may lead to a considerable proportion of patients being under-staged or over-staged. Radiomics have recently been introduced into medical imaging for preoperative diagnosis as computer-aided systems. It refers to the high-throughput extraction of large numbers of imaging features, thus converting medical images into mineable high-dimensional data; the subsequent quantitative analysis of these data can be used to support surgical decision making (26, 27). Radiomics-based methods were also proposed for the prediction of LN metastases in patients with PTC by converting radiological images into minable high-dimensional data (28–30). To our knowledge, some radiomics-based studies have focused on the preoperative prediction of CLNM in patients with PTC to date, but most of these studies are based on US images to develop radiomics models (31, 32) and are single centered. Studies have yet to develop a radiomics-based model with CT images among multicenter to predict CLNM in patients with PTC. Hence, the study aimed to develop a CT-based radiomics model that would incorporate radiomics features and clinical risk factors for the preoperative prediction of CLNM in patients with PTC and being tested in different centers. This comprehensive model may assist clinicians in selecting the most appropriate treatment strategies to achieve a better outcome.

Materials and Methods

Patients

This retrospective study was approved by the Ethics Committees of Yantai Yuhuangding Hospital and the Affiliated Hospital of Binzhou Medical University, exempted from informed consent in accordance with the Helsinki declaration. This analysis was performed on patients who underwent surgical treatment and were pathologically diagnosed with PTC in Yantai Yuhuangding Hospital (denote as Hospital 1) and the Affiliated Hospital of Binzhou Medical University (denote as Hospital 2) from August 2010 to December 2020. The inclusion criteria were as follows: (a) single lesion; (b) histologically confirmed with PTC; (c) underwent no preoperative anticancer treatment; (d) contrast-enhanced CT was performed within 2 weeks before surgery; and (e) patients underwent neck dissection with ipsilateral lobectomy or total thyroidectomy and received pathological diagnosis of LNs. The exclusion criteria were as follows: (a) postoperative pathological examination showed concomitant non-PTC components in the lesion (such as atypical hyperplasia, follicular tumors, medullary carcinomas, undifferentiated carcinomas, and metastatic carcinomas); (b) biopsy before CT scan; (c) without preoperative thyroid CT scan; (d) the patient was suffering from other concomitant malignancies, such as lymphoma, breast cancer, or liver cancer; (e) the primary tumor was unclear on CT images; (f) the primary tumor had maximum diameter < 0.5 cm; (g) the primary tumor was difficult to segment because of nodular goiter or chronic lymphocytic thyroiditis; and (h) postoperative pathological examination showed multifocal PTC. A total of 678 patients from two hospitals were included in this study. Patients in Hospital 1 (N=605) were divided into a training set (N=423) and an internal test set (N=182) at a ratio of 7:3 by using a completely random classification method. Patients in Hospital 2 were included as an external test set (N=73). Clinical data of patients were collected, including age, sex, CT primary tumor maximum diameter, location, and CT-reported LN status. Preoperative CT findings on LN status were used as the CT-reported LN status. In accordance with the National Comprehensive Cancer Network (version3.2020) Guidelines and some previous studies (19, 23, 33, 34), metastatic lymph nodes (LN+) were considered when at least one of the following criteria presented: (a) LN ≥ 10 mm in the maximal short axis diameter; (b) round or irregular shape; (c) rough margin, fuzzy boundary, and/or invasion into adjacent tissues; (d) calcification or cystic and/or necrotic change; (e) strong enhancement (similar to or stronger than that of the pharyngeal mucosa); and (f) heterogeneous enhancement. On the basis of previous studies (35, 36), suspicious LNs (LNsuspicious) were considered when the LNs did not meet the above criteria but were ≥5 mm in the maximal short axial diameter at cervical level VI. CT-reported LN status was assessed by a radiologist (Dr. A) with 12 years of experience in thyroid radiology, and the results were verified by another radiologist (Dr. B) who had 22 years of subspecialty experience in thyroid radiology. The weighted kappa coefficient was used to evaluate the agreement between these two radiologists. For assessment of overall agreement, the mean κ value was calculated from these pairs. Strength of κ agreement was defined as follows: < 0.00, poor; 0.00–0.20, slight; 0.21–0.40, fair; 0.41–0.60, moderate; 0.61–0.80, substantial; and 0.81–1.00, almost perfect (37). All disagreements were resolved through consultation.

Surgery Strategy

Patients received lobectomy and isthmectomy or total thyroidectomy depending on the clinical TNM stage. For patients with lateral cervical LN metastases, lateral cervical LN dissection was performed. The resected thyroid tissues were processed for pathological examination (including the determination of unifocal or multifocal PTC). The resected LNs were also subjected to pathological examination, and LN metastases were determined.

CT Images Acquisition

Before receiving surgery, all patients underwent contrast-enhanced CT on thyroid with a 16-slice spiral CT scanner (Sensation 16; SIEMENS) or a 64-slice spiral CT scanner (Sensation 64; SIEMENS). After plain CT scanning, a dynamic contrast-enhanced CT scan was performed after intravenous administration of 80–100 mL nonionic contrast material (Iopamidol, 370 mg I/mL, Bracco, Milan, Italy) by using power injection at a rate of 3.5 mL/s followed by saline flush (20 mL). Arterial and venous phase images were obtained at 25 and 60 s, respectively. The slice thickness of the reconstructed image was 1.0 mm. Arterial phase, venous phase, and plain scan CT images were retrieved for image feature extraction. All lesion image parameters (primary tumor maximum diameter, location) were examined and assessed by radiologist Dr. A, and the results were verified by another radiologist Dr. B. The CT images of all target lesions were exported from the Picture Archiving and Communicating System with Digital Imaging and Communications in Medicine format and imported into Radcloud (Huiying Medical Technology Co., Ltd.).

CT Images Preprocessing and Segmentation

Al CT images were imported into Radcloud for image preprocessing, including image normalization, to reduce the difference in grayscale texture between images. The data were resampled to 1 mm*1 mm*1 mm to eliminate the difference in different image scales. Before feature selection, tumor segmentation was performed by manually delineating the volume of interest (VOI) along the tumor contour on each CT image slice using Radcloud. Radiologist Dr. A (12 years of experience) used Radcloud to manually delineate the VOI of the primary thyroid tumor as consistent as possible with the margins of the primary tumor lesion. This study did not delineate the LN region but the primary thyroid tumor because many metastatic LNs are extremely small and difficult to identify on the CT image. In addition, many LNs were unable to correspond to each other between CT image and pathology. For these reasons, this prediction model was based on the primary thyroid tumor. The primary tumor VOI region was evaluated by radiologist Dr. B (22 years of subspecialty experience) to ensure the accuracy of segmentation. Following the same guideline describing how to define the boundary of tumors, both doctors were aware of the diagnosis of thyroid cancer but were blinded to the clinical and histopathologic data. The schematic of image segmentation is shown in .

Figure 1

Flowchart of the process of CT image segmentation and VOI construction. An example of primary tumor segmentation and VOI construction for patients with central lymph node metastasis. VOI segmentation is performed on unenhanced and contrast-enhanced computed tomography images (A). Features are extracted from the VOI of primary tumor (B), including tumor shape, intensity, and texture. The VOI of primary tumor can be shown in a 3D status (C). VOI, volume of interest.

Feature Extraction and Selection

Radiomic features (shape, first-order, texture features) were extracted from each VOI by Radcloud. All feature extraction methods were implemented in Radcloud. These features were divided into three groups, namely, first-order statistics, shape features, and texture features. Intra class correlation coefficients (ICCs) were utilized for evaluating the intra- and inter-observer agreement in terms of feature extraction. First, Dr. A and Dr. B randomly selected the images of 30 patients to evaluate the inter-class reproducibility. After a month, Dr. A repeated the same procedure. The remaining VOI segmentation was performed by Dr. A. Features with ICCs greater than 0.75 indicated satisfactory reproducibility and were selected for radiomic features, and those with ICCs less than 0.75 were eliminated. After one-way analysis of variance (ANOVA), features with p <0.05 were selected. Then, the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm was used to select non-zero coefficients. The optimal penalization coefficient alpha was set by five-fold cross-validation, and radiomic features with non-zero coefficients within the training set were finally selected. To avoid overfitting, we ensured the feature selection, and the subsequent classifications were performed only on the training set rather than on the entire dataset.

Radiomics Model Construction

Basing the radiomic features that were selected from the training set, this study used six algorithms to construct the radiomics model for CLNM prediction: K-nearest neighbor (KNN), logistic regression (LR), decision tree (DT), linear-support vector machine (linear-SVM), Gaussian-SVM, and polynomial-SVM. Linear-SVM, Gaussian-SVM, and polynomial-SVM are SVM algorithms with different kernels, which may cause different results.

Clinical Independent Risk Factor Selection

The significant clinical risk factors were determined using ANOVA. Then, the significant clinical risk factors were integrated into the multivariate LR analysis of backward-stepwise selection. During this procedure, collinearity was considered and the factors with variance inflation factor (VIF) > 10 and p value > 0.05 were eliminated. Akaike information was used as the criterion when the smallest Akaike information criterion was reached, the stepwise procedure was stopped, and the final multivariate LR selected the independent clinical factors.

Construction of the Combined Radiomics Model

The six algorithms (KNN, LR, DT, linear-SVM, Gaussian-SVM, and polynomial-SVM) were used to construct combined radiomics models, which cooperate selected radiomics features and independent clinical risk factors, and the best combined model was selected.

Assessment and Test of Combined Radiomics Model Performance

The prediction performance of the combined radiomics models in the training and test sets was evaluated using the area under the receiver operator characteristic (ROC) curve (AUC). The Youden index method was used to identify the optimal cut-off value in ROC curves. Sensitivity, specificity, and accuracy values were determined by the cut-off value. Calibration of the model was assessed using Platt method (38). The calibration curve of the prediction model is an important index to evaluate the probability accuracy of a disease risk model to predict an individual outcome event in the future. A good degree of calibration indicates a high accuracy of the prediction model, whereas a poor degree of calibration suggests that the model may overestimate or underestimate the risk of disease. Good agreement between the true state of CLNM and the predicted probability based on the radiomics model was achieved when the calibration curves were close to the diagonal line. Decision curve analysis (DCA) was used to evaluate the impact of the radiomics model on the patients’ benefit in predicting the CLNM of PTC to assist clinicians in making treatment decisions. Probability thresholds are important for the trade-off between accepting necessary clinical procedures and avoiding unnecessary ones. The curve with the highest gain is the best prediction method.

Statistical Analysis

All statistical tests were conducted in Python 3.6, R software 4.0.3, and SPSS 26. Scikit-Learn, a python library, was performed for the radiomic feature selection and model construction. The modules of “feature-selection”, “linear-model”, “svm”, “neighbors”, “tree”, and “metrics” were used in feature selection and model construction. In R software, the “rms” package was used to select clinical variables and plot the calibration curves. The “rmda” package was used to perform DCA. In SPSS, categorical variables (i.e., sex and CT-reported LN status) were compared using the χ2 test or Fisher exact test. Continuous variables (i.e., age, tumor diameter) were compared using Student’s t test or Mann-Whitney U test, when appropriate. All levels of statistical significance were two-sided, and p <0.05 was considered to indicate statistical significance. DeLong’s test was used to compare the differences in ROC curves.

Results

Demographics Features

The flow chart of patient screening is shown in . From August 2010 to December 2020, 3689 candidates at Hospital 1 were identified, and 605 who met the inclusion criteria were enrolled and divided into a training set (mean age ± standard deviation [SD], 44±12) and an internal test set (mean age ± standard deviation [SD], 44±12). Similarly, 73 patients were enrolled from August 2010 to December 2020 at Hospital 2 (external test set, mean age ± SD, 44 ±11). The clinical characteristics of 678 patients with PTC were summarized. Of the 678 eligible patients, their average age was 44 ± 12 years. The male-to-female patient ratio was 1:2.8. The mean tumor size was 1.18 cm (median=1.0 cm). The weighted kappa coefficient in CT-reported LN status was 0.8. The clinical information of patients in the training set, internal test set, and external test set is shown in .

Figure 2

Table 1

Clinical characteristics of patients in the training and test sets.

Characteristics	Training Set (N = 423)			Internal test set (N = 182)			External test set (N = 73)
	CLNM (+) (N = 218)	CLNM (-) (N = 205)	p-value	CLNM (+) (N = 94)	CLNM (-) (N = 88)	p-value	CLNM (+) (N = 46)	CLNM (-) (N = 27)	p-value
Age, mean ± SD, years	44± 12	44 ± 12		44± 12	44 ± 12		44 ± 12	44 ± 12
< 45, No. (%)	131 (60.09)	84 (40.98)	<0.001	52 (55.32)	41 (46.59)	0.239	21 (45.65)	12 (44.44)	0.92
≥ 45, No. (%)	87(39.91)	121 (59.02)		42 (44.68)	47 (53.41)		25 (54.35)	15 (55.56)
Sex, No. (%)
Male	70 (32.11)	45 (21.95)	<0.001	24 (25.53)	19 (21.59)	0.532	18 (39.13)	2 (28.57)	0.793
Female	148 (67.89)	160 (78.05)		70 (74.47)	69 (78.41)		28 (60.87)	25 (71.43)
CT-maximum diameter*, No. (%)
<1cm	88 (40.37)	120 (58.54)	<0.001	35 (37.23)	55 (62.50)	<0.05	19 (41.30)	12 (44.44)	0.793
≧1cm	130 (59.63)	85(41.46)		59 (62.77)	33 (37.50)		27 (58.70)	5 (55.56)
CT-reported LN status, No. (%)
LN-negative	114 (52.29)	147 (71.71)	<0.001	45 (47.87)	68 (77.27)	<0.05	10 (21.74)	4 (14.81)	0.468
LN-suspicious	39 (17.89)	45 (21.95)		23 (24.47)	12 (13.64)		0 (0.00)	0 (0.00)
LN-positive	65 (29.82)	13 (6.34)		26 (27.66)	8 (9.09)		36 (78.26)	23 (85.19)

Data are n (%) unless otherwise indicated. No significant divergences were found between CLNM (+) and CLNM (-) in terms of age, sex, CT-maximum diameter, CT-reported LN status in the training set (p < 0.05). CT-longest diameter and CT-reported LN status were significantly different between CLNM (+) and CLNM (-) in the internal test set (p < 0.05), but not in the external test set.

CT, computed tomography; LN, lymph node; CLNM, central lymph node metastasis.

*Largest diameter of the target primary lesion.

The flow chart of patients Recruitment process. Hospital 1 stands for patients from Yantai Yuhuangding Hospital, Hospital 2 stands for patients from the Affiliated Hospital of Binzhou Medical University. Clinical characteristics of patients in the training and test sets. Data are n (%) unless otherwise indicated. No significant divergences were found between CLNM (+) and CLNM (-) in terms of age, sex, CT-maximum diameter, CT-reported LN status in the training set (p < 0.05). CT-longest diameter and CT-reported LN status were significantly different between CLNM (+) and CLNM (-) in the internal test set (p < 0.05), but not in the external test set. CT, computed tomography; LN, lymph node; CLNM, central lymph node metastasis. *Largest diameter of the target primary lesion.

Feature Selection and Radiomics Model Construction

A total of 4227 radiomic features were extracted from the arterial phase, venous phase, and plain scan CT images of each patient. The ICCs ranged from 0.864 to 0.989 in the intra-observer and from 0.832 to 0.957 in the inter-observer. This result indicated that feature extraction within and between observers had good repeatability. Finally, 14 non-zero coefficient features ( ) associated with CLNM prediction were selected after ANOVA and LASSO ( ). In the training set, the polynomial-SVM algorithm demonstrated the highest values for AUC 0.871(95% [confidence interval (CI) 0.841–0.898), whereas the AUC values of KNN, LR, DT, linear-SVM, and Gaussian-SVM were 0.709 (95% CI 0.760–0.744), 0.711 (95% CI 0.669–0.748), 0.760 (95% CI 0.717–0.793), 0.713 (95% CI 0.671–0.751), and 0.796 (95% CI 0.760–0.827), respectively. In the internal test set, the KNN algorithm showed the highest values for AUC (0.695 [CI 0.584–0.699]), whereas the AUC values of LR, DT, linear-SVM, Gaussian-SVM, and polynomial-SVM were 0.669 (95% CI 0.522–0.646), 0.640 (95% CI 0.522–0.646), 0.681 (95% CI 0.533–0.656), 0.687 (95% CI 0.549–0.644), 0.671 (95% CI 0.526–0.647), respectively ( ).

Table 2

Radiomics features selected in ANOVA and LASSO regression analysis.

Radiomics features	Coefficient
lbp-2D_firstorder_Range_ scan	0.007418999
lbp-2D_firstorder_10Percentile_scan	0.004015608
Wavelet-HHL_glrlm_RunLengthNonUniformityNormalized_scanwavelet-HHH_ngtdm_Contrast_scan	-0.07654446-0.00770341
wavelet-HLH_glszm_GrayLevelNonUniformityNormalized_scan	-0.01873799
wavelet-HHL_firstorder_10Percentile_scan	0.015804638
wavelet-HLH_glszm_SmallAreaHighGrayLevelEmphasis_scan	-0.02602846
wavelet-HHH_glszm_HighGrayLevelZoneEmphasis_scan	0.001441019
wavelet-HHL_firstorder_Mean_scan	0.003108156
original_shape_LeastAxisLength_AP	0.006855814
original_shape_Elongation_AP	0.00434467
original_gldm_LargeDependenceEmphasis_AP	0.017969827
original_glrlm_RunLengthNonUniformityNormalized_AP	-0.0047654
logarithm_glrlm_RunLengthNonUniformityNormalized_AP	-0.00065297

Fourteen radiomics features with non-zero coefficients in one-way analysis of variance (ANOVA) and the least absolute shrinkage and selection operator (LASSO) logistic regression model were selected.

Figure 3

Computed tomography (CT) image features selection using one-way analysis of variance (ANOVA) and the least absolute shrinkage and selection operator (LASSO) logistic regression model in the training set. (A) The five-fold cross-validation and the minimal criteria process was used to generate the optimal penalization coefficient lambda (λ) in the LASSO model. The vertical line defines the optimal values of λ, where the model provides its best fit to the data. The optimal λ value of 0.165 with -log (λ) =1.5 was selected. (B) LASSO coefficient profiles of the radiomics features. The vertical line was drawn at the value selected using five-fold cross-validation, where optimal λ resulted in 14 nonzero coefficients.

Figure 4

Receiver operating characteristic (ROC) curves of the radiomics model constructed by six algorithms in the training set (A) and internal test set (B).

Radiomics features selected in ANOVA and LASSO regression analysis. Fourteen radiomics features with non-zero coefficients in one-way analysis of variance (ANOVA) and the least absolute shrinkage and selection operator (LASSO) logistic regression model were selected. Computed tomography (CT) image features selection using one-way analysis of variance (ANOVA) and the least absolute shrinkage and selection operator (LASSO) logistic regression model in the training set. (A) The five-fold cross-validation and the minimal criteria process was used to generate the optimal penalization coefficient lambda (λ) in the LASSO model. The vertical line defines the optimal values of λ, where the model provides its best fit to the data. The optimal λ value of 0.165 with -log (λ) =1.5 was selected. (B) LASSO coefficient profiles of the radiomics features. The vertical line was drawn at the value selected using five-fold cross-validation, where optimal λ resulted in 14 nonzero coefficients. Receiver operating characteristic (ROC) curves of the radiomics model constructed by six algorithms in the training set (A) and internal test set (B).

Development and Test of the Combined Radiomics Model

Tumor size, sex, age, tumor diameter, and CT-reported LN status were identified as independent clinical risk factors of CLNM in PTC. No collinearity was observed because the VIF of the predictor ranged from 1.10 to 1.25. The combined radiomics model included radiomic features and independent clinical risk factors. The performance of prediction among these six machine learning models in the training set and internal test set are detailed in . As shown in , in the internal test set, the highest AUC was 0.709 (95% CI 0.634–0.786) in linear-SVM, whereas the AUC values of the KNN, LR, DT, Gaussian-SVM, and polynomial-SVM were only 0.605 (95% CI 0.507–0.611), 0.706 (95% CI 0.610–0.725), 0.662 (95% CI 0.538–0.655), 0.644 (95% CI 0.520–0.630), and 0.663 (95% CI 0.533–0.647), respectively. In addition, the linear-SVM algorithm was also better than the CT reported LN status (experienced radiologist’s performance) in sensitivity (0.702 [95% CI 0.600–0.790] vs. 0.477 [95% CI 0.409–0.545]) and accuracy (0.670 [95% CI 0.600–0.738] vs. 0.642 [95% CI 0.569–0.712]) in the internal test set in the combined radiomics model. The combined radiomics model constructed using the linear-SVM algorithm performed the best for AUC 0.747 (95% CI 0.706–0.782) in the training set ( ), 0.709 (95% CI 0.634–0.786) in the internal test set ( ), and 0.764 (95%CI 0.654–0.875) in the external test set ( ). In the calibration curve ( ), the gray line represents perfect prediction, and the dotted line represents the calibration curve of the combined radiomics model. The calibration curve showed good agreement between the true state of central LN metastases and the predicted probability based on radiomics. The results of DCA are shown in . When the threshold probability ranged from 0.2 to 0.7 in the internal test set and 0.1 to 0.9 in the external test set, the combined radiomics model to predict CLNM provides more net benefit than the “treat all” or “treat none” scheme. Therefore, this combined radiomics model constructed by linear-SVM showed excellent performance in discrimination, calibration, and clinical usefulness.

Figure 5

Receiver operating characteristic (ROC) curves of the combined model constructed by six algorithms in the training set (A) and internal test set (B).

Table 3

The performance of combined models constructed by six algorithms in the internal test set.

	AUC	95%CI	SEN	95%CI	SPE	95%CI	ACC	95%CI
KNN+ Clinical	0.605	0.507-0.611	0.777	0.677-0.853	0.341	0.245-0.451	0.566	0.491-0.639
LR+ Clinical	0.706	0.610-0.725	0.723	0.620-0.808	0.614	0.503-0.714	0.670	0.570-0.738
DT+ Clinical	0.662	0.538-0.655	0.511	0.406-0.614	0.670	0.561-0.765	0.588	0.513-0.660
Lin-SVM+ Clinical	0.709	0.634-0.786	0.702	0.600-0.790	0.636	0.526-0.734	0.670	0.600-0.738
Gaus-SVM+ Clinical	0.644	0.520-0.630	0.766	0.665-0.845	0.386	0.286-0.497	0.582	0.507-0.645
Ploy-SVM+ Clinical	0.663	0.533-0.647	0.691	0.587-0.780	0.489	0.381-0.597	0.593	0.518-0.666
Clinical	0.660	0.570-0.786	0.574	0.468-0.674	0.636	0.526-0.734	0.604	0.530-0.676

KNN, K-nearest neighbor; LR, logistic regression; DT, decision tree; Lin-SVM, linear-support vector machine; Gaus-SVM, Gaussian- support vector machine; Ploy-SVM, polynomial-SVM.

Figure 6

Receiver operating characteristic (ROC) curves of the combined model (blue lines), Rad-score model (orange lines) and clinical model (green lines) in the training set (A), internal test set (B) set, and external test set (C), respectively. Calibration curves (D) of the combined model in the training and test sets, respectively. The diagonal dotted line represents an ideal prediction, while the red line represents the performance of the training set, the blue line and black line represents the performance of the internal test set and external test set, respectively, Closer fit to the diagonal dotted line indicates a better prediction.

Figure 7

Decision curve analysis (DCA) of training set (A), internal test set (B), and external test set (C). The y-axis measures the net benefit. The x-axis shows the corresponding risk threshold. The grey line represents the assumption that all lesions were central lymph node metastases. The black straight line represents the assumption that all nodules were non-CLNM. If the threshold probability was more than 30%, using the combined model (black line) to predict CLNM added more benefit than the Rad-score model (red line) and clinical model (blue line) in the internal test set; If the threshold probability was more than 10%, using the combined model (black line) to predict CLNM added more benefit than the Rad-score model (red line) and clinical model (blue line) in the external test set.

Receiver operating characteristic (ROC) curves of the combined model constructed by six algorithms in the training set (A) and internal test set (B). The performance of combined models constructed by six algorithms in the internal test set. KNN, K-nearest neighbor; LR, logistic regression; DT, decision tree; Lin-SVM, linear-support vector machine; Gaus-SVM, Gaussian- support vector machine; Ploy-SVM, polynomial-SVM. Receiver operating characteristic (ROC) curves of the combined model (blue lines), Rad-score model (orange lines) and clinical model (green lines) in the training set (A), internal test set (B) set, and external test set (C), respectively. Calibration curves (D) of the combined model in the training and test sets, respectively. The diagonal dotted line represents an ideal prediction, while the red line represents the performance of the training set, the blue line and black line represents the performance of the internal test set and external test set, respectively, Closer fit to the diagonal dotted line indicates a better prediction. Decision curve analysis (DCA) of training set (A), internal test set (B), and external test set (C). The y-axis measures the net benefit. The x-axis shows the corresponding risk threshold. The grey line represents the assumption that all lesions were central lymph node metastases. The black straight line represents the assumption that all nodules were non-CLNM. If the threshold probability was more than 30%, using the combined model (black line) to predict CLNM added more benefit than the Rad-score model (red line) and clinical model (blue line) in the internal test set; If the threshold probability was more than 10%, using the combined model (black line) to predict CLNM added more benefit than the Rad-score model (red line) and clinical model (blue line) in the external test set.

Discussion

Due to there’s still controversial in surgery strategies for thyroid carcinoma, preoperative prediction of thyroid cancer and LN status is of vital importance for the precise treatment of thyroid carcinoma (39). The judgement of LN status is the core factor for routine central LN dissection (40–43). Thus, noninvasive LN assessment tool is warranted. In this multicenter retrospective study, a CT-based comprehensive radiomics model including clinical risk factors and radiomic features was developed for the pretreatment prediction of CLNM in patients with PTC. The linear-SVM model demonstrated the highest performance of predicting CLNM, with an AUC of 0.764 in the external test set. Thus, the linear-SVM algorithm was chosen as the best and compared with experienced radiologist’s performance. The application of the combined radiomics model provides a new approach for establishing prediction models with multiple characteristics. This individualized model has satisfied the requirements of precision medicine development and can be used by doctors and patients easily. Many studies have been conducted on this topic (44–48). Zhu et al. (46) applied six machine learning algorithms coupled with preoperative clinical characteristics and intraoperative information to develop prediction models for CLNM. Yang et al. (47) developed a multicenter nomogram with internal and external validation set C-index values of 0.854 and 0.825, respectively. However, their internal validation set was through a bootstrapping analysis, which might cause the overfitting of the model. In addition, these studies solely used clinical information to construct predictive models. By contrast, our multicenter research included an independent internal test set and an external test set and provided additional radiomics evidence in predicting the CLNM of PTC. Compared with US signature-based nomograms (45, 49, 50), our CT-based combined radiomics model, given the high clarity of CT images in identifying LN metastases, incorporated the CT-reported LN status, clinical risk factors, and radiomics features, which could be obtained through a non-invasive method before surgery. This combined radiomics model showed acceptable performance in predicting LN metastases. This study offers other notable advantages. First, six combined radiomics models were constructed by machine learning algorithms (KNN, LR, DT, linear-SVM, Gaussian-SVM, and polynomial-SVM). The reliability and accuracy of the models can be better demonstrated by comparing the performance of these models. Linear-SVM showed the highest value among these six algorithms. The linear-SVM classifier is implemented specifically for massive levels of data and features (51). Relative to the KNN, LR, Gaussian-SVM, polynomial-SVM, and DT, linear-SVM has faster training and classification speed because it is well suited for high-dimensional features (52). In the present study, linear-SVM showed the highest classification accuracy (0.709), which is higher than the recognition accuracy of the KNN, LR, Gaussian-SVM, polynomial-SVM, and DT algorithms. Hence, the linear-SVM has prominent advantages over the other five algorithms in this study. Second, some preprocessing techniques were applied prior to feature extraction to improve feature discrimination. Third, this study is a multicenter study, and CT images were extracted from different data centers and different machines, which improved the generalization ability of the combined radiomics model. This study has several limitations. First, our sample size was modest and only carried out in China, especially for the testing set, which may limit the performance of our predictive modelling. A large-scale, cross-race study is needed to improve the prediction efficiency of the model. Second, although the manual segmentation method has achieved satisfactory inter- and intra-observer reproducibility of radiomics feature extraction, over 90% of features have good reproducibility (53). However, the automated method of image segmentation may have high stability. Third, tumor diameters < 0.5 cm were not included because they could not be reliably identified and segmented on CT images. In the future, we will use smaller slice gaps and more advanced CT software to improve the detection of smaller tumors. Fourth, this study only analyzed single lesion tumor in patients with PTC instead of multiple tumors to avoid selection bias. Thus, the generalization ability of the model is limited. Fifth, the radiomic features were extracted from the primary tumor instead of LNs, which may affect the accuracy of the model. Moreover, only the conventional machine learning radiomics method was used in this study. Some studies have shown that deep learning methods have a certain value in predicting LN metastases. We will continue our research with deep learning methods in the future. In summary, our combined radiomics model is a noninvasive predictive tool that combines the radiomic features of CT images with independent clinical risk factors and shows good application prospects. A large sample size, cross-race study should be carried out to optimize the model and improve the efficiency of the model in subsequent studies.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.

Ethics Statement

Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author Contributions

JL implemented the conceptualization, methodology, data curation, and writing original draft preparation. XW implemented literature searching, and manuscript writing. NM Identified the radiological characteristics of PTC and estimated and adjusted the accuracy of VOIs. GZ implemented VOI segmentation, and literature searching. HZ implemented data analysis and figures making. YM and CJ implemented visualization, and investigation. JM implemented writing- reviewing and editing. XS conducted the design, quality control, and data interpretation. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by the “Taishan Scholar” Project (No. ts20190991).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

50 in total

Review 1. Controversies in the surgical management of thyroid follicular neoplasms. Retrospective analysis of 721 patients.

Authors: Giovanni Conzo; Pietro Giorgio Calò; Claudio Gambardella; Ernesto Tartaglia; Claudio Mauriello; Cristina Della Pietra; Fabio Medas; Rosa Santa Cruz; Francesco Podda; Luigi Santini; Giancarlo Troncone
Journal: Int J Surg Date: 2014-05-22 Impact factor: 6.071

2. Integrated 18F-FDG PET/CT for the initial evaluation of cervical node level of patients with papillary thyroid carcinoma: comparison with ultrasound and contrast-enhanced CT.

Authors: Han-Sin Jeong; Chung-Hwan Baek; Young-Ik Son; Joon-Young Choi; Hyung-Jin Kim; Young-Hyeh Ko; Jae-Hoon Chung; Hye-Jin Baek
Journal: Clin Endocrinol (Oxf) Date: 2006-09 Impact factor: 3.478

3. Long-Term Efficacy of Lymph Node Reoperation for Persistent Papillary Thyroid Cancer: 13-Year Follow-Up.

Authors: Amblessed E Onuma; Eliza W Beal; Fadi Nabhan; Tasha Hughes; William B Farrar; John Phay; Matthew D Ringel; Richard T Kloos; Lawrence A Shirley
Journal: Ann Surg Oncol Date: 2019-02-28 Impact factor: 5.344

4. Prophylactic central compartment lymph node dissection in papillary thyroid carcinoma: clinical implications derived from the first prospective randomized controlled single institution study.

Authors: D Viola; G Materazzi; L Valerio; E Molinaro; L Agate; P Faviana; V Seccia; E Sensi; C Romei; P Piaggi; L Torregrossa; S Sellari-Franceschini; F Basolo; P Vitti; R Elisei; P Miccoli
Journal: J Clin Endocrinol Metab Date: 2015-01-15 Impact factor: 5.958

1. Ultrasound-based radiomics XGBoost model to assess the risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma: Individual application of SHAP.

Authors: Yan Shi; Ying Zou; Jihua Liu; Yuanyuan Wang; Yingbin Chen; Fang Sun; Zhi Yang; Guanghe Cui; Xijun Zhu; Xu Cui; Feifei Liu
Journal: Front Oncol Date: 2022-08-26 Impact factor: 5.738

1 in total