Wenping Chen1, Mengying Xu2, Yiwen Sun3, Changfeng Ji1, Ling Chen4, Song Liu1, Kefeng Zhou1, Zhengyang Zhou1. 1. From the Department of Radiology, Nanjing DrumTower Hospital, Clinical College of Nanjing Medical University. 2. Department of Radiology, The Affiliated Hospital of Nanjing University Medical School. 3. Nuclear Medicine. 4. Pathology, Nanjing Drum Tower Hospital, Clinical College of Nanjing Medical University, Nanjing, China.
Abstract
OBJECTIVES: The aims of the study were to integrate characteristics of computed tomography (CT), texture, and hematological parameters and to establish predictive models for lymph node (LN) metastasis in lung adenocarcinoma. METHODS: A total of 207 lung adenocarcinoma cases with confirmed postoperative pathology and preoperative CT scans between February 2017 and April 2019 were included in this retrospective study. All patients were divided into training and 2 validation cohorts chronologically in the ratio of 3:1:1. The χ2 test or Fisher exact test were used for categorical variables. The Shapiro-Wilk test and Mann-Whitney U test were used for continuous variables. Logistic regression and machine learning algorithm models based on CT characteristics, texture, and hematological parameters were used to predict LN metastasis. The performance of the multivariate models was evaluated using a receiver operating characteristic curve; prediction performance was evaluated in the validation cohorts. Decision curve analysis confirmed its clinical utility. RESULTS: Logistic regression analysis demonstrated that pleural thickening (P = 0.013), percentile 25th (P = 0.033), entropy gray-level co-occurrence matrix 10 (P = 0.019), red blood cell distribution width (P = 0.012), and lymphocyte-to-monocyte ratio (P = 0.049) were independent risk factors associated with LN metastasis. The area under the curve of the predictive model established using the previously mentioned 5 independent risk factors was 0.929 in the receiver operating characteristic analysis. The highest area under the curve was obtained in the training cohort (0.777 using Naive Bayes algorithm). CONCLUSIONS: Integrative predictive models of CT characteristics, texture, and hematological parameters could predict LN metastasis in lung adenocarcinomas. These findings may provide a reference for clinical decision making.
OBJECTIVES: The aims of the study were to integrate characteristics of computed tomography (CT), texture, and hematological parameters and to establish predictive models for lymph node (LN) metastasis in lung adenocarcinoma. METHODS: A total of 207 lung adenocarcinoma cases with confirmed postoperative pathology and preoperative CT scans between February 2017 and April 2019 were included in this retrospective study. All patients were divided into training and 2 validation cohorts chronologically in the ratio of 3:1:1. The χ2 test or Fisher exact test were used for categorical variables. The Shapiro-Wilk test and Mann-Whitney U test were used for continuous variables. Logistic regression and machine learning algorithm models based on CT characteristics, texture, and hematological parameters were used to predict LN metastasis. The performance of the multivariate models was evaluated using a receiver operating characteristic curve; prediction performance was evaluated in the validation cohorts. Decision curve analysis confirmed its clinical utility. RESULTS: Logistic regression analysis demonstrated that pleural thickening (P = 0.013), percentile 25th (P = 0.033), entropy gray-level co-occurrence matrix 10 (P = 0.019), red blood cell distribution width (P = 0.012), and lymphocyte-to-monocyte ratio (P = 0.049) were independent risk factors associated with LN metastasis. The area under the curve of the predictive model established using the previously mentioned 5 independent risk factors was 0.929 in the receiver operating characteristic analysis. The highest area under the curve was obtained in the training cohort (0.777 using Naive Bayes algorithm). CONCLUSIONS: Integrative predictive models of CT characteristics, texture, and hematological parameters could predict LN metastasis in lung adenocarcinomas. These findings may provide a reference for clinical decision making.
Lung carcinoma is the second most commonly diagnosed cancer and the leading cause of cancer-related deaths worldwide.[1] The most common path of metastasis in patients with nonsmall cell lung cancer (NSCLC) is through the lymph node (LN). The National Comprehensive Cancer Network guidelines (2020) demonstrated that the presence of mediastinal LN metastasis has a profound impact on prognosis and treatment decisions.[2] For medically operable diseases, resection is the preferred local treatment modality. A thorough dissection of metastatic mediastinal LN during surgery plays a key role in improving disease-free survival and overall survival rates among the patients.[3] Therefore, it is necessary to accurately evaluate the preoperative LN metastasis in NSCLC.Currently, there are many methods to preoperatively assess the LN status in NSCLC. However, invasive methods, including endobronchial ultrasonography-guided transbronchial needle aspiration and thoracoscopy, are not routinely performed.[4,5] Noninvasive methods include computed tomography (CT), positron emission tomography–CT, and magnetic resonance imaging. The misdiagnosis and false-negative rates are higher in positron emission tomography–CT for diagnosing LN metastasis relative to the final pathological staging after complete nodal dissection (the criterion standard).[6] The time required for performing a magnetic resonance scan is long. Lymph nodes greater than 1 cm in short-axis diameter are considered metastatic nodes. However, the accuracy of preoperative CT scanning in distinguishing LN status is too low for sufficient preoperative staging.[7,8] Lymph node metastasis is misdiagnosed in CT scan analysis because of the presence of normal-sized N2 nodes.Most studies have reported a significant association between LN metastasis and the radiological features of the primary tumor. Zhao et al[9] reported that tumor size greater than 2.65 cm was an independent predictor of LN metastasis. Moreover, several studies using texture analysis describe a correlation between primary tumor and LN metastasis. Bayanati et al[10] confirmed the potential of CT texture analysis for accurately differentiating malignant from benign mediastinal nodes in lung cancer. In addition, several studies reported that hematological inflammatory biomarkers could be used to predict the tumor–node–metastasis stage of lung cancer.[11] Xu et al[12] showed that the neutrophil-to-lymphocyte ratio (NLR) in lung cancer may be an independent predictive marker for the N stage. However, only a few studies have established an integrative model based on radiological features, texture, and hematological parameters to predict LN metastasis.Recently, with the widespread application of regression models and the development of machine learning algorithms, multivariate model evaluation methods have also matured. Therefore, our study aimed to incorporate the radiological features, texture, and hematological parameters to establish predictive models for LN metastasis for lung adenocarcinomas.
MATERIAL AND METHODS
Patients
This retrospective study was approved by the local ethics committee, and the requirement for informed consent was waived. Patients who were at our hospital between February 2017 and April 2019 met the following criteria were collected and analyzed retrospectively in our hospital. The inclusion criteria were as follows: (1) patients who underwent radical resection of lung cancer with systematic LN dissection; (2) postoperative pathology confirmed as lung adenocarcinoma; (3) complete preoperative information on clinical data and CT images; and (4) single lesion. The exclusion criteria were as follows: (1) image quality was poor and could not be used for analysis; (2) patients who had received radiotherapy or/and chemotherapy before surgery; (3) patients had a history of malignancy in other sites; and (4) patients who underwent CT examination more than 1 month before the surgery (Fig. 1).
FIGURE 1
Flowchart shows the patient selection process.
Flowchart shows the patient selection process.Finally, 207 patients (91 men and 116 women; age range, 36–85 years; mean age, 60.5 years) were enrolled in this study. The patients were divided into 3 cohorts (1 training cohort and 2 validation cohorts) in a ratio of 3:1:1.
Hematological Test
Hematological parameters, including white blood cell (WBC) count, lymphocyte count, neutrophil count, monocyte count (MONO), red blood cell count, platelet count, hemoglobin, red blood cell distribution width (RDW), C-reactive protein, albumin (ALB), and globulin, were recorded within 2 weeks before the surgery. Based on the previously mentioned parameters, the hemoglobin/RDW ratio, NLR, lymphocyte-to-monocyte ratio (LMR), platelet-to-lymphocyte ratio, C-reactive protein-to-albumin ratio (CAR), and platelet-to-monocyte ratio (PMR) were calculated.
Computed Tomography Examination
Chest CT images were acquired using 16- or 64-row multidetector spiral CT (VCT 64 or Discovery HD 750, GE Healthcare; iCT 256 or Ingenuity Flex 16, Philips Healthcare; or uCT780, United Imaging, China). The CT scan parameters were as follows: tube voltage, 120 kV; tube current, automatic; rotation time, 0.7 seconds; and matrix, 512 × 512. The CT images were scanned at 5-mm section thickness and reconstructed with a 1.25-mm section thickness. The flow diagram of our study is shown in Figure 2.
FIGURE 2
Workflow of key steps in our study. Polygonal regions of interest on the axial CT section are manually drawn. Hematological features are collected. Computed tomography characteristics and texture parameters are extracted from the defined tumor regions of CT images. The logistic regression and machine learning algorithms are used to construct predictive models. The prediction models are established incorporating CT characteristics, texture parameters, and hematological parameters. The performance of the multivariate models is evaluated using the ROC curve. NEU, neutrophil count; LYM, lymphocyte count; RBC, red blood cell; PLT, platelet count; CRP, C-reactive protein; LASSO, least absolute shrinkage and selection operator. Figure 2 can be viewed online in color at www.jcat.org.
Workflow of key steps in our study. Polygonal regions of interest on the axial CT section are manually drawn. Hematological features are collected. Computed tomography characteristics and texture parameters are extracted from the defined tumor regions of CT images. The logistic regression and machine learning algorithms are used to construct predictive models. The prediction models are established incorporating CT characteristics, texture parameters, and hematological parameters. The performance of the multivariate models is evaluated using the ROC curve. NEU, neutrophil count; LYM, lymphocyte count; RBC, red blood cell; PLT, platelet count; CRP, C-reactive protein; LASSO, least absolute shrinkage and selection operator. Figure 2 can be viewed online in color at www.jcat.org.
Imaging Analysis
Computed Tomography Morphological Characteristics
Readers 1 and 2 (both with 5 years of experience in chest CT diagnosis) evaluated each lesion on the CT images together. Their inconsistent results were confirmed in consensus through consultation. All the CT images were reviewed using the lung and mediastinal window settings in the image processing software. Computed tomography morphological characteristics included: (1) border; (2) attenuation; (3) lobulation; (4) spiculation; (5) calcification; (6) vascular convergence sign; (7) air bronchogram sign; (8) vacuole sign; (9) nodule/mass type; (10) adjacent pleural thickening; (11) pleural indentation; (12) obstructive pulmonary emphysema; (13) peripheral fibrosis; (14) pleural effusion; (15) single enlarged LN; (16) multiple enlarged LNs; and (17) calcified LNs.
Quantitative CT Value Parameters
Quantitative CT values were measured by reader 1 to avoid calcification, vacuoles, cavities, and bronchial shadows on the maximal section of the lesions. The mean, maximum, and minimum CT attenuations in the nonenhanced phase were recorded as CTmean, CTmax, and CTmin, respectively. The corresponding standard deviation (SD) value was recorded as SD1. In addition, the long and short diameters of the lesions were measured and recorded. To determine interobserver reproducibility, reader 2 repeated the previously mentioned procedure.
Computed Tomography Texture Parameters
Polygonal regions of interest in the nonenhanced CT images were manually drawn along the margin of the lesion on the largest cross-section by reader 1 to avoid calcification, vacuoles, cavities, and bronchial shadows. Texture parameters were as follows: (1) the first-order features included the mean, SD2, max frequency, mode, minimum, maximum, cumulative percentiles (5th, 10th, 25th, 50th, 75th, and 90th percentiles), skewness, kurtosis, entropy, and histogram width; (2) the second-order features were from the gray-level co-occurrence matrix (GLCM) and included entropy GLCM, energy GLCM, inertia GLCM, and variance GLCM. To confirm interobserver reproducibility, reader 2 repeated the previously mentioned procedure.
Development, Performance, and Testing of Multivariate Models
First, in the training cohort, variables with significant differences (P < 0.05) in the univariate analysis were used for multivariate binomial logistic regression. The Hosmer-Lemeshow test was used to measure the goodness of fit. A multivariate model was applied to the 2 validation cohorts.Next, if significance (P < 0.05) was met in the univariate analysis of the training cohort, for dimension reduction, the least absolute shrinkage and selection operator analysis was performed. The retained features were input into our in-house software programmed using the Python Scikit-learn package (Python version 3.8, Scikit-learn version 0.22.2, http://scikit-learn.org/). The machine learning classifiers of support vector machine (SVM), Naive Bayes (NB), and random forest (RF) were used to generate multivariate models. The ratio of cases in the training and validation cohorts was 3:1:1. In the training phase, a popular data preprocessing method in machine learning—Synthetic Minority Oversampling Technique—was used to address the class imbalance problem. The models were evaluated by repeated stratification (K = 5) cross-testing. Multivariate models based on machine learning classifiers were applied to the 2 validation cohorts. The performance of the multivariate models was evaluated using a receiver operating characteristic (ROC) curve and the values for the area under of curve (AUC) value, diagnostic sensitivity, specificity, and accuracy were determined. Furthermore, to evaluate the clinical usefulness of the multivariate model, a decision curve analysis was performed by calculating the net benefits for a range of threshold probabilities in the 2 validation cohorts.
Statistical Analysis
Statistical analyses were performed using SPSS (version 22.0, Microsoft Windows x64; SPSS) and MedCalc Statistical Software (version 11.4.2.0, MedCalc Software bvba; http://www.medcalc.org; 2011), and a 2-tailed P value less than 0.05 was defined as statistically significant. The χ2 test or Fisher exact test (n < 5) was used for categorical variables. Continuous variables, including hematological parameters, CT value parameters, texture parameters, and the long and short diameters of the lesion, were tested for their normality using the Shapiro-Wilk test, and accordingly, the Mann-Whitney U test was used for nonnormally distributed variables. The interobserver agreement of CT values and texture parameters was estimated using the intraclass correlation coefficient (ICC; 0.000–0.200: poor; 0.201–0.400: fair; 0.401–0.600: moderate; 0.601–0.800: good; and 0.801–1.000: excellent).
RESULTS
Patient Characteristics
Among the 207 lung adenocarcinoma cases, 50 (24.2%) had LN metastasis, while 157 (75.8%) did not. As shown in Table 1, a statistically significant difference was found in sex between patients with and without LN metastasis in the training cohort (P < 0.05). No significant differences in age were found in the training cohort (P > 0.05). No significant differences in sex or age were found between the 2 validation cohorts (all P > 0.05).
TABLE 1
Demographic and CT Morphological Characteristics of Patients With Lung Adenocarcinomas
Variable
Training Cohort
Validation Cohort One
Validation Cohort Two
N−
N+
P
N−
N+
P
N−
N+
P
Age, y
0.626
0.987
0.761
<60
44
14
18
4
13
3
≥60
44
21
17
3
21
5
Sex
0.021*
0.413
0.433
Female
60
16
13
4
20
3
Male
28
19
22
3
14
5
Pleural indentation
0.048*
0.244
1.000
Absent
28
5
10
1
6
1
Present
60
30
25
6
28
7
Pleural thickening
0.003*
0.631
0.237
Absent
69
18
27
4
31
6
Present
19
17
8
3
3
2
Air bronchogram
0.016*
0.353
0.238
Absent
25
18
7
2
15
6
Present
63
17
28
5
19
2
Attenuation
0.003*
0.654
0.173
GGO
11
1
4
0
2
0
Part solid
30
4
15
1
15
1
Solid
47
30
16
6
17
7
*P < 0.05 was considered statistically significant.
GGO indicates ground-glass opacity.
Demographic and CT Morphological Characteristics of Patients With Lung Adenocarcinomas*P < 0.05 was considered statistically significant.GGO indicates ground-glass opacity.
Univariate Analyses
Among the CT qualitative parameters, attenuation, pleural indentation, pleural thickening, and air bronchogram were significantly different between patients with and without LN metastasis in the training cohort (all P < 0.05; Table 1, Fig. 3).
FIGURE 3
Typical morphological features of lung adenocarcinomas on CT images. A, Ground-glass nodule of the right upper lobe without solid component (white arrow). B, Part solid nodule of the right lower lobe (white arrow). C, Solid mass of the right lower lobe with 2 linear pleural tags (black arrows) and spiculation sign (white arrow). D, Spiculated homogeneous solid mass of right lower lobe with vacuoles (white arrow). E, Irregular mass of the left upper lobe with large cavitation (white arrow). F, Irregular mass of the right upper lobe with lobulated margins and air bronchiolograms (white arrow).
Typical morphological features of lung adenocarcinomas on CT images. A, Ground-glass nodule of the right upper lobe without solid component (white arrow). B, Part solid nodule of the right lower lobe (white arrow). C, Solid mass of the right lower lobe with 2 linear pleural tags (black arrows) and spiculation sign (white arrow). D, Spiculated homogeneous solid mass of right lower lobe with vacuoles (white arrow). E, Irregular mass of the left upper lobe with large cavitation (white arrow). F, Irregular mass of the right upper lobe with lobulated margins and air bronchiolograms (white arrow).Among the CT quantitative parameters, there were significant differences in the mean CT attenuation, minimum CT attenuation, SD1, long diameter, and short diameter between the different LN statuses in the training cohort (all P < 0.05).Among the texture parameters, 24 of 35 were significantly different between patients with and without LN metastasis in the training cohort (all P < 0.05; Table 2). There were significant differences in values of MONO, RDW, NLR, LMR, and PMR between patients with and without LN metastasis in the training cohort (Table 3).
TABLE 2
Univariate Analysis of Quantitative CT and Texture Parameters in the Training Cohort
Variable
N−
N+
P
CTmean, HU
−16.85 (−202.40 to 30.88)
31.01 (8.76 to 49.51)
<0.001*
CTmin, HU
−811.00 (−1024.00 to −198.25)
−163.00 (−330.00 to −100.00)
<0.001*
SD1
185.73 (64.83 to 273.49)
68.35 (47.69 to 96.16)
<0.001*
Long diameter, cm
2.26 (1.60 to 3.00)
2.76 (2.00 to 3.75)
0.018*
Short diameter, cm
1.60 (1.30 to 2.28)
2.49 (1.50 to 3.03)
0.003*
Mean, HU
20.89 (−146.96 to 37.49)
35.09 (28.10 to 57.51)
0.002*
SD2
86.77 (62.36 to 192.88)
60.15 (51.06 to 79.43)
<0.001*
Max frequency
5.00 (3.00 to 8.00)
11.00 (6.00 to 16.00)
<0.001*
Mode, HU
7.5 (−175.75 to 44.50)
37.00 (9.00 to 64.00)
0.010*
Min, HU
−306.50 (−724.50 to −164.25)
−177.00 (−256.00 to −125.00)
0.001*
Percentile 5th, HU
−115.00 (−516.50 to −57.50)
−56.00 (−98.00 to −37.00)
<0.001*
Percentile 10th, HU
−72.00 (−433.25 to −36.25)
−33.00 (−66.00 to −19.00)
0.001*
Percentile 25th, HU
−18.50 (−317.25 to 0.75)
−1.00 (−19.00 to 18.00)
0.002*
Percentile 50th, HU
23.50 (−115.25 to 40.00)
34.00 (27.00 to 59.00)
0.008*
Area, mm2
154.02 (84.78 to 260.24)
318.39 (168.56 to 580.39)
<0.001*
Max diameter, mm
18.26 (13.51 to 26.00)
25.27 (16.82 to 35.22)
0.004*
Histogram width, HU
217.00 (148.00 to 495.50)
153.00 (122.00 to 215.00)
<0.001*
Entropy GLCM 10
7.42 (5.55 to 8.23)
8.63 (7.63 to 9.26)
<0.001*
Entropy GLCM 11
7.60 (5.90 to 8.29)
8.62 (7.77 to 9.33)
<0.001*
Entropy GLCM 12
7.46 (5.80 to 8.24)
8.57 (7.69 to 9.18)
<0.001*
Entropy GLCM 13
7.61 (6.03 to 8.42)
8.72 (7.88 to 9.45)
<0.001*
Energy GLCM 10†
6.33 (3.54 to 21.68)
2.87 (1.81 to 5.19)
<0.001*
Energy GLCM 11†
5.51 (3.48 to 17.47)
2.91 (1.82 to 4.68)
<0.001*
Energy GLCM 12†
6.00 (3.56 to 18.25)
3.05 (1.93 to 5.10)
<0.001*
Energy GLCM 13†
5.41 (3.21 to 15.95)
2.60 (1.61 to 4.66)
<0.001*
Inertia GLCM 10
270.55 (204.91 to 515.96)
198.34 (150.62 to 336.28)
0.013*
Variance GLCM 10
150.90 (110.97 to 237.46)
124.04 (87.15 to 165.90)
0.008*
Variance GLCM 11
167.42 (111.73 to 237.04)
123.72 (87.04 to 200.13)
0.027*
Variance GLCM 13
170.53 (118.31 to 260.85)
132.15 (95.72 to 195.09)
0.030*
The data are presented as median with (1st quartile to 3rd quartile).
*P < 0.05 was considered statistically significant.
†×10−3.
SD1, standard deviation in quantitative CT parameters; SD2, standard deviation in texture parameters.
TABLE 3
Univariate Analysis of Hematological Parameters in the Training Cohort
Variable
N−
N+
P
MONO count, 109/L
0.30 (0.30 to 0.48)
0.40 (0.30 to 0.60)
0.009*
RDW, %
12.85 (12.50 to 13.30)
13.20 (12.80 to 13.80)
0.020*
NLR
1.63 (1.25 to 2.19)
1.94 (1.43 to 2.79)
0.042*
LMR
5.00 (4.04 to 6.92)
3.83 (2.60 to 6.00)
0.008*
PMR
533.33 (436.25 to 789.72)
434.00 (263.75 to 650.00)
0.008*
The data are presented as median with (1st quartile to 3rd quartile).
*P < 0.05 was considered statistically significant.
Univariate Analysis of Quantitative CT and Texture Parameters in the Training CohortThe data are presented as median with (1st quartile to 3rd quartile).*P < 0.05 was considered statistically significant.†×10−3.SD1, standard deviation in quantitative CT parameters; SD2, standard deviation in texture parameters.Univariate Analysis of Hematological Parameters in the Training CohortThe data are presented as median with (1st quartile to 3rd quartile).*P < 0.05 was considered statistically significant.Among the quantitative CT parameters in the training cohort, CTmean had the highest AUC value (0.719), with a sensitivity of 80.0% and a specificity of 61.4%. The texture parameters using maximum frequency had a good ability to predict LN metastasis with an AUC of 0.736, a sensitivity of 60.0%, and a specificity of 77.3% in the training cohort (Table 4). Platelet-to-monocyte ratio had the highest AUC value of 0.852 for the 5 optimal hematological parameters, with a sensitivity of 57.1% and specificity of 72.7% in the training cohort (Table 5, Fig. 4).
TABLE 4
The Diagnostic Performance of Quantitative CT and Texture Parameters in the Training Cohort
Variable
Cutoff
Sensitivity
Specificity
AUC
Accuracy
P
CTmean, HU
2.90
80.0%
61.4%
0.719
66.7%
<0.001*
CTmin, HU
−344.00
77.1%
67.0%
0.716
69.9%
<0.001*
SD1
130.80
82.9%
61.4%
0.717
67.5%
<0.001*
Long diameter, cm
2.32
71.4%
53.4%
0.638
58.5%
0.018*
Short diameter, cm
2.40
51.4%
81.8%
0.674
73.1%
0.003*
Mean, HU
22.08
85.7%
51.1%
0.676
60.9%
0.002*
SD2
115.83
94.3%
43.2%
0.716
57.7%
<0.001*
Max frequency
8.00
60.0%
77.3%
0.736
72.4%
<0.001*
Mode, HU
23.00
68.6%
61.4%
0.649
63.4%
0.010*
Min, HU
−257.00
77.1%
58.0%
0.699
63.4%
0.001*
Percentile 5th, HU
−163.00
91.4%
45.5%
0.709
58.6%
<0.001*
Percentile 10th, HU
−151.00
94.3%
40.9%
0.698
56.1%
0.001*
Percentile 25th, HU
−51.00
91.4%
39.8%
0.675
54.5%
0.002*
Percentile 50th, HU
24.00
85.7%
51.1%
0.654
60.9%
0.008*
Area, mm2
260.54
60.0%
76.1%
0.702
71.5%
<0.001*
Max diameter, mm
20.42
71.4%
58.0%
0.667
61.8%
0.004*
Histogram width, HU
246.00
91.4%
46.6%
0.706
59.3%
<0.001*
Entropy GLCM 10
8.56
57.1%
84.1%
0.721
76.4%
<0.001*
Entropy GLCM 11
8.54
57.1%
84.1%
0.722
76.4%
<0.001*
Entropy GLCM 12
8.29
60.0%
79.5%
0.723
74.0%
<0.001*
Entropy GLCM 13
8.56
60.0%
83.0%
0.723
76.5%
<0.001*
Energy GLCM 10†
3.20
60.0%
80.7%
0.714
74.8%
<0.001*
Energy GLCM 11†
3.00
60.0%
83.0%
0.718
76.5%
<0.001*
Energy GLCM 12†
5.30
82.9%
56.8%
0.722
64.2%
<0.001*
Energy GLCM 13†
2.80
60.0%
83.0%
0.720
76..5%
<0.001*
Inertia GLCM 10
198.33
51.4%
78.4%
0.644
70.7%
0.013*
Variance GLCM 10
96.83
40.0%
88.6%
0.654
74.8%
0.008*
Variance GLCM 11
104.89
40.0%
85.2%
0.628
72.3%
0.027*
Variance GLCM 13
107.43
40.0%
81.8%
0.625
69.9%
0.030*
*P < 0.05 was considered statistically significant.
†×103.
SD1 indicates standard deviation in quantitative CT parameters; SD2, standard deviation in texture parameters.
TABLE 5
The Diagnostic Performance of Hematological Parameters in the Training Cohort
Variable
Cutoff
Sensitivity
Specificity
AUC
Accuracy
P
MONO count, 109/L
0.30
65.7%
54.5%
0.648
57.7%
0.009*
RDW, %
13.30
48.6%
78.4%
0.634
69.9%
0.020*
NLR
2.52
42.9%
84.1%
0.618
72.4%
0.042*
LMR
4.00
57.1%
75.0%
0.653
69.9%
0.008*
PMR
457.50
57.1%
72.7%
0.852
68.3%
0.008*
*P < 0.05 was considered statistically significant.
FIGURE 4
The histogram shows hematological parameters in different LN status. * P < 0.05 was considered statistically significant. CAR, C-reactive protein-to-albumin ratio; PLR, platelet-to-lymphocyte ratio.
The Diagnostic Performance of Quantitative CT and Texture Parameters in the Training Cohort*P < 0.05 was considered statistically significant.†×103.SD1 indicates standard deviation in quantitative CT parameters; SD2, standard deviation in texture parameters.The Diagnostic Performance of Hematological Parameters in the Training Cohort*P < 0.05 was considered statistically significant.The histogram shows hematological parameters in different LN status. * P < 0.05 was considered statistically significant. CAR, C-reactive protein-to-albumin ratio; PLR, platelet-to-lymphocyte ratio.
Multivariate Analyses
Variables with significant differences (P < 0.05) in the univariate analysis were subjected to binary logistic regression analysis in the training cohort. The results demonstrated that pleural thickening (P = 0.013), percentile 25th (P = 0.033), entropy GLCM 10 (P = 0.019), RDW (P = 0.012), and LMR (P = 0.049) were independent risk factors associated with LN metastasis (Table 6). These 5 independent risk factors were chosen to establish the predictive model. The ROC curve results showed that the AUC of the predictive model was 0.929 (Fig. 5). The results were higher than those of the single-factor parameters. The model was tested in the 2 validation cohorts and values of AUCs were 0.886 and 0.871, respectively (Table 7, Supplementary Table 1, http://links.lww.com/RCT/A136). Decision curve analysis results for the multivariate models in the 2 validation cohorts are plotted in Figure 6.
TABLE 6
Binomial Logistic Regression Results for Prediction of LN Metastasis in Lung Adenocarcinomas
Variable
B
SE
Wald
df
P
Pleural thickening
2.234
0.900
6.165
1
0.013*
Percentile 25th, HU
0.275
0.129
4.533
1
0.033*
Entropy GLCM 10
−16.428
7.024
5.470
1
0.019*
RDW, %
1.278
0.506
6.375
1
0.012*
LMR
−0.580
0.294
3.889
1
0.049*
Predictive model
−7.659
10.685
0.514
1
0.473
*P < 0.05 was considered statistically significant.
B indicates the estimated value of the regression coefficient given by the statistical software; df, degree of freedom; SE, standard error.
FIGURE 5
Receiver operating characteristic analysis to predict LN metastasis in lung adenocarcinomas. The values of AUCs for pleural thickening, percentile 25th, entropy GLCM 10, RDW, LMR, and predictive model were 0.635, 0.675, 0.721, 0.634, 0.653, and 0.929, respectively. The predictive model presented good performance in predicting LN metastasis than univariate parameters. PRE, prediction probability. Figure 5 can be viewed online in color at www.jcat.org.
TABLE 7
The Diagnostic Performance of the Models in the Training and 2 Validation Cohorts
AUC
Model
Logistic Regression
SVM
Naive Bayes
Random Forest
Training cohort
0.929
0.767
0.777
0.734
Validation cohort 1
0.886
0.747
0.710
0.714
Validation cohort 2
0.871
0.879
0.702
0.842
FIGURE 6
Decision curve analysis for the multivariate models based on regression analysis in validation cohort 1 (A) and validation cohort 2 (B). The y-axis indicates the net benefit, and the x-axis indicates threshold probability. Compared with the simple diagnoses such as all LN metastasis in patients with lung adenocarcinomas (blue lines) or all patients without LN metastasis (black lines), the multivariate models (red lines) had the highest net benefit across the majority of the range of reasonable threshold probabilities at which a patient would be diagnosed as LN metastasis. Figure 6 can be viewed online in color at www.jcat.org.
Binomial Logistic Regression Results for Prediction of LN Metastasis in Lung Adenocarcinomas*P < 0.05 was considered statistically significant.B indicates the estimated value of the regression coefficient given by the statistical software; df, degree of freedom; SE, standard error.Receiver operating characteristic analysis to predict LN metastasis in lung adenocarcinomas. The values of AUCs for pleural thickening, percentile 25th, entropy GLCM 10, RDW, LMR, and predictive model were 0.635, 0.675, 0.721, 0.634, 0.653, and 0.929, respectively. The predictive model presented good performance in predicting LN metastasis than univariate parameters. PRE, prediction probability. Figure 5 can be viewed online in color at www.jcat.org.The Diagnostic Performance of the Models in the Training and 2 Validation CohortsDecision curve analysis for the multivariate models based on regression analysis in validation cohort 1 (A) and validation cohort 2 (B). The y-axis indicates the net benefit, and the x-axis indicates threshold probability. Compared with the simple diagnoses such as all LN metastasis in patients with lung adenocarcinomas (blue lines) or all patients without LN metastasis (black lines), the multivariate models (red lines) had the highest net benefit across the majority of the range of reasonable threshold probabilities at which a patient would be diagnosed as LN metastasis. Figure 6 can be viewed online in color at www.jcat.org.
Machine Learning Algorithm
Table 7 lists the values of AUCs for the 3 models based on machine learning algorithms. The greatest AUC in the training cohort model of 0.777 was obtained by using NB algorithm (Supplementary Table 2, http://links.lww.com/RCT/A136).
Interobserver Agreement
Among all the 41 CT continuous variables, 4 parameters of the interobserver agreements were good (0.643–0.796) and 29 of those were excellent (0.803–0.982; Table 8).
TABLE 8
Interobserver Agreement of Quantitative CT and Texture Parameters
Variable
ICC (95% CI)
Variable
ICC (95% CI)
CTmean
0.891 (0.859–0.916)
CTmax
0.254 (0.122–0.377)
CTmin
0.625 (0.534–0.701)
SD1
0.697 (0.619–0.761)
Long diameter
0.889 (0.856–0.914)
Short diameter
0.889 (0.857–0.915)
Mean
0.952 (0.937–0.963)
Histogram width
0.851 (0.809–0.885)
SD2
0.847 (0.804–0.882)
Entropy GLCM 10
0.979 (0.972–0.984)
Max frequency
0.939 (0.921–0.954)
Entropy GLCM 11
0.980 (0.973–0.985)
Mode
0.796 (0.740–0.841)
Entropy GLCM 12
0.981 (0.975–0.985)
Minimum
0.803 (0.749–0.847)
Entropy GLCM 13
0.979 (0.973–0.984)
Maximum
0.836 (0.790–0.873)
Energy GLCM 10
0.901 (0.871–0.924)
Percentile 5th
0.875 (0.838–0.903)
Energy GLCM 11
0.924 (0.902–0.942)
Percentile 10th
0.890 (0.858–0.915)
Energy GLCM 12
0.897 (0.867–0.921)
Percentile 25th
0.927 (0.905–0.944)
Energy GLCM 13
0.820 (0.770–0.860)
Percentile 50th
0.965 (0.955–0.973)
Inertia GLCM 10
0.263 (0.132–0.385)
Percentile 75th
0.982 (0.976–0.986)
Inertia GLCM 11
0.643 (0.556–0.717)
Percentile 90th
0.970 (0.961–0.977)
Inertia GLCM 12
0.288 (0.158–0.408)
Skewness
0.537 (0.433–0.628)
Inertia GLCM 13
0.651 (0.565–0.723)
Kurtosis
0.547 (0.443–0.635)
Variance GLCM 10
0.342 (0.215–0.456)
Entropy
0.899 (0.870–0.923)
Variance GLCM 11
0.973 (0.965–0.980)
Area
0.975 (0.968–0.981)
Variance GLCM 12
0.412 (0.293–0.519)
Max diameter
0.937 (0.918–0.952)
Variance GLCM 13
0.848 (0.805–0.883)
SsD low
0.851 (0.808–0.984)
SD1 indicates standard deviation in CT value quantitative parameters; SD2, standard deviation in texture parameters.
Interobserver Agreement of Quantitative CT and Texture ParametersSD1 indicates standard deviation in CT value quantitative parameters; SD2, standard deviation in texture parameters.
DISCUSSION
In this study, 207 lung adenocarcinoma cases were divided into training cohort and 2 validation cohorts. Qualitative CT, quantitative CT, texture, and hematological parameters were analyzed to predict LN metastasis. Parameters with significant differences (P < 0.05) in the univariate analysis were chosen as input parameters for the binary logistic regression analysis and machine learning algorithm and the prediction model was established. The results showed that the AUC values of the binary logistic regression models were 0.929, 0.886, and 0.871 in the 3 cohorts, respectively. The highest AUC value of the machine learning algorithm model was 0.777 in the training cohort using NB algorithm. The highest AUC values of the machine learning algorithm were 0.747 and 0.879, respectively, in the 2 validation cohorts by using SVM.Among the qualitative CT parameters, pleural indentation, pleural thickening, attenuation, and air bronchogram were significantly different in the training cohort. Malignant lesions tend to cause pleural thickening and indentations close to the pleura.[13,14] Malignant tumors are prone to LN metastasis. The risk of LN metastasis is greater in lung adenocarcinomas, which are diagnosed as solid lesions. This is probably because the blood supply to ground-glass opacity lesions is not as rich as that in the solid lesions.[15] Lymph node metastasis was found in 21.3% of patients with air bronchogram and 41.9% without air bronchogram with lung adenocarcinomas. Hattori et al[16] confirmed the significance of the presence of an air bronchogram in the lung adenocarcinoma as a predictor of LN-negative metastasis. However, Li et al[13] reported that tumors with an air bronchogram were more common in the LN-positive metastasis group than in the LN-negative metastasis group. Further validation with larger sample size is needed to confirm these results.Five quantitative CT value parameters were found to be statistically significant in the training cohort. The mean CT attenuation and minimum CT attenuation were higher in the LN-positive metastasis group. This might be because the lesions with more solid components have an abundant blood supply.[15] The values of long diameter and short diameter were higher in the LN-positive metastasis group. The larger is the tumor size, the higher is the risk of LN metastasis.[17]Among the texture parameters, 24 were statistically significant and mainly included the percentile, second-order entropy, and second-order energy series. The lower percentiles (5th–25th) are referred to as the ground-glass component.[18] The higher is the value, the lower is the ground-glass component. Lesions with fewer ground-glass components are more likely to develop LN metastasis.[19] This was consistent with our results of CT morphological assessment. In this study, the values of the second-order entropy GLCM 10-13 were higher in the LN-positive metastasis group relative to the LN-negative metastasis group. Entropy quantitatively features the heterogeneity of the tumor CT values.[20,21] The higher is the heterogeneity, the more malignant is the tumor, thereby resulting in a higher risk of LN metastasis became. However, the values of the second-order energy GLCM 10-13 were lower in the LN-positive metastasis group than those in the LN-negative metastasis group. The energy features indicate the uniformity of gray-level voxel pairs.[22] The more uniform is the tumor, the lower is its degree of malignancy, and the lower is the associated risk of LN metastasis.Recently, it has become a common practice to add clinical information in radiological studies. In this study, we also incorporated hematological factors and radiological parameters to predict LN metastasis. The results showed that significant differences in the values of MONO, RDW, NLR, LMR, and PMR between patients with different LN statuses in the training cohort. The correlation between hematological factors and tumors needs confirmed further. Wang et al[23] reported that a decreased LMR is considered to be associated with a worse prognosis of patients due to their important roles in the initiation and development of cancers. Our results showed that the LMR values were significantly lower in the LN-positive metastasis group. Thus, the results of these 2 studies were similar.Statistically significant parameters, including 5 CT morphological characteristics, 5 CT value quantitative parameters, 24 texture parameters, and 5 hematological parameters, were subjected to binary logistic regression analysis in the training cohort. The results demonstrated that pleural thickening, percentile 25th, entropy GLCM 10, RDW, and LMR were independent risk factors associated with LN metastasis and were chosen further to establish a predictive model. Decision curve analysis indicated that multivariate models based on regression analysis were useful for predicting LN metastasis in lung adenocarcinomas, which suggested the net benefit of its clinical consequences according to the threshold probability.The AUC of the predictive model was 0.929, which was higher than those in the previous studies, thereby leading to overfitting.[18,24] Therefore, we also established models using machine learning algorithms to predict LN metastasis. Before model building, least absolute shrinkage and selection operator analysis was used for dimension reduction. The results showed that the highest AUC value of the machine-learning algorithm model was 0.777 in the training cohort by using NB algorithm. Generally, machine learning algorithm models require a larger sample size. The larger is the sample size, the higher is the efficiency of the model. Therefore, further studies with larger sample sizes should be performed. In addition, we analyzed the consistency of the included parameters. Four parameters of the interobserver agreements were good (0.643–0.796), and 29 of those were excellent (0.803–0.982).However, our study had some limitations. First, the sample size was relatively small. This was a single-center study and external validation is lacking. Thus, larger sample sizes should be used, and multicenter cooperation in the future is necessary to validate these findings. Second, our study was retrospective in design, and patient inclusion bias was inevitable. Third, we did not evaluate the interobserver consistency in CT morphological characteristics. Finally, texture analysis was performed on the 2-dimensional images by selecting only the cross-section of the maximum slice. This contains little information and may not reflect the features of the entire tumor.
CONCLUSIONS
Multivariate models incorporating CT morphological characteristics, CT value quantitative parameters, texture, and hematological parameters using logistic regression and machine learning algorithms could predict LN metastasis in lung adenocarcinomas. These findings may provide a reference for clinical decision making.
Authors: Michael Brun Andersen; Stefan Walbom Harders; Balaji Ganeshan; Jesper Thygesen; Hans Henrik Torp Madsen; Finn Rasmussen Journal: Acta Radiol Date: 2015-08-12 Impact factor: 1.990
Authors: Thanos Sioris; Ritva Järvenpää; Pekka Kuukasjärvi; Heikki Helin; Seppo Saarelainen; Matti Tarkka Journal: Eur J Cardiothorac Surg Date: 2003-03 Impact factor: 4.191
Authors: Klaus L Prenzel; Stefan P Mönig; Jan M Sinning; Stephan E Baldus; Hans-Georg Brochhagen; Paul M Schneider; Arnulf H Hölscher Journal: Chest Date: 2003-02 Impact factor: 9.410