Tianqi Zhang1,2, Xiuling Li1, Jianhua Liu2. 1. College of Applied Mathematics, 66445Jilin University of Finance and Economics, Changchun, China. 2. Department of Radiology, 12510the Second Hospital of Jilin University, Changchun, China.
Abstract
BACKGROUND: Pure ground-glass nodules (pGGNs) have been considered inert tumors due to their biological behavior; however, their prognosis is not completely consistent because of differences in internal pathological component. The aim of this study was to explore whether radiomics can be used to identify the invasiveness of pGGNs. METHODS: The retrospective study received the relevant ethical approval. After postoperative pathological confirmation, sixty-five patients with lung adenocarcinoma pGGNs (≤30 mm) were enrolled in this study from January 2015 to October 2018. All the cases were randomly divided into training and test groups in a 7:3 ratio. In total, 385 radiomics features were obtained from HRCT images, and then least absolute shrinkage and selection operator (LASSO) logistic regression was applied to the training group to obtain optimal features to distinguish the invasion degree of lesions. The diagnostic efficiency of the radiomics model was estimated by the area under the curve (AUC) of the receiver operating curve (ROC), and verified by the test group. RESULTS: The optimal features ("GLCMEntropy_angle135_offset1" and "Sphericity") were selected after applying the LASSO regression to develop the proposed radiomics model. This prediction model exhibited good differentiation between pre-invasive and invasive lesions. The AUC for the test group was 0.824 (95%CI: 0.599-1.000), indicating that the radiomics model has some prediction ability. CONCLUSION: The HRCT radiomics features can discriminate pre-invasive from invasive lung adenocarcinoma pGGNs. This non-invasive method can provide more information for surgeons before operation, and can also predict the prognosis of patients to some extent.
BACKGROUND: Pure ground-glass nodules (pGGNs) have been considered inert tumors due to their biological behavior; however, their prognosis is not completely consistent because of differences in internal pathological component. The aim of this study was to explore whether radiomics can be used to identify the invasiveness of pGGNs. METHODS: The retrospective study received the relevant ethical approval. After postoperative pathological confirmation, sixty-five patients with lung adenocarcinoma pGGNs (≤30 mm) were enrolled in this study from January 2015 to October 2018. All the cases were randomly divided into training and test groups in a 7:3 ratio. In total, 385 radiomics features were obtained from HRCT images, and then least absolute shrinkage and selection operator (LASSO) logistic regression was applied to the training group to obtain optimal features to distinguish the invasion degree of lesions. The diagnostic efficiency of the radiomics model was estimated by the area under the curve (AUC) of the receiver operating curve (ROC), and verified by the test group. RESULTS: The optimal features ("GLCMEntropy_angle135_offset1" and "Sphericity") were selected after applying the LASSO regression to develop the proposed radiomics model. This prediction model exhibited good differentiation between pre-invasive and invasive lesions. The AUC for the test group was 0.824 (95%CI: 0.599-1.000), indicating that the radiomics model has some prediction ability. CONCLUSION: The HRCT radiomics features can discriminate pre-invasive from invasive lung adenocarcinoma pGGNs. This non-invasive method can provide more information for surgeons before operation, and can also predict the prognosis of patients to some extent.
It is an indisputable fact that the mortality and morbidity of lung cancer are still high.[1-3] Recently, many studies have shown that the incidence of lung adenocarcinoma has exceeded other lung cancer types.[4,5] With rapidly increasing morbidity due to lung adenocarcinoma, the number of deaths is soaring year by year worldwide.[6,7] It has, consequently, attracted the interest of many researchers so that they have directed their attention to this critical issue. In China, some studies have confirmed that the incidence of lung adenocarcinoma in different regions is as high as 43%–46%, which is about twice that of squamous cell carcinoma.[8,9] Lung adenocarcinoma was re-classified according to its invasive component by IASLC/ATS/ERS in 2011.
Atypical adenomatous hyperplasia (AAH) and adenocarcinoma in situ (AIS) are pre-invasive lesions; minimally invasive adenocarcinoma (MIA) and invasive adenocarcinoma (IAC) were considered as invasive lesions. Some early lung adenocarcinoma appears on CT as ground glass nodules (GGNs) <30 mm.[10-12] Based on the existence or non-existence of solid components, they can be categorized into pure ground-glass nodules (pGGNs) and mixed ground glass nodules (mGGNs). Some researchers suggest that pGGNs do not have an invasive component; however, many studies have disproved it.[13-15] Lung adenocarcinoma can be regarded as a continuous process, from AAH to IAC.
Therefore, it is crucial to accurately judge the pathological types of pGGNs, assisting surgeons to treat the lesions in time, and avoid a poor prognosis.However, it was worth noting that AIS and MIA diagnosis requires complete surgical excision of the lesion,
indicating that needle biopsy cannot accurately determine invasiveness before operation. Therefore, it is a challenge for clinicians to make appropriate decisions. With the innovation and popularity of high-resolution computed tomography (HRCT), the findings of pGGNs have showed an upward trend. However, the diagnostic ability of radiologists varies due to differences in working years and educational background.[17,18] Therefore, a more accurate and objective method is an urgent necessity to distinguish pre-invasive and invasive lesions to assist clinical decisions.Chest CT scan had become a routine diagnostic method for patients suspected of having lung cancer. Through simple and convenient imaging examination, clinicians can detect lesions earlier and provide timely treatment to reduce mortality.[19,20] Compared with ordinary chest CT, HRCT has the advantage that it can better display the anatomical structure of the chest, especially the fine boundaries of blood vessels and bronchus. Therefore, more details of the interior and edge of the nodules can be obtained by HRCT.[21-24]In recent years, radiologists focus on radiomics because more information can be obtained by texture analysis rather than by the naked eyes. Medical images can be analyzed carefully to obtain high-dimensional digital information and quantitative characteristics related to diseases(Figure 1).[25,26] Some studies have confirmed that radiomics is an effective method to distinguish the invasion, metastasis, and prognosis of lung adenocarcinoma, and it is also an important method to assist clinical decision-making.[27,28]
Figure 1.
Process of radiomics analysis.
Process of radiomics analysis.This study mainly focuses on radiomics features on HRCT to differentiate the invasion degree of pGGNs. If the pathological type of pGGNs can be accurately determined by non-invasive means before the operation, it can assist clinicians in preparing individualized treatment plans for lung adenocarcinoma patients. Long-term follow-ups, rather than surgery for the elderly and infirm patients, can significantly improve patient’s quality of life. Usually, invasive lesions need lobectomy and mediastinal lymph node dissection; therefore, the damage range is larger.
To sum up, radiologists hope to make preoperative judgments on the invasiveness of pGGNs by radiomics to avoid delays or overtreatment.
Methods
Patients
As a retrospective study, this project has been approved by the ethics department to exempt patients from informed consent (No.66, 2019). pGGNs were defined as a single, somewhat round, non-solid, hazy opacity in pulmonary parenchyma measuring ≤ 30 mm. Two radiologists with > 11 years of experience in chest diagnosis reconfirmed these lesions from the longest axial diameter as pure ground-glass opacity. This study has been conducted in accordance with the guideline of STARD 2015.
The Inclusion Criteria
(a) From January 2015 to October 2018, all patients had pGGNs on HRCT, and the diameter was less than 30 mm in the longest axial; (b) no history of malignant tumors or no antineoplastic therapy before HRCT scan; (c) all the histopathological results confirmed as lung adenocarcinoma by the official pathological reports after surgery; (d) complete clinical history and image data that could be viewed in the HIS (Hospital Information System); (e) HRCT examinations performed within 3 weeks before the operation.
Exclusion Criteria
(a) large areas of infection, severe interstitial lung disease, or respiratory motion artifacts; (b) preoperative radiotherapy or chemotherapy. The identity information of all patients has been hidden, only the lesions CT images are retained.
Histological Evaluation
All the surgical specimens were fixed in formalin aqueous solution, dehydrated, and paraffin-embedded. Finally, pathological sections were obtained by H&E (hematoxylin and eosin) staining. The diagnosis was made by one pathologist, then the results were checked by another senior pathologist. According to the latest lung adenocarcinoma pathological classification in 2011, all patients were assigned into pre-invasive (AAH/AIS) and invasive (MIA/IAC) groups.
Extraction Radiomics Feature
The HRCT chest scan was achieved by a 256-slice CT scanner (Brilliance, Philips, the Netherlands), with a tube voltage of 140 kVp, and a tube current 350 mAs. Scanning layer thickness was 1 mm and scanning layer spacing was 1 mm, a pitch of 0.342, a matrix array of 1024 × 1024, and a 1-3 second scan time. All the patients were scanned in a supine position and an inspiratory stage. Images post-processing confirmed to the standards. Routine chest HRCT includes images of pulmonary window and mediastinal window, the region of interest (ROI) was manually obtained in pulmonary window images by ITK-SNAP software. Texture extraction was performed via the Artificial Intelligence Kit software (A.K. software; GE Healthcare, China). The manually obtained ROI was input into the A.K. software, and the software automatically analyzed image data and extracted the features. A total of 385 radiomics parameters, including first-order histogram features (n = 42), shape-based features (n = 9), high-order textural features (n = 334) were obtained based on the ROI by using A.K. software.
Construction of Radiomics Model
After extracting all the features of the training group, the least absolute shrinkage and selection operator (LASSO) was selected to sift out redundant features. LASSO selected the best radiomics features from vast number of multicollinearity image features. LASSO curve had a relationship between binomial deviance and log(λ) to extract the best value for log(λ), which was detected by the lower criterion value and the standard error for that criterion. The LASSO binary logistic regression analysis was performed by R (3.5.3) and RStudio (1.2.1335).[31-33] In the pre-invasive and the invasive groups, the R software randomly divided all eligible patients in a 7:3 ratio, using 70% of patients as the training group to select suitable parameters to establish a radiomics model, and the remaining 30% as a test group to verify the established model.
Statistical Analysis
Statistical analysis in features extraction and model construction processes was automatically completed by RStudio. Logistic regression was implemented by performing specific function operations on the filtered features. The sensitivity, specificity, and the receiver operating characteristic curve (ROC) and area under the curve (AUC) were extracted from these features. P-value ≤ 0.05 was considered statistically significant.
Results
Patients and Histological Characteristics
We sifted through all HRCT examinations from January 2015 to October 2018 and selected patients according to the above criteria. In this study, pathological results were obtained from surgical excision rather than biopsy procedures. Finally, 65 pGGNs in 65 individuals constituted the study population, including 38 women and 27 men, with an age range from 37 to 74 years (mean age: 58.8 ± 8.4). The 65 patients were assigned to pre-invasive (7 AAH and 31 AIS) and invasive (20 MIA and 7 IAC) groups according to the pathological type.
Radiomics features extraction results
A total of 385 radiomics features were extracted from the training group in the study. There were a lot of redundant parameters in the radiomics features extracted from ROIs. The best features can be selected by LASSO regression model to calculate the Rad-score of the radiomics model. In order to avoid over fitting during the process of LASSO regression, a 10-fold cross validation as adopted to screen the best penalty parameter log(λ). With the increase of log(λ), the number of remaining features gradually decreased, as shown in Figure 2, only 3 features were retained after dimension reduction process. After stepwise regression and 5 iteration processed, only 2 meaningful features (GLCMEntropy_angle135_offset1 and Sphericity) remained, and the A.K. software established the radiomics model by these 2 features. Table 1 presents the radiomics model parameters. Two features were validated in the training and test group, respectively. The medians and interquartile ranges of the features were selected by a logistic regression of the 2 texture and shape features on HRCT for both pre-invasive and invasive lung adenocarcinomas. The median values of “GLCMEntropy_angle135_offset1” and “Sphericity” were significantly higher in the pre-invasive than invasive lesions, indicating that both features had very good predictive power between pre-invasive group and invasive group (Figure 3), and confirming that the radiomics model can distinguish invasive lesions from pre-invasive lesions.
Figure 2.
Selection of the optimal feature according to binomial deviance by the least absolute shrinkage and selection operator model.
Table 1.
Coefficients for the Radiomics Model (* P < 0.05, ** P < 0.01).
Characteristics
Coefficients
Std. Error
Z Value
P-Value
Intercept
18.455
6.307
2.93
0.0034 **
GLCMEntropy_angle135_offset1
−0.734
0.252
−2.91
0.0036 **
Sphericity
−14.758
6.930
−2.13
0.0332 *
Figure 3.
A, GLCMEntropy_angle135_offset1 and B, sphericity. The medians and interquartile ranges of each of the texture and shape features on high-resolution computed tomography.
Selection of the optimal feature according to binomial deviance by the least absolute shrinkage and selection operator model.Coefficients for the Radiomics Model (* P < 0.05, ** P < 0.01).A, GLCMEntropy_angle135_offset1 and B, sphericity. The medians and interquartile ranges of each of the texture and shape features on high-resolution computed tomography.
Performance of the Radiomics Model
We verified the discriminative power of the radiomics model in the training group and the test group, respectively. The ROC of the training group was shown in Figure 4, and the AUC of training group was 0.880 (95% CI: 0.778–0.982), specificity and sensitivity were 0.808 and 0.944, respectively. Since the radiomics model features were collected in the training group, it was not surprising that the model exhibited high predictive ability. Therefore, it was necessary to re-evaluate the diagnostic ability of this radiomics model in the test group. AUC of the test group was 0.824 (95% CI: 0.599-1.000), specificity and sensitivity were 0.750 and 0.889, respectively. Because the AUC value of the test group was slightly lower than that of the training group, we verified the consistency of the 2 group through DeLong’s test. The test was used to estimate 2 correlated ROC curves; a P-value of 0.7 indicated no significant difference between the 2 groups.
Figure 4.
Receiver operating curve curve for the prediction of the radiomics model in 2 groups. A, AUC of train group is 0.880, specificity and sensitivity is 0.808 and 0.944, respectively. B, AUC of test group is 0.824, specificity and sensitivity is 0.750 and 0.889, respectively. AUC, area under the curve.
Receiver operating curve curve for the prediction of the radiomics model in 2 groups. A, AUC of train group is 0.880, specificity and sensitivity is 0.808 and 0.944, respectively. B, AUC of test group is 0.824, specificity and sensitivity is 0.750 and 0.889, respectively. AUC, area under the curve.
Discussion
A decade ago, IASLC/ATS/ERS published the latest edition of pathological typing of pulmonary adenocarcinoma in 2011 10. An ever-increasing number of radiologists began to pay attention to the relationship between imaging features and tumor molecular subtypes.[34,35] As an inert tumor, pulmonary adenocarcinoma has a very high postoperative survival rate.[36,37] However, many patients and doctors are still anxious about pGGNs and eager to know more about their surgical treatments. It has led to many unnecessary injury and treatment, bringing economic and psychological pressure to patients. According to the latest pathological classification of lung adenocarcinoma in 2011, pre-invasive lesions include AAH and AIS, which exhibit a lepidic pattern without invasion. MIA mainly exhibits in lepidic pattern with minor invasion (≤0.5 cm).
Therefore, MIA and IAC progress faster than pre-invasive lesions and require closer attention in the follow. The advantage of paying close attention to invasive lesions is detecting enlarged lesions or the presence of solid components, which require immediate intervention, reducing the possibility of recurrence or lymph node metastasis after surgery. Conversely, the elderly or individuals with poor lung function and pre-invasive lesions can choose long-term follow-ups or sublobar resection.[36,38,39] Based on the above, it is necessary to study lung adenocarcinoma pGGNs comprehensively to distinguish the invasive from pre-invasive lesions.In association with equipment updating, the chest CT resolution has gradually improved; however, the information radiologists can obtain with naked eyes is still limited. This study took HRCT as the research object, the high-throughput features in the medical images were extracted by computer software, and finally the radiomics model was established. In this way, image features could be transformed into high-dimensional information, establishing a non-invasive method to assist radiologists in judging the pathological types of tumors.The basis of features extraction is the accurate segmentation of lesions in medical images, with correct ROI ensuring the reliability of radiomics study.[40,41] In the present study, HRCT images were selected because of their high resolution; they can display more lesion details.
ROI was manually sketched by radiologists through an open-source software ITK-SNAP, and the results were input into the A.K. software. After the software carried out automatic calculations, we obtained 385 radiomic parameters based on Histogram, Form Factor, Run-length Matrix (RLM), Gray-level Co-occurrence Matrix (GLCM), and Gray-level Size Zone Matrix (GLSZM). These raw parameters contained much redundant information; therefore, further simplification was necessary to achieve meaningful results. After screening by LASSO regression, 2 meaningful features, “GLCMEntropy_angle135_offset1” and “Sphericity,” remained to construct a radiomics model. The specificity, sensitivity, ACC, and AUC of the model in both training and test groups showed pretty good predictive ability, which confirmed the feasibility of radiomics model in the study of lung adenocarcinoma pGGNs.Sphericity is a parameter for describing the shape of the modules. As shown in Figure 4, comparing the sphericity of the pre-invasive group, it was significantly higher than that of the invasive group, reflecting the morphological differences of lesions with different malignant degrees. Some researchers suggest that lobulated and spiculated margins appear more frequently in invasive than pre-invasive lesions, which may impact the sphericity of nodules.[15,43] Besides, entropy is a parameter used to describe the randomness of intensity images.
Some researchers suggest that entropy is a quantitative feature that reflects the heterogeneity of tumors and is associated with the invasiveness of malignant tumors.[44,45] Some studies showed that radiomics analysis differentiates between inert and invasive lesions.
Hwang et al
used texture analysis to study lung adenocarcinomas, reporting that features such as entropy and homogeneity reliably discriminated IAC from pre-invasive pGGNs ≥5 mm. Meanwhile, they also analyzed lesions ≥10 mm and reached the same conclusion. She et al
retrospectively studied 402 nodules and obtained 5 features, including entropy, through texture analysis. Radiomics model established based on these features exhibited good prediction in both primary (AUC = 0.95) and validation cohorts (AUC = 0.89). Xue et al
constructed a radiomics model based on quantitative and qualitative features, and the AUC of their model in the training and validation cohorts was 0.76 and 0.79, respectively. This result is slightly lower than the AUC of the training and test group in this study (0.880 and 0.824, respectively). Shim et al
analyzed 191 GGNs and reported that the 75th percentile CT attenuation value and entropy could be regarded as independent predictors for IAC. Despite the differences in various studies, these results still demonstrate that radiomics can reasonably predict the invasiveness of lung adenocarcinoma pGGNs.The present research had some limitations. First, this retrospective study was still limited in the number of samples. There were only 7 cases of AAH and IAC, respectively. Small sample size and potential selection bias may interfere with the division of training group and test group. The impact of this interference on this radiomics model needs to be further confirmed. Second, DICOM data obtained by one center and one CT scan unit can ensure the consistency of data, but it decreases the ability of comprehensive evaluation of the model. Finally, there might be subjectivity in manually sketched ROI by radiologists. Therefore, we will continue to enlarge the sample size and collect multi-center data for further research.
Conclusion
In conclusion, radiomics model may provide a non-invasive method for judging the invasiveness of pGGNs≤30 mm in lung adenocarcinomas. It may able to assist clinicians in rendering personalized treatment to patients with different pathological types in the future.
Authors: Hyun Ju Lee; Jin Mo Goo; Chang Hyun Lee; Chang Min Park; Kwang Gi Kim; Eun-Ah Park; Ho Yun Lee Journal: Eur Radiol Date: 2008-10-17 Impact factor: 5.315
Authors: Bartjan de Hoop; Hester Gietema; Saskia van de Vorst; Keelin Murphy; Rob J van Klaveren; Mathias Prokop Journal: Radiology Date: 2010-02-01 Impact factor: 11.105
Authors: Claudia I Henschke; David F Yankelevitz; Rowena Yip; Anthony P Reeves; Ali Farooqi; Dongming Xu; James P Smith; Daniel M Libby; Mark W Pasmantier; Olli S Miettinen Journal: Radiology Date: 2012-03-27 Impact factor: 11.105