Literature DB >> 35417660

Development and Validation of Machine Learning Models to Predict Epidermal Growth Factor Receptor Mutation in Non-Small Cell Lung Cancer: A Multi-Center Retrospective Radiomics Study.

Yafeng Liu¹, Jiawei Zhou¹, Jing Wu^1,2, Wenyang Wang¹, Xueqin Wang¹, Jianqiang Guo¹, Qingsen Wang¹, Xin Zhang¹, Danting Li¹, Jun Xie³, Xuansheng Ding^1,4,5, Yingru Xing^1,6, Dong Hu^1,2,3.

Abstract

OBJECTIVE: To develop and validate a generalized prediction model that can classify epidermal growth factor receptor (EGFR) mutation status in non-small cell lung cancer patients.
METHODS: A total of 346 patients (296 in the training cohort and 50 in the validation cohort) from four centers were included in this retrospective study. First, 1085 features were extracted using IBEX from the computed tomography images. The features were screened using the intraclass correlation coefficient, hypothesis tests and least absolute shrinkage and selection operator. Logistic regression (LR), decision tree (DT), random forest (RF), and support vector machine (SVM) were used to build a radiomics model for classification. The models were evaluated using the following metrics: area under the curve (AUC), calibration curve (CAL), decision curve analysis (DCA), concordance index (C-index), and Brier score.
RESULTS: Sixteen features were selected, and models were built using LR, DT, RF, and SVM. In the training cohort, the AUCs was .723, .842, .995, and .883; In the validation cohort, the AUCs were .658, 0567, .88, and .765. RF model with the best AUC, its CAL, C-index (training cohort=.998; validation cohort=.883), and Brier score (training cohort=.007; validation cohort=0.137) showed a satisfactory predictive accuracy; DCA indicated that the RF model has better clinical application value.
CONCLUSION: Machine learning models based on computed tomography images can be used to evaluate EGFR status in patients with non-small cell lung cancer, and the RF model outperformed LR, DT, and SVM.

Entities: Chemical

Keywords: computed tomography; epidermal growth factor receptor; machine learning; non–small cell lung cancer; radiomics

Mesh：

Substances：
ErbB Receptors

Year: 2022 PMID： 35417660 PMCID： PMC9016531 DOI： 10.1177/10732748221092926

Source DB: PubMed Journal: Cancer Control ISSN： 1073-2748 Impact factor: 2.339

Introduction

Approximately 85% of lung cancers are non–small cell lung cancers (NSCLC), which have high recurrence rates and poor prognosis.[1,2] In the treatment of NSCLC, first-line chemotherapy regimens are only 30% effective, whereas the effectiveness of epidermal growth factor receptor-tyrosine kinase inhibitor (EGFR-TKI) therapy in patients with EGFR-sensitive mutations reaches 70%. The presence of EGFR-sensitive mutations is a major predictor of the effectiveness of drugs with EGFR. Tissue biopsy to determine the EGFR gene status in NSCLC patients is extremely accurate; however, it has some limitations, such as difficulties in obtaining tissue samples and high economic costs.[6,7] With the rapid development of the most advanced artificial intelligence technology and radiomics, high-throughput extraction of radiomics features from medical images is required to quantify the shape, intensity, and texture of tumors to comprehensively characterize the tumor phenotype, and noninvasive radiomics models have shown great potential in diagnosis, prognosis, and genetic information.[10-12] In recent years, the use of positron emission tomography/computed tomography (PET/CT) or enhanced CT images to forecast the status of EGFR mutations has promoted the progress of relevant studies.[13-17] However, due to differences in population distribution, living area, economy, and medical institution equipment capacity involved in separate studies, the research results in economically developed regions may not be suitable for the region where this research team is located. Therefore, in this retrospective study, we collected radiographic data from four centers involving populations with different demographic factors. Applying machine learning to radiomics constructs a strong generalization model to predict EGFR mutations in patients with NSCLC, providing a reference for clinical practice.

Data and Methods

Patient Imaging and Clinical Data

NSCLC radiogenomics data were obtained from the Cancer Imaging Archive portal and included 211 patients. Among these, 129 patients had wild-type EGFR, 43 had EGFR mutations, and 39 had unknown genes. We included all patients who underwent chest CT scans and had known EGFR mutation status; 39 patients with alien genes and two patients in whom IBEX generated errors during feature extraction were excluded, leading to a total of 168 patients to be included in the study. Supplementary Data 1 (S1) contains information regarding the scanning parameters. The personal information of patients in medical materials has been anonymized. This study was conducted in accordance with the STROBE guidelines. In addition, we collected clinical and imaging data of patients with primary NSCLC between January 2016 and December 2020 at the Cancer Hospital of Anhui University of Science and Technology, the Eastern Hospital of Anhui University of Science and Technology, and the Huainan Chaoyang Hospital of Anhui University of Science and Technology, using the following inclusion criteria: (1) patients with pathologically proven NSCLC, (2) EGFR gene status testing performed on biopsy tissues, and (3) CT scans performed within 2 weeks before treatment. The exclusion criteria were as follows: (1) patients who received radiotherapy, chemotherapy, concurrent radiotherapy, or traditional Chinese medicine treatment before CT imaging and (2) incomplete image information of the patient. 86 patients from the Cancer Hospital of Anhui University of Science and Technology, 50 from the Eastern Hospital of Anhui University of Science and Technology, and 41 from Huainan Chaoyang Hospital of Anhui University of Science and Technology were included in compliance with the above conditions. To improve the generalization ability of the model constructed from the heterogenous and complex dataset, 296 patients from the NSCLC radiogenomics data, Cancer Hospital of Anhui University of Science and Technology, and Huainan Chaoyang Hospital of Anhui University of Science and Technology were used as the training cohort, and 50 patients from the Eastern Hospital of Anhui University of Science and Technology were used as the validation cohort. This retrospective study was conducted in accordance with the principles of the Helsinki Declaration. The Ethics Committee of Anhui University of Science and Technology (approval no. L2022001) conducted an ethical review of the three medical institutions involved (Cancer Hospital of Anhui University of Science and Technology, Eastern Hospital of Anhui University of Science and Technology, and Huainan Chaoyang Hospital of Anhui University of Science and Technology). Oral consent was obtained, and data were processed anonymously before conducting the study. The research flow is illustrated in Figure 1.

Figure 1.

The overall framework of data analysis and model integration.

Image Segmentation, Image Pre-processing, and Feature Extraction

The collected CT images were uploaded to IBEX in Digital Imaging and Communication in Medicine (DICOM), and regions of interest (ROIs) were manually outlined layer-by-layer by two highly qualified diagnostic cardiothoracic disease imaging physicians (one 8 years and one 10 years working experience) without knowledge of the EGFR test results (lung window: 1500 HU, −500 HU; mediastinal window: 300 HU, 30 HU). After the sketch was completed, the images were preprocessed using resample voxel size, bit depth rescale range, and log filter in IBEX to achieve image-scale uniformity, correction of grayscale inhomogeneity, and image denoting. Five types of radiomics features were extracted from the ROIs: (1) intensity histogram (n = 49), (2) shape (n = 18), (3) texture-based features including grayscale co-occurrence matrix (n = 840) features and gray level run length matrix (n = 33); (4) grayscale intensity (n = 135); and (5) neighborhood intensity difference (n = 10). Supplementary data 2 (S2) shows the kinds of features extracted in the 3D image.

Radiomics Feature Selection

Feature selection is important to improve model generalization and optimize the model. The two physicians performed independent ROI delineation and feature extraction on all data. The features extracted by the two physicians were subjected to the ICC test to select features with stability and repeatability (ICC < .5, poor reliability; .5 < ICC < .75, medium reliability; .75 < ICC < .9, good reliability; and ICC > .9, excellent reliability). Second, features with ICC > .75 were standardized using the Z-score method. Third, the Shapiro–Wilk test (P > .05) and Bartlett’s test (P > .05) were used to test the normality and homogeneity of variance of the features with ICC > .75. An independent sample T-test (P < .05) was used for the data in accordance with the normal distribution and homogeneity of variance, and the Mann–Whitney U test (P < .05) was used for the data. Finally, to avoid overfitting or selection bias, LASSO regression verified following 10-fold cross-validation was used to screen out the radiomics features of the constructed model.

Machine Learning Model Construction and External Validation

After screening the core radiomics features, the four most popular machine learning classifiers (logistic regression (LR), decision tree (DT), random forest (RF), and radius-based function support vector machines (SVM)) were applied to construct imaging histology models in the training and validation cohorts. We applied an exhaustive grid search approach was applied to identify the values of the hyperparameters that optimize the model prediction performance. Supplementary data 3 (S3) shows the setting of hyperparameters of different machine learning classifiers. The area under the curve (AUC), calibration curve (CAL), decision curve analysis (DCA), concordance index (C-index), and Brier score were used to estimate the discrimination, calibration, and clinical applicability of models constructed using different classifiers. The C-index ranges from .5 to 1, with a C-index <.5 reflecting complete inconsistency, and the model has no predictive value and C-index = 1, reflecting complete consistency. The Brier score was used to measure the overall performance of the model; if the Brier score=0, the model was considered to have perfect overall performance, and the predicted and actual values were in perfect agreement. If the Brier score is >.25, the model was considered to have no value.

Statistical Analysis

All statistical analyses were performed using Empower Stats (version 2.2) and R software (version 4.0.5). Quantitative data are described as the mean ± standard deviation (SD), and qualitative data are described as frequencies (percentages). The “glmnet” package was used to implement the LASSO. CAL, DCA, C-index, and Brier scores were used to evaluate the performance of the machine learning classifier models. Differences between the AUC values of the models were compared using the Delong test. Statistical significance was set at P < .05.

Results

Clinical Data Analysis

The patients were divided into training and validation cohorts (Table 1). The training cohort consisted of 296 patients (184 men and 112 women; mean age: 66.82 ± 11.49 years; range: 24–89 years) from three centers. Of these, 117 (39.53%) had EGFR mutations, 179 (60.47%) had wild-type EGFR, and 253 (85.47%) had adenocarcinoma, 39 (13.18%) had squamous cell carcinoma, and 4 (1.35%) had other types of cancer (3 Large cell carcinoma;1 pulmonary sarcomatoid carcinoma). There were 202 (68.24%) smokers and 94 non-smokers (31.76%). The validation cohort included 50 patients (21 men and 29 women; mean age 66.56 ± 9.44 years; range, 43–85 years). There were 28 (56.00%) patients had EGFR mutations, 22 (44.00%) had wild-type EGFR, 32 (64.00%) had adenocarcinoma, and 18 (36.00%) had squamous cell carcinoma. There were 23 (46.00%) smokers and 27 non-smokers (54.00%).

Table 1.

Patients in the Training and Validation Cohorts.

	Training				Validation	P-value
Characteristic	n = 169^a	n = 86^b	n = 41^c	total = 296	n = 50^d
Age (y, mean ± SD)	67.65 ± 10.33	64.65 ± 11.65	67.93 ± 14.89	66.82 ± 11.49	66.56 ± 9.44	.675
EGFR status						.029
Wild type	126 (74.56%)	34 (39.53%)	19 (46.34%)	179 (60.47%)	22 (44.00%)
Mutant	43 (25.44%)	52 (60.47%)	22 (53.66%)	117 (39.53%)	28 (56.00%)
Sex						.007
Female	62 (36.69%)	31 (36.05%)	19 (46.34%)	112 (37.84%)	29 (58.00%)
Male	107 (63.31%)	55 (63.95%)	22 (53.66%)	184 (62.16%)	21 (42.00%)
Smoking status						.002
Never smoker	40 (23.67%)	33 (38.37%)	21 (51.22%)	94 (31.76%)	27 (54.00%)
Smoker	129 (76.33%)	53 (61.63%)	20 (48.78%)	202 (68.24%)	23 (46.00%)
TYPE						<.001
Luad	149 (88.17%)	70 (81.40%)	34 (82.93%)	253 (85.47%)	32 (64.00%)
Lusc	17 (10.06%)	15 (17.44%)	7 (17.07%)	39 (13.18%)	18 (36.00%)
Other	3 (1.78%)	1 (1.16%)	0 (.00%)	4 (1.35%)	0 (.00%)

Note: Luad, Lung adenocarcinoma.

Lusc, lung squad cell carcinoma.

Other, 3 Large cell carcinoma and 1 pulmonary sarcomatoid carcinoma.

aThe Cancer Imaging Archive.

bCancer Hospital of Anhui University of Science and Technology.

cHuainan Chaoyang Hospital of Anhui University of Science and Technology.

dEastern Hospital of Anhui University of Science and Technology.

Patients in the Training and Validation Cohorts. Note: Luad, Lung adenocarcinoma. Lusc, lung squad cell carcinoma. Other, 3 Large cell carcinoma and 1 pulmonary sarcomatoid carcinoma. aThe Cancer Imaging Archive. bCancer Hospital of Anhui University of Science and Technology. cHuainan Chaoyang Hospital of Anhui University of Science and Technology. dEastern Hospital of Anhui University of Science and Technology. There were no significant differences in age between the training and validation cohorts. However, there were significant differences in EGFR mutation rates, sex, smoking status, and tumor type (Table 1).

Feature Extraction and Selection

A total of 1085 radiomics features were successfully extracted from each patient’s ROI. First, 376 features with an ICC value < .75 were eliminated (Figure 2A). Second, 191 features were eliminated following hypothesis testing. Finally, the remaining 518 features were analyzed using 10-fold cross-validated LASSO regression and a standard error rule (Figures 2B and 2C). Sixteen core features were screened based on optimal λ = .03202 and standard error = .05841 (Table 2).

Figure 2.

Selection of radiomics features. (A): ICC histogram of radiomics features; (B/C): LASSO method for screening of radiomics features.

Table 2.

Texture Features Selection for Radiomics Models.

Parameters	Parameter category	Importance
Mean absolute deviation	Intensity histogram	−.065652535
60 Percentile area	Intensity histogram	−.027231004
Convex	Shape	.737731264
Correlation	Gray level cooccurence matrix 3	−.364074713
Dissimilarity	Gray level cooccurence matrix 3	.338776007
5-1 Homogeneity 2	Gray level cooccurence matrix 3	−.176750873
10-4 Homogeneity 2	Gray level cooccurence matrix 3	−.048826405
-333-7 Information measure corr 1	Gray level cooccurence matrix 3	−.053598886
8-1 Information measure corr 1	Gray level cooccurence matrix 3	−.217087549
9-7 Information measure corr 1	Gray level cooccurence matrix 3	.095533464
12-4 Inverse diff norm	Gray level cooccurence matrix 3	−.013040947
6-4 Inverse variance	Gray level cooccurence matrix 3	.322764285
8-4 Inverse variance	Gray level cooccurence matrix 3	−.092512731
8-1 Max Probability	Gray level cooccurence matrix 3	−.024734566
12-7 Max Probability	Gray level cooccurence matrix 3	−.102971831
−333 Run length nonuniformity	Gray level Run length matrix 25	−.002072575

Selection of radiomics features. (A): ICC histogram of radiomics features; (B/C): LASSO method for screening of radiomics features. Texture Features Selection for Radiomics Models.

Radiomics Model Performance

According to the 16 screened radiomics features, the LR, DT, RF, and SVM classifiers were used to construct the model in the training cohort and validated in the validation cohort. The specific performances of the four classifier prediction models are shown in Figure 3 and Table 3.

Figure 3.

Table 3.

Performance of the Radiomics Signature.

	AUC	Accuracy	Sensitivity	Specificity	C-index	Brier score
Training
LR	.723	.69	.53	.794	.725	.203
DT	.842	.794	.701	.855	.837	.153
RF	.995	.99	.992	.989	.998	.007
SVM	.883	.845	.838	.85	.905	.119
Validation
LR	.658	.68	.75	.591	.664	.235
DT	.567	.46	.358	.591	.605	.244
RF	.88	.96	.965	.955	.883	.137
SVM	.765	.84	.715	.99	.773	.162

Building and performance of four machine learning classifier models. Receiver operating characteristic curves (3A), Calibration curves (3B), and Decision curves (3C) of different classifiers and models generated from the development cohorts; Receiver operating characteristic curves (3D), Calibration curves (3E), and Decision curves (3F) of different classifiers and models generated from the validation cohorts. Performance of the Radiomics Signature. In the training cohort, Figure 3A shows that the RF classifier performed the best (AUC=.995; 95% confidence interval [CI], .98–.996; sensitivity, 99.2%; specificity, 98.9%; accuracy, 99%). The remaining three classifiers were applied as follows (LR: AUC=.723, DT: AUC=.842, SVM: AUC=.883). The calibration curve (Figure 3B) shows excellent agreement between the predicted and actual values for the four machine learning classifiers. DCA (Figure 3C) indicated that the four machine learning classifiers provided more benefits than all treatments or no treatments. In the validation cohort, Figure 3D shows that the RF classifier performed better (AUC=.88, 95% CI: .75-.946; sensitivity=96.5%; specificity=95.5%; accuracy=96%) than the other three classifiers (LR: AUC=.658, DT: AUC=.567, SVM: AUC=.765). The calibration curve (Figure 3E) shows a trend in which the predicted values for the RF classifier are closer to the 45°standard line, indicating that consistency of the RF model is more desirable. DCA (Figure 3F) also indicated that the RF classifier could achieve more clinical net benefits at almost all threshold probabilities. In this study, the C-index of the RF model (training: RF=.998; validation: RF=.883) was higher than that of the other models (training: LR=.725, DT=.855, SVM=.905; validation: LR=.664, DT=.605, SVM=.773) in both the training and validation cohorts (Table 3). In this study, the Brier score of the RF model (training: RF=.007; validation: RF=.137) was lower than that of the other models (training: LR=.203, DT=.153, SVM=.119; validation: LR=.235, DT=.244, SVM=.162) in both the training and validation cohorts (Table 3). In this study, the Delong test showed that the AUC of the RF classifier in the training cohort was not significantly different from that of the RF classifier in the validation cohort (P > .05). However, there were significant differences (P < .001) between the LR, DT and SVM classifiers in the validation cohort (Table 4).

Table 4.

Delong Test of Machine Learning Classifier Model.

Training- AUC	Validation- AUC	P-value
RF	RF	>.05
RF	LR	<.001
RF	DT	<.001
RF	SVM	<.001

Delong Test of Machine Learning Classifier Model.

Discussion

This study aims to construct a predictive model with strong generalizability. We hope that this radiomics model can be used to determine the EGFR status of patients with NSCLC and provide a reference for guiding personalized targeted therapies. Finally, we obtained 16 radiomic features with accurate prediction ability, including intensity histograms (n = 2), shape (n = 1), and GLCM (n = 13). These features encompass the description of intensity distribution, spatial relationships between different intensity levels, shape of texture patterns, and tumor heterogeneity. The intensity histogram is related to the gray level frequency distribution within the ROI, relies on single-voxel values rather than adjacent interacting voxels, and may be obtained from the voxel intensity histogram. Morphological features are used to describe tumor characteristics by calculating the ROI, providing information on the size of the lesion tissue. The correlation of some features with EGFR mutations has been confirmed in other studies related to the prediction of EGFR status using imaging histology.[24,25] Diverse machine learning algorithms have their own advantages and disadvantages. Currently, the most common machine learning methods are LR, SVM, RF, and DT. In this study, the performance of the radiomics models was evaluated using the four different classifiers mentioned above, and the RF classifier with the highest diagnostic performance and good calibration and stability in the validation cohort was selected. In similar studies, Yang et al applied an RF classifier to construct a model for predicting EGFR mutation status in patients with lung adenocarcinoma based on CT radiomics features; the AUC of the training cohort was .826, while that of the validation cohort was .779; however, this was only a single-center study. Velazquez et al used CT radiomics features combined with clinical variables to predict EGFR mutations, with an AUC of .75 and lacked external data validation, the clinical applicability of which was limited. Histological examination, the gold standard for EGFR detection, may provide additional support in clinical practice. However, if the puncture position is unavailable or the basic conditions are poor, multiple aspiration biopsies are required. Imaging examinations can provide a reference regarding EGFR gene status while understanding tumorigenesis and progression through imaging. Similarly, in patients with multiple tumors, radiography is beneficial for selecting the most suspicious tumor for biopsy. Thus, when histopathological examination is difficult, radiomics may play a useful role in clinical practice. In this study, data from three centers were mixed to construct a training cohort, and core radiomics features that reflected EGFR status were screened. The model was verified in a validation cohort and the results were stable, which could reflect the generalizability of the model to a certain extent. However, the limited dataset cannot include all information reflecting EGFR status; therefore, the test results may not fully reflect the generalizability of the model. Future research will focus on verifying the generalizability of this model. In addition, there are the following limitations. (1) Radiomics analysis of histological features was mainly performed using a retrospective study design, which is still different from the actual predictive clinical need, leading to the need for further validation in prospective studies. (2) Different CT imaging protocols in different hospitals and radiomics features are influenced by CT scanner parameters (e.g., reconstruction kernel or slice thickness). Although resampling and pre-processing were performed to limit the differences between them, undiscovered differences may still exist. (3) For ROI outlining, manual and automatic outlinings offer unique advantages. The difference between the two approaches in terms of image alignment and contour generation may affect the calculation of radiomic features.

Conclusion

By comparing the four machine learning models, the RF model had a satisfactory performance for predicting the EGFR status of NSCLC. However, these results are preliminary and need to be validated using prospective datasets to assess their potential clinical applications. Click here for additional data file. Supplemental Material for Development and Validation of Machine Learning Models to Predict Epidermal Growth Factor Receptor Mutation in Non-Small Cell Lung Cancer: A Multi-Center Retrospective Radiomics Study by Liu Yafeng, Zhou Jiawei, Wu Jing, Wenyang Wang, Xueqin Wang, Jianqiang Guo, Qingsen Wang, Zhang Xin, Li Danting, Xie Jun, Ding Xuansheng, Xing Yingru, and Hu Dong in Cancer Control

27 in total

1. Quantitative radiomics: impact of stochastic effects on textural feature analysis implies the need for standards.

Authors: Matthew J Nyflot; Fei Yang; Darrin Byrd; Stephen R Bowen; George A Sandison; Paul E Kinahan
Journal: J Med Imaging (Bellingham) Date: 2015-08-05

Review 2. New treatment options for lung adenocarcinoma--in view of molecular background.

Authors: Nora Bittner; Gyula Ostoros; Lajos Géczi
Journal: Pathol Oncol Res Date: 2013-12-05 Impact factor: 3.201

3. Cancer treatment and survivorship statistics, 2019.

Authors: Kimberly D Miller; Leticia Nogueira; Angela B Mariotto; Julia H Rowland; K Robin Yabroff; Catherine M Alfano; Ahmedin Jemal; Joan L Kramer; Rebecca L Siegel
Journal: CA Cancer J Clin Date: 2019-06-11 Impact factor: 508.702

4. Cancer statistics, 2020.

Authors: Rebecca L Siegel; Kimberly D Miller; Ahmedin Jemal
Journal: CA Cancer J Clin Date: 2020-01-08 Impact factor: 508.702

Review 5. Intratumor heterogeneity: evolution through space and time.

Authors: Charles Swanton
Journal: Cancer Res Date: 2012-09-20 Impact factor: 12.701

6. Predicting EGFR mutation status in lung adenocarcinoma on computed tomography image using deep learning.

Authors: Shuo Wang; Jingyun Shi; Zhaoxiang Ye; Di Dong; Dongdong Yu; Mu Zhou; Ying Liu; Olivier Gevaert; Kun Wang; Yongbei Zhu; Hongyu Zhou; Zhenyu Liu; Jie Tian
Journal: Eur Respir J Date: 2019-03-28 Impact factor: 16.671

7. Color image segmentation using adaptive hierarchical-histogram thresholding.

Authors: Min Li; Lei Wang; Shaobo Deng; Chunhua Zhou
Journal: PLoS One Date: 2020-01-10 Impact factor: 3.240

8. Radiomics: Images Are More than Pictures, They Are Data.

Authors: Robert J Gillies; Paul E Kinahan; Hedvig Hricak
Journal: Radiology Date: 2015-11-18 Impact factor: 11.105

Review 9. Imaging-Based Prediction of Molecular Therapy Targets in NSCLC by Radiogenomics and AI Approaches: A Systematic Review.

Authors: Gaia Ninatti; Margarita Kirienko; Emanuele Neri; Martina Sollini; Arturo Chiti
Journal: Diagnostics (Basel) Date: 2020-05-30

10. Radiomics Signature as a Predictive Factor for EGFR Mutations in Advanced Lung Adenocarcinoma.

Authors: Duo Hong; Ke Xu; Lina Zhang; Xiaoting Wan; Yan Guo
Journal: Front Oncol Date: 2020-01-31 Impact factor: 6.244