| Literature DB >> 35578691 |
Jiameng Lu1, Xiaoqing Ji2, Lixia Wang3, Yunxiu Jiang4, Xinyi Liu4, Zhenshen Ma5, Yafei Ning1, Jie Dong5, Haiying Peng4, Fei Sun4, Zihan Guo4, Yanbo Ji2, Jianping Xing1, Yue Lu6, Degan Lu7.
Abstract
Identifying an epidermal growth factor receptor (EGFR) mutation is important because EGFR tyrosine kinase inhibitors are the first-line treatment of choice for patients with EGFR mutation-positive lung adenocarcinomas (LUAC). This study is aimed at developing and validating a radiomics-based machine learning (ML) approach to identify EGFR mutations in patients with LUAC. We retrospectively collected data from 201 patients with positive EGFR mutation LUAC (140 in the training cohort and 61 in the validation cohort). We extracted 1316 radiomics features from preprocessed CT images and selected 14 radiomics features and 1 clinical feature which were most relevant to mutations through filter method. Subsequently, we built models using 7 ML approaches and established the receiver operating characteristic (ROC) curve to assess the discriminating performance of these models. In terms of predicting EGFR mutation, the model derived from radiomics features and combined models (radiomics features and relevant clinical factors) had an AUC of 0.79 (95% confidence interval (CI): 0.77-0.82), 0.86 (0.87-0.88), respectively. Our study offers a radiomics-based ML model using filter methods to detect the EGFR mutation in patients with LUAC. This convenient and low-cost method may be of help to noninvasively identify patients before obtaining tumor sample for molecule testing.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35578691 PMCID: PMC9107363 DOI: 10.1155/2022/2056837
Source DB: PubMed Journal: Dis Markers ISSN: 0278-0240 Impact factor: 3.464
Figure 1Patient recruitment workflow. In total, 201 of 568 patients were included in this study according to the selection criteria.
Figure 2Workflow of the radiomics analysis.
Characteristics of patients in training and validation cohorts.
| Variable | Training cohort |
| Validation cohort |
|
| ||
|---|---|---|---|---|---|---|---|
| Mutant type | Wild type | Mutant type | Wild type | ||||
| Age (y, mean ± SD) | 65.27 ± 1.44 | 66.14 ± 1.24 | 0.66 | 64.35 ± 1.34 | 63.48 ± 1.57 | 0.68 | 0.83 |
| Sex, | 0.06 | 0.19 | 0.44 | ||||
| Male | 34 (24.29) | 40 (28.57) | 9 (14.75) | 19 (31.15) | |||
| Female | 41 (29.29) | 25 (17.86) | 17 (27.87) | 16 (26.23) | |||
| Smoking status, | 0.01 | 0.05 | 0.64 | ||||
| Smoker | 24 (32.00) | 36 (55.38) | 5 (19.23) | 19 (54.29) | |||
| Never smoker | 51 (68.00) | 29 (44.62) | 21 (80.77) | 16 (45.71) | |||
| Stage, | 0.43 | 0.15 | 0.07 | ||||
| III | 46 (61.33) | 44 (68.75) | 16 (61.54) | 15 (42.86) | |||
| IV | 29 (38.67) | 20 (31.25) | 10 (38.46) | 20 (57.14) | |||
| Serum level of tumor marker (mean ± SD) | |||||||
| CEA | 109.0 ± 75.82 | 30.86 ± 7.56 | 0.31 | 114.6 ± 77.44 | 129.3 ± 78.20 | 0.89 | 0.28 |
| NSE | 98.39 ± 75.75 | 28.75 ± 6.79 | 0.36 | 100.6 ± 77.31 | 29.05 ± 6.932 | 0.36 | 0.27 |
| CYFRA 21-1 | 6.91 ± 0.79 | 9.36 ± 1.58 | 0.17 | 6.88 ± 0.82 | 9.63 ± 1.62 | 0.13 | 0.17 |
| SCC | 0.85 ± 0.77 | 1.26 ± 1.65 | 0.08 | 0.62 ± 0.38 | 0.97 ± 0.92 | 0.06 | 0.03 |
| Pro-GRP | 45.50 ± 8.23 | 49.31 ± 6.49 | 0.72 | 45.33 ± 8.40 | 51.17 ± 7.09 | 0.59 | 0.72 |
CEA: carcinoembryonic antigen; NSE: neuron-specific enolase; CYFRA 21-1: fragment of cytokeratin subunit 19; SCC: squamous cell carcinoma antigen; Pro-GRP: pro-gastrin-releasing peptide.
Figure 3Heat maps with the AUC of different combinations of FS methods (rows) and classification algorithms (columns). (a) The average cross-validated AUC from 70 models based on radiomics features. (b) The average cross-validated AUC from 70 models based on radiomics features and clinical data.
Figure 4ROC curves of models on the training and validation sets. (a) The fivefold cross-validated ROC curve of model RF-MUIF. (b) The fivefold cross-validated ROC curve of model XGBoost-MUIF. (c) ROC curve of XGBoost-MUIF on the validation dataset. (d) ROC curve of RF-MUIF on the validation dataset.
Figure 5Boxplot illustrating the variation of the 15 features that finally incorporate into XGBoost-MUIF model between EGFR mutant type and EGFR wild type: (a) original_shape_Flatness; (b) original_firstorder_Kurtosis; (c) original_glcm_Idmn; (d) original_gldm_DependenceEntropy; (e) original_glrlm_GrayLevelNonUniformityNormalized; (f) original_glrlm_ShortRunEmphasis; (g) original_gldm_GrayLevelNonUniformity; (h) original_glszm_LargeAreaLowGrayLevelEmphasis; (i) original_glszm_SmallAreaHighGrayLevelEmphasis; (j) original_ngtdm_Contrast; (k) wavelet-HLH_gldm_HighGrayLevelEmphasis; (l) wavelet-HLH_gldm_SmallDependenceHighGrayLevelEmphasis; (m) log-sigma-2-0-mm-3D_gldm_LargeDependenceEmphasis; (n) log-sigma-2-0-mm-3D_gldm_LargeDependenceLowGrayLevelEmphasis; (o) SCC.