Literature DB >> 31074592

Toward automatic prediction of EGFR mutation status in pulmonary adenocarcinoma with 3D deep learning.

Wei Zhao^1,2, Jiancheng Yang^3,4,5, Bingbing Ni^3,4, Dexi Bi⁶, Yingli Sun¹, Mengdi Xu⁵, Xiaoxia Zhu¹, Cheng Li¹, Liang Jin¹, Pan Gao¹, Peijun Wang⁷, Yanqing Hua¹, Ming Li^1,2,8.

Abstract

To develop a deep learning system based on 3D convolutional neural networks (CNNs), and to automatically predict EGFR-mutant pulmonary adenocarcinoma in CT images. A dataset of 579 nodules with EGFR mutation status labels of mutant (Mut) or wild-type (WT) was retrospectively analyzed. A deep learning system, namely 3D DenseNets, was developed to process 3D patches of nodules from CT data, and learn strong representations with supervised end-to-end training. The 3D DenseNets were trained with a training subset of 348 nodules and tuned with a development subset of 116 nodules. A strong data augmentation technique, mixup, was used for better generalization. We evaluated our model on a holdout subset of 115 nodules. An independent public dataset of 37 nodules from the cancer imaging archive (TCIA) was also used to test the generalization of our method. Conventional radiomics analysis was also performed for comparison. Our method achieved promising performance on predicting EGFR mutation status, with AUCs of 75.8% and 75.0% for our holdout test set and public test set, respectively. Moreover, strong relations were found between deep learning feature and conventional radiomics, while deep learning worked through an enhanced radiomics manner, that is, deep learned radiomics (DLR), in terms of robustness, compactness and expressiveness. The proposed deep learning system predicts EGFR-mutant of lung adenocarcinomas in CT images noninvasively and automatically, indicating its potential to help clinical decision-making by identifying eligible patients of pulmonary adenocarcinoma for EGFR-targeted therapy.

Entities: Chemical Disease Gene Mutation Species

Keywords: zzm321990EGFRzzm321990; convolutional neural networks; deep learning; mixup training technique; radiomics

Mesh：

Substances：

Year: 2019 PMID： 31074592 PMCID： PMC6601587 DOI： 10.1002/cam4.2233

Source DB: PubMed Journal: Cancer Med ISSN： 2045-7634 Impact factor: 4.452

INTRODUCTION

Lung cancer is one of the leading causes of cancer‐related death.1 Non‐small cell lung cancer (NSCLC) accounts for more than 80% lung cancer cases, where adenocarcinoma is the most common histological subtype.2 With the advancements of genomics, several genomic alternations, such as oncogenic ALK rearrangements,3 ROS1 rearrangement,4 KRAS mutations,5 and sensitizing EGFR mutations,6 have been identified as predictive and prognostic markers for NSCLC. Targeted therapy with these molecular markers has become an essential part of precision medicine for lung cancer. Small molecule tyrosine kinase inhibitors (TKIs) that target specific EGFR mutations have resulted in improved progression‐free survival (PFS) and higher objective radiographic response rate in patients with EGFR mutations than standard chemotherapy.7, 8, 9 Moreover, approximately 80% patients with EGFR‐mutant lung cancer respond to EGFR TKIs therapy (at initial treatment).10 However, the administration of EGFR TKIs (for example, gefitinib) on the patients without EGFR mutations, not just showed no effect but even resulted in worse PFS and unnecessary costs compared to platinum‐based chemotherapy,11 which highlights the importance of identifying eligible patients for the first‐line TKI therapy. Mutation profiling after biopsies or surgical resections has become a standard and informative medical procedure. However, the high cost and invasiveness of the approaches and repeating tumor sampling strongly limit the applicability of molecular testing. Besides, the poor DNA quality, intratumoral heterogeneity and long turnaround time raise much concern on the balance of costs and benefits.12, 13 Notwithstanding re‐biopsy is feasible in clinical practice, it is an invasive operation and faces the same challenges voiced by previous studies.14, 15 These defects enormously limit the practicability of precision medicine at scale. Alternatively, noninvasive markers are promising for predicting EGFR mutation.16, 17 Cancers with different genotypes drive specific biological processes involved in the development and progression of tumors, thus ultimately leading to different phenotypes. In other words, it is potentially feasible to predict the genotypes by identifying specific phenotypes. There have been studies investigating the phenotype‐genotype associations to identify associated genomic changes,18, 19 based on tumor morphology on computed tomography (CT)20, 21 and magnetic resonance imaging (MRI).22 Radiomics,20 that encodes tumor phenotypes with innumerable quantitative features using predefined image analysis algorithms. In particular, previous studies have demonstrated that certain radiomic features are associated with EGFR mutations status, suggesting that those identified features may be driven by somatic mutations.16, 23 Although these studies have achieved impressive performances, especially when combined with clinical information,17, 23, 24 conventional radiomics‐based methods are born with three main challenges. Firstly, conventional radiomics methods require strict procedures, including detection, segmentation, feature extraction, selection etc,25 which is tedious and time‐consuming. Secondly, radiomic features are susceptible to the manual segmentation as well as CT scanning parameters, thus interobserver reproducibility analysis is necessary. Finally, hand‐craft radiomic features are expressive but may be not enough for high‐level tasks. On the other hand, deep neural networks, or Deep Learning, have achieved remarkable success in several important problems of artificial intelligence (AI), for example, natural image classification26 and human language translation.27 Deep convolutional neural networks (CNNs), a family of neural networks, have shown incredible effectiveness in several tasks of natural image computer vision and medical image computing.28, 29 As powerful algorithms of representation learning, CNNs largely reduce the necessities of hand‐craft feature engineering. Our previous study has proven the effectiveness and efficacy of deep learning in predicting the invasiveness of lung adenocarcinomas from CT images.30 In this regard, we addressed the problem of CT‐based EGFR mutation prediction by deep neural networks, to make our system automatic, robust and accurate. In this study, we aimed to develop a deep learning system to predict the EGFR mutation status of lung adenocarcinoma based on CT images by integrating recent advances in deep supervised learning, such as dense connections31 and mixup training,32 to significantly reduce the empirical risks of overfitting. Our method is a labor‐saving strategy without the requirement of precise nodule segmentation, and also expected to obtain more stable performance due to the enhanced nature of the employed learning algorithms.

MATERIALS AND METHODS

We developed a novel analytic framework based on deep learning of the CT imaging phenotypes of lung adenocarcinoma to predict EGFR mutation status (see Figure 1). This retrospective study was approved by the Institutional Review Board of our institution (NO.20170103), which waived the requirement for patients' informed consent referring to the CIOMS guideline.

Figure 1

Overall pipeline for this study. A local CT dataset (HdH Dataset) and a public dataset selected from TCIA database (TCIA Dataset) of lung adenocarcinoma patients with EGFR mutation testing were used. Nodules were manually localized, segmented, and labelled as EGFR mutant (Mut)/wild‐type (WT). For deep learning, 3D DenseNets were trained using the training subset. A strong data augmentation technique, mixup, was used for better regularization. Expressive representations, that is, deep learned radiomics (DLR), for the nodules were end‐to‐end learned during the training procedure. Meanwhile, conventional radiomics analysis following the common practice was carried out for performance comparison, and association study between the 3D deep leaning and conventional radiomics was performed by calculating the pairwise correlation coefficients

Data collection

This study used local data collected in Huadong Hospital (HdH Dataset) and publicly available data selected from TCIA (TCIA Dataset).33, 34 In HdH Dataset, the inclusion criteria were (a) patient receiving thin‐slice chest CT (0.75‐1.5 mm) scan prior to biopsies or surgical treatment, (b) with pathology reports for the diagnosis of pulmonary adenocarcinoma, and (c) with detailed EGFR mutation testing reports. Only one malignant nodule was studied for each patient due to the availability of EGFR testing report. Among the 579 lung adenocarcinoma patients, 308 were EGFR‐mutant (Mut) and 271 were EGFR wild‐type (WT). The detailed characteristics of patients and lesions in HdH Dataset are presented in Table 1. CT acquisition parameters are described in Appendix S1. In the TCIA Dataset, 37 nodules were selected to validate the stability and generalization of the analytic framework. The inclusion criteria were (a) CT with slice thickness ≤1.5 mm (to avoid data inconsistency), (b) with EGFR mutations testing reports, (c) with pathology reports for the diagnosis of lung adenocarcinoma, (d) and lesions that could be certainly identified as the resected or biopsied lesions. The detailed information of the TCIA Dataset is described in Table S1 .

Table 1

Characteristics of patients and lesions in HdH dataset

Characteristics	Number	Percentage
Gender
Male	245	42.3
Female	334	57.6
Mean age (range) (y)
Male	61.8 ± 11.6 (29‐85)	‐
Female	58.4 ± 11.9 (22‐85)	‐
Total	59.8 ± 11.9 (22‐85)	‐
Mean size (range) (cm)	1.8 (0.3‐8.6)	‐
Location
Right lobe	342	59.1
Left lobe	237	40.9
Pathology
Adenocarcinoma in situ	31	5.4
Minimally invasive adenocarcinoma	157	27.1
Invasive adenocarcinoma	391	67.5
TMN classification (eighth edition)
0	31	5.4
I A‐B	356	61.5
II A‐B	7	1.2
III A‐C	10	1.7
IV A‐B	175	30.2
EGFR Mut	308	53.2

Characteristics of patients and lesions in HdH dataset

EGFR mutation profiling

Drug target‐associated mutations on EGFR exons 18, 19, 20, and 21 were examined using a PCR‐based amplification‐refractory mutation system (ARMS) with Human EGFR Gene Mutations Fluorescence Polymerase Chain Reaction Diagnostic Kit (ADx‐EG01, AmoyDx). Wild‐type EGFR in this study referred to no mutation detected among those loci.

Data annotation and pretreatment

Volumes of interest (VOIs) of the enrolled nodules in both datasets were manually delineated at voxel level by a radiologist with 4‐year experience in chest CT interpretation using medical image processing and navigation software 3D Slicer (version 4.8.0, Brigham and Women's Hospital), and subsequently confirmed (modified or re‐delineated) by another radiologist with 12‐year experience in chest CT interpretation. DICOM‐format images were imported into the software, subsequently the images with VOI information were exported into NII format for further analysis. Each segmented nodule was assigned a specific EGFR mutation label (Mut or WT) according to the corresponding EGFR mutation testing report. Deep learning is known to require heavy tuning. Thus, we randomized the HdH Dataset into training, development, and test subsets, containing 60%, 20%, and 20% (only once) of each EGFR categories (Mut or WT), respectively. The training set was used for training the neural networks. The development set was used for tuning the hyperparameters, such as determining early stopping, discovering the most suitable neural architectures, and choosing the best model snapshots. Once we chose the best model using the development set, the holdout test set was used for fairly evaluating the performance of our method. To further validate the stability and generalization of our method, we evaluate our model on the independent TCIA Dataset, without any fine tuning.

EGFR mutation status prediction with conventional radiomics

We investigated the performance of the conventional radiomics method using the two datasets. Following the common practice of radiomics method,23 the radiomic features were extracted with numbers of image analysis algorithms. Specifically, 475 radiomics features, including 50 histogram features, 325 co‐occurrence matrix features, and 100 run lengths matrix features, were automatically extracted using MATLAB 2016b for all delineated nodules in the two datasets. The detailed radiomics extraction methodology is described in Appendix S1. Radiomic features require refined manual segmentation of nodules. Even so, many radiomic features are not stable due to different segmentation manners. To ensure the reproducibility of radiomics, 50 nodules in the HdH Dataset were randomly selected for independent segmentation by two radiologists. The 475 radiomics features of the 50 nodules were evaluated with interclass correlation coefficient (ICC) analysis using the “irr” package in R software (version 3.4.3). Features with an ICC > 0.8 were considered reliable. Finally, a total of 401 features were selected. Logistic regression was used to model the relation between the radiomic features and EGFR status with the widely used scikit‐learn library in Python.17, 35 Due to the simple hyperparameter setting of logistic regression, the training and development sets were merged into a “train‐dev” set. Ten‐fold cross validation search was performed on the train‐dev set with 1000 randomly sampled values of regularization term C. The trained model was used for scoring the HdH test Dataset and TCIA Dataset. We also tried heavy search for models and hypermeters with an automated machine learning tool (AutoML),36 built upon widely used scikit‐learn Python package37; however it did not yield better performance. Note that the feature selection procedure to reduce the redundancy between the features had been included in the AutoML search procedure. Although there may exist a better model theoretically, the heavy search process implied conventional radiomic features are not strong enough representations.

EGFR status prediction with 3D DenseNets

To take the advantage of deep representation learning, we designed a deep learning‐based framework for analyzing the EGFR mutation status of nodules. Taking into account the characteristics of the used data, we have adopted the following principle to design the models: 3D: CT images were presented in 3D, and the 3D views provided critical visual information from a practical point of radiology; Parameter‐efficient: the model should be compact and easy to train with limited data; Data‐efficient: our training method should lead to less overfitting; Automatic: at inference stage, our model should be easy to use, given approximate locations of nodules, rather than the manual segmentation. Following these principles, we adopted 3D DenseNets, which were derived from the powerful deep convolutional neural networks 2D DenseNets.31 DenseNets have achieved great success in 2D natural images and medical images,38 which elegantly reuse the features from lower layers, thus exploring high‐level representations in an efficient way. Our 3D DenseNets used same notations but 3D convolutions, with a growth rate k = 16, a compression rate θ = 2, and a bottleneck B = 4. Batch Normalization39 and Leaky ReLU40 (α = 0.1) were used together as activation functions. The resulting neural networks contained only 0.55 M parameters, making the training compatible with limited data. We implemented the neural networks with Keras 2.1.541 and TensorFlow 1.4.0.42 The detailed structure of the 3D DenseNets is depicted in Table 2.

Table 2

3D DenseNet architectures for EGFR mutation classification

Layer	Tensor size	Building blocks
Input	48 × 48 × 48 × 1
Convolution	48 × 48 × 48 × 32	3 × 3 × 3 conv
Pooling	24 × 24 × 24 × 32	2 × 2 × 2 average pool
Dense Block (1)	24 × 24 × 24 × 80	bn-leakyrelu-1×1×1convbn-leakyrelu-3×3×3conv×3
Compression and Pooling (1)	12 × 12 × 12 × 40	bn-leakyrelu-1×1×1conv2×2×2averagepool
Dense Block (2)	12 × 12 × 12 × 136	bn-leakyrelu-1×1×1convbn-leakyrelu-3×3×3conv×6
Compression and Pooling (2)	6 × 6 × 6 × 68	bn-leakyrelu-1×1×1conv2×2×2averagepool
Dense Block (3)	6 × 6 × 6 × 132	bn-leakyrelu-1×1×1convbn-leakyrelu-3×3×3conv×4
Compression and Pooling (3)	3 × 3 × 3 × 66	bn-leakyrelu-1×1×1conv2×2×2averagepool
Dense Block (4)	3 × 3 × 3 × 114	bn-leakyrelu-1×1×1convbn-leakyrelu-3×3×3conv×3
Global Pooling (DLR)	114	3 × 3 × 3 average pool
Output	1	sigmoid

3D DenseNet architectures for EGFR mutation classification The inputs of the proposed 3D DenseNets were cubic patches of 48 × 48 × 48 mm, generated by (pre‐processed) chest CT scans and the coordinates of the approximate mass centers of nodule. In practice, the coordinates can be marked by radiologists manually, or by automatic nodule detection systems.43 Our method did not require the manually refined segmentation, which were usually done by the users (radiologists) manually. The preprocessing followed a “standard” procedure: the input patches were converted into Hounsfield units, followed by resizing the volumetric data into spacing of 1 × 1 × 1 mm by trilinear interpolation, clipping the voxel intensity into , quantifying the density into grayscale, and transforming the values to by the mapping . We further applied multiple data augmentation techniques to represent nodules in different views: Moving the centers by small amounts in [−m, m] pixels in the three axes Reordering the axes and rotation by 90° increments Left‐right flipping.

Training for the deep learning models

During training, we applied the above data augmentation techniques with m = 8. These conventional techniques could effectively increase the training data size, yet not enough for a good training convergence. We further used a novel data augmentation technique: mixup 32 to stabilize the training. mixup showed remarkable improvements in state‐of‐the‐art natural image classification neural networks. To our knowledge, this is the first comprehensive study introducing mixup into medical image computing. The mixup produces extra informative training samples by the following easy‐to‐implement data augmentation routine. Given , α is a predefined hyperparameter, and (x, y) and (x, y) are two input‐label pairs from the training distribution, then extra training samples are obtained by the following equations: α defines the strength of mixup by controlling the beta distribution; when α = 0, mixup ceases to be effective. In our experiments, we set α = 0.9 for modest usage of mixup. As shown later in the Results section, mixup was critical for enabling the training of deep neural networks with limited data on this task. We trained the 3D DenseNets with binary cross‐entropy loss:where y is the ground truth of the input, and is the prediction given by the models. “Kaiming uniform” method44 was used for initializing the neural networks. During training, we sampled 20 positive and 20 negative samples (with a batch size of 40) for balanced optimization. A Nesterov‐momentum SGD45 with momentum = 0.9 was used as neural optimizer. The initial learning rate was 3 × 10−2, and decayed by a factor of 1/3 at the end of epoch = 30, 60, 100, and 150. No weight decay nor dropout46 was used. All hyperparameters were manually tuned on the HdH development set. A model snapshot was saved at the end of each epoch. Finally, the best model of a single run was selected by the following heuristics: initially, select five candidates with the highest AUCs on the development set; then, choose one of the candidates with the lowest absolute difference of training and development AUCs. Two hundred epochs were usually enough for a good convergence. In our experiments, the best model snapshot was generated at the end of epoch 131.

Inference for the deep learning models

Representing nodules in multiple views not only increases the training data size, but also effectively reduces empirical variance of inference by the ensemble trick. In practice, given an input, we forwarded the well‐trained neural network using the mentioned conventional data augmentation (m = 3) with 100 runs. Then, probability scores of EGFR Mut were obtained by averaging the multiple runs: We used the last‐layer (the DLR layer in Table 1) 114‐d outputs of the trained 3D DenseNet as the learned representations given nodules, that is, the deep learned radiomics (DLR). More robust DLR was obtained by averaging the feature outputs of multiple runs. Note the DLR and the probability scores could be obtained within a single neural network forward pass.

RESULTS

Performance of deep learning on EGFR mutation status prediction

We compared the performance between deep learning and conventional radiomics on predicting EGFR mutation status of lung adenocarcinoma, which implied the associations of EGFR genotype and radiomic phenotype. The AUC, which is insensitive of class skews, was used for evaluation. As depicted in Table 3 and Figure 2C, the 3D DenseNets (with mixup, ensemble) outperformed our conventional radiomics‐based method (P = 0.021, DeLong test47). Several prior studies were also listed in Table 3. Since they used nonshared datasets independently, their results were only for reference only. On our holdout test set (HdH test Dataset) containing 115 patients and the TCIA Dataset of 37 patients, 3D DenseNets achieved AUCs of 75.8% and 75.0%, respectively. Considering the class‐imbalance and threshold bias, we believed the threshold‐free AUROC was the most appropriate metric. For reference, we applied a threshold that maximizes the accuracy to obtain the sensitivity, specificity, accuracy, and precision (or positive predictive value). These results were depicted in Table S2. Notably, the data distribution of the TCIA Dataset differed much from that of the HdH Dataset in terms of patients and imaging; the comparable performances on the two datasets indicated the robustness of our method. More importantly, unlike conventional radiomics methods that require labor‐intensive manual segmentation masks, our deep learning method was compatible with approximate locations of the nodules instead.

Table 3

Presentation of our method and several previous studies in terms of methods, datasets, and resulting classification AUCs

Method	Training #Patients	Test #Patients	#EGFR muta	AUC (%)
Radiomics23	353	352	183 (24.0%)b	69
+ clinical information	353	352	183 (24.0%)b	75
Radiomics17	298	NAc	137 (46.0%)	64.7
+ clinical information	298	NAc	137 (46.0%)	70.9
Radiomics18	47	NAc	19 (40.4%)	67.0
Radiomics (This study)	464 (HdH train‐dev Dataset)	115 (HdH test Dataset)	62 (53.9%)	64.5
3D DenseNets w/mixup, ensemble (This study)	348 (HdH training Dataset)	115 (HdH test Dataset)	62 (53.9%)	75.8
Radiomics (this study)	464 (HdH train‐dev Dataset)	37 (TCIA Dataset)	9 (24.3%)	68.7
3D DenseNets w/mixup, ensemble (this study)	348 (HdH training Dataset)	37 (TCIA Dataset)	9 (24.3%)	75.0

Shown as the number of cases (percentage).

Estimated using the proportion of EGFR Mut on the entire data set, rather than the test set.

The evaluation results are based on multivariate statistical analysis, rather than the practice of training – validation (development) – test in machine learning. Since the prior studies listed in the above table used nonshared datasets independently, the results are for reference only.

Figure 2

Visualization of conventional radiomics, deep learned radiomics (DLR) and their associations. A, Cluster map of conventional radiomics, with 115 nodules on the x‐axis and 401 radiomic features on the y‐axis. Each feature was normalized into zero mean and unit standard variance. The nodules of a same cluster (adjacent columns) shared similar radiomic features in Euclidean space. The semantic label EGFR Mut/WT of each nodule was shown on the black‐grey bar below the x‐axis. B, Correlation coefficient matrix for conventional radiomics and DLR. Note the radiomic features (y‐axis in A) and the DLR features (x‐axis in D) were both aligned with the correlation coefficient matrix. C, The classification ROC curves of our radiomics‐based and DLR‐based methods. The brighter (red or blue) blocks show the higher correlation. The black denotes no correlation. D, Cluster map of DRL with 115 nodules on the y‐axis and 114 radiomic features on the x‐axis. Each feature was normalized into zero mean and unit standard variance. The nodules of a same cluster (adjacent rows) shared similar DLR in Euclidean space. The semantic label EGFR Mut/WT of each nodule was shown on the black‐grey bar on the left of the y‐axis

Presentation of our method and several previous studies in terms of methods, datasets, and resulting classification AUCs Shown as the number of cases (percentage). Estimated using the proportion of EGFR Mut on the entire data set, rather than the test set. The evaluation results are based on multivariate statistical analysis, rather than the practice of training – validation (development) – test in machine learning. Since the prior studies listed in the above table used nonshared datasets independently, the results are for reference only. Visualization of conventional radiomics, deep learned radiomics (DLR) and their associations. A, Cluster map of conventional radiomics, with 115 nodules on the x‐axis and 401 radiomic features on the y‐axis. Each feature was normalized into zero mean and unit standard variance. The nodules of a same cluster (adjacent columns) shared similar radiomic features in Euclidean space. The semantic label EGFR Mut/WT of each nodule was shown on the black‐grey bar below the x‐axis. B, Correlation coefficient matrix for conventional radiomics and DLR. Note the radiomic features (y‐axis in A) and the DLR features (x‐axis in D) were both aligned with the correlation coefficient matrix. C, The classification ROC curves of our radiomics‐based and DLR‐based methods. The brighter (red or blue) blocks show the higher correlation. The black denotes no correlation. D, Cluster map of DRL with 115 nodules on the y‐axis and 114 radiomic features on the x‐axis. Each feature was normalized into zero mean and unit standard variance. The nodules of a same cluster (adjacent rows) shared similar DLR in Euclidean space. The semantic label EGFR Mut/WT of each nodule was shown on the black‐grey bar on the left of the y‐axis

Potential association between the conventional radiomics and deep learned radiomics

Radiomics analysis provides important medical insights. However, conventional radiomics requires manual segmentation, which is a tedious process in practice. Instead, our method based on 3D deep learning predicts the labels automatically with even better performance. To our knowledge, few prior studies compare these two methods together. Assumed that deep learning networks particularly learn the representation of radiomics with end‐to‐end training, we investigated the relation between the both, and discuss the reason why deep learning provided better discriminative performance on this problem. To this end, we analyzed 401 conventional radiomic features (ICC > 0.8), against 114 deep learned radiomics (DLR) features extracted from 3D DenseNets (see the “Inference for the deep learning models” section). As illustrated in Figure 2A,D, the unsupervised clusters of both the conventional radiomics and DLR matched much with the semantic labels (Mut/WT); in other words, the continuous regions on the black‐grey bars shared numbers of similar features respectively. However, the cluster map of DLR had obviously shaper contrast than that of conventional radiomics, which indicated the DLR features are more representative and compact than the hand‐craft features. It well explained the higher classification performance of DLR than conventional radiomics. Moreover, we associated DLR and conventional radiomics, by visualizing the pairwise Pearson correlation coefficients on the HdH test Dataset with a cluster map in Figure 2B. Though the radiomic features and DLR features do not highly correlate (correlation coefficients range from −0.5 to +0.5), the cluster map indicated relatively high correlation (bright red or blue) between DLR and radiomics. It is worth noting that, most of the radiomic features were potentially correlated with at least one DLR feature, while certain DLR features showed low correlation with all the radiomic features, which potentially corresponded to the extra high‐level information with more hints on modeling the EGFR status. Considering that our DLR‐based method showed better performance, it is reasonable to assume that the DLR features encoded automatically not only almost all information of the (highly correlated) conventional radiomics with even more compact representation (more informative with lower feature dimension), but also critical discriminative information not in the conventional radiomics. To verify our assumption further, we used the 114 DLR features together with the 401 radiomics features to fit a cross validation searched logistic regression model as in our conventional radiomics analysis. An automatic machine learning tool (AutoML)36 was also applied for searching a best‐performing model. The resulting model achieved AUCs of 71.2% and 74.2% on the HdH test Dataset and TCIA Dataset, respectively, which performs poorer than using DLR alone. It implied that, combining the conventional radiomics and DLR naïvely could not boost the performance, due to the high correlation between the both. Note that the feature selection procedure to reduce the redundancy between the features had been included in the AutoML search procedure.

Ensemble vs vanilla (no ensemble)

The inference‐stage data augmentation by representing a nodule in multiple views, was not only reasonable from a medical perspective, but also made sense in our empirical results. As depicted in Table 4, this simple ensemble technique was beneficial to reducing the inference variance. The vanilla term referred to inference with a single forward. The ensemble inference produced a more stable performance, on the HdH Dataset (training, development, test) and TCIA Dataset, though the vanilla inference might produce even better results in some cases.

Table 4

Dataset summary and prediction performance of deep learning systems on HdH Dataset (training, development and test) and TCIA Dataset

Dataset	#Patients	EGFR⁺ a	AUC (w/mixup, ensemble) (%)	AUC (w/mixup, vanilla) (%)	AUC (w/o mixup, ensemble) (%)	AUC (w/o mixup, vanilla) (%)
HdH training dataset	348	185 (53.2%)	76.7	76.0	71.0	70.1
HdH development dataset	116	61 (52.6%)	74.1	74.6	69.2	70.4
HdH test dataset	115	62 (53.9%)	75.8	76.8	67.9	67.9
TCIA dataset	37	9 (24.3%)	75.0	68.3	70.6	71.4

Shown as the number of cases (percentage).

Dataset summary and prediction performance of deep learning systems on HdH Dataset (training, development and test) and TCIA Dataset Shown as the number of cases (percentage).

Mixup vs no mixup

The mixup technique was critical for training our 3D DenseNets. As shown in Table 4, the models with mixup training significantly outperformed those without in all of our experiments. The models without mixup training follows the same model selection procedure. We explained the reason for these remarkable differences with the surprisingly strong regularization effects of mixup training. As demonstrated in Figure 3, the learning curves with mixup training were much smoother, and no severe overfitting was observed. On the contrary, the training without mixup was not stable; besides, there existed typical overfitting, making the model selection impractical. Considering the unreasonable effectiveness of mixup training, it could potentially be a standard technique in medical image computing in the future.

Figure 3

The learning curves of the best models with mixup training and those without, in terms of binary cross‐entropy loss. The losses on the HdH training, development (val) set and test Dataset were shown on the figures. “epochs” on the x‐axis means the training consumes once the entire training set

The t‐SNE visualization of the deep learned radiomics

To explore the manifold structure of the DLR intuitively, we visualized the DLR features on the HdH test Dataset using t‐Distributed Stochastic Neighbor Embedding (t‐SNE)48 as illustrated in Figure 4. No distinct pattern was shown on the t‐SNE visualization, which was reasonable considering the difficulty of using the phenotype information to predict the genotype expression. Even so, two clusters of EGFR Mut and EGFR WT could be found, see Figure 4. Most of the DLR features outside of the contours of the two clusters were assigned a score with high uncertainty by the 3D DenseNets. Nevertheless, the DLR features were shown to be meaningful and representative despite the difficulty of the task.

Figure 4

t‐SNE visualization of the deep learned radiomics in a 2D space on the HdH test Dataset. A, The t‐SNE visualization scatter plot colored by the EGFR labels. Clusters can be found labeled by the ground truth. B, The t‐SNE visualization scatter plot colored by the EGFR probability score predicted by the 3D DenseNets. The prediction scores are consistent with ground truth

DISCUSSION

This study developed a deep supervised learning approach to predict EGFR‐mutant lung adenocarcinoma in CT images. Compared with conventional radiomics, the models are less prone to overfitting on the limited training data and show better prediction performance. The deep learning method is also labor‐saving since it does not require precise segmentation of nodules. More importantly, we empirically find that the deep learning models learn more representative features than conventional radiomics, named deep learned radiomics (DLR), which are the keys to better analytic performances. Our method outperformed previous studies and established a new methodology on this task. Using imaging information only, our deep 3D DenseNets achieved AUCs of 75.8% and 75.0% on the HdH test Dataset and independent public TCIA Dataset, respectively, which were better than the best of previous radiomics method (69%), and on par with their model combining additional clinical information (75%). We analyzed the associations between DLR and conventional radiomics, and empirically showed that DLR features were more representative and learned extra imaging information compared to conventional radiomics. Furthermore, we introduced the mixup training to our deep learning method, which suggested “unreasonable effectiveness” in regularizing the training of the deep neural networks with limited data. Decoding image phenotypes using radiomics to predict tumor genotypes, called radio‐genomics as well, shows promising performances and outcomes in precision medicine research. A few recent studies have investigated radiomics analysis to noninvasively predict EGFR mutation status.16, 23 For example, Rios et al23 discriminate EGFR Mut and WT cases with an AUC of 69%, with an improvement on AUC (75%) when combining clinical information. Despite these impressive results, radiomics‐based methods are limited with expressiveness of the hand‐craft features; besides, the requirement of manual segmentation makes it difficult to apply in practical clinical contexts. In contrast, our deep learning method is automatic in predicting the EGFR mutation status, which only requires an approximate location of nodule instead of labor‐intensive voxel‐wise segmentation. More importantly, the resulting DLR features are learned automatically with end‐to‐end training. In our association analysis, we found that some DLR features were not correlated with any radiomic features in our experiments, implying extra information extracted by our 3D DenseNets. Powered by dense connections31 and mixup training,32 our method achieved a state‐of‐the‐art performance. Considering that sensitizing EGFR mutations are more likely found in Asian patients,2 a selected subset from TCIA database33 was used for testing the generality of our system. Our deep learning system showed robust performance on the public data set. Radiomic features, defined by image analysis algorithms on numbers of visual characteristics, such as shape, volume, and intensity, are low‐level features without high‐level semantic information. However, these features are naturally “grey‐box” interpretable. On the other hand, DLR features are semantically high‐level thanks to the end‐to‐end learning, yet deep learning models are known to be black‐box artificial intelligence lack of the desirable interpretability, especially in the medical contexts. Our association analysis not only provides insight on understanding the DLR features, but also approaches opening the black‐box of deep learning in medical image computing by a grey‐box model. However, it is hardly possible to identify the specific correlation between radiomics and DLR, we leave this for further exploration. There were limitations in our study. Due to resource limitation, EGFR mutations were detected with ARMS‐PCR covering specific loci, thus EGFR mutations were narrowly defined in this study. Further studies with mutations detected by second‐generation sequencing are needed. Besides, information within the CT images is relative limited; integrating more available information of patients, such as clinicopathological facets, blood testing result, proteomics, and even the lifestyles, into the models can be beneficial to inferring the genotypes in multiple views. Recently, liquid biopsy, using blood as opposed to tumor samples for molecular analysis, was developed to identify EGFR mutations with promising results.49, 50 Taking advantage of these valuable information or integrating different promising approaches may improve the performance of predicting the EGFR mutation status. We consider it as a future direction. Moreover, the sample size of the independent validation was still small, and our deep leaning approach should be tested in larger cohorts. Also, this was a single‐center study, our deep learning systems desire larger datasets with more diversity. Further improvement can be made to practically help scalable precision medicine; however, the methodology established by this study, could serve as a paradigm for future studies. Lastly, only thin‐slice CT images were included in the current study to mitigate the radiomic feature variabilities. However, whether the deep learning method could perform better with different slice thickness CT images is worth further investigating. Actually, this is one of our ongoing research. In conclusion, the proposed deep learning system predicts EGFR‐mutant lung adenocarcinoma in CT images automatically and noninvasively with promising performance, indicating the potential to help clinical decision‐making by identifying eligible patients of pulmonary adenocarcinoma for EGFR‐targeted therapy. The association study between conventional radiomics and deep learned radiomics discusses the relation between the both, and approaches toward a grey‐box explanation methodology of black‐box model.

CONFLICT OF INTEREST

The authors declare that they have no conflict of interest.

AUTHORS CONTRIBUTION

Conception and design: W. Zhao, J. Yang, Y. Hua, M. Li. Development of methodology: W. Zhao, J. Yang, D. Bi, B.Ni, M.Xu, Y. Hua, M. Li. Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc): W. Zhao, J. Yang, X. Zhu, Y. Sun, L. Jin, C. li, P. Gao, Y.Hua, M. Li. Analysis and interpretation of data (eg, statistical analysis, biostatistics, computational analysis):W. Zhao, J. Yang, D.Bi, Y. Hua, M. Li. Writing, review, and/or revision of the manuscript: All authors Administrative, technical, or material support (ie, reporting or organizing data, constructing databases): W. Zhao J. Yang, P. Wang, Y. Hua, M. Li. Study supervision: W. Zhao, B. Ni, P. Wang, Y. Hua, M. Li. Other (algorithm and software development): J. Yang.

ETHICAL APPROVAL

This retrospective study was approved by the Institutional Review Board of our institution (NO.20170103), which waived the requirement for patients’ informed consent referring to the CIOMS guideline. Click here for additional data file. Click here for additional data file. Click here for additional data file.

33 in total

Review 1. New driver mutations in non-small-cell lung cancer.

Authors: William Pao; Nicolas Girard
Journal: Lancet Oncol Date: 2011-02 Impact factor: 41.316

2. Non-small cell lung cancer: identifying prognostic imaging biomarkers by leveraging public gene expression microarray data--methods and preliminary results.

Authors: Olivier Gevaert; Jiajing Xu; Chuong D Hoang; Ann N Leung; Yue Xu; Andrew Quon; Daniel L Rubin; Sandy Napel; Sylvia K Plevritis
Journal: Radiology Date: 2012-06-21 Impact factor: 11.105

3. Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer.

Authors: Eunice L Kwak; Yung-Jue Bang; D Ross Camidge; Alice T Shaw; Benjamin Solomon; Robert G Maki; Sai-Hong I Ou; Bruce J Dezube; Pasi A Jänne; Daniel B Costa; Marileila Varella-Garcia; Woo-Ho Kim; Thomas J Lynch; Panos Fidias; Hannah Stubbs; Jeffrey A Engelman; Lecia V Sequist; WeiWei Tan; Leena Gandhi; Mari Mino-Kenudson; Greg C Wei; S Martin Shreeve; Mark J Ratain; Jeffrey Settleman; James G Christensen; Daniel A Haber; Keith Wilner; Ravi Salgia; Geoffrey I Shapiro; Jeffrey W Clark; A John Iafrate
Journal: N Engl J Med Date: 2010-10-28 Impact factor: 91.245

4. Gefitinib or chemotherapy for non-small-cell lung cancer with mutated EGFR.

Authors: Makoto Maemondo; Akira Inoue; Kunihiko Kobayashi; Shunichi Sugawara; Satoshi Oizumi; Hiroshi Isobe; Akihiko Gemma; Masao Harada; Hirohisa Yoshizawa; Ichiro Kinoshita; Yuka Fujita; Shoji Okinaga; Haruto Hirano; Kozo Yoshimori; Toshiyuki Harada; Takashi Ogura; Masahiro Ando; Hitoshi Miyazawa; Tomoaki Tanaka; Yasuo Saijo; Koichi Hagiwara; Satoshi Morita; Toshihiro Nukiwa
Journal: N Engl J Med Date: 2010-06-24 Impact factor: 91.245

5. Molecular characteristics of bronchioloalveolar carcinoma and adenocarcinoma, bronchioloalveolar carcinoma subtype, predict response to erlotinib.

Authors: Vincent A Miller; Gregory J Riely; Maureen F Zakowski; Allan R Li; Jyoti D Patel; Robert T Heelan; Mark G Kris; Alan B Sandler; David P Carbone; Anne Tsao; Roy S Herbst; Glenn Heller; Marc Ladanyi; William Pao; David H Johnson
Journal: J Clin Oncol Date: 2008-03-20 Impact factor: 44.544

6. Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma.

Authors: Tony S Mok; Yi-Long Wu; Sumitra Thongprasert; Chih-Hsin Yang; Da-Tong Chu; Nagahiro Saijo; Patrapim Sunpaweravong; Baohui Han; Benjamin Margono; Yukito Ichinose; Yutaka Nishiwaki; Yuichiro Ohe; Jin-Ji Yang; Busyamas Chewaskulyong; Haiyi Jiang; Emma L Duffield; Claire L Watkins; Alison A Armour; Masahiro Fukuoka
Journal: N Engl J Med Date: 2009-08-19 Impact factor: 91.245

Review 7. Prognostic value of K-RAS mutations in patients with non-small cell lung cancer: a systematic review with meta-analysis.

Authors: Daquan Meng; Mingli Yuan; Xiaojuan Li; Lijun Chen; Jie Yang; Xin Zhao; Wanli Ma; Jianbao Xin
Journal: Lung Cancer Date: 2013-04-19 Impact factor: 5.705

Review 8. Genotyping and genomic profiling of non-small-cell lung cancer: implications for current and future therapies.

Authors: Tianhong Li; Hsing-Jien Kung; Philip C Mack; David R Gandara
Journal: J Clin Oncol Date: 2013-02-11 Impact factor: 44.544

9. Phase III study of afatinib or cisplatin plus pemetrexed in patients with metastatic lung adenocarcinoma with EGFR mutations.

Authors: Lecia V Sequist; James Chih-Hsin Yang; Nobuyuki Yamamoto; Kenneth O'Byrne; Vera Hirsh; Tony Mok; Sarayut Lucien Geater; Sergey Orlov; Chun-Ming Tsai; Michael Boyer; Wu-Chou Su; Jaafar Bennouna; Terufumi Kato; Vera Gorbunova; Ki Hyeong Lee; Riyaz Shah; Dan Massey; Victoria Zazulina; Mehdi Shahidi; Martin Schuler
Journal: J Clin Oncol Date: 2013-07-01 Impact factor: 44.544

10. Molecular testing guideline for selection of lung cancer patients for EGFR and ALK tyrosine kinase inhibitors: guideline from the College of American Pathologists, International Association for the Study of Lung Cancer, and Association for Molecular Pathology.

Authors: Neal I Lindeman; Philip T Cagle; Mary Beth Beasley; Dhananjay Arun Chitale; Sanja Dacic; Giuseppe Giaccone; Robert Brian Jenkins; David J Kwiatkowski; Juan-Sebastian Saldivar; Jeremy Squire; Erik Thunnissen; Marc Ladanyi
Journal: J Mol Diagn Date: 2013-04-04 Impact factor: 5.568

20 in total

1. Prediction of KRAS, NRAS and BRAF status in colorectal cancer patients with liver metastasis using a deep artificial neural network based on radiomics and semantic features.

Authors: Ruichuan Shi; Weixing Chen; Bowen Yang; Jinglei Qu; Yu Cheng; Zhitu Zhu; Yu Gao; Qian Wang; Yunpeng Liu; Zhi Li; Xiujuan Qu
Journal: Am J Cancer Res Date: 2020-12-01 Impact factor: 6.166

Review 2. Gut microbiome, big data and machine learning to promote precision medicine for cancer.

Authors: Giovanni Cammarota; Gianluca Ianiro; Anna Ahern; Carmine Carbone; Andriy Temko; Marcus J Claesson; Antonio Gasbarrini; Giampaolo Tortora
Journal: Nat Rev Gastroenterol Hepatol Date: 2020-07-09 Impact factor: 46.802

3. Radiomic Feature-Based Nomogram: A Novel Technique to Predict EGFR-Activating Mutations for EGFR Tyrosin Kinase Inhibitor Therapy.

Authors: Qiaoyou Weng; Junguo Hui; Hailin Wang; Chuanqiang Lan; Jiansheng Huang; Chun Zhao; Liyun Zheng; Shiji Fang; Minjiang Chen; Chenying Lu; Yuyan Bao; Peipei Pang; Min Xu; Weibo Mao; Zufei Wang; Jianfei Tu; Yuan Huang; Jiansong Ji
Journal: Front Oncol Date: 2021-08-06 Impact factor: 6.244

4. A Machine Learning-Based Predictive Model of Epidermal Growth Factor Mutations in Lung Adenocarcinomas.

Authors: Ruimin He; Xiaohua Yang; Tengxiang Li; Yaolin He; Xiaoxue Xie; Qilei Chen; Zijian Zhang; Tingting Cheng
Journal: Cancers (Basel) Date: 2022-09-25 Impact factor: 6.575

5. Development and validation of novel radiomics-based nomograms for the prediction of EGFR mutations and Ki-67 proliferation index in non-small cell lung cancer.

Authors: Yinjun Dong; Zekun Jiang; Chaowei Li; Shuai Dong; Shengdong Zhang; Yunhong Lv; Fenghao Sun; Shuguang Liu
Journal: Quant Imaging Med Surg Date: 2022-05

6. External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review.

Authors: Alice C Yu; Bahram Mohajer; John Eng
Journal: Radiol Artif Intell Date: 2022-05-04

7. Evaluating Solid Lung Adenocarcinoma Anaplastic Lymphoma Kinase Gene Rearrangement Using Noninvasive Radiomics Biomarkers.

Authors: De-Ning Ma; Xin-Yi Gao; Yi-Bo Dan; An-Ni Zhang; Wei-Jun Wang; Guang Yang; Hong-Zhou Zhu
Journal: Onco Targets Ther Date: 2020-07-16 Impact factor: 4.147