Literature DB >> 30854457

Explaining Deep Features Using Radiologist-Defined Semantic Features and Traditional Quantitative Features.

Rahul Paul¹, Matthew Schabath², Yoganand Balagurunathan³, Ying Liu⁴, Qian Li⁴, Robert Gillies³, Lawrence O Hall¹, Dmitry B Goldgof¹.

Abstract

Quantitative features are generated from a tumor phenotype by various data characterization, feature-extraction approaches and have been used successfully as a biomarker. These features give us information about a nodule, for example, nodule size, pixel intensity, histogram-based information, and texture information from wavelets or a convolution kernel. Semantic features, on the other hand, can be generated by an experienced radiologist and consist of the common characteristics of a tumor, for example, location of a tumor, fissure, or pleural wall attachment, presence of fibrosis or emphysema, concave cut on nodule surface. These features have been derived for lung nodules by our group. Semantic features have also shown promise in predicting malignancy. Deep features from images are generally extracted from the last layers before the classification layer of a convolutional neural network (CNN). By training with the use of different types of images, the CNN learns to recognize various patterns and textures. But when we extract deep features, there is no specific naming approach for them, other than denoting them by the feature column number (position of a neuron in a hidden layer). In this study, we tried to relate and explain deep features with respect to traditional quantitative features and semantic features. We discovered that 26 deep features from the Vgg-S neural network and 12 deep features from our trained CNN could be explained by semantic or traditional quantitative features. From this, we concluded that those deep features can have a recognizable definition via semantic or quantitative features.

Entities: Disease Gene Species

Keywords: CNN; deep features; explainable AI; interpretation of features; quantitative features; radiomics; semantic features

Mesh：

Year: 2019 PMID： 30854457 PMCID： PMC6403047 DOI： 10.18383/j.tom.2018.00034

Source DB: PubMed Journal: Tomography ISSN： 2379-1381

Introduction

Lung cancer is one of the most common causes of malignancy worldwide, with a 5-year survival rate of 18% (1). The American Cancer Society estimates 14% of new cancer cases will be lung cancer cases for 2018, making it the second most detected cancer in the United States. They also estimate 154,050 deaths from lung cancer, which is the most in the United States in 2018 (2). As lung cancer typically remains undetected during the initial stages, ∼75% of patients with lung cancers are first diagnosed at the advanced stages (III/IV) (3). As a result, early detection and diagnosis is a high priority. Low-dose computed tomography (LDCT) is a noninvasive and widely used imaging technique for detecting lung nodules. By analyzing CT scans, radiologists can generate specific features from one's lung nodule, which could provide guidance for detection and diagnosis. These distinctive features are named semantic features. They can be categorized into the following different groups: shape (eg, lobulation), location (eg, lobe location), margin (eg, spiculation), external (eg, peripheral emphysema). With CT scans, cavitation is discovered in 22% of primary lung cancers and often the cavities in benign nodules mimic the cavities of malignant nodules, which makes precise diagnosis difficult (4). In another study (5), it was found that the risk of lung cancer can be increased 3- to 4-fold owing to emphysema among heavy smokers. Nodule size also influences cancer diagnosis and treatment (6). Hence, semantic features can be used in creating a predictor of lung cancer. Using CT scans, quantitative information from a lung nodule can be generated and analyzed using statistics, machine learning, or high-dimensional data analysis. This approach is termed radiomics (7). These quantitative features can be categorized into the following different groups: texture (eg, Law's texture features, wavelet features), size (eg, longest diameter, volume), location (eg, attached to the pleural wall, distance from the boundary). These traditional quantitative features can be used to create a biomarker for tumor prognosis, analysis, and prediction (8–10). Deep learning is an emerging approach mainly applied in recognition-, prediction-, and classification-related tasks. Propagating data through multiple hidden layers will eventually help a neural network to learn and build a representation of data, which can be used further for prediction or classification. For image data, a convolutional neural network (CNN) typically uses several convolutional kernels to extract different textures and edges before propagating the extracted information through multiple hidden layers. For lung nodule analysis, CNNs have been used effectively in recent years (11). In the medical imaging field, data are currently scarce; so, as an alternative to building a new model, transfer learning has been used (12). Convolution layers of CNNs, after learning, contain representations of edge gradients and textures, and when propagated through fully connected layers, various high-level features are posited to have been learned by the network. From fully connected layers, deep features (the outputs of units in the layer) are extracted and denoted by the number of the feature from the learning tool (the position of a neuron in a hidden layer row vector). Two pretrained CNNs were used in the work described in this paper for extracting the following deep features: the Vgg-S network (13), which was trained on the ImageNet data set (14) of color camera images and our designed CNN (15), which was trained on lung nodule images. There were 23 traditional quantitative features [RIDER subset features (16)] used in this study along with 20 semantic features, which were generated by an experienced radiologist from Tianjin Medical University Cancer Institute and Hospital, China. This study is an extension of our previous study (17), which analyzes the similarity between deep features and semantic features. In this current study, we also focused on traditional quantitative features, that is, analyzed the similarity of deep feature(s) to traditional quantitative features. The analysis was conducted by replacing ≥1 deep features with traditional quantitative or semantic feature(s). The goal was to show that equivalent classification performance can be achieved. That means those deep features contained information similar to that of the semantic or traditional quantitative features. We can equate those deep features with the name of the corresponding semantic or traditional quantitative feature. We found that location-based semantic features are difficult to replace, but size-, shape-, and texture-based semantic features can be replaced by deep feature(s). Therefore, shape and texture quantitative features can be used to explain deep feature(s). By “explain,” we mean the features can replace deep features and a classifier will achieve the same accuracy. We successfully explained 26 deep features from the Vgg-S network out of 4096 features and 12 deep features from our trained CNN by semantic and traditional quantitative features. This provides a semantic meaning for the deep features.

Methodology

Data Set

A subset of cases from the LDCT-arm of the NLST (National Lung Screening Trial) data set was chosen for this study. The NLST study was conducted over 3 years: 1 baseline scan (T0) and 2 following scans (T1 and T2) in 2 subsequent years with an interval of ∼1 year (18) between scans. For this study, a subset of nodule-positive and screen-detected lung cancer (SDLC) cases (years later) from the baseline (T0) scans were chosen, and the patient data were deidentified under an IRB-approved process. These subsets of cases were further divided into the following 2 categories: cohort 1 and cohort 2. Cohort 1 consisted of cases with a baseline scan (T0), which had a follow-up scan after 1 year (T1), wherein some of the nodules became cancerous. Whereas, cohort 2 consisted of nodules that became cancerous after 2 years (T2 scan) from the baseline scan (T0). Selection of cohorts is shown in Figure 1. Only Cohort 2 (SDLC, 85; positive control cases, 152) was chosen for our study. Between the SDLC and control-positive cases, there is no statistically significant difference with respect to sex, race age, ethnicity, and smoking (19). Nodule segmentation was performed using the Definiens software suite (20). From our initial set of cases, 52 cases were excluded owing to ≥1 of the following reasons: multiple malignant nodules, inability to identify the nodule, or unknown location of the tumor. So, finally, 185 cases (SDLC, 58; control-positive cases, 127) were selected for our study.

Figure 1.

Selection of cohort 1 and cohort 2.

Semantic Features

Semantic features were described from the CT scan of a lung tumor, by an experienced radiologist. They can be used further for diagnosis. An experienced radiologist (Y.L.) with 7 years of experience from Tianjin Medical University Cancer Institute and Hospital, China, described 20 semantic features (21–24) on a subset of cases that intersected Cohort 2. Semantic features can be categorized into the following groups: shape, size, location, margin, external attenuation, and associated findings. These features have been derived with respect to lung nodules by our group. Table 1 shows a detailed description of our semantic features.

Table 1.

Description of Semantic Features

Characteristic	Definition	Scoring
Location
1. Lobe Location	Lobe location of the nodule	Left lower lobe (5), left upper lobe (4), right lower lobe (3), right middle lobe (2), right upper lobe (1)
Size
2. Long-Axis Diameter	Longest diameter of the nodule	NA
3. Short-Axis Diameter	Longest perpendicular diameter of nodule in the same section	NA
Shape
4. Contour	Roundness of the nodule	1, round; 2, oval; 3, irregular
5. Lobulation	Wavy nodule's surface	1, none; 2, yes
6. Concavity	Concave cut on nodule surface	1, none; 2, slight concavity; 3, deep concavity
Margin
7. Border Definition	Edge appearance of the nodule	1, well defined; 2, slight poorly; 3, poorly defined
8. Spiculation	Lines radiating from the margins of tumor	1, none; 2 yes
Attenuation
9. Texture	Solid, non-solid, part solid	1, non-solid; 2, part solid; 3, solid
10. Cavitation	Presence of air in the tumor at the time of diagnosis	0, no; 1, yes
External
11. Fissure Attachment	Nodule attaches to the fissure	0, no; 1, yes
12. Pleural Attachment	Nodules attaches to the pleura	0, no; 1, yes
13. Vascular Convergence	Convergence of vessels to nodule	0, no significant convergence; 1, significant
14. Pleural Retraction	Retraction of the pleura towards nodule	0, absence of pleural retraction; 1, present
15. Peripheral Emphysema	Peripheral emphysema caused by nodule	1, absence of emphysema; 2, slight present; 3 severely present
16. Peripheral Fibrosis	Peripheral fibrosis caused by nodule	1, absence of fibrosis; 2, slight present; 3 severely present
17. Vessel Attachment	Nodule attachment to blood vessel	0, no; 1, yes
Associated Findings
18. Nodules in Primary Lobe	Any nodules suspected to be malignant or intermediate	0, no; 1, yes
19. Nodules in Nonprimary Lobe	Any nodules suspected to be malignant or intermediate	0, no; 1, yes
20. Lymphadenopathy	Lymph nodes with short- axis diameter greater than 1 cm	0, no; 1, yes

Description of Semantic Features

Traditional Quantitative Features

Definiens software (20), along with help from a radiologist, was used to segment lung nodules. Then 23 Rider stable features (16) were extracted using Definiens software. Table 2 shows a detailed description of the “traditional” quantitative features.

Table 2.

Description of Rider Stable Traditional Quantitative Features

Characteristic	Features
Size	1. Long-axis diameter
	2. Short-axis diameter
	3. Long-axis diameter × short-axis diameter
	4. Volume (cm)
	5. Volume (pixel)
	6. Number of pixels
	7. Length/width
Pixel Intensity Histogram	8. Mean (HU)
Pixel Intensity Histogram	9. Stand deviation (HU)
Tumor Location	10. 8a_3D_ is attached to pleural wall
	11. 8b_3D Relative border to lung
	12. 8c_3D_Relative border to pleural wall
	13. 9e_3D_Standard deviation_COG to border
	14. 9g_3D_max_Dist_COG to border
Tumor Shape (Roundness)	15. 9b-3D circularity
	16. 5a_3D- MacSpic
	17. Asymmetry
	18. Roundness
Run-length and Co-occurrence	19. Avg_RLN
Law's Texture Feature	20. E5 E5 L5 layer 1
	21. E5 E5 R5 layer 1
	22. E5 W5 L5 layer 1
	23. L5 W5 L5 layer 1

Description of Rider Stable Traditional Quantitative Features

Deep Features from Vgg-S Network

Nowadays CNNs are used effectively for image classification and prediction (11, 13). A CNN has many layers of convolution kernels along with multiple hidden layers, which makes the network architecture deeper, and features extracted from such a network are called “deep features.” In the medical imaging field, there is typically not enough original data available to train a CNN. As a result, transfer learning (12) is an alternative option. Applying previously learned knowledge from 1 domain to a new task domain is called transfer learning. To extract deep features from a CT scan, the 2-dimensional slice, which has the largest nodule area, was chosen for every case. We extracted only the nodule region by incorporating the largest rectangular box around the nodule. Bicubic interpolation was used to resize the nodule images to 224 × 224, which was the required input size of the Vgg-S network. Figure 2 shows a lung image with nodule and the extracted nodule region. The Vgg-S network was trained using natural camera images, which were 3-channel (R, G, B), but the nodule images were grayscale (no color component and voxel intensities of the CT images were converted to 0-255). So, the same grayscale nodule image was used 3 times to mimic an image with 3 color channels and then normalization was performed using the appropriate color channel image. The deep features were generated from the last fully connected layer after applying the ReLU activation function. The size of the feature vector was 4096.

Figure 2.

(Left) lung image with nodule inside outlined in blue (nodule pixel size =0.74 mm), with box used for extracted nodule in red, (Right) extracted nodule.

Deep Features from Our Trained CNN

We also experimented by extracting deep features from our designed CNN network (15). Augmented nodule images of Cohort 1 were used to train our CNN architecture. Each nodule image was augmented first by being flipped horizontally and vertically and then all images were rotated by 15°. Keras (25) with a Tensorflow (26) backend was used to train our CNN. We used the same 2-dimensional slice from a nodule for training the CNN and for transfer learning using the Vgg-S network. The input image size for the CNN architecture was 100 × 100 pixels. The augmented data set was divided into the following 2 parts: 70% of the data for training and the remaining 30% for validation. The CNN was trained for 100 epochs with 0.0001 learning rate with RMSprop (27) optimization and binary cross-entropy as loss function. A batch size of 16 was chosen for training and validation. L2 regularization (28) along with dropout (29) was used to reduce overfitting of our small and shallow CNN network. Our designed CNN is described in detail in Table 3. The deep features were extracted from the last layer before the classification layer. The size of the feature vector was 1024. After applying the ReLU activation function, some features will be all zeros because ReLU truncates the negative feature values to zero. We removed such features, and as a result, the final number of feature vectors from Vgg-S pretrained CNN and our trained CNN became 3844 and 560, respectively.

Table 3.

Our Designed CNN architecture

Layers	Parameter	Total Parameters
Left branch
Input Image	100 × 100
Max Pool 1	10 × 10
Dropout	0.1
Right branch
Input Image	100 × 100
Conv 1	64 × 5 × 5, pad 0, stride 1
Leaky ReLU	alpha = 0.01
Max Pool 2a	3 × 3, pad 0, stride 3	39,553
Conv 2	64 × 2 × 2, pad 0, stride 1
Leaky ReLU	alpha = 0.01
Max Pool 2b	3 × 3, pad 0, stride 3
Dropout	0.1
Concatenate Left Branch + Right Branch
Conv 3 + ReLU	64 × 2 × 2, pad 0, stride 1
Max Pool 3	2 × 2, pad 0, stride 2
L2 regularizer	0.01
Dropout	0.1
Fully Connected 1	1 sigmoid

Our Designed CNN architecture

Experiments and Results

This section describes the procedure of representing deep feature(s) using semantic or traditional quantitative features. Wrapper feature selection (30) was applied on traditional quantitative or semantic features of Cohort 2 to select the best subset of features with maximum accuracy. Backward feature selection using the best first strategy and random forests classifier (31) with 200 trees was applied using the wrapper approach. Tenfold cross-validation was used for selecting the best subset of features. We analyzed quantitative features and semantic features separately. A subset of 9 quantitative features was chosen and it enabled a maximum accuracy of 84.32% (AUC 0.87), whereas a subset of 13 semantic features were selected, enabling a maximum accuracy of 83.78% (AUC 0.84). Here, we aim to use semantic features or traditional quantitative features to interpret/explain deep feature(s).

Explaining Deep Features With Respect to Semantic Features

The chosen semantic features (13) were location, long-axis diameter, short-axis diameter, lobulation, concavity, border definition, spiculation, texture, cavitation, vascular convergence, vessel attachment, perinodule fibrosis, and nodules in primary tumor lobe. After selecting the best subset of semantic features, the correlation coefficient (Pearson correlation coefficient) was calculated for each semantic feature with the deep features, and the 5 most correlated features for each semantic feature were selected. We then replaced each semantic feature with the correlated deep feature(s) and checked whether the same classification accuracy of 83.78% could be achieved. Our purpose for the study was to determine if semantic features could explain deep features. To do this, we replaced each semantic feature by ≥1 deep features to see if the same classification accuracy could be achieved. We replaced 1 semantic feature at a time from the subset of 13 features and substituted that semantic feature by, at first, the most correlated deep feature and, then 2 most correlated deep features and proceeded similarly to add features until the 5 most correlated deep features had been used as replacements. The accuracy was calculated using a random forests classifier with 200 trees using 10-fold cross-validation. Deep features from Vgg-S pretrained CNN and our trained CNN were examined separately. Figure 3 shows the approach taken for the analysis.

Figure 3.

Overview of the approach taken in this study.

Overview of the approach taken in this study. After replacing a feature with deep features extracted from the Vgg-S pretrained CNN, we secured the same original classification accuracy of 83.78% for the following 8 semantic features: long-axis diameter, lobulation, concavity, spiculation, texture, cavitation, vascular convergence, and peripheral fibrosis. Using the deep features acquired from our trained CNN, we achieved the same original classification accuracy of 83.78% for the following 4 semantic features: long-axis diameter, concavity, cavitation, nodules in primary tumor lobe. We found that 3 semantic features (long-axis diameter, concavity, cavitation) could be used to explain both deep features from Vgg-S and our trained CNN. Five semantic features could be used to explain only deep features from Vgg-S, and only 1 semantic feature could be used to explain deep features from our trained CNN. The Vgg-S network was trained on camera images from at least 1000 classes of objects, but not lung nodule images. The large training set helped the network to develop general features and which in turn were explained by texture, spiculation, lobulation, vascular convergence, and peripheral fibrosis. The replacement of the first 3 and the last feature appear to result from training on lots of images of different types. Table 4 shows the performance of each semantic feature after removing 1 semantic feature at a time from the subset of 13 features. So, we only calculated classification performance of 12 features at a time using random forests classifier using 10-fold cross-validation, to check whether by removing each feature, there was a change in classification accuracy. In Table 4, we show only the semantic features out of the chosen 13 feature subsets that could be used to explain deep feature(s). Table 5 shows the explainable deep features and their equivalent semantic feature(s). We also show the correlation value of each deep feature with a semantic feature in Table 5.

Table 4.

Classification performance After Features Removal

Features	Feature Names	Accuracy
Semantic Features	Long-axis diameter	82.70 (0.82)
	Lobulation	82.70 (0.83)
	Concavity	83.24 (0.83)
	Spiculation	83.24 (0.83)
	Texture	82.70 (0.83)
	Cavitation	82.70 (0.83)
	Vascular convergence	83.24 (0.84)
	Peripheral fibrosis	82.70 (0.83)
	Nodules in primary lobe	81.62 (0.83)
Traditional Quantitative Features	9b-3D circularity	82.16 (0.86)
	Roundness	82.70 (0.87)
	L5W5L5 layer 1	82.70 (0.87)

These features were from our chosen subset of features, leaving 12 features for training/testing.

Table 5.

Semantic and Traditional Quantitative Features and Corresponding Deep Feature(s)

Features	Feature Names	Deep Features from Vgg-S With Correlation Value					Deep Features from Our Trained CNN With Correlation Value
Semantic Features	Long-axis diameter	3353		2135			230
	Long-axis diameter	0.4334		0.42			0.3055
	Lobulation	3534		1372	2975	2111	NA
	Lobulation	0.5742		0.5614	0.5611	0.5520	NA
	Concavity	3534	2975	1372	2111	3246	547		440
	Concavity	0.5	0.4839	0.4837	0.475	0.4612	0.1776		0.1514
	Spiculation	2811					NA
	Spiculation	0.4111					NA
	Texture	1201		3350			NA
	Texture	−0.3119		0.2936			NA
	Cavitation	3353		526			395
	Cavitation	0.3888		0.3551			0.2748
	Vascular convergence	1464		2115			NA
	Vascular convergence	0.7052		0.701			NA
	Peripheral fibrosis	3305		3064			NA
	Peripheral fibrosis	0.2076		0.2043			NA
	Nodules in primary lobe	NA					425		57
	Nodules in primary lobe	NA					0.1871		0.1836
Traditional Quantitative Features	Roundness	1395		2510			160		20
	Roundness	0.3		0.27			0.16		0.13
	9b-3d circularity	1395		1757	3401	2777	160		20
	9b-3d circularity	0.24		−0.234	−0.2069	−0.2069	0.14		0.13
	L5W5L5 layer 1	51	66	163	476	928	547	169	265	309
	L5W5L5 layer 1	0.77	0.75	0.69	0.69	0.69	0.28	0.27	0.26	0.26

Classification performance After Features Removal These features were from our chosen subset of features, leaving 12 features for training/testing. Semantic and Traditional Quantitative Features and Corresponding Deep Feature(s) After replacing semantic features with deep feature(s), similar classification performance was obtained for 9 semantic features. For example, 2 deep features (3353 and 526) from the Vgg-S network could achieve the same classification performance of 83.78% if used in place of cavitation. The deep features 3353 and 526 had the correlation of 0.388 and 0.3551, respectively, with the semantic feature cavitation. Whereas, the deep feature 395 from our trained CNN, which had a correlation coefficient of 0.2748, was explained by cavitation. Similarly, 2 deep features (3353 and 2135) from the Vgg-S network and 1 deep feature (230) using the features from our trained CNN were explained long-axis diameter by providing equivalent performance.

Explaining Deep Features Using Traditional Quantitative Features

The 9 traditional quantitative features that enabled the best accuracy were: Mean (HU), 8a-3D_is_attached to pleural wall, 8c-3D_Relative border to pleural wall, 9b-3D circularity, Asymmetry, Roundness, Volume, E5W5L5, and L5W5L5. The Pearson correlation coefficient was calculated for each traditional quantitative feature with the deep features and the top 5 correlated deep features were selected to replace each traditional quantitative feature. We replaced each traditional quantitative feature by ≥1 deep features to try to achieve the same classification accuracy of 84.32%. After replacing deep features extracted from the Vgg-S pretrained CNN, we got the same original classification accuracy of 84.32% for the following 3 traditional quantitative features: 9b-3D circularity, roundness, and L5W5L5 layer 1. Hence, they can be used to explain what the deep features that replaced them have learned. Traditional quantitative features consist of tumor size, tumor shape, Law's texture features, tumor location, etc. As we have seen earlier for semantic features, deep features could be explained by shape-based quantitative features. In Table 4, we only show the 3 quantitative features that can be replaced (used to explain) deep feature(s). Table 5 shows the quantitative features, their equivalent deep feature(s), and correlations.

Discussion

We showed that some deep features can be explained by a semantic feature or traditional quantitative feature. From a lung nodule CT image, experienced radiologists generated semantic features of different types of information regarding a lung nodule, for example, size, shape, location of nodule, the boundary of the nodule, attachment to the vessel, fibrosis information, etc. These features were shown to provide useful information toward the prognosis and diagnosis of lung cancer. From a tumor phenotype, quantitative information can be extracted using various data characterization approaches, and these features are called traditional quantitative features. Deep features are extracted from a CNN, generally from the last layer before the final classification layer. For this study, deep features were extracted from the last fully connected layer of the following 2 pretrained CNNs: the Vgg-S network, which was trained on the ImageNet data set, and our designed CNN, which was trained on LDCT lung nodule images. The Vgg-S architecture is a network with 5 convolution layers followed by 3 fully connected layers. Our designed CNN is a small and shallow network with 3 convolution layers and 1 fully connected layer. As the Vgg-S network was trained on a large set of classes of camera images, various textures and other features were extractable, which can be used effectively for tumor classification. Our trained CNN was trained with LDCT lung nodule images and gave us better performance than transfer learning in our previous study (15). In this study, we attempted to explain deep features using semantic or traditional quantitative features. A subset of features was chosen from the semantic or traditional quantitative features using a wrapper with a random forests classifier. For the semantic features, the best subset had 13 features with an accuracy of 83.78% (AUC 0.84), whereas from traditional quantitative features, the size of the best subset was 9 features with an accuracy of 84.32% (AUC 0.87). The Pearson correlation coefficient was calculated with each of the chosen semantic features or traditional quantitative features and the deep features. For every semantic or traditional quantitative feature, the top 5 most correlated deep features were chosen. Now, from our chosen subset of semantic or traditional quantitative features, 1 feature was removed, and it was substituted by the most correlated deep feature and classification performance was calculated. With a single substituted deep feature, if we can achieve the classification performance then stop; otherwise, substitute that semantic feature or traditional quantitative feature by the 2 most correlated features and continue this process until the 5 most correlated deep features have been used. In total, 26 deep features from the Vgg-S network and 12 deep features from our trained CNN were explained by 9 semantic features and 3 traditional quantitative features. From this, we hypothesized that those deep features can have a recognizable definition from semantic or quantitative features. That is, those deep features can be given some meaningful definition. We also trained our CNN on cohort 2 (all 237 cases) and then extracted deep features for only the subset of 185 cases for which semantic features were available. The deep feature vector size was 1024. We removed all zero features to get 699 features from cohort 2. We then used these deep features to represent semantic and quantitative features. We found that some additional semantic features could be used to explain deep features from our CNN trained on cohort 1 (shown in Table 5) in addition to the ones previously found useful. Lobulation, spiculation, vascular convergence, perinodule fibrosis and border definition could explain features from our new deep feature set (CNN trained on cohort 2 data only). Among these semantic features, “border definition” was found to explain 4 deep features (147, 160, 504, and 372) and it could not explain any deep features from Vgg-S or our CNN (trained on cohort 1). For this study, we extracted only the nodule region from a CT slice. As the nodule region was extracted the information regarding pleural wall attachment, fissure attachment, relative border to the lung, or distance was lost. However, deep features from our trained CNN were explained by only 1 location-based semantic feature (nodules in primary lobe). For training the CNN, we performed data augmentation by rotation and flipping, which enabled the extracted deep features to achieve comparable accuracy. The deep features capture the boundary and shape information quite well because that information could be obtained from the extracted nodule region, and thus, 2 traditional quantitative features (9b-3D-circularity and roundness) and 3 semantic features (lobulation, concavity, and spiculation) were able to explain deep features. Deep features are known to grasp texture-based information as well. As a result, L5W5L5 Law's texture feature and cavitation were useful for explaining deep features. We also found out that deep features 3353, 3534, 1372, 2975, and 2111 from the Vgg-S network were correlated with and explained by >1 semantic features, and feature 1395 was correlated with and explained by 2 traditional quantitative features (roundness and 9b_3D_circularity). Deep features 160 and 20 from our trained CNN network were explained by 2 traditional quantitative features (roundness and 9b_3D_circularity). In this work, the 5 most correlated features were used to replace a semantic or radiomics feature. Our requirement was some nonzero correlation. Now, with all the comparisons, there will potentially be some spurious correlations. Hence, the Bonferroni correction was used to look at the significance of correlations between deep features and every semantic (or radiomics) feature. As an example, cavitation could be replaced by 2 deep features from the Vgg-S network. Fea 1 (3353) had an original P value = 4.8651e-08 and fea 2 (526) had an original P value = 7.0822e-07. After the Bonferroni correction, the P value of fea 1 was 9.73e-08 and that of fea 2 was 1.4164e-06. Now both Bonferroni-corrected P-values were less than the more rigorous significance level. However, when combined, they added more information to our model and hence appear to be associated with cavitation. After using the Bonferroni correction, we found some of the features with the 5 highest correlation values did not have a significant correlation with a semantic or radiomics feature. Nonetheless, the weakly correlated features were able to explain some CNN features. We interpret this to mean that insignificant, but nonzero, correlations taken together can provide insight into (some) deep features. In total, 26 deep features from the Vgg-S network and 12 deep features from our trained CNN were explained by 9 semantic features and three traditional quantitative features.

Conclusions

The recent success of CNNs in various classification-type tasks leads to the question of what they have learned. Here, deep features are explained with respect to semantic features and traditional quantitative features. In this study, we found explanations for 26 deep features from the Vgg-S network out of 4096 features and 12 deep features from our trained CNN by semantic and traditional quantitative features. One can also look at this as providing semantic information about deep features. Although there has been some research (32–39) regarding semantic understanding of natural scenes using deep CNN features, to our knowledge, this is the first work to explain deep features with respect to traditional quantitative features and semantic features extracted from a lung nodule. In the future, deep features with semantic meaning can be included in biomarkers for tumor prognosis and diagnosis of lung nodules from CT scans, along with semantic features and traditional quantitative features. There were 2 limitations in our study, first, only 10-fold cross-validation was used to evaluate the performance as we had a limited set of expensive to obtain semantic information. The second limitation of our study was using a single slice for every patient to extract deep features, whereas semantic information was generated from multiple slices. In the future with more semantic annotated data, we will investigate deep features from a 3D CNN.

16 in total

1. Guidelines for Management of Incidental Pulmonary Nodules Detected on CT Images: From the Fleischner Society 2017.

Authors: Heber MacMahon; David P Naidich; Jin Mo Goo; Kyung Soo Lee; Ann N C Leung; John R Mayo; Atul C Mehta; Yoshiharu Ohno; Charles A Powell; Mathias Prokop; Geoffrey D Rubin; Cornelia M Schaefer-Prokop; William D Travis; Paul E Van Schil; Alexander A Bankier
Journal: Radiology Date: 2017-02-23 Impact factor: 11.105

2. Reduced lung-cancer mortality with low-dose computed tomographic screening.

Authors: Denise R Aberle; Amanda M Adams; Christine D Berg; William C Black; Jonathan D Clapp; Richard M Fagerstrom; Ilana F Gareen; Constantine Gatsonis; Pamela M Marcus; JoRean D Sicks
Journal: N Engl J Med Date: 2011-06-29 Impact factor: 91.245

3. Effect of emphysema on lung cancer risk in smokers: a computed tomography-based assessment.

Authors: Yan Li; Stephen J Swensen; Leman Günbey Karabekmez; Randolph S Marks; Shawn M Stoddard; Ruoxiang Jiang; Joel B Worra; Fang Zhang; David E Midthun; Mariza de Andrade; Yong Song; Ping Yang
Journal: Cancer Prev Res (Phila) Date: 2010-11-30

4. Cancer statistics, 2015.

Authors: Rebecca L Siegel; Kimberly D Miller; Ahmedin Jemal
Journal: CA Cancer J Clin Date: 2015-01-05 Impact factor: 508.702

5. Radiological Image Traits Predictive of Cancer Status in Pulmonary Nodules.

Authors: Ying Liu; Yoganand Balagurunathan; Thomas Atwater; Sanja Antic; Qian Li; Ronald C Walker; Gary T Smith; Pierre P Massion; Matthew B Schabath; Robert J Gillies
Journal: Clin Cancer Res Date: 2016-09-23 Impact factor: 12.531

6. Test-retest reproducibility analysis of lung CT image features.

Authors: Yoganand Balagurunathan; Virendra Kumar; Yuhua Gu; Jongphil Kim; Hua Wang; Ying Liu; Dmitry B Goldgof; Lawrence O Hall; Rene Korn; Binsheng Zhao; Lawrence H Schwartz; Satrajit Basu; Steven Eschrich; Robert A Gatenby; Robert J Gillies
Journal: J Digit Imaging Date: 2014-12 Impact factor: 4.056

7. CT Features Associated with Epidermal Growth Factor Receptor Mutation Status in Patients with Lung Adenocarcinoma.

Authors: Ying Liu; Jongphil Kim; Fangyuan Qu; Shichang Liu; Hua Wang; Yoganand Balagurunathan; Zhaoxiang Ye; Robert J Gillies
Journal: Radiology Date: 2016-03-03 Impact factor: 11.105

8. Lung cancer survival and stage at diagnosis in Australia, Canada, Denmark, Norway, Sweden and the UK: a population-based study, 2004-2007.

Authors: Sarah Walters; Camille Maringe; Michel P Coleman; Michael D Peake; John Butler; Nicholas Young; Stefan Bergström; Louise Hanna; Erik Jakobsen; Karl Kölbeck; Stein Sundstrøm; Gerda Engholm; Anna Gavin; Marianne L Gjerstorff; Juanita Hatcher; Tom Børge Johannesen; Karen M Linklater; Colleen E McGahan; John Steward; Elizabeth Tracey; Donna Turner; Michael A Richards; Bernard Rachet
Journal: Thorax Date: 2013-02-11 Impact factor: 9.139

9. Differences in Patient Outcomes of Prevalence, Interval, and Screen-Detected Lung Cancers in the CT Arm of the National Lung Screening Trial.

Authors: Matthew B Schabath; Pierre P Massion; Zachary J Thompson; Steven A Eschrich; Yoganand Balagurunathan; Dmitry Goldof; Denise R Aberle; Robert J Gillies
Journal: PLoS One Date: 2016-08-10 Impact factor: 3.240

10. Radiomics: Images Are More than Pictures, They Are Data.

Authors: Robert J Gillies; Paul E Kinahan; Hedvig Hricak
Journal: Radiology Date: 2015-11-18 Impact factor: 11.105

7 in total

Review 1. Understanding artificial intelligence based radiology studies: What is overfitting?

Authors: Simukayi Mutasa; Shawn Sun; Richard Ha
Journal: Clin Imaging Date: 2020-04-23 Impact factor: 1.605

Review 2. Systematic review: radiomics for the diagnosis and prognosis of hepatocellular carcinoma.

Authors: Emily Harding-Theobald; Jeremy Louissaint; Bharat Maraj; Edward Cuaresma; Whitney Townsend; Mishal Mendiratta-Lala; Amit G Singal; Grace L Su; Anna S Lok; Neehar D Parikh
Journal: Aliment Pharmacol Ther Date: 2021-08-12 Impact factor: 9.524

Review 3. Radiomics in immuno-oncology.

Authors: Z Bodalal; I Wamelink; S Trebeschi; R G H Beets-Tan
Journal: Immunooncol Technol Date: 2021-04-16

4. Prognostic Value of Transfer Learning Based Features in Resectable Pancreatic Ductal Adenocarcinoma.

Authors: Yucheng Zhang; Edrise M Lobo-Mueller; Paul Karanicolas; Steven Gallinger; Masoom A Haider; Farzad Khalvati
Journal: Front Artif Intell Date: 2020-10-05

5. Multisite Technical and Clinical Performance Evaluation of Quantitative Imaging Biomarkers from 3D FDG PET Segmentations of Head and Neck Cancer Images.

Authors: Brian J Smith; John M Buatti; Christian Bauer; Ethan J Ulrich; Payam Ahmadvand; Mikalai M Budzevich; Robert J Gillies; Dmitry Goldgof; Milan Grkovski; Ghassan Hamarneh; Paul E Kinahan; John P Muzi; Mark Muzi; Charles M Laymon; James M Mountz; Sadek Nehmeh; Matthew J Oborski; Binsheng Zhao; John J Sunderland; Reinhard R Beichel
Journal: Tomography Date: 2020-06

6. Deep Feature Stability Analysis Using CT Images of a Physical Phantom Across Scanner Manufacturers, Cartridges, Pixel Sizes, and Slice Thickness.

Authors: Rahul Paul; Mohammed Shafiq-Ul Hassan; Eduardo G Moros; Robert J Gillies; Lawrence O Hall; Dmitry B Goldgof
Journal: Tomography Date: 2020-06

7. Quantitative Imaging Enters the Clinical Arena: A Personal Viewpoint.

Authors: Robert J Nordstrom
Journal: Tomography Date: 2020-06

7 in total