| Literature DB >> 28066809 |
Rahul Paul1, Samuel H Hawkins1, Yoganand Balagurunathan2, Matthew B Schabath3, Robert J Gillies2, Lawrence O Hall1, Dmitry B Goldgof1.
Abstract
Lung cancer is the most common cause of cancer-related deaths in the USA. It can be detected and diagnosed using computed tomography images. For an automated classifier, identifying predictive features from medical images is a key concern. Deep feature extraction using pretrained convolutional neural networks (CNNs) has recently been successfully applied in some image domains. Here, we applied a pretrained CNN to extract deep features from 40 computed tomography images, with contrast, of non-small cell adenocarcinoma lung cancer, and combined deep features with traditional image features and trained classifiers to predict short- and long-term survivors. We experimented with several pretrained CNNs and several feature selection strategies. The best previously reported accuracy when using traditional quantitative features was 77.5% (area under the curve [AUC], 0.712), which was achieved by a decision tree classifier. The best reported accuracy from transfer learning and deep features was 77.5% (AUC, 0.713) using a decision tree classifier. When extracted deep neural network features were combined with traditional quantitative features, we obtained an accuracy of 90% (AUC, 0.935) with the 5 best post-rectified linear unit features extracted from a vgg-f pretrained CNN and the 5 best traditional features. The best results were achieved with the symmetric uncertainty feature ranking algorithm followed by a random forests classifier.Entities:
Keywords: adenocarcinoma; computed tomography; deep features; deep neural network; lung cancer; pre-trained CNN; symmetric uncertainty; transfer learning
Year: 2016 PMID: 28066809 PMCID: PMC5218828 DOI: 10.18383/j.tom.2016.00211
Source DB: PubMed Journal: Tomography ISSN: 2379-1381
vgg-F Architecture
| Conv 1 | 64 × 11 × 11 st. 4, pad 0 |
| Conv 2 | 256 × 5 × 5 st. 1, pad 2 |
| Conv 3 | 256 × 3 × 3 st. 1, pad 1 |
| Conv 4 | 256 × 3 × 3 st. 1, pad 1 |
| Conv 5 | 256 × 3 × 3 st. 1, pad 1 |
| Full 6 | 4096 dropout |
| Full 7 | 4096 dropout |
| Full 8 | 1000 softmax |
vgg-S Architecture
| Conv 1 | 96 × 7 × 7 st. 2, pad 0 |
| Conv 2 | 256 × 5 × 5 st. 1, pad 1 |
| Conv 3 | 512 × 3 × 3 st. 1, pad 1 |
| Conv 4 | 512 × 3 × 3 st. 1, pad 1 |
| Conv 5 | 512 × 3 × 3 st. 1, pad 1 |
| Full 6 | 4096 dropout |
| Full 7 | 4096 dropout |
| Full 8 | 1000 softmax |
Figure 1.Example computed tomography (CT) slices of long-term (A) and short-term (B) survival groups with tumors outlined.
Demographic Summary of Patients in the Data Set
| Characteristics | Short Survival Class | Long Survival Class | |
|---|---|---|---|
| Age, mean (SD) | 69 (8.07) | 64.45 (9.75) | 0.1161 (Unpaired student |
| Sex, N (%) | |||
| Male | 12 (60%) | 7 (35%) | 0.2049 (Fisher exact test) |
| Female | 8 (40%) | 13 (65%) | |
| Race | |||
| White | 20 (100%) | 20 (100%) | 1 (Fisher exact test) |
| Black, Asian, and Others | 0 (0%) | 0 (0%) | |
| Ethnicity, N (%) | |||
| Hispanic or Latino | 1 (5%) | 0 (0%) | 1 (Fisher exact test) |
| Neither Hispanic/Latino and unknown | 19 (95%) | 20 (100%) | |
| Histology, N (%) | |||
| Adenocarcinoma | 20 (100%) | 20 (100%) | |
| Squamous cell carcinoma | 0 (0%) | 0 (0%) | |
| Other, NOS, unknown | 0 (0%) | 0 (0%) | |
| Stage, N (%) | |||
| I | 4 (20%) | 10 (50%) | |
| II | 5 (25%) | 5 (25%) | |
| III | 10 (50%) | 3 (15%) | |
| IV | 1 (5%) | 2 (10%) | |
| Carcinoid, unknown | 0 (0%) | 0 (0%) | |
| Tobacco Use, N% | |||
| Moderate (1–2 PPD) | 4 (20%) | 4 (20%) | |
| Light (<1 PPD) | 0 (0%) | 1 (5%) | |
| HIST | 12 (60%) | 12 (60%) | |
| None | 0 (0%) | 3 (15%) | |
| Cigarettes Nos | 4 (20%) | 0 (0%) |
Figure 2.Example of lung patch used for the warped approach.
Figure 3.Example of lung patch used for the cropped approach.
Selected Results
| vgg-m (postReLU 5 Features) | vgg-m (postReLU 5 Features) | vgg-f (postReLU 5 Features) | vgg-f (postReLU Features) | vgg-f (postReLU Features) | |
| Deep features | Deep features | Deep features | Mixed (Deep + Traditional quantitative) features | Mixed (Deep + Traditional quantitative) features | |
| Single | Single | Multiple | Single | Multiple | |
| Decision tree | Random forest | Random forest | Naïve bayes | Random forest | |
| Symmetric uncertainty | Symmetric uncertainty | Symmetric uncertainty | Relief-f | Symmetric uncertainty | |
| 5 | 5 | 5 | 10 (5 Deep + 5 Traditional quantitative image features) | 10 (5 Deep + 5 Traditional quantitative image features) | |
| 82.5% | 72.5% | 87.5% | 90% | 90% | |
| 0.778 | 0.804 | 0.875 | 0.935 | 0.935 |
Abbreviations: CNN, convolutional neural network; ReLU, rectified linear unit; AUC, area under the curve.
vgg-M Architecture
| Conv 1 | 96 × 7 × 7 st. 2, pad 0 |
| Conv 2 | 256 × 5 × 5 st. 2, pad 1 |
| Conv 3 | 512 × 3 × 3 st. 1, pad 1 |
| Conv 4 | 512 × 3 × 3 st. 1, pad 1 |
| Conv 5 | 512 × 3 × 3 st. 1, pad 1 |
| Full 6 | 4096 dropout |
| Full 7 | 4096 dropout |
| Full 8 | 1000 softmax |