| Literature DB >> 31921650 |
Darcie A P Delzell1, Sara Magnuson1, Tabitha Peter1, Michelle Smith1, Brian J Smith2.
Abstract
As awareness of the habits and risks associated with lung cancer has increased, so has the interest in promoting and improving upon lung cancer screening procedures. Recent research demonstrates the benefits of lung cancer screening; the National Lung Screening Trial (NLST) found as its primary result that preventative screening significantly decreases the death rate for patients battling lung cancer. However, it was also noted that the false positive rate was very high (>94%).In this work, we investigated the ability of various machine learning classifiers to accurately predict lung cancer nodule status while also considering the associated false positive rate. We utilized 416 quantitative imaging biomarkers taken from CT scans of lung nodules from 200 patients, where the nodules had been verified as cancerous or benign. These imaging biomarkers were created from both nodule and parenchymal tissue. A variety of linear, nonlinear, and ensemble predictive classifying models, along with several feature selection methods, were used to classify the binary outcome of malignant or benign status. Elastic net and support vector machine, combined with either a linear combination or correlation feature selection method, were some of the best-performing classifiers (average cross-validation AUC near 0.72 for these models), while random forest and bagged trees were the worst performing classifiers (AUC near 0.60). For the best performing models, the false positive rate was near 30%, notably lower than that reported in the NLST.The use of radiomic biomarkers with machine learning methods are a promising diagnostic tool for tumor classification. The have the potential to provide good classification and simultaneously reduce the false positive rate.Entities:
Keywords: CT image; biomarkers; lung cancer; machine learning; radiomics
Year: 2019 PMID: 31921650 PMCID: PMC6917601 DOI: 10.3389/fonc.2019.01393
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Demographics of patient cohort.
| Number of patients | 110 | 90 |
| Female | 51 (46.4%) | 63 (70.0%) |
| Male | 59 (53.6%) | 27 (30.0%) |
| Age, yrs (mean ±SD) | 65.7 ± 11.2 | 58.2 ± 13.2 |
| Pack-years (mean ±SD) | 38.4 ± 31.2 | 11.2 ± 16.9 |
| Nodule size, mm (range, mean ±SD) | 7−44, 19.1 ± 6.3 | 6−30, 15.2 ± 5.8 |
Summary of feature selection methods.
| Linear combination | lincom |
| Pairwise correlation | corr.95 |
| PCA - 0.85 cutoff | pca.85 |
| PCA - 0.90 cutoff | pca.90 |
| PCA - 0.95 cutoff | pca.95 |
| Unfiltered | nofilter |
Summary of classifiers.
| Elastic net | elasticnet | |
| Linear | Logistic regression | logistic |
| Partial least squares | pls | |
| Logistic regression with step AIC | glmStepAIC | |
| K-nearest neighbors | knn | |
| Neural network | nnet | |
| Nonlinear | Support vector machine (linear kernel) | svml |
| SVM (polynomial kernel) | svmpoly | |
| SVM (radial kernel) | svmr | |
| Bagged trees | bag | |
| Ensemble | Random forest | rf |
| Stochastic gradient boosting | gbm |
Figure 1Average AUC values (over the 50 repeated cross-validation testing sets) of each feature selection/classifier combination.
AUC values for classifiers with highest predictive performance (SD taken over the 50 cross-validation testing sets).
| elasticnet | lincom | 0.747 | 0.111 | 0.616 | 0.729 | 0.271 | 0.136 |
| svml | lincom | 0.745 | 0.112 | 0.549 | 0.765 | 0.235 | 0.126 |
| svmpoly | lincom | 0.741 | 0.113 | 0.569 | 0.781 | 0.219 | 0.132 |
| pls | lincom | 0.728 | 0.111 | 0.627 | 0.707 | 0.293 | 0.126 |
| svmr | corr.95 | 0.728 | 0.106 | 0.542 | 0.780 | 0.220 | 0.148 |
| gbm | lincom | 0.714 | 0.106 | 0.596 | 0.733 | 0.267 | 0.140 |
| glmStepAIC | corr.95 | 0.714 | 0.110 | 0.636 | 0.684 | 0.316 | 0.130 |
| nnet | lincom | 0.709 | 0.113 | 0.620 | 0.707 | 0.293 | 0.143 |
| logistic | corr.95 | 0.684 | 0.108 | 0.600 | 0.689 | 0.311 | 0.116 |
| knn | corr.95 | 0.676 | 0.109 | 0.482 | 0.738 | 0.262 | 0.117 |
| rf | corr.95 | 0.663 | 0.124 | 0.473 | 0.730 | 0.270 | 0.127 |
| bag | lincom | 0.658 | 0.106 | 0.529 | 0.702 | 0.298 | 0.146 |
Figure 2Boxplots of AUC values (over the 50 repeated cross-validation testing sets) for each feature selection method for the four best-performing classifiers.
Figure 3Boxplots of the false positive rates (over the 50 repeated cross-validation testing sets) for each feature selection method for the four best-performing classifiers.
Figure 4ROC curve for the elastic net classifier with the linear combinations filter.