| Literature DB >> 30305102 |
Bo He1, Wei Zhao4, Jiang-Yuan Pi3, Dan Han1, Yuan-Ming Jiang1, Zhen-Guang Zhang1, Wei Zhao4.
Abstract
BACKGROUND: This study aimed at predicting the survival status on non-small cell lung cancer patients with the phenotypic radiomics features obtained from the CT images.Entities:
Keywords: CT; Non-small cell lung cancer; Radiomics; Random forest; Survival status
Mesh:
Substances:
Year: 2018 PMID: 30305102 PMCID: PMC6180390 DOI: 10.1186/s12931-018-0887-8
Source DB: PubMed Journal: Respir Res ISSN: 1465-9921
Demographic characteristics
| Characteristic | Number of Patients (%) |
|---|---|
| Gender | |
| Male | 120 (64.5%) |
| Female | 66 (35.5%) |
| Smoking Status | |
| Nonsmoker | 39 (20.9%) |
| Former smoker | 117 (62.9%) |
| Current smoker | 30 (16.2%) |
| Histology | |
| Adenocarcinoma | 154 (82.7%) |
| Squamous cell carcinoma | 29 (15.6%) |
| NOS | 3 (1.7%) |
| Treatment | |
| Surgery | 33 (17.7%) |
| Chemotherapy | 40 (21.5%) |
| Radiotherapy | 19 (10.2%) |
| Adjuvant Treatment | 40 (21.5%) |
| Unknown | 54 (29.1%) |
| Overall Survival | |
| Dead | 37 (19.9%) |
| Alive | 149 (80.1%) |
NOS not otherwise specified
This table displayed the clinical data of all 186 NSCLC patients, including the gender, smoking history, histology, treatment, and overall survival data
Fig. 1Survival Status Distribution of Patients. a Before oversampling. b After oversampling. X axis is the survival status of the patients: the blue bar represents the alive group and the orange bar represents dead group; Y axis is the number of patients in each survival group
Model ranking based on mean precision score
| Model | Mean Precision | Mean Recall | Mean Accuracy | Max Depth | Max Features | Min Samples Split | N Estimators |
|---|---|---|---|---|---|---|---|
| 149 | 0.886 | 0.892 | 0.883 | 15 | 3 | 3 | 1100 |
| 252 | 0.882 | 0.91 | 0.888 | 25 | 10 | 3 | 100 |
| 164 | 0.878 | 0.901 | 0.883 | 15 | 5 | 3 | 500 |
| 154 | 0.878 | 0.892 | 0.879 | 15 | 3 | 5 | 900 |
| 233 | 0.878 | 0.847 | 0.861 | 25 | 3 | 10 | 1100 |
This table displayed the results of automatic hyper-parameters tuning based on two evaluation standards and ranked the models based on mean precision score. The last four columns represent the values of hyper-parameters of models
Model Ranking Based on Mean Recall Score
| Model | Mean Precision | Mean Recall | Mean Accuracy | Max Depth | Max Features | Min Samples Split | N Estimators |
|---|---|---|---|---|---|---|---|
| 221 | 0.886 | 0.892 | 0.883 | 25 | 3 | 3 | 1100 |
| 279 | 0.879 | 0.892 | 0.874 | 25 | 20 | 5 | 700 |
| 153 | 0.884 | 0.883 | 0.879 | 15 | 3 | 5 | 700 |
| 225 | 0.884 | 0.883 | 0.879 | 25 | 3 | 5 | 700 |
| 146 | 0.878 | 0.883 | 0.874 | 15 | 3 | 3 | 500 |
This table displayed the results of automatic hyper-parameters tuning based on two evaluation standards and ranked the models based on mean recall scores. The last four columns represent the values of hyper-parameters of models
Fig. 2Confusion Matrix of Parameter Tuning Based on Different Evaluation. a Based on precision. b Based on recall. The horizontal line means the number of predicted in each group; the vertical line means the actual number of each survival group. The leading diagonal represents correct prediction; the minor diagonal represents incorrect prediction
Fig. 3Fifty Features with Top Gini Importance Values. X axis is the name of features and Y axis represents the Gini-importance score
Fig. 4The Radiological Images of Three Certain Samples. a-c The patients’ living statuses from (a-c) are Alive, Dead, Dead
Basic information and the value of certain features of three cases
| Features | R01–005 | R01–006 | R01–129 |
|---|---|---|---|
| Case ID | |||
| Histology | Adenocarcinoma | Adenocarcinoma | Adenocarcinoma |
| Survival Status | Alive | Alive | Dead |
| original_glszm_LargeAreaLowGrayLevelEmphasis | 0.023323 | 0.330361 | 100.1903 |
| wavelet.LLH_glszm_LargeAreaLowGrayLevelEmphasis | 0.001637 | 0.008254 | 2.941066 |
| wavelet.HHH_glcm_ClusterProminence | 400.5173 | 413.4463 | 8.821475 |
| wavelet.LLL_glszm_LargeAreaEmphasis | 2.594846 | 8.408592 | 191.8787 |
| wavelet.HLL_gldm_DependenceVariance | 0.137999 | 1.28458 | 16.99916 |
It is shown in the table that different survival status corresponds to different level of feature, and it is noteworthy that the difference between them is distinguishing
Fig. 5Precision and Recall Score as a Function of Decision Values. Blue dashed line: precision score; Green line: recall score. Y axis in the score value and X axis is decision threshold value. The intersection of the two curves are the optimal point where the trade-off of precision and recall is achieved
Fig. 6Precision and Recall with the Determined Decision Threshold. a Precision and Recall Curve (This curve shows how recall and precision changes as the decision threshold value changes. The triangle represents the decision threshold we chose). b Confusion Matrix. The horizontal line means the number of predicted in each group; the vertical line means the actual number of each survival group. The leading diagonal represents correct prediction; the minor diagonal represents incorrect prediction
Fig. 7The ROC curve for Random Forest Model Performance on the Validation Data. X axis represents false positive rate () and Y axis is true positive rate (). The diagonal dashed line means random prediction