| Literature DB >> 32029785 |
Bora Lee1, Sang Hoon Chun2, Ji Hyung Hong2, In Sook Woo2, Seoree Kim2, Joon Won Jeong2, Jae Jun Kim3, Hyun Woo Lee4, Sae Jung Na5, Kyongmin Sarah Beck5, Bomi Gil5, Sungsoo Park1, Ho Jung An6, Yoon Ho Ko7,8.
Abstract
Accurate prediction of non-small cell lung cancer (NSCLC) prognosis after surgery remains challenging. The Cox proportional hazard (PH) model is widely used, however, there are some limitations associated with it. In this study, we developed novel neural network models called binned time survival analysis (DeepBTS) models using 30 clinico-pathological features of surgically resected NSCLC patients (training cohort, n = 1,022; external validation cohort, n = 298). We employed the root-mean-square error (in the supervised learning model, s- DeepBTS) or negative log-likelihood (in the semi-unsupervised learning model, su-DeepBTS) as the loss function. The su-DeepBTS algorithm achieved better performance (C-index = 0.7306; AUC = 0.7677) than the other models (Cox PH: C-index = 0.7048 and AUC = 0.7390; s-DeepBTS: C-index = 0.7126 and AUC = 0.7420). The top 14 features were selected using su-DeepBTS model as a selector and could distinguish the low- and high-risk groups in the training cohort (p = 1.86 × 10-11) and validation cohort (p = 1.04 × 10-10). When trained with the optimal feature set for each model, the su-DeepBTS model could predict the prognoses of NSCLC better than the traditional model, especially in stage I patients. Follow-up studies using combined radiological, pathological imaging, and genomic data to enhance the performance of our model are ongoing.Entities:
Mesh:
Year: 2020 PMID: 32029785 PMCID: PMC7005286 DOI: 10.1038/s41598-020-58722-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Baseline characteristics of the training and validation cohorts.
| Characteristic | Training cohort | External validation cohort | ||
|---|---|---|---|---|
| Age (years) | Median (range) | 66 (33–86) | 66 (25–85) | 0.387 |
| Gender | Male | 666 (65.2) | 195 (65.4) | 0.931 |
| Female | 356 (34.8) | 103 (34.6) | ||
| Smoking history | Never | 461 (45.7) | 132 (45.2) | 0.884 |
| Former/current | 548 (54.3) | 160 (54.8) | ||
| ECOG | 0 | 503 (49.2) | 157 (52.7) | 0.292 |
| performance status | 1 | 519 (50.8) | 141 (47.3) | |
| CEA | ng/mL | 2.1 (1.0–230.1) | 1.9 (1.0–1070.9) | 0.428 |
| WBC | 106/L | 7,399 ± 3.8 | 7,611 ± 3.3 | 0.381 |
| Neutrophil | % | 59.7 ± 31.1 | 60.0 ± 12.4 | 0.722 |
| lymphocyte | % | 29.9 ± 18.7 | 28.5 ± 10.9 | 0.227 |
| Haemoglobin | g/dL | 13.2 ± 3.8 | 13.0 ± 1.7 | 0.518 |
| Platelet | 109/L | 237 ± 79 | 239 ± 83 | 0.610 |
| C-reactive protein | mg/dL | 0.14 (0.0–34.1) | 0.12 (0.02–23.6) | 0.450 |
| Pulmonary function | FEV1 (L) | 2.4 (0.08–352.0) | 2.37 (0.96–139.0) | 0.915 |
| DLCo (%) | 85 (8–173) | 83 (9–159) | 0.327 | |
| Histology | Adenocarcinoma | 651 (63.7) | 2003 (67.1) | 0.442 |
| Squamous | 303 (29.6) | 77 (25.8) | ||
| others | 68 (6.7) | 21 (7.0) | ||
| Tumour size | cm | 2.5 (0.4–13.0) | 2.5 (0.3–13.0) | 0.226 |
| No. of LN positivity | 0 (0–23) | 0 (0-31) | 0.847 | |
| T stage | T1 | 474 (46.5) | 160 (53.7) | 0.064 |
| T2 | 433 (42.5) | 114 (38.3) | ||
| T3/4 | 113 (11.1) | 24 (8.1) | ||
| N stage | N0 | 757 (74.9) | 239 (80.5) | 0.136 |
| N1 | 135 (13.4) | 30 (10.1) | ||
| N2 | 119 (11.8) | 28 (9.4) | ||
| TNM stage | I | 669 (65.7) | 211 (71.0) | 0.218 |
| II | 200 (19.6) | 50 (16.8) | ||
| III | 150 (14.7) | 36 (12.1) | ||
| Tumor differentiation | Well | 197 (19.7) | 75 (25.6) | 0.054 |
| Moderately | 603 (60.4) | 156 (53.2) | ||
| Poorly | 199 (19.9) | 62 (21.1) | ||
| Vascular invasion | Yes | 143 (14.0) | 35 (11.9) | 0.489 |
| Lymphatic invasion | Yes | 353 (34.6) | 95 (32.0) | 0.560 |
| Perineural invasion | Yes | 59 (5.8) | 13 (4.4) | 0.642 |
| Resection status* | R0 | 980 (97.5) | 289 (98.0) | 0.849 |
| R1 | 19 (1.9) | 5 (1.7) | ||
| R2 | 6 (0.6) | 1 (0.3) | ||
| Neoadjuvant treatment | Yes | 50 (4.9) | 14 (4.7) | 0.888 |
| Adjuvant treatment | Yes | 333 (33.1) | 86 (29.3) | 0.214 |
| Recurrence | Yes | 272 (26.6) | 76 (25.2) | 0.618 |
ECOG, Eastern Cooperative Oncology Group; CEA, carcinoembryonic antigen; WBC, white blood cell; FEV1, forced expiratory volume in the first second; DLCo, diffusing capacity of the lung for carbon monoxide; LN, lymph node.
*R0, number of cancer cells seen microscopically at the resection margin; R1, microscopic positive margin; R2, macroscopic positive margin.
Performance scores of three different models.
| Number of features | Training cohort | External validation cohort | |||
|---|---|---|---|---|---|
| 28 | Optimal feature set | 28 | Optimal feature set | ||
| Cox PH | C-index | 0.7048 ± 0.0067 | 0.7248 ± 0.0030 | 0.6939 ± 0.0017 | 0.6924 ± 0.0009 |
| AUC | 0.7390 ± 0.0071 | 0.7622 ± 0.0041 | 0.7064 ± 0.0016 | 0.7112 ± 0.0010 | |
| s-DeepBTS | C-index | 0.7126 ± 0.0089 | 0.7338 ± 0.0022 | 0.6879 ± 0.0048 | 0.6944 ± 0.0008 |
| AUC | 0.7420 ± 0.0183 | 0.7727 ± 0.0024 | 0.7020 ± 0.0054 | 0.7083 ± 0.0012 | |
| su-DeepBTS | C-index | 0.7306 ± 0.0042 | 0.7419 ± 0.0044 | 0.7077 ± 0.0019 | 0.7013 ± 0.0018 |
| AUC | 0.7677 ± 0.0049 | 0.7780 ± 0.0054 | 0.7224 ± 0.0021 | 0.7123 ± 0.0021 | |
Cox PH, Cox proportional-hazards; AUC, area under the curve; s-DeepBTS, supervised deep neural network for binned time survival analysis; su-DeepBTS, semi-unsupervised deep neural network for binned time survival analysis.
Performance comparison of model–feature selector pairs.
| Pairs (model - feature selector) | Area under the graph | Peak score | Peak feature number |
|---|---|---|---|
| su-DeepBTS–su-DeepBTS erase | 19.896134 | 0.742358 | 14 |
| su-DeepBTS–s-DeepBTS erase | 19.782187 | 0.739613 | 12 |
| s-DeepBTS–s-DeepBTS erase | 19.437697 | 0.726892 | 17 |
| su-DeepBTS–Cox PH erase | 18.982598 | 0.736879 | 14 |
| su-DeepBTS–Cox PH log(p) value | 18.912272 | 0.735058 | 4 |
| s-DeepBTS–su-DeepBTS erase | 18.835688 | 0.73088 | 3 |
| Cox PH–su-DeepBTS erase | 18.683417 | 0.723161 | 5 |
| Cox PH–Cox PH erase | 18.587178 | 0.72231 | 7 |
| Cox PH–s-DeepBTS erase | 18.41109 | 0.717018 | 5 |
| s-DeepBTS–Cox PH log(p) value | 18.375609 | 0.734164 | 4 |
| Cox PH–Cox PH log(p) value | 18.358491 | 0.72157 | 5 |
| s-DeepBTS–Cox PH erase | 17.984331 | 0.719938 | 2 |
| Standard Deviation | 0.591 | 0.008 | — |
Each row presents area under the graph drawn in Fig. 1. with the number of features used as the x-value and C-index as the y-value (“Area under the graph” column), peak C-index score in each graph (“Peak score” column), and the number of features used when the C-index score is maximum (“Peak feature number” column).
Cox PH, Cox proportional-hazards; s-DeepBTS, supervised deep neural network for binned time survival analysis; su-DeepBTS, semi-unsupervised deep neural network for binned time survival analysis.
Figure 1Comparison of model–feature selector pairs. The curves for all combinations of model–feature selector pairs are presented, with the x-axis representing the number of features used and the y-axis indicating the C-index.
Top 15 important features selected by four different feature selectors.
| Cox PH log(p) value (ascending order) | Cox PH erasing feature selection | s-DeepBTS erasing feature selection | su-DeepBTS erasing feature selection | |
|---|---|---|---|---|
| 1 | No. of LN positivity | No. of LN positivity | No. of LN positivity | No. of LN positivity |
| 2 | T stage | T stage | T stage | T stage |
| 3 | ECOG | WBC | Age | R0 resection |
| 4 | Vascular invasion | Sex | R0 resection | Sex |
| 5 | WBC | Lymphocyte fraction | Vascular invasion | Vascular invasion |
| 6 | Adjuvant treatment | DLCO | WBC | DLCO |
| 7 | Age | CEA | Tumour differentiation | Lymphocyte fraction |
| 8 | CEA | Vascular invasion | Lymphatic invasion | WBC |
| 9 | CRP | Haemoglobin | Perineural invasion | ECOG |
| 10 | Tumour size | Tumour differentiation | DLCO | Lymphatic invasion |
| 11 | Lymphocyte fraction | Albumin | Tumour size | Histology |
| 12 | Tumour differentiation | ECOG | ECOG | Neoadjuvant treatment |
| 13 | DLCO | Smoking | LDH | Adjuvant treatment |
| 14 | Histology | Adjuvant treatment | Albumin | Albumin |
| 15 | Perineural invasion | Tumour size | Haemoglobin | Tumour differentiation |
Cox PH, Cox proportional-hazards; s-DeepBTS, supervised deep neural network for binned time survival analysis; su-DeepBTS, semi-unsupervised deep neural network for binned time survival analysis; LN, lymph node; ECOG, Eastern Cooperative Oncology Group; WBC, white blood cell; DLCO, diffusion capacity of carbon monoxide; CEA, carcinoembryonic antigen; LDH, lactic acid dehdrogenase
Sensitivity, specificity, and accuracy for 3-year recurrence prediction using su-DeepBTS model.
| Training cohort | Sensitivity | Specificity | Accuracy |
|---|---|---|---|
| number of features = 14 | 0.72047143 | 0.73982096 | 0.73432601 |
| number of features = 28 | 0.68479412 | 0.76053801 | 0.73782872 |
| number of features = 14 | 0.621875 | 0.74204545 | 0.7100 |
| number of features = 28 | 0.634375 | 0.73522727 | 0.7083333 |
Figure 2Kaplan–Meier curves according to the predicted risk of recurrence for all patients which obtained using su-DeepBTS model trained with (a) optimal 14 features and (b) all 28 features (left side is for the training cohort and right side is for the external validation cohort). (c) Kaplan–Meier curves according to predicted risk of recurrence in stage I/IA/IB patients of external validation cohort which obtained using su-DeepBTS model trained with optimal 14 features.
Figure 3Overview of the proposed binned-time survival analysis models. (a) Simple example to explain the method of calculating survival probabilities for building output values. The total time-bin count of the output is based on the maximum RFS duration among all of the samples. Since 36 months is the longest duration defined, the total number of bins is 37. Each bin was filled with a survival probability value according to the recurrence statuses of the samples. For all of the samples, each time bin was filled with 1 until recurrence or follow-up loss. After relapse or follow-up loss, the time bin was filled with 0 for recurrence patients and with the calculated Kaplan–Meier survival probability for censored patients. Schema of of (b) s-DeepBTS and (c) su-DeepBTS models. RFS, recurrence-free survival; s-DeepBTS, supervised deep neural network for binned time survival analysis; su-DeepBTS, semi-unsupervised deep neural network for binned time survival analysis.