| Literature DB >> 33833572 |
Ting Lin1, Jinhai Mai2,3, Meng Yan1, Zhenhui Li4, Xianyue Quan1, Xin Chen3,5.
Abstract
PURPOSE: To develop and further validate a deep learning signature-based nomogram from computed tomography (CT) images for prediction of the overall survival (OS) in resected non-small cell lung cancer (NSCLC) patients. PATIENTS AND METHODS: A total of 1792 deep learning features were extracted from non-enhanced and venous-phase CT images for each NSCLC patient in training cohort (n=231). Then, a deep learning signature was built with the least absolute shrinkage and selection operator (LASSO) Cox regression model for OS estimation. At last, a nomogram was constructed with the signature and other independent clinical risk factors. The performance of nomogram was assessed by discrimination, calibration and clinical usefulness. In addition, in order to quantify the improvement in performance added by deep learning signature, the net reclassification improvement (NRI) was calculated. The results were validated in external validation cohort (n=77).Entities:
Keywords: deep learning; nomogram; non-small cell lung cancer; prognosis
Year: 2021 PMID: 33833572 PMCID: PMC8019610 DOI: 10.2147/CMAR.S299020
Source DB: PubMed Journal: Cancer Manag Res ISSN: 1179-1322 Impact factor: 3.989
Figure 1Deep learning workflow for feature extraction. Image segmentation was performed by experienced radiologist on the CT images. Sub-images contain whole tumor were cropping from the segmented images, and then combined into the RGB image. The deep learning features were extracted from the RGB images.
Clinical Characteristics of Patients with Resectable NSCLC in Training and Validation Cohort
| Characteristics | Training Cohort (n=231) | Validation Cohort (n=77) | P |
|---|---|---|---|
| Age (years) | 0.028* | ||
| Median | 63 (26–85) | 60 (28–77) | |
| Gender | 0.785** | ||
| Male | 144 (62.3%) | 50 (64.9%) | |
| Female | 87 (37.7%) | 27 (35.1%) | |
| Smoking status | 0.833** | ||
| No | 158 (68.4%) | 51 (66.2%) | |
| Yes | 73 (31.6%) | 26 (33.8%) | |
| TNM stage | 0.004** | ||
| I | 158 (68.4%) | 37 (48.1%) | |
| II | 21 (9.1%) | 14 (18.2%) | |
| III | 52 (22.5%) | 26 (33.7%) | |
| Lymphatic vessel invasion | 0.373** | ||
| No | 199 (86.1%) | 70 (90.9%) | |
| Yes | 32 (13.9%) | 7 (9.1%) | |
| Differentiation grade | 0.0001** | ||
| Well | 14 (6.1%) | 11 (14.3%) | |
| Moderate | 168 (72.7%) | 36 (46.8%) | |
| Poor | 49 (21.2%) | 30 (38.9%) | |
| Follow-up time (days) | 0.007* | ||
| Median | 1314 (781.5–2029.5) | 1940 (1296–2268) | |
| Maximum | 3530 | 3611 | |
| Pathological type | 4.97e-10** | ||
| SC | 47 (20.4%) | 6 (7.8%) | |
| ADC | 183 (79.2%) | 57 (74.0%) | |
| Other | 1 (0.4%) | 14 (18.2%) | |
| Location of tumor | 0.1091** | ||
| Central | 17 (7.4%) | 11 (14.3%) | |
| Peripheral | 214 (92.6%) | 66 (85.7%) |
Notes: Unless otherwise specified, data are expressed as median for continuous variables, or number (%) for categorical variables. *p value was calculated with the Mann–Whitney test, **p value was calculated with the Pearson χ2 test.
Abbreviations: NSCLC, non-small cell lung cancer; TNM, tumor, node, metastasis; SC, squamous carcinoma; ADC, adenomatous carcinoma.
Figure 2Deep learning feature selection using the LASSO Cox regression model. Horizontal line represents a feature selection result for a feature group. The left column represents the distribution of the coefficients for each feature, a coefficient profile plot was produced against the log (λ) sequence and the right column is to use the 10-fold cross-validation to adjust the parameters in the LASSO model to get the minimum standard. The C-index was plotted versus log (λ). Dotted vertical lines were drawn at the optimal values by using the minimum criteria and the 1 standard error of the minimum criteria (the 1-SE criteria). We obtained 9 most significant features with non-zero coefficients as the predictive features.
Figure 3Kaplan–Meier survival analyses according to the deep learning signature for patients in training and external validation cohorts. A significant association of the signature with the OS was shown in the training cohort, which was then confirmed in the external validation cohort.
OS and Death Rate in High-Risk and Low-Risk Groups
| Parameter | Training Cohort | Validation Cohort | ||||
|---|---|---|---|---|---|---|
| High-Risk Group | Low-Risk Group | Total | High-Risk Group | Low-Risk Group | Total | |
| No. of patients | 115 | 116 | 231 | 39 | 38 | 77 |
| 3-year OS (days) | ||||||
| Median† | 1177 (620–1897) | 1606 (1044–2082) | 1314 (781.5–2029.5) | 1705 (726.5–2126) | 2006 (1572–2280) | 1940 (1296–2268) |
| Mean | 1317 | 1598 | 1458.5 | 1556.7 | 1911 | 1732 |
| No. of Death | ||||||
| At 1-year | 9 (7.83) | 3 (2.59) | 12 (5.19) | 6 (15.38) | 1 (2.63) | 7 (9.09) |
| At 2-year | 20 (17.39) | 4 (3.45) | 24 (10.39) | 10 (25.64) | 3 (7.89) | 13 (16.88) |
| At 3-year | 27 (23.48) | 6 (5.17) | 33 (14.29) | 14 (35.90) | 3 (7.89) | 17 (22.08) |
Note: †Data are interquartile ranges.
Abbreviation: OS, overall survival.
Figure 4Receiver operating characteristic (ROC) curve of deep learning signature for 3-year OS estimation. The blue line represents the ROC curve of the training set. The red line presents the ROC curve of the external validation set.
Figure 5The constructed deep learning signature-based nomogram. The nomogram was developed in the training set, with the deep learning signature, TNM stage, lymphatic vessel invasion and differentiation grade.
Figure 6The performance of the nomogram. (A and B) Receiver operating characteristic (ROC) curves for the nomogram (A) and clinical model (B) show the predictive accuracy of each model in terms of the area under the curve (AUC) at predicting 3-year OS in the training and the validation cohorts. (C and D) Calibration curves for the nomogram show the agreement between the estimated and observed 1-, 2-, 3-year outcomes in the training cohort (C) and the validation cohort (D). (E) Decision curve analysis for nomogram. The nomogram had a good net benefit compared with clinical model, deep learning signature and simple strategies such as follow-up of all patients (grey line) or no patients (horizontal black line) across the majority range of threshold probabilities.