| Literature DB >> 32973220 |
Sebastian Starke1,2,3, Stefan Leger4,5,6, Alex Zwanenburg4,5,6, Karoline Leger4,5,6,7, Fabian Lohaus4,5,6,7, Annett Linge4,5,6,7, Andreas Schreiber8, Goda Kalinauskaite9,10, Inge Tinhofer9,10, Nika Guberina11,12, Maja Guberina11,12, Panagiotis Balermpas13,14, Jens von der Grün13,14, Ute Ganswindt15,16,17,18, Claus Belka15,16,17, Jan C Peeken15,19,20, Stephanie E Combs15,19,20, Simon Boeke21,22, Daniel Zips21,22, Christian Richter4,5,7,23, Esther G C Troost4,5,6,7,23, Mechthild Krause4,5,6,7,23, Michael Baumann4,5,6,7,23,24, Steffen Löck4,5,7.
Abstract
For treatment individualisation of patients with locally advanced head and neck squamous cell carcinoma (HNSCC) treated with primary radiochemotherapy, we explored the capabilities of different deep learning approaches for predicting loco-regional tumour control (LRC) from treatment-planning computed tomography images. Based on multicentre cohorts for exploration (206 patients) and independent validation (85 patients), multiple deep learning strategies including training of 3D- and 2D-convolutional neural networks (CNN) from scratch, transfer learning and extraction of deep autoencoder features were assessed and compared to a clinical model. Analyses were based on Cox proportional hazards regression and model performances were assessed by the concordance index (C-index) and the model's ability to stratify patients based on predicted hazards of LRC. Among all models, an ensemble of 3D-CNNs achieved the best performance (C-index 0.31) with a significant association to LRC on the independent validation cohort. It performed better than the clinical model including the tumour volume (C-index 0.39). Significant differences in LRC were observed between patient groups at low or high risk of tumour recurrence as predicted by the model ([Formula: see text]). This 3D-CNN ensemble will be further evaluated in a currently ongoing prospective validation study once follow-up is complete.Entities:
Mesh:
Year: 2020 PMID: 32973220 PMCID: PMC7518264 DOI: 10.1038/s41598-020-70542-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Patient characteristics of the exploratory and independent validation cohort: p values were obtained by using two-sided Mann–Whitney U-tests for continuous variables and homogeneity tests for categorical variables.
| Variable | Exploratory cohort (n = 206) | Independent validation cohort (n = 85) | |||
|---|---|---|---|---|---|
| Median | (Range) | Median | (Range) | ||
| Follow up time of patients alive (months) | 52.62 | (4.27–131.91) | 42.55 | (7.85–107.27) | 0.72 |
| Age (years) | 59.00 | (39.20–84.50) | 55.00 | (37.00–76.00) | 0.023 |
| Primary tumour volume ( | 29.13 | (4.52–321.74) | 40.62 | (2.70–239.07) | 0.039 |
Figure 1Design of the analysis. (i) To provide baseline results, a clinical Cox proportional hazards model (CPHM) was trained on the exploratory cohort and evaluated on the independent validation cohort. (ii)–(iv) Three deep learning approaches were evaluated by training convolutional neural networks in a cross-validation approach. Subsequently, for each approach ensembles were constructed from the models obtained during cross-validation and their performance was evaluated on the independent validation cohort.
Figure 2Architecture used when training a 2D-convolutional neural network from scratch. Numbers give shapes of computed feature maps. The network consists of convolutional filters (‘conv’, light orange), with ReLU activation functions (orange). These are followed by a flattening layer and fully-connected dense layers (‘fc’, green). Network output is computed through a activation (purple). (a) This architecture was used when training only on image data. The model output is given by . (b) An additional dense layer was introduced when clinical features were used in addition to image data. The network output in this case is given by .
Figure 3Architecture of the applied autoencoder. Numbers describe the shapes of computed feature maps. Convolutional layers ('conv’) are comprised of convolutional filters (light orange) and Leaky ReLU () activation functions (orange). Spatial downsampling is performed using max-pooling layers (red), resulting in a set of bottleneck features. Upsampling operations ('up’, blue), and convolutional layers are then used to reconstruct the input image. A sigmoid activation (purple) is used as model output to match the range of the input data.
Ensemble training from scratch: C-indices for the endpoint loco-regional control (LRC) are computed by averaging the model predictions of the repeated cross-validation models to build an ensemble model.
| Final activation | Batch normalisation | C-index | Log-rank | ||
|---|---|---|---|---|---|
| Exploratory cohort | Independent validation cohort | ||||
| Training | Internal test | ||||
| 3D-CNN | |||||
| Yes | 0.02 | 0.39 | |||
| (0.01–0.03) | (0.33–0.46) | (0.22–0.39) | |||
| 2D-CNN | |||||
| linear | No | 0.02 | 0.43 | 0.39 | 0.039 |
| (0.01–0.02) | (0.36–0.49) | (0.29–0.49) | |||
| linear | Yes | 0.01 | 0.43 | 0.38 | 0.015 |
| (0.00–0.02) | (0.36–0.49) | (0.27–0.48) | |||
| No | 0.07 | 0.42 | 0.38 | 0.051 | |
| (0.05–0.09) | (0.36–0.48) | (0.28–0.48) | |||
| Yes | 0.01 | 0.42 | 0.38 | 0.015 | |
| (0.01–0.02) | (0.36–0.48) | (0.27–0.48) | |||
| 2D-CNN + volume | |||||
| Yes | 0.06 | 0.47 | 0.40 | 0.070 | |
| (0.04–0.08) | (0.41–0.53) | (0.29–0.50) | |||
Values in parenthesis denote 95% confidence intervals. In addition, differences in LRC between Kaplan–Meier curves of the stratified patient groups are assessed by the log-rank test. Best performance is marked in bold.
C-index concordance index, , hyperbolic tangent.
Figure 4Ensemble Kaplan–Meier curves: Kaplan–Meier curves for patient groups at low risk (blue) and high risk (orange) of loco-regional recurrence for training and internal test folds as well as for the independent validation cohort. The stratification was created using the median of the training ensemble predictions as cutoff. The top row shows the curves obtained from an ensemble of 3D-CNN models trained from scratch based on the architecture of Hosny et al.[23] with as final activation. The centre row shows the curves obtained from an ensemble of 2D-CNN models trained from scratch without batch normalisation and as final activation. The bottom row shows the curves obtained from an ensemble of transfer learning models based on DenseNet201 with the last convolutional layer as foundation.
Ensemble of transfer learning models: C-indices for the endpoint loco-regional control (LRC) are computed by averaging the model predictions of the repeated cross-validation models to build an ensemble model.
| Architecture | Layer name | C-index | Log-rank | ||
|---|---|---|---|---|---|
| Exploratory cohort | Independent validation cohort | ||||
| Training | Internal test | ||||
| ResNet50 | last | 0.06 | 0.37 | 0.39 | 0.17 |
| (0.04–0.07) | (0.31–0.42) | (0.30–0.48) | |||
| ResNet50 | activation_37 | 0.14 | 0.39 | 0.41 | 0.15 |
| (0.11–0.17) | (0.33–0.44) | (0.31–0.51) | |||
| DenseNet201 | last | 0.05 | 0.39 | 0.041 | |
| (0.04–0.06) | (0.33–0.45) | (0.27–0.47) | |||
| DenseNet201 | conv4_block48 | 0.12 | 0.43 | 0.43 | 0.032 |
| (0.10–0.15) | (0.37–0.50) | (0.33–0.53) | |||
| IRNV2 | last | 0.08 | 0.38 | 0.41 | 0.25 |
| (0.06–0.10) | (0.32–0.44) | (0.31–0.52) | |||
| IRNV2 | block17_10_ac | 0.26 | 0.41 | 0.42 | |
| (0.22–0.31) | (0.36–0.47) | (0.32–0.53) | |||
Values in parenthesis denote 95% confidence intervals. In addition, differences in LRC between Kaplan–Meier curves of the stratified patient groups are assessed by the log-rank test. Best performance is marked in bold.
C-index concordance index, IRNV2 inceptionResNetV2.
Ensemble of autoencoder models: C-indices for the endpoint loco-regional control (LRC) are computed by averaging the model predictions of the repeated cross-validation models to build an ensemble model.
| Feature selection + ML algorithm | C-index | Log-rank | ||
|---|---|---|---|---|
| Exploratory cohort | Independent validation cohort | |||
| Training | Internal test | |||
| - + LCPHM | 0.01 | 0.50 | ||
| (0.00–0.01) | (0.43–0.57) | (0.32–0.53) | ||
| PCA(1) + CPHM | 0.49 | 0.53 | 0.54 | 0.63 |
| (0.42–0.56) | (0.47–0.60) | (0.42–0.66) | ||
| PCA(2) + CPHM | 0.47 | 0.51 | 0.53 | |
| (0.40–0.54) | (0.44–0.58) | (0.42–0.64) | ||
| PCA(5) + CPHM | 0.44 | 0.50 | 0.50 | 0.72 |
| (0.37–0.50) | (0.44–0.57) | (0.41–0.60) | ||
| PCA(10) + CPHM | 0.35 | 0.42 | 0.43 | 0.40 |
| (0.29–0.40) | (0.36–0.48) | (0.33–0.53) | ||
Values in parenthesis denote 95% confidence intervals. In addition, differences in LRC between Kaplan–Meier curves of the stratified patient groups are assessed by the log-rank test. Best performance is marked in bold.
C-index concordance index, ML machine learning, CPHM Cox proportional hazards model, LCPHM Lasso-Cox proportional hazards model, PCA principal component analysis.