| Literature DB >> 35264661 |
Shekoufeh Gorgi Zadeh1, Charlotte Behning2, Matthias Schmid2.
Abstract
With the popularity of deep neural networks (DNNs) in recent years, many researchers have proposed DNNs for the analysis of survival data (time-to-event data). These networks learn the distribution of survival times directly from the predictor variables without making strong assumptions on the underlying stochastic process. In survival analysis, it is common to observe several types of events, also called competing events. The occurrences of these competing events are usually not independent of one another and have to be incorporated in the modeling process in addition to censoring. In classical survival analysis, a popular method to incorporate competing events is the subdistribution hazard model, which is usually fitted using weighted Cox regression. In the DNN framework, only few architectures have been proposed to model the distribution of time to a specific event in a competing events situation. These architectures are characterized by a separate subnetwork/pathway per event, leading to large networks with huge amounts of parameters that may become difficult to train. In this work, we propose a novel imputation strategy for data preprocessing that incorporates weights derived from a time-discrete version of the classical subdistribution hazard model. With this, it is no longer necessary to add multiple subnetworks to the DNN to handle competing events. Our experiments on synthetic and real-world datasets show that DNNs with multiple subnetworks per event can simply be replaced by a DNN designed for a single-event analysis without loss in accuracy.Entities:
Mesh:
Year: 2022 PMID: 35264661 PMCID: PMC8907249 DOI: 10.1038/s41598-022-07828-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Illustration of the imputation strategy. The left panel presents the subdistribution times of eight randomly sampled individuals. Individuals 1 and 5 experienced the competing event first, implying that their censoring times are unobserved (as illustrated by the time span for in the right panel). For these individuals, censoring times are estimated by first calculating estimated subdistribution hazard weights (see upper right diagram). From that, the weight differences are calculated and used to sample censoring times , which are in turn used to impute the unobserved values of . Note that the bars in the lower right panel correspond to the heights of the steps in the upper right panel.
Figure 2Visualization of the DeepHit and DeepHit architectures used in the experiments.
Characteristics of the datasets used in the experiments.
| Censoring rate | Type-1 rate | Type-2 rate | Training | Validation | Test |
|---|---|---|---|---|---|
| 15,000 | 5000 | 10,000 | |||
| 15,000 | 5000 | 10,000 | |||
| 15,000 | 5000 | 10,000 | |||
| 9729 | 3256 | 6851 | |||
| 9729 | 3256 | 6851 | |||
| 60,898 | 24,361 | 36,539 | |||
The three leftmost columns represent the censoring, type-1 (), and type-2 () rates in the training/validation/test datasets. The three rightmost columns represent the respective numbers of instances in the simulated, CRASH-2, and SEER breast cancer data. For CRASH-2, indicates either death due to bleeding event (upper row) and death due to any recorded cause (lower row).
Figure 3Calibration plots obtained from the test data in Table 1, using the DeepHit architecture. Each plot presents the averaged type-1 cumulative incidence functions as obtained from (i) training the DeepHit with the preprocessed data (cyan), (ii) training DeepHit treating individuals with observed competing events as censored (orange), and (iii) training DeepHit for both the event of interest and the competing event (gray). Red curves refer to the nonparametric Aalen-Johansen reference curves.
Figure 4Calibration plots obtained from the simulated test data in Table 1 using the DRSA architecture. Each plot presents the averaged type-1 cumulative incidence functions as obtained from (i) training the DRSA network with the preprocessed training data (cyan) and (ii) training DRSA treating individuals with observed competing events as censored (orange). Red curves refer to the nonparametric Aalen-Johansen reference curves.
Mean estimated C-indices (averaged over time) with estimated standard deviations, as obtained from training the DeepHit architecture on the simulated, CRASH-2, and SEER breast cancer data.
| Data | Type-1-rate | Type-2-rate | DeepHit | DeepHit | DeepHit |
|---|---|---|---|---|---|
| CRASH-2 | 78.17 ± 1.04 | 76.80 ± 4.96 | |||
| CRASH-2 | 79.88 ± 2.01 | 80.05 ± 4.23 | |||
| SEER | 81.75 ± 3.46 | 81.73 ± 3.34 | |||
| Simulated | 62.58 ± 2.17 | 63.71 ± 0.96 | |||
| Simulated | 64.59 ± 2.25 | 65.20 ± 3.26 | |||
| Simulated | 64.97 ± 2.51 | 64.39 ± 6.26 |
DeepHit = DeepHit architecture with one sub-network trained with the preprocessed input data; DeepHit = DeepHit architecture with two subnetworks; DeepHit, no imp. = DeepHit architecture with one sub-network trained on the original input data (treating individuals with observed competing events as censored individuals). Best-performing methods are marked bold. Note that the C-indices must be compared within each row, as the datasets used for training were different in terms of size, censoring, and event rates across the rows. For CRASH-2, in the upper and the lower rows indicates death due to bleeding and death due to any recorded cause, respectively. The numbers in this table are obtained from the test datasets.
Mean estimated C-indices (averaged over time) with estimated standard deviations, as obtained from training the DRSA architecture on the simulated data.
| Data | Type-1-rate | Type-2-rate | DRSA, subdist.-based imp. | DRSA, no imp. |
|---|---|---|---|---|
| Simulated | 55.62 ± 0.86 | |||
| Simulated | 57.60 ± 0.93 | |||
| Simulated | 63.41 ± 1.00 |
The first column on the right-hand side contains results from DRSA architecture trained with the preprocessed input data; The second column shows the results from the DRSA architecture, trained on the original input data (treating individuals with observed competing events as censored individuals). Best-performing methods are marked bold. Note that the C-indices must be compared within each row, as the datasets used for training are different in terms of censoring and event rates across the rows. The numbers in this table are obtained from the test datasets.
Average time (in seconds) and number of iterations needed for training DeepHit and DeepHit per dataset.
| Simulated | SEER | CRASH-2 | |
|---|---|---|---|
| Time | #itr | Time | #itr | Time | #itr | |
| DeepHit | |||
| DeepHit |
Performance on validation data was used as the stopping criterion.