| Literature DB >> 31840093 |
W A C van Amsterdam1, J J C Verhoeff2, P A de Jong1, T Leiner1, M J C Eijkemans3.
Abstract
Deep learning has shown remarkable results for image analysis and is expected to aid individual treatment decisions in health care. Treatment recommendations are predictions with an inherently causal interpretation. To use deep learning for these applications in the setting of observational data, deep learning methods must be made compatible with the required causal assumptions. We present a scenario with real-world medical images (CT-scans of lung cancer) and simulated outcome data. Through the data simulation scheme, the images contain two distinct factors of variation that are associated with survival, but represent a collider (tumor size) and a prognostic factor (tumor heterogeneity), respectively. When a deep network would use all the information available in the image to predict survival, it would condition on the collider and thereby introduce bias in the estimation of the treatment effect. We show that when this collider can be quantified, unbiased individual prognosis predictions are attainable with deep learning. This is achieved by (1) setting a dual task for the network to predict both the outcome and the collider and (2) enforcing a form of linear independence of the activation distributions of the last layer. Our method provides an example of combining deep learning and structural causal models to achieve unbiased individual prognosis predictions. Extensions of machine learning methods for applications to causal questions are required to attain the long-standing goal of personalized medicine supported by artificial intelligence.Entities:
Keywords: Computed tomography; Computer science; Epidemiology; Prognosis
Year: 2019 PMID: 31840093 PMCID: PMC6904461 DOI: 10.1038/s41746-019-0194-x
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1Directed Acyclic Graph describing the data-generating mechanism for the simulations. Signs indicate positive or negative associations. Rectangle shaped variables are image variables, dashed variables are unobserved. Tumor aggressiveness and patient fitness cannot be directly measured. represent biological processes, causing the outcome and image patterns. We cannot directly observe these biological processes, but are noisy views of these variables that are measurable from the image. is a collider since it is the child of and . Conditioning on will induce an artificial association between and , thereby inducing a confounding path between treatment and survival, that only exists when conditioning on the collider.
Parameters for sampling images and modeling outcome data.
| Variable | Variable model | |
|---|---|---|
| Aggressiveness | ||
| Fitness | ||
| Heterogeneity | ||
| Size | ||
| Treatment | ||
| Survival |
For each observation , an image is drawn from the total pool of images with the closest and . This ensures the required association between factors of variation in the image and the simulated outcome data. The parametric equations follow the DAG presented in Fig. 1: are continuous independent noise variables. The collider is the difference between and , with a small amount of Gaussian noise (standard deviation of noise ). and have a standard deviation of to ensure that has a standard deviation of . Treatment is modeled as a Bernoulli variable with a logistic link function, where increased increases the probability of being treated. is subtracted to assure that ~ of patients are treated. Gaussian noise of standard deviation is added to the inverse log-odds of being treated to assure that every patient has some probability of being treated with the more intense treatment. This reflects the clinical world better as some patients may have strong preferences regarding their treatment, regardless of their underlying health status. Overall survival () increases with treatment (the true treatment effect is ) and decreases with heterogeneity in radiodensity and tumor aggressiveness. Again, Gaussian noise of standard deviation is added to introduce some uncertainty in the data
Fig. 2Schematic overview of the proposed convolutional neural network architecture. The network receives two inputs: an image and the treatment indicator (). Loss functions are depicted in double octagons. The last layer activations are used to separate factors of variation in the image. is trained to approximate the measurement of the collider . The rest of the last layer activations are constrained to be linearly independent from through . The total loss is . CNN convolutional neural network.
Main results.
| Model | Variables | ||
|---|---|---|---|
| Regression | 2.74 | 1.02 | |
| Regression | 1.39 | 0.65 | |
| Regression* | 1.99 | 1.00 | |
| BiasedNet | 1.83 | 0.66 | |
| CausalNet | 2.23 | 1.02 |
Mean squared error for survival (MSEy) along with estimated average treatment effect (ATE). The linear regression metrics are the expected outcomes according to whether or not the model conditions on the collider . Regression* is the optimal value for our setup: (1) predicting the outcome based on relevant prognostic information from the image while (2) retaining a valid estimate of the treatment effect. All metrics were calculated on the validation set
Sensitivity analysis to measuring the collider on the wrong scale.
| Model | Actual | Measured | ||
|---|---|---|---|---|
| Regression* | Area | Area | 1.99 | 1.00 |
| CausalNet | Area | Area | 2.23 | 1.02 |
| CausalNet | Diameter | Volume | 2.24 | 0.99 |
| CausalNet | Volume | Diameter | 2.21 | 1.02 |
Mean squared error for survival (MSEy) along with estimated average treatment effect (ATE). The regression* results indicate the optimal results attainable for this simulated scenario