| Literature DB >> 30850846 |
Ladislav Rampášek1,2,3, Daniel Hidru1,2,3, Petr Smirnov3,4,5, Benjamin Haibe-Kains1,3,4,5, Anna Goldenberg1,2,3.
Abstract
MOTIVATION: Individualized drug response prediction is a fundamental part of personalized medicine for cancer. Great effort has been made to discover biomarkers or to develop machine learning methods for accurate drug response prediction in cancers. Incorporating prior knowledge of biological systems into these methods is a promising avenue to improve prediction performance. High-throughput cell line assays of drug-induced transcriptomic perturbation effects are a prior knowledge that has not been fully incorporated into a drug response prediction model yet.Entities:
Mesh:
Year: 2019 PMID: 30850846 PMCID: PMC6761940 DOI: 10.1093/bioinformatics/btz158
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.An overview of Dr.VAE prediction process. In training, Dr.VAE learns a drug response classifier jointly with a latent representation of pre-treatment gene expression and its drug-induced change. To make a prediction, we first embed the pre-treatment gene expression , and then, from this latent representation we predict latent representation of post-treatment state . Based on both and , a logistic regression classifier predicts the probability of positive response. Additionally, we can decode the predicted post-treatment latent representation to the gene expression data space, but this is not required for drug response classification
Fig. 2.Dr.VAE model and its derivatives. (a) Factorization of the generative distribution p (solid edges) and of the approximate posterior q (dashed edges). In case the post-treatment gene expression is not observed, we use the expected posterior for instead. (b, c) Hyperparameters of the generative and inference model, respectively. Node labels show dimensionality of the corresponding random variables, while edge labels show architecture of the encoders/decoders between the respective random variables. Note, that the ‘data decoder’ is shared for both and so is the ‘data encoder’ . (d) Detailed depiction of data-to-latent-space encoder and of the reparameterization trick. (e) Factorization of SSVAE model (Kingma ), we set the hyperparameters of generative and inference distributions equivalently to the analogous distributions in Dr.VAE as shown in (b, c, d). (f) Factorization of PertVAE model, we set the hyperparameters of generative and inference distributions equivalently to the analogous distributions in Dr.VAE (b, c, d)
Fig. 3.Summarized classification results. (a) AUROC of Dr.VAE and baseline methods. Shown is average over 26 drugs, each evaluated in 100 train-validation-test splits. (b) Dr.VAE is comparable or better than any other baseline for >80% of the drugs (P-value <0.05 Wilcoxon test)
Fig. 4.All to all comparison of tested methods. For each method, there is a row showing the count of 26 drugs for which this method significantly outperforms the other methods corresponding to individual columns. The comparison is based on test AUROC performance in 100 train-validation-test splits. Statistical significance of observed differences in test performance for any two methods was tested by one-sided Wilcoxon Signed-Rank Test (P-value <0.05). The heatmap color is normalized within each column, emphasizing methods that are the best contenders compared to the method corresponding to that column
The ability of Dr.VAE to model post-treatment gene expression correlates with signal/noise ratio and quantity of perturbation experiments
| Δ RMSE evaluated on | dataset property correlated to |
|
|
|---|---|---|---|
|
| Effect/rep. variance ratio (ERVR) | 0.66 |
|
|
| ERVR | 0.72 |
|
|
| Num. unique CLs in CMap (NCL) | 0.71 |
|
|
| NCL | 0.52 |
|
|
| ERVR * NCL | 0.81 |
|
|
| ERVR * NCL | 0.73 |
|
|
| Dr.VAE-SSVAE [AUROC] | 0.29 | 0.15 |
|
| Dr.VAE-SSVAE [AUPR] | 0.20 | 0.33 |
Note: We computed Δ RMSE improvement of Dr.VAE in post-treatment expression prediction over Dr.VAE w/I, averaged over validation data splits, and correlated it to overall CMap-L1000v1 dataset statistics. The Pearson correlation was computed for prediction Δ improvement of both post-treatment gene expression and its latent representation . Additionally we include correlation with difference in Dr.VAE and SSVAE classification performance.