| Literature DB >> 33077880 |
Cheng Qian1, Amin Emad2, Nicholas D Sidiropoulos3.
Abstract
The biological processes involved in a drug's mechanisms of action are oftentimes dynamic, complex and difficult to discern. Time-course gene expression data is a rich source of information that can be used to unravel these complex processes, identify biomarkers of drug sensitivity and predict the response to a drug. However, the majority of previous work has not fully utilized this temporal dimension. In these studies, the gene expression data is either considered at one time-point (before the administration of the drug) or two time-points (before and after the administration of the drug). This is clearly inadequate in modeling dynamic gene-drug interactions, especially for applications such as long-term drug therapy. In this work, we present a novel REcursive Prediction (REP) framework for drug response prediction by taking advantage of time-course gene expression data. Our goal is to predict drug response values at every stage of a long-term treatment, given the expression levels of genes collected in the previous time-points. To this end, REP employs a built-in recursive structure that exploits the intrinsic time-course nature of the data and integrates past values of drug responses for subsequent predictions. It also incorporates tensor completion that can not only alleviate the impact of noise and missing data, but also predict unseen gene expression levels (GEXs). These advantages enable REP to estimate drug response at any stage of a given treatment from some GEXs measured in the beginning of the treatment. Extensive experiments on two datasets corresponding to multiple sclerosis patients treated with interferon are included to showcase the effectiveness of REP.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33077880 PMCID: PMC7573611 DOI: 10.1038/s41598-020-74725-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Sketch view of the proposed method. In (a), Step (1) shows the raw data with missing values marked as ‘black’; Step (2) shows the low-rank tensor factorization; Step (3) is the missing completion, where . In (b), it shows the composition of training data: features and labels. In (c), it shows the prediction for new patient at a specific time t.
Figure 2Choosing F and for the tensor completion algorithm.
ACC & AUC versus percentage of missing values, where all metrics are obtained via LOO CV on the 27 patients with seven time points in Dataset1.
| Metrics | Miss | REP-SVM | dl-HMM | EN-LR | SVM | KNN |
|---|---|---|---|---|---|---|
| ACC | 0.23 | 0.925 | 0.851 | 0.882 | 0.898 | 0.719 |
| 5 | 0.912 | 0.818 | 0.865 | 0.857 | 0.701 | |
| 10 | 0.886 | 0.809 | 0.878 | 0.855 | 0.685 | |
| 15 | 0.876 | 0.818 | 0.844 | 0.837 | 0.721 | |
| 20 | 0.872 | 0.781 | 0.830 | 0.825 | 0.718 | |
| AUC | 0.23 | 0.971 | 0.875 | 0.927 | 0.968 | 0.413 |
| 5 | 0.963 | 0.769 | 0.923 | 0.940 | 0.472 | |
| 10 | 0.951 | 0.744 | 0.926 | 0.934 | 0.461 | |
| 15 | 0.942 | 0.723 | 0.875 | 0.907 | 0.454 | |
| 20 | 0.941 | 0.710 | 0.878 | 0.886 | 0.497 |
ACC and AUC comparison using all patients’ data in Dataset1, where the training set contains 27 patients with seven time points and testing set consists of the remaining 26 patients with less than seven time points.
| Metrics | REP-SVM | dl-HMM | EN-LR | SVM | KNN |
|---|---|---|---|---|---|
| ACC | 0.790 | 0.651 | 0.751 | 0.775 | 0.549 |
| AUC | 0.884 | 0.687 | 0.805 | 0.853 | 0.547 |
Percentage of missing values in the training set is 0.23% and that in the testing set is 18.1%.
ACC & AUC versus percentage of different imputation methods using Dataset1, where the training set contains 27 patients with seven time points and testing set consists of the remaining 26 patients with less than seven time points.
| Metrics | Imputation method | REP-SVM | EN-LR | SVM |
|---|---|---|---|---|
| ACC | Tensor completion | 0.790 | 0.751 | 0.775 |
| Median | 0.724 | 0.713 | 0.721 | |
| Mean | 0.736 | 0.706 | 0.722 | |
| KNN | 0.738 | 0.723 | 0.721 | |
| AUC | Tensor completion | 0.884 | 0.805 | 0.853 |
| Median | 0.838 | 0.743 | 0.812 | |
| Mean | 0.835 | 0.739 | 0.812 | |
| KNN | 0.837 | 0.748 | 0.798 |
Percentage of missing values in the training set 0.23% and that in the testing set is 18.1%.
Figure 3Top 20 genes in Dataset1 selected by REP-SVM according to their weights in in (12) learned from the training set corresponding to Dataset1.
Regression performance comparison using Dataset1 and Dataset2.
| Method | MAE | MSE | Method | MAE | MSE | |
|---|---|---|---|---|---|---|
| Dataset1 | REP-ElasticNet | 0.919 | 1.504 | ElasticNet | 1.208 | 2.683 |
| REP-KNN | 0.839 | 1.305 | KNN | 1.345 | 3.236 | |
| REP-RandomForest | 0.809 | 1.098 | RandomForest | 1.294 | 2.946 | |
| REP-SVR | 0.727 | 0.891 | SVR | 1.282 | 3.110 | |
| Dataset2 | REP-ElasticNet | 0.934 | 1.237 | ElasticNet | 1.007 | 1.447 |
| REP-KNN | 0.658 | 0.780 | KNN | 1.129 | 1.798 | |
| REP-RandomForest | 0.930 | 1.265 | RandomForest | 1.095 | 1.753 | |
| REP-SVR | 0.660 | 0.792 | SVR | 1.146 | 1.897 |
Regression performance comparison using estimated and actual GEX values on Dataset1 and Dataset2.
| Method | Using estimated GEX | Using actual GEX | |||
|---|---|---|---|---|---|
| MAE | MSE | MAE | MSE | ||
| Dataset1 | REP-ElasticNet | 1.131 | 2.423 | 0.919 | 1.504 |
| REP-KNN | 1.087 | 2.281 | 0.839 | 1.305 | |
| REP-RandomForest | 0.893 | 1.513 | 0.809 | 1.098 | |
| REP-SVR | 1.105 | 2.049 | 0.727 | 0.891 | |
| Dataset2 | REP-ElasticNet | 0.918 | 1.335 | 0.934 | 1.237 |
| REP-KNN | 0.852 | 1.176 | 0.658 | 0.780 | |
| REP-RandomForest | 0.813 | 1.148 | 0.930 | 1.265 | |
| REP-SVR | 0.84 | 1.144 | 0.660 | 0.792 | |
Columns corresponding to “estimated GEX” represent analysis in which in the test set, only the GEX values at the initial time point were used and the GEX of later time points were estimated. Columns corresponding to “actual GEX” represent analysis in which in the test set, GEX values of all time points were used.