| Literature DB >> 33840174 |
Kexin Qiu1, JoongHo Lee1, HanByeol Kim1, Seokhyun Yoon1,2, Keunsoo Kang3.
Abstract
Although many models have been proposed to accurately predict the response of drugs in cell lines recent years, understanding the genome related to drug response is also the key for completing oncology precision medicine. In this paper, based on the cancer cell line gene expression and the drug response data, we established a reliable and accurate drug response prediction model and found predictor genes for some drugs of interest. To this end, we first performed pre-selection of genes based on the Pearson correlation coefficient and then used ElasticNet regression model for drug response prediction and fine gene selection. To find more reliable set of predictor genes, we performed regression twice for each drug, one with IC50 and the other with area under the curve (AUC) (or activity area). For the 12 drugs we tested, the predictive performance in terms of Pearson correlation coefficient exceeded 0.6 and the highest one was 17-AAG for which Pearson correlation coefficient was 0.811 for IC50 and 0.81 for AUC. We identify common predictor genes for IC50 and AUC, with which the performance was similar to those with genes separately found for IC50 and AUC, but with much smaller number of predictor genes. By using only common predictor genes, the highest performance was AZD6244 (0.8016 for IC50, 0.7945 for AUC) with 321 predictor genes.Entities:
Keywords: cell line gene expression data; drug response prediction; machine learning; predictor genes
Year: 2021 PMID: 33840174 PMCID: PMC8042299 DOI: 10.5808/gi.20076
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
Fig. 1.Experimental workflow. GDSC, Genomics of Drug Sensitivity in Cancer; AUC, area under the curve.
Fig. 2.A comparison of three regression methods in terms of Pearson’s correlation coefficients (PCC) between the predicted IC50 and the measured ones. GDSC, Genomics of Drug Sensitivity in Cancer.
Comparisons of the PCC between the estimated response and the true value for the 12 drugs in GDSC
| Drug name | Predict IC50 | Predict AUC | ||
|---|---|---|---|---|
| No. of features | PCC | No. of features | PCC | |
| 17-AAG | 566 | 0.811 | 520 | 0.81 |
| AZD-0530 | 262 | 0.612 | 214 | 0.702 |
| AZD6244 | 570 | 0.823 | 551 | 0.792 |
| Erlotinib | 253 | 0.603 | 222 | 0.60 |
| Lapatinib | 261 | 0.698 | 213 | 0.625 |
| Nilotinib | 475 | 0.782 | 340 | 0.839 |
| Nutlin-3a | 475 | 0.819 | 310 | 0.783 |
| PD-0325901 | 570 | 0.775 | 520 | 0.742 |
| PD-0332991 | 527 | 0.743 | 432 | 0.671 |
| PHA-665752 | 224 | 0.635 | 155 | 0.522 |
| PLX4720 | 499 | 0.715 | 348 | 0.705 |
| Sorafenib | 297 | 0.619 | 248 | 0.647 |
PCC, Pearson’s correlation coefficient; GDSC, Genomics of Drug Sensitivity in Cancer; AUC, area under the curve.
Comparisons of the PCC of the predicted IC50 and AUC with those reported in literature [6]
| Drug name | No. of features | PCC of the predict IC50 | PCC of the predict AUC | Existing prediction results of IC50 [ |
|---|---|---|---|---|
| 17-AAG | 260 | 0.795 | 0.785 | - |
| AZD-0530 | 80 | 0.547 | 0.591 | 0.58 |
| AZD6244 | 321 | 0.8016 | 0.7945 | 0.6 |
| Erlotinib | 43 | 0.505 | 0.562 | 0.590 |
| Lapatinib | 229 | 0.588 | 0.61 | 0.585 |
| Nilotinib | 184 | 0.745 | 0.799 | - |
| Nutlin-3a | 198 | 0.764 | 0.742 | - |
| PD-0325901 | 234 | 0.742 | 0.728 | 0.8 |
| PD-0332991 | 195 | 0.707 | 0.688 | - |
| PHA-665752 | 48 | 0.468 | 0.359 | 0.35 |
| PLX4720 | 171 | 0.643 | 0.654 | 0.57 |
| Sorafenib | 244 | 0.595 | 0.583 | 0.38 |
The results show that the proposed method performs better for most of the drugs we tested than other methods.
PCC, Pearson’s correlation coefficient; AUC, area under the curve.
Fig. 3.Heatmap for the predictor genes of the four selected drugs: for lapatinib (A), for nilotinib (B), for PD-0332991 (C), and for sorafenib (D). The type abbreviation S stands for “sensitive” and R for “resistant.”