| Literature DB >> 35428192 |
Tianxing Ma1, Qiao Liu2, Haochen Li3, Mu Zhou4, Rui Jiang1, Xuegong Zhang5,6.
Abstract
BACKGROUND: Drug resistance is a critical obstacle in cancer therapy. Discovering cancer drug response is important to improve anti-cancer drug treatment and guide anti-cancer drug design. Abundant genomic and drug response resources of cancer cell lines provide unprecedented opportunities for such study. However, cancer cell lines cannot fully reflect heterogeneous tumor microenvironments. Transferring knowledge studied from in vitro cell lines to single-cell and clinical data will be a promising direction to better understand drug resistance. Most current studies include single nucleotide variants (SNV) as features and focus on improving predictive ability of cancer drug response on cell lines. However, obtaining accurate SNVs from clinical tumor samples and single-cell data is not reliable. This makes it difficult to generalize such SNV-based models to clinical tumor data or single-cell level studies in the future.Entities:
Keywords: Cancer drug response; Graph convolutional networks; Protein–protein interactions; Tumor heterogeneity
Mesh:
Substances:
Year: 2022 PMID: 35428192 PMCID: PMC9011932 DOI: 10.1186/s12859-022-04664-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Overview of DualGCN. DualGCN takes chemical structure information of a drug and gene features of a cancer sample as inputs to the (1) drug-GCN module and (2) bio-GCN module, respectively. It outputs the response (IC50) of the given drug on the given cancer sample. (1) In the drug-GCN module, drug chemical structure data are first transformed using the previous algorithm [29]. The transformed features are considered as features of nodes (atoms). Edges between nodes represent connections between atoms of drugs. (2) The bio-GCN module is built based on PPI networks where nodes indicate cancer-related proteins (genes) and edges represent interactions between proteins. This module takes gene expression and copy number variation of cancer-related genes as inputs. Such gene features are considered as features of corresponding nodes. Embeddings from the two GCN modules are then concatenated and fed into MLP to study cancer drug response
Performance comparison
| Method | Pearson’s correlation | Spearman’s correlation | RMSE |
|---|---|---|---|
| SVM | 0.336 ± 0.078 | 0.230 ± 0.071 | 3.115 ± 0.053 |
| Random Forest | 0.864 ± 0.001 | 0.839 ± 0.003 | 1.441 ± 0.008 |
| Lasso | 0.893 ± 0.002 | 0.873 ± 0.002 | 1.284 ± 0.007 |
| Ridge | 0.895 ± 0.002 | 0.875 ± 0.002 | 1.268 ± 0.007 |
| DeepCDR (-) | 0.900 ± 0.004 | 0.877 ± 0.004 | 1.265 ± 0.020 |
| CDRscan | 0.911 ± 0.002 | 0.894 ± 0.002 | 1.173 ± 0.011 |
| DualGCN | 0.925 ± 0.001 | 0.907 ± 0.002 | 1.079 ± 0.007 |
| DeepCDR | 0.928 ± 0.001 | 0.910 ± 0.001 | 1.066 ± 0.004 |
Fig. 2Performance of DualGCN across cancers and drugs. a Pearson’s correlation on each type of cancer. We calculated the average Pearson’s correlation coefficients of samples belonging to each type of cancer and sorted the coefficients from large to small (from left to right in the figure). Blue dots indicate the mean of Pearson’s correlation across CVs and are denoted by . Vertical blue bars represent variances of Pearson’s correlation across CVs. denotes average sample size across CVs. The largest and smallest Pearson’s correlation coefficients were obtained on lung squamous cell carcinoma (LUSC) and neuroblastoma (NB), respectively. b Scatterplot of correlations between true and predicted IC50 on LUSC. c Scatterplot of correlations between true and predicted IC50 on NB. d Pearson’s correlation on each drug. We calculated the average Pearson’s correlation coefficients of samples belonging to each drug and sorted the coefficients from large to small. The left ten in the figure are drugs with the best predictive performance, and the right ten are drugs with the worst predictive performance. Blue dots indicate the mean of Pearson’s correlation across CVs and are denoted by . Vertical blue bars represent variances of Pearson’s correlation across CVs. denotes average sample size across CVs. The largest and smallest Pearson’s correlation coefficients were obtained on CAY10603 and cetuximab, respectively. e Scatterplot of correlations between true and predicted IC50 on CAY10603. f Scatterplot of correlations between true and predicted IC50 on cetuximab
Ablation study on gene features
| Pearson’s correlation | Spearman’s correlation | RMSE | |
|---|---|---|---|
| Expr. | 0.908 ± 0.005 | 0.887 ± 0.008 | 1.191 ± 0.031 |
| CNV | 0.911 ± 0.007 | 0.892 ± 0.007 | 1.172 ± 0.046 |
| Expr. + CNV | 0.925 ± 0.001 | 0.907 ± 0.002 | 1.079 ± 0.007 |