Literature DB >> 35428192

DualGCN: a dual graph convolutional network model to predict cancer drug response.

Tianxing Ma¹, Qiao Liu², Haochen Li³, Mu Zhou⁴, Rui Jiang¹, Xuegong Zhang^5,6.

Abstract

BACKGROUND: Drug resistance is a critical obstacle in cancer therapy. Discovering cancer drug response is important to improve anti-cancer drug treatment and guide anti-cancer drug design. Abundant genomic and drug response resources of cancer cell lines provide unprecedented opportunities for such study. However, cancer cell lines cannot fully reflect heterogeneous tumor microenvironments. Transferring knowledge studied from in vitro cell lines to single-cell and clinical data will be a promising direction to better understand drug resistance. Most current studies include single nucleotide variants (SNV) as features and focus on improving predictive ability of cancer drug response on cell lines. However, obtaining accurate SNVs from clinical tumor samples and single-cell data is not reliable. This makes it difficult to generalize such SNV-based models to clinical tumor data or single-cell level studies in the future.
RESULTS: We present a new method, DualGCN, a unified Dual Graph Convolutional Network model to predict cancer drug response. DualGCN encodes both chemical structures of drugs and omics data of biological samples using graph convolutional networks. Then the two embeddings are fed into a multilayer perceptron to predict drug response. DualGCN incorporates prior knowledge on cancer-related genes and protein-protein interactions, and outperforms most state-of-the-art methods while avoiding using large-scale SNV data.
CONCLUSIONS: The proposed method outperforms most state-of-the-art methods in predicting cancer drug response without the use of large-scale SNV data. These favorable results indicate its potential to be extended to clinical and single-cell tumor samples and advancements in precision medicine.

Entities: Chemical

Keywords: Cancer drug response; Graph convolutional networks; Protein–protein interactions; Tumor heterogeneity

Mesh：

Substances：
Antineoplastic Agents

Year: 2022 PMID： 35428192 PMCID： PMC9011932 DOI： 10.1186/s12859-022-04664-4

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.307

Background

Anti-cancer drugs have played important roles in cancer therapy in recent years. However, the occurrence of drug resistance limits the effectiveness of anti-cancer drugs [1]. It is essential to fully explore the cancer drug response (CDR) underlying comprehensive biological systems. Cancer drug response can be studied with cancer cell line models. Drug response on these models is quantitatively described by the half-maximal inhibitory concentration (IC50). The IC50 depicts the amount of drug needed to inhibit cancer cell growth by half. A smaller IC50 indicates that the drug is relatively more powerful. Comprehensive genetic and pharmacologic characterizations of cancer cell line models are collected by projects such as Cancer Cell Line Encyclopedia (CCLE) [2], Catalogue of Somatic Mutations in Cancer (COSMIC) [3], and Genomics of Drug Sensitivity in Cancer (GDSC) [4]. Such data enable researchers to develop predictive machine learning models of anti-cancer drug sensitivity [5-9]. These models consist of two parts that are responsible for encoding drugs and cell lines separately. Drugs are represented through one-hot encoding using simplified molecular-input line-entry system (SMILES) data [7, 8]. Genomic mutations have been reported to have significantly different patterns across cell lines [4]. They are widely used as features of cancer cell lines, and are encoded by models such as multilayer perceptrons (MLP) [7] and convolutional neural networks (CNN) [8, 9]. However, drug resistance could not be fully discovered using these in vitro cancer cell lines. It has been revealed that tumors are highly heterogeneous [10], and tumor microenvironments have essential influences on tumor progression [11-13]. Such heterogeneity and interaction could not be reflected with in vitro cancer cell lines only. Emerging single-cell data and clinical data show the potential to decipher complex tumor microenvironments and to unlock drug response [14-16]. Transferring knowledge studied from in vitro cancer cell lines to single-cell and clinical data is a promising avenue [14]. There are some limitations in current methods to be generalized to single-cell and clinical data. First, most existing methods include SNVs as features to improve the predictive ability on cancer cell lines. However, it has been revealed that calling SNVs reliably from cancer samples cannot always be reached. High-frequency genomic aberrations and aneuploidy are common in cancers, and these variations reduce SNV detection efficiency [17]. Similarly, detecting reliable SNVs covering all hotspots simultaneously from single-cell data is unattainable. Both sequencing coverage and sequencing depth in single-cell data are too low to detect SNVs completely from the data [18, 19]. Second, current methods encode gene features as separate units. However, recent evidence from single-cell studies shows that the tumor microenvironment is a complex system [11]. Tumor cells interact with surrounding cells. Such interactions form a biological network, and the whole ecosystems contribute to drug response simultaneously [20-22]. These inspired us to develop new methods without using SNVs as features and considering cancer samples as systems with interactions between proteins. In this paper, we propose a novel deep learning model called DualGCN. It consists of dual graph convolutional networks (GCN) [23] and takes drug structures and omics data as input to predict cancer drug response. One GCN module learns intrinsic chemical features of drugs. Nodes in this module represent atoms of drugs, and edges indicate connections between the atoms. Meanwhile, the other GCN module incorporates protein–protein interactions (PPI) and extracts the underlying biological features of cancer samples. Nodes in this module represent proteins, and edges indicate protein–protein interactions. In this study, we used gene expression and copy number variation as gene features. These features have been demonstrated to be vital to depict cancer cell types in recent single-cell studies [24-28]. We conducted extensive experiments and demonstrated that our method outperforms most state-of-the-art methods while avoiding the use of SNVs. In addition, we conducted a case study on clinical cancer patients with DualGCN and showed its potential to be extended to clinical and single-cell cancer samples.

Results and discussion

Overview of DualGCN

DualGCN takes chemical structure data of drugs and gene features of cancer samples as inputs and outputs drug response (IC50). The concept of DualGCN is shown in Fig. 1. The top panel of Fig. 1 is a GCN module (named drug-GCN below) used to encode the drug chemical structure. Nodes in this module represent atoms of drugs. Edges between nodes indicate connections between the atoms of drugs. Features of atoms are learned from the previous algorithm [29]. The bottom panel of Fig. 1 is another GCN module (named bio-GCN below) used to encode biological features of cancer samples. It is built on PPI networks and takes features of cancer-related genes as inputs. We used gene expression (Expr.) and copy number variation (CNV) as gene features in this study. These gene features were demonstrated to have important roles in decoding cancer cell types from recent studies [26-28]. Both GCN modules use ReLU as activation functions and adopt batch normalization [30] and dropout [31] strategies to improve model robustness. Two embeddings from the drug-GCN module and the bio-GCN module are then concatenated together to be fed into a multilayer perceptron to study the response of the given drug on the given cancer sample. Detailed settings of the model can be found in Additional file 1: Table S1.

Fig. 1

Overview of DualGCN. DualGCN takes chemical structure information of a drug and gene features of a cancer sample as inputs to the (1) drug-GCN module and (2) bio-GCN module, respectively. It outputs the response (IC50) of the given drug on the given cancer sample. (1) In the drug-GCN module, drug chemical structure data are first transformed using the previous algorithm [29]. The transformed features are considered as features of nodes (atoms). Edges between nodes represent connections between atoms of drugs. (2) The bio-GCN module is built based on PPI networks where nodes indicate cancer-related proteins (genes) and edges represent interactions between proteins. This module takes gene expression and copy number variation of cancer-related genes as inputs. Such gene features are considered as features of corresponding nodes. Embeddings from the two GCN modules are then concatenated and fed into MLP to study cancer drug response

Assessment of methods

We evaluated the performance of DualGCN as well as baselines including support vector machine (SVM), random forest, Lasso regression, ridge regression, CDRscan [7], and DeepCDR [8]. The evaluation was conducted on 86,530 drug-cell line pairs. These data included 208 drugs and 525 cell lines covering 27 kinds of cancers. Data preparation and configurations of baselines are described in the “Methods” section. The evaluation was conducted with five-fold cross-validation (CV). We used evaluation metrics including Pearson’s correlation coefficient, Spearman’s correlation coefficient, and root mean square error (RMSE). DualGCN achieves strong predictive performance without the use of SNVs. It gained Pearson’s correlation = 0.925, Spearman’s correlation = 0.907, and RMSE = 1.079. It significantly outperformed traditional methods, including SVM, random forest, Lasso regression, and ridge regression (Table 1). Detailed configurations and results of these methods can be found in Additional file 1: Table S5, Additional file 1: Table S6, and Additional file 1: Table S7. In addition, we also compared DualGCN with deep learning models. DualGCN had consistent improvements over CDRscan among all evaluation metrics. Improvements in Pearson’s correlation, Spearman’s correlation, and RMSE were 0.014, 0.013, and 0.094, respectively. DeepCDR gained higher predictive performance than DualGCN. The differences in Pearson’s correlation, Spearman’s correlation, and RMSE were 0.003, 0.003, and 0.013, respectively. Such differences needed huge SNV information. DeepCDR contains several sub-networks encoding multi-omics data. We evaluated its performance without SNV by removing the corresponding sub-network and denoted it by DeepCDR (-). Pearson’s correlation, Spearman’s correlation, and RMSE of DeepCDR (-) dropped to 0.900, 0.877, and 1.265, respectively. DualGCN gained a large margin over it without tens of thousands of SNVs. Improvements in Pearson’s correlation, Spearman’s correlation, and RMSE are 0.025, 0.030, and 0.186, respectively. There are two major reasons SNV data should be treated with caution. First, different projects collected SNVs in different patterns and used different references (human reference genome or normal tissues) in SNV calling algorithms. Thus, SNVs might not be aligned across data from different sources. Second, studying drug responses on in vitro cancer cell lines only cannot fully reveal the mechanisms of drug resistance. Transferring knowledge studied from in vitro cancer cell lines to single-cell and clinical data tends to be an important direction [14]. However, it is unreliable to call SNVs from clinical and single-cell tumor data covering all candidate loci [17-19]. In addition, recent evidence shows that whole tumors collectively act on drugs [12]. These studies gradually accumulate protein–protein interactions influencing cancer progression and drug response [13]. DeepCDR encodes different features of the same unit (gene) separately. It is difficult for such encoding systems to further include constantly discovered and important interacting protein pairs. DualGCN encodes genes as basic units. It achieves strong predictive performance without SNV data. Such advances indicate its potential to absorb new biological knowledge and to be generalized to studies on clinical data and at single-cell resolution.

Table 1

Performance comparison

Method	Pearson’s correlation	Spearman’s correlation	RMSE
SVM	0.336 ± 0.078	0.230 ± 0.071	3.115 ± 0.053
Random Forest	0.864 ± 0.001	0.839 ± 0.003	1.441 ± 0.008
Lasso	0.893 ± 0.002	0.873 ± 0.002	1.284 ± 0.007
Ridge	0.895 ± 0.002	0.875 ± 0.002	1.268 ± 0.007
DeepCDR (-)	0.900 ± 0.004	0.877 ± 0.004	1.265 ± 0.020
CDRscan	0.911 ± 0.002	0.894 ± 0.002	1.173 ± 0.011
DualGCN	0.925 ± 0.001	0.907 ± 0.002	1.079 ± 0.007
DeepCDR	0.928 ± 0.001	0.910 ± 0.001	1.066 ± 0.004

Performance comparison DualGCN achieves high performance across different types of cancers consistently. Pearson’s correlation coefficients on different cancers ranged from 0.942 to 0.893 (Fig. 2a). The highest and the lowest coefficients were obtained on lung squamous cell carcinoma (LUSC) and neuroblastoma (NB), respectively. Scatterplots of these two cases are shown in Fig. 2b and Fig. 2c. We also evaluated the performance across drugs. Pearson’s correlation coefficients for different drugs varied in a wide range from 0.861 to 0.132 (Fig. 2d). The highest and the lowest coefficients were obtained on CAY10603 and cetuximab, respectively. Scatterplots of these two cases are shown in Fig. 2e, f. We performed principal component analysis (PCA) on SMILEs of drugs. We observed that latent representations of CAY10603 and cetuximab were close in low-dimensional space. This result indicates that the structures of these two drugs are similar, though the prediction performances on these two drugs were significantly different (Additional file 1: Figure S2). In addition, we found that the IC50 of cetuximab was much higher than that of other drugs. These findings indicate that drugs with low prediction performances may be affected by their isolation of IC50 from the overall distribution.

Fig. 2

Performance of DualGCN across cancers and drugs. a Pearson’s correlation on each type of cancer. We calculated the average Pearson’s correlation coefficients of samples belonging to each type of cancer and sorted the coefficients from large to small (from left to right in the figure). Blue dots indicate the mean of Pearson’s correlation across CVs and are denoted by . Vertical blue bars represent variances of Pearson’s correlation across CVs. denotes average sample size across CVs. The largest and smallest Pearson’s correlation coefficients were obtained on lung squamous cell carcinoma (LUSC) and neuroblastoma (NB), respectively. b Scatterplot of correlations between true and predicted IC50 on LUSC. c Scatterplot of correlations between true and predicted IC50 on NB. d Pearson’s correlation on each drug. We calculated the average Pearson’s correlation coefficients of samples belonging to each drug and sorted the coefficients from large to small. The left ten in the figure are drugs with the best predictive performance, and the right ten are drugs with the worst predictive performance. Blue dots indicate the mean of Pearson’s correlation across CVs and are denoted by . Vertical blue bars represent variances of Pearson’s correlation across CVs. denotes average sample size across CVs. The largest and smallest Pearson’s correlation coefficients were obtained on CAY10603 and cetuximab, respectively. e Scatterplot of correlations between true and predicted IC50 on CAY10603. f Scatterplot of correlations between true and predicted IC50 on cetuximab

Ablation analysis

We conducted ablation studies to evaluate the effects of different gene features on DualGCN. We studied such effects by taking only one kind of features as the input. The results are shown in Table 2. CNV data contributed more than gene expression data to our model. In addition, simultaneously taking gene expression and CNV data gained higher predictive performance than single kind of features.

Table 2

Ablation study on gene features

	Pearson’s correlation	Spearman’s correlation	RMSE
Expr.	0.908 ± 0.005	0.887 ± 0.008	1.191 ± 0.031
CNV	0.911 ± 0.007	0.892 ± 0.007	1.172 ± 0.046
Expr. + CNV	0.925 ± 0.001	0.907 ± 0.002	1.079 ± 0.007

Ablation study on gene features

A case study on clinical cancer patients

We conducted a case study on clinical BRCA patients using the trained DualGCN model. Gene features and drug response annotations of patients were obtained from The Cancer Genome Atlas Program (TCGA) [32]. There is a noticeable difference in analyzing drug response from in vitro cancer cell lines and clinical cancer data. Drug response annotations of clinical cancer data are qualitatively described as grades. In contrast, responses on cancer cell lines are quantitatively depicted by the IC50. We first binarized the clinical drug response annotations of patients into “sensitive” and “resistant”. Such binary labels were considered as ground truth. Then, we predicted the drug responses of patients and calculated the corresponding drug sensitivity score (DSS). A high DSS indicates sensitivity, and a low DSS indicates resistance. Detailed descriptions of annotation transformation and definitions of the DSS are given in the “Methods” section. We set DSS on cancer samples as discrimination thresholds of the receiver operating characteristic (ROC) curve. We observed a modest consistency between the predicted drug responses and clinical annotations. The area of the curve (AUC) of the ROC curve was 0.661 (95% confidence interval: 0.558 to 0.765, shown in Additional file 1: Figure S3. Future studies may need to combine single-cell cancer data and cellular interactions to further decode cell-type composition and cancer drug resistance mechanisms.

Conclusions

Anti-cancer drugs have played important roles in cancer treatments. However, resistance to anti-cancer drugs continues to be a serious challenge. Studying drug response on tumors is essential to improve the treatment of cancers and guide anti-cancer drug design. Cancer cell line models have been widely used for such research. However, tumors are heterogeneous and consist of different cell types and complex interactions. Studying in vitro cancer cell lines only cannot fully decode the mechanisms of drug resistance. Emerging single-cell technologies are powerful toolkits to explore cell-type composition and cellular interactions in tumors. Transferring drug response knowledge obtained from cell line models to clinical and single-cell data is an important direction. Single nucleotide variants are widely used as features of cancer cell lines in current cancer drug response studies. However, detecting SNVs covering all candidate genomic loci from clinical tumor data is not always reliable, let alone from single-cell data. Such SNV-based models are hard to extend to studies on clinical data and at single-cell resolution. In this study, we developed a unified dual graph convolutional network model, DualGCN, to predict cancer drug response. DualGCN encodes both drugs and cancer samples using graph convolutional networks with protein–protein interactions embedded. We demonstrated that DualGCN gained high predictive abilities without the use of SNV data. Such advances indicate its potential to be further extended to clinical and single-cell data. Meanwhile, recent single-cell tumor studies have constantly discovered important interactions in tumors. DualGCN sets genes as units of the encoding system with links across them. Such structures make it easy to absorb newly discovered protein interactions essential to tumor progression and drug resistance. We organized a case study on analyzing clinical cancer samples using knowledge learned from cell line models, and observed a modest consistency between the predicted drug responses and clinical annotations. In addition, we notice limitations of the proposed method. Units of the module encoding cancer samples are genes. Thus, input features are at the gene level. Such structures provide a convenient interface to combine interacting protein pairs constantly discovered from cancer research. However, other non-gene level signals, such as histone modifications, are hard to encode into the module directly. In summary, we introduce a method, DualGCN, that achieves high predictive abilities on cancer drug response without using SNV data. The method could be extended to clinical and single-cell data and has the potential to promote the development of precision medicine.

Methods

Drug and cell line data preparation

Drug data were downloaded from the GDSC (version: GDSC1) [4]. We only kept drugs that were recorded in PubChem [33]. In addition, drugs sharing the same PubChem identifiers but owning different GDSC identifiers were also filtered out. Finally, we collected 208 drugs. Detailed descriptions of these drugs can be found in Additional file 1: Table S2. We then transformed drug chemical structure data to obtain feature vectors of atoms of drugs using the previous algorithm [29]. Dimension of these feature vectors was . It has been proved that these feature vectors reflect the intrinsic properties of drugs, such as atom type, atom connectivity, and degrees of freedom. Gene features of cancer cell lines were downloaded from CCLE (version: 19Q2) [2]. We filtered out cell lines if (1) either gene expression or CNV data were unavailable, or (2) cancer type annotations were missed, or (3) the sample size of the corresponding cancer type was less than 10. Finally, we collected 525 cell lines covering 27 kinds of cancers. Detailed descriptions of these cell lines can be found in Additional file 1: Table S3. Gene expression data were represented as . CNV data were represented as , where represents the relative copy number. We then used z-score normalization on these gene features. Cancer drug response data (IC50) were downloaded from GDSC (version: GDSC1) [4]. The IC50 describes the amount of drug needed to inhibit cancer cell growth by half. In GDSC, the IC50 is recorded in the scale of µM and is transformed with natural logarithm. Finally, we collected 86,530 drug-cell line pairs.

Construction of drug-GCN module

Drug-GCN module takes feature and adjacency matrix of drugs as inputs. It considers each drug as a graph where nodes represent atoms of the drug and edges indicate connections between atoms. This module extracts intrinsic chemical attributes using the graph convolutional network algorithm [23]. Different drugs have different number of atoms (from 5 to 96 in this study), so the scales of these raw drug graphs vary. We first built a fixed-scale graph , and then embedded the raw drug graph into it. Such operations ensure that the drug-GCN module is unified to all drugs. The number of nodes of graph is 100. Mathematically, raw drug graph is a sub-graph of the fixed-scale graph . Additional nodes in are filled with zeros,where denotes the feature matrix of the fixed-scale graph . denotes binary adjacency matrix of . Similarly, and denote the feature matrix and adjacency matrix of , respectively. denotes the number of atoms of drug . , , , and are zero matrices. According to the GCN algorithm [23], we have,where is the output of layer , and is the initial feature matrix . is a modified adjacency matrix with self-connections. is an identity matrix. Diagonal matrix is a degree matrix of with . represents weights of the layer . Detailed configurations of the drug-GCN module can be found in Additional file 1: Table S1.

Construction of bio-GCN module

Bio-GCN module takes the gene features of cancer samples as inputs. Gene expression and CNV data were used in this study. These gene features were first fed into a two-layer MLP and the latent features were considered as features of genes. The module considers each cancer sample as a graph where nodes are proteins (genes) and edges indicate interactions between proteins. Such protein–protein interaction information was obtained from the STRING database (version 11.0, Taxonomy ID: 9606) [34]. Meanwhile, we only kept proteins that are known to be related to cancers. Such cancer-related proteins (genes) were collected from COSMIC [3] and TCGA [32]. We finally obtained 697 cancer-related genes (Table S4 in Additional file 1) and 55,140 protein–protein interaction pairs among them. Mathematically, the biological graph of cancer sample is denoted by . and denote the feature matrix and adjacency matrix, respectively. denotes the number of nodes. denotes dimension of features of genes. is a symmetric binary matrix. if gene and gene have interactions in the PPI network. Otherwise, . Then, the bio-GCN module uses graph convolutional network algorithms to extract intrinsic biological features of the cancer sample. The formula is as same as Eq. (1). Detailed configurations of the bio-GCN module can be found in Additional file 1: Table S1.

Configurations of baselines

We compared DualGCN with six baselines, including DeepCDR [8], CDRscan [7], SVM, random forest, Lasso regression, and ridge regression. We additionally collected SNV data from the CCLE because they were necessary when using some of the baselines. We finally collected 27,180 SNVs within the cancer-related genes. We encoded the SNV features as binary vectors with one denoting the occurrence of a mutation. DeepCDR [8] encodes multi-omics data using CNN separately. Genomic features including SNVs, gene expression, and copy number variation were used. Besides, it encodes drug data using graph convolutional networks. Meanwhile, we also tested the performance of DeepCDR without using SNV data by removing the corresponding CNN module. This modified version is denoted by DeepCDR (-). CDRscan [7] encodes SNVs using CNN. Besides, drugs are represented through one-hot encoding on SMILES data. SMILES is a string where characters represent atoms and connectivity relationships. We obtained SMILES (isomeric type) of drugs through parsing related XML files from PubChem. In addition, we also tested SVM, random forest, Lasso regression, and ridge regression using SNVs as features of cell lines, and drugs were represented through one-hot encoding of SMILES. We applied kernels including radial basis function (RBF) kernel, polynomial kernel, and sigmoid kernel for SVM. We applied multiple number of trees (n = 50, 100, 200) for random forest. We set coefficient alpha = 0.01, 0.1, 0.5 for Lasso regression. We set coefficient alpha = 0.1, 0.5, 1.0, 2.0 for ridge regression.

Clinical cancer data preparation

We conducted a case study on clinical cancer patients using DualGCN. First, we curated data of patients whose drug response information was available in TCGA. Patients with breast invasive carcinoma (BRCA) owned the largest scale (195 records) and were included in this case study. Then, we downloaded the gene features of these cancer patients through Firehose Broad GDAC (http://gdac.broadinstitute.org/). Gene expression data of patients were transformed as . CNV data were at segment-level originally. We further transformed these segment-level CNV data into gene-level. There are segments overlapping some gene, and the length of each overlapped region is denoted by . Length of the gene is denoted by . The relative copy number ratio of each segment is denoted by . We extracted the locations of genes from Ensembl (GRCh37) [35]. We transformed segment-level CNV data into gene-level and adopted logarithmic transformation using the following formula, There is a noticeable difference in analyzing drug response from in vitro cancer cell lines and clinical cancer data. In clinical cancer data, drug response annotations are qualitative rather than quantitative. Drug responses are labeled as four types in TCGA: (1) complete response, (2) partial response, (3) clinical progressive disease, and (4) stable disease. We binarized such labels into “sensitive” and “resistant”. We considered drugs to be sensitive if annotations in TCGA were (1) complete response or (2) partial response. We considered drugs to be resistant if annotations were (3) clinical progressive disease or (4) stable disease. On the other hand, drug responses on cell lines are quantified by IC50. However, the range of IC50 of each drug is different (Figure S1 in Additional file 1). We thereby introduced a metric, drug sensitivity score (DSS), to transform drug responses into the same scale and to make responses comparable across drugs,where MSC denotes max screening concentration of the drug. We collected MSC from the GDSC. is indicator function. If , . This indicates that the given drug is not sufficient to kill the cancer cells, and the DSS is smaller than 0. If , . This indicates that the given drug has the potential to kill the cancer cells, and the DSS is larger than 0. The larger the DSS is, the more sensitive the drug is. Gene features and drug response annotations of clinical samples are given in Additional file 2: Table S8. We predicted the IC50 of drugs on clinical cancer patients and calculated the DSS. We then adopted the ROC curve to analyze the consistency between our predictions and the binary clinical annotations obtained from the TCGA. Additional file 1: Supplementary figures and Supplementary tables S1–S7 for additional results. Figure S1. IC50 and MSC of drugs. Figure S2. PCA of structures of drugs. Figure S3. ROC curve on clinical cancer patients. Table S1. Parameter settings of DualGCN. Table S2. Descriptions of drugs. Table S3. Descriptions of cell lines. Table S4. List of cancer-related genes. Table S5. Results of SVM regression with various kernels. Table S6. Results of random forest with various number of trees. Table S7. Results of Lasso regression with various alpha. Additional file 2: Supplementary Table S8 for gene features and clinical annotations of the TCGA data.

31 in total

Review 1. Single-cell genome sequencing: current state of the science.

Authors: Charles Gawad; Winston Koh; Stephen R Quake
Journal: Nat Rev Genet Date: 2016-01-25 Impact factor: 53.242

Review 2. Single-Cell Techniques and Deep Learning in Predicting Drug Response.

Authors: Zhenyu Wu; Patrick J Lawrence; Anjun Ma; Jian Zhu; Dong Xu; Qin Ma
Journal: Trends Pharmacol Sci Date: 2020-11-02 Impact factor: 14.819

3. DeepCDR: a hybrid graph convolutional network for predicting cancer drug response.

Authors: Qiao Liu; Zhiqiang Hu; Rui Jiang; Mu Zhou
Journal: Bioinformatics Date: 2020-12-30 Impact factor: 6.937

4. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma.

Authors: Anoop P Patel; Itay Tirosh; John J Trombetta; Alex K Shalek; Shawn M Gillespie; Hiroaki Wakimoto; Daniel P Cahill; Brian V Nahed; William T Curry; Robert L Martuza; David N Louis; Orit Rozenblatt-Rosen; Mario L Suvà; Aviv Regev; Bradley E Bernstein
Journal: Science Date: 2014-06-12 Impact factor: 47.728

Review 5. A view on drug resistance in cancer.

Authors: José Baselga; David M Hyman; Neil Vasan
Journal: Nature Date: 2019-11-13 Impact factor: 49.962

6. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Authors: Jordi Barretina; Giordano Caponigro; Nicolas Stransky; Kavitha Venkatesan; Adam A Margolin; Sungjoon Kim; Christopher J Wilson; Joseph Lehár; Gregory V Kryukov; Dmitriy Sonkin; Anupama Reddy; Manway Liu; Lauren Murray; Michael F Berger; John E Monahan; Paula Morais; Jodi Meltzer; Adam Korejwa; Judit Jané-Valbuena; Felipa A Mapa; Joseph Thibault; Eva Bric-Furlong; Pichai Raman; Aaron Shipway; Ingo H Engels; Jill Cheng; Guoying K Yu; Jianjun Yu; Peter Aspesi; Melanie de Silva; Kalpana Jagtap; Michael D Jones; Li Wang; Charles Hatton; Emanuele Palescandolo; Supriya Gupta; Scott Mahan; Carrie Sougnez; Robert C Onofrio; Ted Liefeld; Laura MacConaill; Wendy Winckler; Michael Reich; Nanxin Li; Jill P Mesirov; Stacey B Gabriel; Gad Getz; Kristin Ardlie; Vivien Chan; Vic E Myer; Barbara L Weber; Jeff Porter; Markus Warmuth; Peter Finan; Jennifer L Harris; Matthew Meyerson; Todd R Golub; Michael P Morrissey; William R Sellers; Robert Schlegel; Levi A Garraway
Journal: Nature Date: 2012-03-28 Impact factor: 49.962

7. Ensembl 2020.

Authors: Andrew D Yates; Premanand Achuthan; Wasiu Akanni; James Allen; Jamie Allen; Jorge Alvarez-Jarreta; M Ridwan Amode; Irina M Armean; Andrey G Azov; Ruth Bennett; Jyothish Bhai; Konstantinos Billis; Sanjay Boddu; José Carlos Marugán; Carla Cummins; Claire Davidson; Kamalkumar Dodiya; Reham Fatima; Astrid Gall; Carlos Garcia Giron; Laurent Gil; Tiago Grego; Leanne Haggerty; Erin Haskell; Thibaut Hourlier; Osagie G Izuogu; Sophie H Janacek; Thomas Juettemann; Mike Kay; Ilias Lavidas; Tuan Le; Diana Lemos; Jose Gonzalez Martinez; Thomas Maurel; Mark McDowall; Aoife McMahon; Shamika Mohanan; Benjamin Moore; Michael Nuhn; Denye N Oheh; Anne Parker; Andrew Parton; Mateus Patricio; Manoj Pandian Sakthivel; Ahamed Imran Abdul Salam; Bianca M Schmitt; Helen Schuilenburg; Dan Sheppard; Mira Sycheva; Marek Szuba; Kieron Taylor; Anja Thormann; Glen Threadgold; Alessandro Vullo; Brandon Walts; Andrea Winterbottom; Amonida Zadissa; Marc Chakiachvili; Bethany Flint; Adam Frankish; Sarah E Hunt; Garth IIsley; Myrto Kostadima; Nick Langridge; Jane E Loveland; Fergal J Martin; Joannella Morales; Jonathan M Mudge; Matthieu Muffato; Emily Perry; Magali Ruffier; Stephen J Trevanion; Fiona Cunningham; Kevin L Howe; Daniel R Zerbino; Paul Flicek
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

Review 8. Advantages of targeting the tumor immune microenvironment over blocking immune checkpoint in cancer immunotherapy.

Authors: Tianyu Tang; Xing Huang; Gang Zhang; Zhengtao Hong; Xueli Bai; Tingbo Liang
Journal: Signal Transduct Target Ther Date: 2021-02-20

9. Single-cell RNA sequencing reveals the tumor microenvironment and facilitates strategic choices to circumvent treatment failure in a chemorefractory bladder cancer patient.

Authors: Hye Won Lee; Woosung Chung; Hae-Ock Lee; Da Eun Jeong; Areum Jo; Joung Eun Lim; Jeong Hee Hong; Do-Hyun Nam; Byong Chang Jeong; Se Hoon Park; Kyeung-Min Joo; Woong-Yang Park
Journal: Genome Med Date: 2020-05-27 Impact factor: 11.117

10. Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma.

Authors: Yu-Pei Chen; Jian-Hua Yin; Wen-Fei Li; Han-Jie Li; Dong-Ping Chen; Cui-Juan Zhang; Jia-Wei Lv; Ya-Qin Wang; Xiao-Min Li; Jun-Yan Li; Pan-Pan Zhang; Ying-Qin Li; Qing-Mei He; Xiao-Jing Yang; Yuan Lei; Ling-Long Tang; Guan-Qun Zhou; Yan-Ping Mao; Chen Wei; Ke-Xu Xiong; Hong-Bo Zhang; Shi-Da Zhu; Yong Hou; Ying Sun; Michael Dean; Ido Amit; Kui Wu; Dong-Ming Kuang; Gui-Bo Li; Na Liu; Jun Ma
Journal: Cell Res Date: 2020-07-20 Impact factor: 46.297