| Literature DB >> 32153729 |
Peiran Jiang1,2, Shujun Huang3, Zhenyuan Fu4, Zexuan Sun1,5, Ted M Lakowski3, Pingzhao Hu1,6.
Abstract
Drug combinations are frequently used for the treatment of cancer patients in order to increase efficacy, decrease adverse side effects, or overcome drug resistance. Given the enormous number of drug combinations, it is cost- and time-consuming to screen all possible drug pairs experimentally. Currently, it has not been fully explored to integrate multiple networks to predict synergistic drug combinations using recently developed deep learning technologies. In this study, we proposed a Graph Convolutional Network (GCN) model to predict synergistic drug combinations in particular cancer cell lines. Specifically, the GCN method used a convolutional neural network model to do heterogeneous graph embedding, and thus solved a link prediction task. The graph in this study was a multimodal graph, which was constructed by integrating the drug-drug combination, drug-protein interaction, and protein-protein interaction networks. We found that the GCN model was able to correctly predict cell line-specific synergistic drug combinations from a large heterogonous network. The majority (30) of the 39 cell line-specific models show an area under the receiver operational characteristic curve (AUC) larger than 0.80, resulting in a mean AUC of 0.84. Moreover, we conducted an in-depth literature survey to investigate the top predicted drug combinations in specific cancer cell lines and found that many of them have been found to show synergistic antitumor activity against the same or other cancers in vitro or in vivo. Taken together, the results indicate that our study provides a promising way to better predict and optimize synergistic drug pairs in silico.Entities:
Keywords: ACC, accuracy; AUC, area under the curve; CNN, convolutional neural network; Cancer; Cell line; DDS, drug-drug synergy; DNN, deep neural network; DTI, drug-target interaction; ER, estrogen receptor; FPR, false positive rate; GBM, glioblastoma multiforme; GCN, graph convolutional network; Graph convolutional network; HTS, high throughput screening; Heterogenous network; PPI, protein–protein interaction; RF, random forest; ROC, receiver operating characteristic; SD, standard deviation; SVM, support vector machine; Synergistic drug combination; TNBC, triple negative breast cancer; TPR, true positive rate; XGBoost, extreme gradient boosting
Year: 2020 PMID: 32153729 PMCID: PMC7052513 DOI: 10.1016/j.csbj.2020.02.006
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1The study designs. (a) Data collection. The drug-drug synergy (DDS) data, the drug-target interaction (DTI) data, and the protein–protein interaction (PPI) data were collected for the three subnetworks. (b) Network construction. For a given cell line, the synergy scores of drug pairs were binarized to construct the DDS subnetwork, which together with the DTI and PPI networks was further built the cell line-specific heterogenous network. (c) Model inference. The heterogenous network for a specific cell line is the input of the GCN encoder. Each encoded node is then mapped to an embedding space for representing the drug-drug synergy prediction in the new space. (d) Model evaluation. The negative sampling method together the accuracy, AUC, and Pearson correlation coefficient metrics were used. (e) Exploration of embedding space. t-SNE method was used to find the distribution of synergistic drug combinations.
The data sources of three types of interactions.
| Data sources | Number of links | Number of entries | Number of entities |
|---|---|---|---|
| Ⅰ(DDS) | 23,052 DDS | 23,052 DDS | 38 drugs, 39 cell lines |
| II(DTIs) | 8,083,600 DTIs | 871 DTIs | 519,022 drugs, 8,934 proteins |
| III(PPIs) | 719,402 PPIs | 5,296 PPIs | 19,085 proteins |
Fig. 2The cell line-specific heterogenous network derived from the cell line CAOV3. The teal color represents the drugs (nodes) and their interactions (edges), which consist the DDS network. The orange color represents the proteins (nodes) and their interactions (edges), which consist the PPI network. The olive color represents the interactions (edges) between the drugs and the proteins, which consist the DTI network. For the cell line CAOV3, the cell line-specific DDS network was first linked to the DTI network and then connected to the PPI network. We can choose any area of the network to zoom in and see that area in more detail. For example, (a) displays the entry number, names, and linkages of proteins in the selected area. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3The workflow of GCN encoder and matrix decoder. There are 4 hidden layers in the GCN encoder. Between each of two hidden layers, there is a ReLu activation function. The output of the ReLu is the input for the next hidden layer. For the last hidden layer, we adopt a sigmoid activation function. The input of the GCN model is a graph and the output is an embedding vector for each node. Matrix decoder decodes the embedding vectors to predict the synergy score of any given drug combination.
Fig. 4The performance of DDS prediction for all cell lines. The x-axis is the cell line index. The y-axis is numeric ranging from 0 to 1. (a) The line chart shows the AUC of the negative sampling method (the dash line) and the 10-fold CV method (the solid line) across cell lines. (b) The line chart shows the accuracy (ACC) of the negative sampling method (the dash line) and the 10-fold CV method (the solid line) across cell lines.
Fig. 5The Pearson correlation coefficients of the GCN models. (a) The boxplot shows the Pearson correlation coefficients between true and predicted synergy scores per tissue types. On the x-axis tissue names and the number of cell lines are displayed. (b) The bar plot shows the Pearson correlation coefficients between true and predicted synergy scores per drugs. On the x-axis the drug names are displayed. The error bar, which was calculated by repeating 10 times across all cell lines, was shown for each drug.
Fig. 6The diagram of visualization and regression. (a) The 3-D matrix representation for experimentally measured drug synergy scores and predicted drug synergy scores. Each dot represents an experimental (blue, cutoff = 60, more than 60) or a predicted (orange, cutoff = 0.75) measurement of the synergy effect of drugs A and B in a specific cell line. The x axis is first drug index. The y axis is the second drug index. The z axis is the cell line index. (b) The regression of the predicted and measured synergy scores for all cell lines. Dots here are also flattened dots from the two 3-D matrices. The x-axis is the normalized measured synergy scores and y-axis is the predicted synergy probability (from 0 to 1). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Compare the Performance of GCN with other different traditional methods.
| Performance metrics | AUC | AUPRC | Accuracy | Kappa | Evaluation method | ||||
|---|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | ||
| GCN | 0.892 | 0.008 | 0.794 | 0.015 | 0.919 | 0.018 | 0.584 | 0.031 | 10-fold CV (Negative sampling) |
| DNN (adjacency) | 0.752 | 0.052 | 0.691 | 0.029 | 0.882 | 0.029 | 0.541 | 0.021 | 10-fold CV |
| DNN (physiochemical) | 0.811 | 0.021 | 0.666 | 0.041 | 0.833 | 0.045 | 0.486 | 0.044 | 10-fold CV |
| SVM | 0.762 | 0.072 | 0.682 | 0.13 | 0.872 | 0.059 | 0.514 | 0.14 | 10-fold CV |
| EN | 0.741 | 0.059 | 0.522 | 0.09 | 0.881 | 0.051 | 0.531 | 0.062 | 10-fold CV |
| RF | 0.779 | 0.054 | 0.534 | 0.043 | 0.873 | 0.043 | 0.522 | 0.056 | 10-fold CV |
Compare the Performance of GCN with the state-of-the-art method DeepSynergy.
| Performance metrics | AUC | AUPRC | Accuracy | Kappa | Evaluation method | ||||
|---|---|---|---|---|---|---|---|---|---|
| mean | std | mean | std | mean | std | mean | std | ||
| GCN | 0.892 | 0.008 | 0.794 | 0.015 | 0.919 | 0.018 | 0.584 | 0.031 | 10-fold CV (Negative sampling) |
| DeepSynergy | 0.893 | 0.034 | 0.568 | 0.089 | 0.929 | 0.014 | 0.568 | 0.106 | 10-CV |
Top predicted synergistic drug combinations for each of the 39 cancer cell lines.
| Cell line | Cancer | Drug A | Drug B | Probability for synergy |
|---|---|---|---|---|
| OCUBM | Breast | ABT-888 | MK-8669 | 0.98 |
| ZR751 | Breast | AZD1775 | BEZ-235 | 0.92 |
| MDAMB436 | Breast | BEZ-235 | Temozolomide | 0.86 |
| T47D | Breast | Sunitinib | BEZ-235 | 0.86 |
| KPL1 | Breast | MK-8669 | MK-2206 | 0.82 |
| EFM192B | Breast | Dasatinib | MK-8669 | 0.78 |
| HT29 | Colon | MK-4827 | Temozolomide | 0.95 |
| RKO | Colon | MK-2206 | MK-8669 | 0.88 |
| SW620 | Colon | Dasatinib | Sunitinib | 0.87 |
| SW837 | Colon | Lapatinib | MK-2206 | 0.87 |
| HCT116 | Colon | BEZ-235 | MK-8776 | 0.82 |
| LOVO | Colon | Lapatinib | Dasatinib | 0.82 |
| DLD1 | Colon | Sunitinib | Temozolomide | 0.73 |
| SKMES1 | Lung | MK-4827 | SN-38 | 0.93 |
| NCIH460 | Lung | BEZ-235 | MK-4827 | 0.90 |
| MSTO | Lung | Bortezomib | Dasatinib | 0.87 |
| NCIH23 | Lung | Temozolomide | MK-4827 | 0.84 |
| A427 | Lung | MK-8669 | Temozolomide | 0.82 |
| NCIH1650 | Lung | Dasatinib | MK-8669 | 0.81 |
| NCIH2122 | Lung | MK-4827 | Temozolomide | 0.68 |
| NCIH520 | Lung | Oxaliplatin | Sunitinib | 0.10 |
| SKMEL30 | Melanoma | MK-8776 | MK-8669 | 0.98 |
| A375 | Melanoma | BEZ-235 | Temozolomide | 0.96 |
| UACC62 | Melanoma | MK-8669 | MK-4827 | 0.96 |
| A2058 | Melanoma | MK-8776 | Temozolomide | 0.89 |
| RPMI7951 | Melanoma | AZD1775 | MK-8669 | 0.84 |
| HT144 | Melanoma | BEZ-235 | MK-8669 | 0.62 |
| OV90 | Ovarian | Vinorelbine | MK-8776 | 0.97 |
| PA1 | Ovarian | BEZ-235 | MK-4827 | 0.94 |
| SKOV3 | Ovarian | MK-8669 | MK-4827 | 0.93 |
| UWB1289BRCA1 | Ovarian | BEZ-235 | Temozolomide | 0.91 |
| A2780 | Ovarian | MK-8669 | MK-2206 | 0.85 |
| CAOV3 | Ovarian | Etoposide | MK-2206 | 0.83 |
| OVCAR3 | Ovarian | Dasatinib | MK-8776 | 0.82 |
| UWB1289 | Ovarian | AZD1775 | BEZ-235 | 0.80 |
| ES2 | Ovarian | Sunitinib | BEZ-235 | 0.75 |
| VCAP | Prostate | BEZ-235 | MK-4541 | 0.93 |
| LNCAP | Prostate | BEZ-235 | Geldanamycin | 0.77 |
Colon cancer cell line COLO320DM and the lung cancer cell line NCIH520 were not included in the table due to the low predicted probability synergy score of the top drug combinations in the two cell lines.
Performance comparison of AUC in 10-fold CV using different settings of negative sampling (GCN with only DDI data).
| Values of | 5% | 10% | 15% | 20% | 25% | 50% |
| Values of | 0.5:1 | 1:1 | 1.5:1 | 2:1 | 2.5:1 | 5:1 |
| Average AUC | 0.809 ± 0.05 | 0.857 ± 0.04 | 0.853 ± 0.04 | 0.837 ± 0.04 | 0.803 ± 0.04 | 0.753 ± 0.04 |
p, the percentage of selected negative sample number to the number of benchmark positive samples.
r, the ratio of the size of the negative dataset to that of the positive dataset in both training process and prediction performance process.
Fig. 7Visualization of synergistic effects by t-SNE to explore the embedding space. The left panel (a) is the t-SNE result of the cell line KPL1-specific embedding space and the right panel (b) is the t-SNE result of the cell line SW620-specific embedding space. Two red frames in the middle are the magnifications in particular areas in (a) and (b). The x-axis is the first dimension of t-SNE and the y-axis is the second dimension of t-SNE. Each dot is a representation of a specific drug. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)