| Literature DB >> 26351271 |
Isidro Cortés-Ciriano1, Gerard J P van Westen2, Guillaume Bouvier1, Michael Nilges1, John P Overington3, Andreas Bender4, Thérèse E Malliavin1.
Abstract
MOTIVATION: Recent large-scale omics initiatives have catalogued the somatic alterations of cancer cell line panels along with their pharmacological response to hundreds of compounds. In this study, we have explored these data to advance computational approaches that enable more effective and targeted use of current and future anticancer therapeutics.Entities:
Mesh:
Year: 2015 PMID: 26351271 PMCID: PMC4681992 DOI: 10.1093/bioinformatics/btv529
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Pharmacogenomic modelling concept and illustration of the learning strategies explored. (a) The pGI50 values for 17 142 compounds on 59 cancer cell lines (941 831 data points) were modelled with RF and SVM models and conformal prediction. (b) Illustration of the training data used in the following learning strategies: (b) 10-fold CV PGM models (interpolation); (c) LOCCO; (d) LOCO; and (e) Family QSAR. As can be seen in (b–e), the training data used in each learning strategy differs with respect to (i) the subset of data-points from the whole dataset used for training and (ii) the type and combination of input descriptors, which can be only compound descriptors, only cell line descriptors, or the combination of both. In all models reported in this article, Morgan fingerprints were used as compound descriptors, whereas the dataset views indicated in Table 1 and four cell line kernels were used to encode the cell lines. Overall, this validation enabled us to assess the model’s performance in real-world settings, where the extrapolation to novel cell lines and compounds is often a necessary step
Description of the dataset views benchmarked for the compound sensitivity prediction using the NCI60 panel
| Original profiling dataset | Abbreviated data set view name | Details |
|---|---|---|
| Cell line fingerprints ( | Cell Fingerprints | Number of short tandem repeats at 16 genomic loci |
| DNA copy-number variation ( | CNV | CNV for the 967 genes ( |
| DNA copy-number variation ( | CNV Onc. & T. Suppre. | CNV for oncogenes and tumour suppressors ( |
| mRNA ( | G.t.l ABC | Transcript levels (log2) of 47 ABC transporters |
| mRNA ( | G.t.l Onc. & T. Suppre. | Transcript levels (log2) of (i) oncogenes, and (ii) tumour suppressors |
| mRNA ( | G.t.l Kin. | Transcript levels (log2) of 402 human kinases ( |
| mRNA ( | G.t.l 1000 genes | Transcript levels (log2) of the 1000 genes displaying the highest variability among the NCI60 panel ( |
| mRNA ( | G.t.l 1000 pathways | Average transcript levels (log2) of the 1000 pathways displaying the highest variance among the NCI60 panel ( |
| mRNA ( | G.t.l 1000 genes & Kin. & Onco. & T. Suppre. | Transcript levels (log2) of (i) the 1000 genes displaying the highest variance among the NCI60 panel, (ii) the human kinome, (iii) oncogenes, and (iv) tumour suppressors |
| mRNA ( | G.t.l Kin. & Onco. & T. Suppre. | Transcript levels (log2) of (i) the human kinome, (ii) oncogenes, and (iii) tumour suppressors |
| miRNA ( | miRNA | Expression (log2) of 627 miRNAs |
| Reverse-phase lysate arrays ( | RPLA | Normalized protein abundance levels (log2) for 89 proteins ( |
| Whole exome sequencing ( | Exome | Mutation status (1: mutated, 0: non mutated) of 112 Type II variants ( |
| Whole exome sequencing & DNA copy-number variation | Exome & CNV | Concatenation of dataset views Exome seq. and CNV |
The abbreviated names used in Fig. 3 are indicated in the second column. Prior biological knowledge, such as pathway information, was included in some dataset views, whereas the gene transcript levels and mutational status for genes implicated in cancer, kinases and ABC transporters were gathered independently and combined in the dataset views to assess the redundancy of their predictive signal
Fig. 3.Benchmarking the cell line profiling dataset views for the cell line sensitivity prediction. (a) The predictive power of the 14 dataset views (Table 1) and two cell line kernels, namely cor. Proteome and cor. Transcriptome, was quantified by the RMSE values on the test set. For each dataset view, we trained the 10-fold CV PGM models on the uncorrelated bioactivities 0.5 dataset. We found significant differences among the dataset views (ANOVA, P < 0.01). Post-hoc analyses (Tukey’s HSD, α = 0.05) were used to cluster the dataset views according to their predictive power. Dataset views sharing a letter label performed at the same level of statistical significance. We consistently found that the gene transcript levels and the abundance of proteins and miRNA led to the most predictive models (labelled with ‘a’). (b) The evaluation of both interpolation and extrapolation power was evaluated on the complete dataset. After finding significant differences across groups (ANOVA, P < 0.01), we found that the PGM models interpolate and extrapolate to new cell lines and tissues at the same level of statistical significance (Tukey’s HSD, α = 0.05). In contrast, we found statistically significant differences in the performance between extrapolation and interpolation to new chemical clusters. The blue points indicate the median and the interquartile range (25th–75th percentile), whereas the red points indicate the mean RMSE value
Fig. 2.Comparison between (i) RF and SVM and (ii) the cell line kernel and PGM models. (a) The predictive power of the 10-fold CV RF and SVM models was compared on the uncorrelated bioactivities 1 dataset across the cell line kernels explored and the data set view ‘G.t.l. 1000 genes’ (Table 1). The CV RMSEtest values on the left out sets in CV were used as a proxy to monitor the predictive power of the models. RF and SVM trained on the ‘G.t.l. 1000 genes’ dataset view displayed comparable predictive power, whereas RF and SVM exhibited diverse performance across the cell line kernels used in this study. Models sharing a letter label performed at the same level of statistical significance (Tukey’s HSD, α = 0.05). The blue points indicate the median and the interquartile range (25th–75th percentile), whereas the red points indicate the mean RMSEtest value. (b) Comparison between the individual QSAR and PGM models. The 10-fold CV Ind. QSAR models were trained on increasingly larger training sets and their performance was assessed on the left out data (orange). Thus, each point in the figure corresponds to the average 10-fold CV RMSEtest value across 59 models (one per cell line). The 10-fold CV PGM models were trained jointly on the compound and cell line descriptors (‘G.t.l. 1000 genes’ dataset view). For each model, the training set comprised a fraction of the data annotated on a given cell line (x-axis) and a percentage of the data annotated on the remaining cell lines (indicated in the legend). Overall, lower RMSEtest values are obtained when integrating information from several cell lines, indicating that the PGM models enable us to share information across cell lines and compounds, thereby outperforming the individual QSAR models
Fig. 4.Evaluation of the predicted growth inhibition patterns for MTX on the NCI60 panel. (a) The relative growth inhibition pattern (z-scores) on the NCI60 panel was calculated from the experimental pGI50 values together with the experimental uncertainty of the measurements. (b) Predicted relative growth inhibition pattern of growth inhibition in the 10-fold CV model (i.e. interpolation) along with the 75% CI calculated using conformal prediction. Complex, and overall matching, inhibition patterns are reflected by the predictions. For instance, the TK-10, RXF-393 and A498 renal cell lines (marked with an asterisk) were predicted to be highly resistant to MTX, whereas the effect of MTX on sensitive cell lines, namely UO-31, SN12C, CAKI-1 and ACHN, was also correctly predicted. Cell lines originating from the same tissue are in the same colour (breast: red, central nervous system: orange, colon: olive green, lung cancer: dark green, leukaemia: turquoise, melanoma: blue, ovarian: blue, prostate: purple, renal: magenta)