| Literature DB >> 30097794 |
Mehreen Ali1,2, Tero Aittokallio3,4,5.
Abstract
In-depth modeling of the complex interplay among multiple omics data measured from cancer cell lines or patient tumors is providing new opportunities toward identification of tailored therapies for individual cancer patients. Supervised machine learning algorithms are increasingly being applied to the omics profiles as they enable integrative analyses among the high-dimensional data sets, as well as personalized predictions of therapy responses using multi-omics panels of response-predictive biomarkers identified through feature selection and cross-validation. However, technical variability and frequent missingness in input "big data" require the application of dedicated data preprocessing pipelines that often lead to some loss of information and compressed view of the biological signal. We describe here the state-of-the-art machine learning methods for anti-cancer drug response modeling and prediction and give our perspective on further opportunities to make better use of high-dimensional multi-omics profiles along with knowledge about cancer pathways targeted by anti-cancer compounds when predicting their phenotypic responses.Entities:
Keywords: Drug response prediction; Feature selection; Multi-view regression; Omics profiling; Precision oncology; Predictive biomarkers
Year: 2018 PMID: 30097794 PMCID: PMC6381361 DOI: 10.1007/s12551-018-0446-z
Source DB: PubMed Journal: Biophys Rev ISSN: 1867-2450
Details of the key omics datasets available from representative cancer cell lines and patient genomic resources, along with the dataset sizes and dimensionalities of the raw and processed profiles for the NGS-based datatypes (rows in italics). For the other datatypes, only the dimensionality of the processed data is reported for comparison
| NCI-DREAM71 | NCI-602 | GDSC10003 | TCGA4/TCPA5 | |
|---|---|---|---|---|
| Cancer type | Breast cancer | 9 tissue types | 29 tissue types | 33 tissue types |
| Number of samples | 53 cell lines | 59 cell lines | 1124 cell lines | ~ 11,000 patient tumors |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Total size (processed datasets) |
|
|
|
|
| Whole genome sequencing | ~ 20,000 genes | ~ 17,000 genes | 19,100 genes | ~ 21,000 genes |
| Whole exome sequencing | ~ 22,000 genes | ~ 13,000 genes | ~ 23,000 genes | ~ 20,000 genes |
| RNAsequencing | ~ 40,000 transcripts | ~ 60, 000 transcripts | ~ 50, 000 transcripts | ~ 55,000 transcripts |
| MicroRNA profiles | – | ~ 800 miRNA transcripts | – | ~ 1800 miRNA transcripts |
| Microarray gene expression | ~ 18,000 genes | 25,722 genes | 17,737 genes | ~ 22,000 genes |
| Somatic mutation calling | ~ 33,000 SNPs6 | ~ 500,000 SNPs | ~ 485,000 SNPs | ~ 500,000 SNPs |
| Copy number variation | ~ 27,000 variants | ~ 25,000 variants | ~ 50,000 variants | ~ 50,000 variants |
| DNA methylation patterns | ~ 27,000 CpGs7 | 20,000 CpGs | ~ 35,000 CpGs | ~ 486,000 CpGs |
| RPPA8 proteomics | 131 proteins | 162 proteins | – | ~ 240 proteins |
| MS9 proteomics | – | 10,350 proteins | – | ~ 16,000 proteins10 |
| Drug response data | 28 compounds | > 100,000 compounds | 265 compounds | Survival data for clinical treatments |
Boldface entries represent total sizes of raw and processed datasets. These are not statistical significance values
1NCI-DREAM7, DREAM7 Challenge (http://dreamchallenges.org/), organized together with the National Cancer Institute (NCI; Costello et al. 2014)
2NCI-60, The National Cancer Institute drug screening panel (Shoemaker 2006)
3GDSC1000, Genomics of Drug Sensitivity in Cancer project (Yang et al. 2012)
4TCGA, The Cancer Genome Atlas (http://cancergenome.nih.gov/; Weinstein et al. 2013)
5TCPA, The Cancer Proteome Atlas (http://tcpaportal.org/tcpa/, Li et al. 2013)
6SNPs, single-nucleotide polymorphism
7CpGs, CpG island in DNA where “C” is connected to “G” by a phosphodiester bond “p”
8RPPA, reverse phase protein array
9MS, mass spectrometry
10CPTAC, Clinical Proteomic Tumor Analysis Consortium (https://proteomics.cancer.gov/programs/cptac)
Representative drug sensitivity prediction models classified in terms of whether or not they implement also feature selection
| Prediction model | Example applications | |
|---|---|---|
| Kernel-based | SVM1 | Dong et al. ( |
| BEMKL | Kernelized regression model for drug response prediction based on data integration across multiple omics profiles, through multi-task, multiple kernel learning (Costello et al. | |
| cwKBMF5 | Drug response prediction model (Ammad-ud-din et al. | |
| KRL | Kernelized rank learning (KRL; He et al. | |
| Feature selection-based | Ridge Regression | Geeleher et al. ( |
| Elastic net | Jang et al. ( | |
| Random forests | Riddick et al. ( | |
| MVLR8 | Bayesian multi-view multi-task linear regression model (Ammad-ud-din et al. |
1SVM, support vector machines
2CCLE, Cancer Cell Line Encyclopedia
3CGP, Cancer Genome Project
4NCI, National Cancer Institute
5cwKBMF, component-wise kernelized Bayesian matrix factorization
6GDSC, Genomics of Drug Sensitivity in Cancer project
7CTRP, Cancer Therapeutic Response Portal
8MVLR, multi-view linear regression
9TNBC, triple-negative breast cancer