| Literature DB >> 34619752 |
Farzaneh Firoozbakht1, Behnam Yousefi1,2, Benno Schwikowski1.
Abstract
For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.Entities:
Mesh:
Year: 2022 PMID: 34619752 PMCID: PMC8769705 DOI: 10.1093/bib/bbab408
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Publicly available drug screening datasets on cell lines
| Datasets | Drug response measurement | Sample size | Drug number | Tissue | Reference | URL |
|---|---|---|---|---|---|---|
| NCI60 | GI50, IC50, | 60 | <3000 | 9 | Shoemaker |
|
| Genomics of Drug Sensitivity in Cancer (GDSC) | IC50, AUC | 988 | 518 | 36 | Garnett |
|
| Cancer Cell Line Encyclopedia (CCLE) | IC50, EC50, AA | 504 | 24 | 36 | Barretina |
|
| GRAY | GI50 | 70 | 90 | 1 | Heiser |
|
| Institute for Molecular Medicine Finland | EC50, DSS | 106 | 308 | 1 | Pemovska |
|
| Genentech (GNE) | IC50 | 675 | >350 | 17 | Klijn |
|
| Human B-cell Cancer Cell Lines (HBCCL) | AUC, Sen/Res | 26 | 3 | 2 | Falgreen |
|
| Cancer Therapeutics Response Portal (CTRPv2) | AUC | 860 | 481 | 25 | Seashore-Ludlow |
|
| Genentech Cell Line Screening Initiative | AUC, IC50 | 410 | 16 | 23 | Haverty |
|
| Head and neck squamous cell carcinomas cell lines | IC50 | 8 | 276 | 1 | Chia |
|
| Texas Southwestern Medical Center non-small cell lung cancer cell line | AUC, ED50 | 100 | 222 | 2 | McMillan |
|
Figure 1(A) Preclinical samples (e.g. cell lines), on one hand, are profiled to generate omics data, and on the other hand, they are screened over different drugs for obtaining their sensitivities. This information together is used to train machine learning models. (b) The trained model predicts the sensitivity for a new cell line.
Systematic overview of drug response prediction methods
| Model | Paper | Method name | Dataset | Input | Output | Evaluation | Code/software availability |
|---|---|---|---|---|---|---|---|
|
| |||||||
| Linear Regression | |||||||
| Barretina | - | CCLE | MAGE, CNV, Mu, TT | IC50, AA | KFCV | - | |
| Papillon-Cavanagh | - | GDSC, CCLE | MAGE | IC50 | KFCV, CDV | - | |
| Niepel | - | NCI | Mu | GI50 | LOOCV | - | |
| Geeleher | - | GDSC, three clinical datasets | MAGE | IC50 | LOOCV, CDV |
| |
| Falgreen | - | HBCCL, GDSC, IDRC, LLMPP, MDFCI, UAMS | MAGE | AUC | KFCV, LOOCV | - | |
| Covell | - | GDSC, CCLE | MAGE | GI50 | CDV | - | |
| Li | - | OncoPanel, BATTLE | MAGE | IC50 | RRSSCV | - | |
| Sokolov | GELnet | LBNLBC | MAGE | GI50 | LPOCV |
| |
| Aben | TANDEM | GDSC | MAGE, Mu, CNV, DM, TT | IC50 | KFCV |
| |
| Huang | TG-LASSO | GDSC, TCGA | MAGE | IC50 | RRSSCV, CDV |
| |
| Logistic Regression | |||||||
| Geeleher | - | GDSC, three clinical datasets | MAGE | Res/Non-Res | LOOCV, CDV |
| |
| Falgreen | - | HBCCL, GDSC, IDRC, LLMPP, MDFCI, UAMS | MAGE | Sen/Res | KFCV, LOOCV | - | |
| Ding | - | TCGA | MAGE, CNV, DM, miRNA | Sen/Res | - | ||
| Geeleher | - | GDSC, TCGA | MAGE | IC50 | KFCV, CDV |
| |
| Ding | - | GDSC, CCLE | MAGE, CNV, Mu | Sen/Res | RRSSCV | - | |
| Huang | L | GDSC, BATTLE | MAGE | IC50 | KFCV | - | |
| Maximum Margin Models | |||||||
| Dong | - | GDSC, CCLE | MAGE | Sen/Res | KFCV, CDV | - | |
| Gupta | - | CCLE | MAGE, CNV, Mu | IC50 | - |
| |
| Parca | - | GDSC | MAGE | IC50 | KFCV |
| |
| Ensemble Learning Methods | |||||||
| Riddick | - | NCI-60 | MAGE | IC50 | OOBP | - | |
| Daemen | - | LBNLBC, TCGA | MAGE, DM, ES, RSGE, PA, CNV | Sen/Res | RRSSCV, CDV | - | |
| Stetson | - | NCI60, CCLE, GDSC | MAGE, SNP, CNV | IC50 | KFCV, CDV | - | |
| Wan & Pal (2014) | - | CCLE, NCI-DREAM | MAGE, CNV, PA, DM, RSGE | GI50, IC50 | KFCV, LOOCV | - | |
| Fang | QRF | CCLE | MAGE, Mu, CNV | AA | OOBP | - | |
| Xu | AutoBorutaRF | GDSC, CCLE | MAGE, SNP, CNV | Sen/Res | KFCV |
| |
| Oskooei | NetBiTE | GDSC | MAGE | IC50 | KFCV | - | |
| Su | Deep-Resp-Forest | GDSC, CCLE | MAGE, CNV | Sen/Res | KFCV |
| |
| Kurilov | - | GDSC, CCLE, CTRP, gCSI, NIBR PDXE | MAGE, CNV, Mu, RSGE, TT | IC50, AUC, V1, TVC | RRSSCV |
| |
| Nearest Neighbour Method | |||||||
| Li | GA/KNN | GDSC, CCLE, TCGA, GTEx | MAGE, RSGE | IC50 | RRSSCV, CDV | - | |
| Artificial Neural Networks and Deep Learning | |||||||
| Sakellaropoulos | - | GDSC, TCGA, MD Anderson, OCCAMS, multiple myeloma | MAGE | IC50 | KFCV |
| |
| Sharifi-Noghabi | MOLI | GDSC, PDX, TCGA | MAGE, SNP, CNV | Res/Non-Res | CDV |
| |
| Ahmed | - | NSCLCCLP | RSGE | AUC, ED50 | RRSSCV |
| |
| Ma | TCRP | GDSC, CCLE, DepMap, PDTC BioBank, PDX Encyclopedia | MAGE, Mu | AUC, TVC | CDV |
| |
| Malik | - | GDSC, TCGA | MAGE, Mu, CNV, DM | IC50 | KFCV, CDV |
| |
| Molecular Network Similarity-Based Methods | |||||||
| Kim | NBC | GDSC, CCLE | MAGE | Sen/Res | KFCV | - | |
| Stanfield | - | GDSC, CCLE | SNP | CDSS | LOOCV, CDV | - | |
|
| |||||||
| Linear Regression | |||||||
| Costello | Bayesian multitask MKL | NCI-DREAM, GDSC | MAGE, RSGE, CNV, Mu, DM, PA | GI50 | challenge test cell lines |
| |
| Yuan | - | CCLE, CTRPv2, NCI60 | MAGE, CNV, Mu | AA, AUC, GI50 | KFCV | - | |
| Ammad-ud-din | MVLR | GDSC, FIMM | MAGE | IC50, CDSS | LOOCV |
| |
| Ensemble Learning Methods | |||||||
| Matlock | - | GDSC, CCLE | MAGE, DT, DD | AUC | RRSSCV | - | |
| Liu | - | GDSC, CCLE | MAGE | AA, AUC | KFCV |
| |
| Sharma & Rani (2019) | - | GDSC, CCLE, NCI-Dream | MAGE | IC50, GI50 | KFCV | - | |
| Su | Meta-GDBP | GDSC, CCLE | MAGE, DD | IC50, AA | KFCV, CDV |
| |
| Artificial Neural Networks and Deep Learning | |||||||
| Menden | - | GDSC | MS, MU, CNV, DD | IC50 | KFCV | - | |
| Chang | CDRscan | GDSC, CCLP | SNP, DD | IC50 | KFCV | - | |
| Li | Deep DSC | GDSC, CCLE | MAGE, DD | IC50 | KFCV, LOTO, LOCO | - | |
| Chiu | DeepDR | GDSC, TCGA | MAGE, Mu | IC50 | RRSSCV | - | |
| Joo | DeepIC50 | GDSC, CCLE, TCGA | Mu, DD | 3-class sensitivity | RRSSCV, CDV |
| |
| Liu | tCNNS | GDSC | Mu, CNV, DD | IC50 | RRSSCV, LOTO |
| |
| Manica | MCA | GDSC | MAGE, DD | IC50 | KFCV, RRSSCV |
| |
| Choi | RefDNN | GDSC, CCLE | MAGE, DD | Sen/Res | KFCV |
| |
| Zhu | - | GDSC, CCLE, CTRP, gCSI | MAGE, DD | AUC | KFCC | - | |
| Bazgir | REFINED | GDSC, NCI-60 | MAGE, DD | IC50, GI50, Sen/Res | KFCV, RRSSCV |
| |
| Liu | DeepCDR | GDSC, CCLE, TCGA | MAGE, Mu, DM, DD | IC50, Sen/Res | KFCV, CDV |
| |
| Tang & Gottlieb (2021) | PathDSP | GDSC, CCLE | MAGE, CNV, Mu, DD | IC50 | KFCV, CDV |
| |
| Nguyen | GraphDRP | GDSC | Mu, CNV, DD | IC50 | RRSSCV |
| |
| Recommender Systems (neighbourhood-based) | |||||||
| Zhang | Dual-layer integrated cell line-drug network | GDSC, CCLE | MAGE, DD | AA, IC50 | LOOCV | - | |
| Sheng | - | GDSC, CCLE | MAGE, DD | IC50 | KFCV, CDV | - | |
| Liu | NCFGER | GDSC, CCLE | MAGE, DD | IC50, AA | KFCV | - | |
| Zhang | HIWCF | GDSC, CCLE | MAGE, DD, DT | IC50, AA | KFCV |
| |
| Le & Pham (2018) | GloNetDRP | GDSC, CCLE | MAGE, Mu | AUC, IC50 | KFCV | - | |
| Wei | CDCN | GDSC, CCLE | MAGE, DD | AA, Sen/Res | LOOCV |
| |
| Recommender Systems (model-based) | |||||||
| Ammad-ud-din | KBMF | GDSC | MAGE, CNV, Mu, DD, DT | IC50 | KFCV |
| |
| Ammad-ud-din | cwKBMF | GDSC, CTRPv1, AMLCLP | MAGE, DD | IC50, AUC | KFCV, CDV |
| |
| Wang | SRMF | GDSC, CCLE | MAGE, DD | AA, IC50 | KFCV |
| |
| Suphavilai | CaDRReS | GDSC, CCLE, HNCCLP | MAGE | IC50 | KFCV |
| |
| Yang | Macau | GDSC | MAGE, DT | IC50 | KFCV |
| |
| Guan | WGRMF | GDSC, CCLE | MAGE, DD | IC50, AA | KFCV | - | |
| Moughari & Eslahchi (2020) | ADRML | GDSC, CCLE | MAGE, Mu, CNV, DD, DT | IC50 | KFCV |
| |
| Emdadi & Eslahchi (2020) | DSPLMF | GDSC, CCLE | MAGE, Mu, CNV, DD | Sen/Res | KFCV |
| |
| Emdadi & Eslahchi (2021) | Auto-HMM-LMF | GDSC, CCLE | MAGE, CNV, Mu, TT, DD | Sen/Res | KFCV |
| |
| Network Representation Learning-Based Models | |||||||
| Yang | NRL2DRP | GDSC | Mu, CNV, DM | Sen/Res | KFCV |
| |
| Yu | DREMO | GDSC, CCLE | MAGE, Mu, CNV, DD, DT | Sen/Res | KFCV |
| |
| Network Propagation-Based Method | Zhang | HNMDRP | GDSC | MAGE, DD | Sen/Res | LOOCV |
|
AA = activity area; AUC = area under curve; CDSS = custom drug sensitivity score; CDV = cross-dataset validation; CNV = copy number variation; DD = drug descriptors; DM = DNA methylation; ES = exome sequencing; MAGE = microarray gene expression; KFCV = K-fold cross validation; LOCO = leave-one-compound-out; LOOCV = leave-one-out cross validation; LOTO = leave-one-tissue-out; LPOCV = leave-pair-out cross validation; miRNA = micro-RNA; MS = microsatellite; Mu = mutation; OOBP = out-of-bag prediction; PA = protein abundance; Res/Non-Res = responder/non-responder; RSGE = RNA-seq gene expression; RRSSCV = repeated random sub-sampling cross validation; Sen/Res = sensitive/resistance; SNP = single nucleotide polymorphism; TT = tissue type; TVC = tumor volume changes; V1 = viability at 1 μm.
Figure 2Artificial neural networks in drug response prediction can be configured as (A) single drug learning (SDL) or (B, C) multi-drug learning (MDL). SDL ANNs learns the response of a single drug for a sample using omics data, whereas MDL ANN learns the response to multiple drugs, which can have (B) multiple outputs, such that each output corresponds to the prediction for a specific drug, or (C) single output with the additional input of drug features.
Figure 3Two principal types of recommender systems (RS) for drug response prediction. (A) Neighbourhood-based RSs, which use cell line similarity and drug similarity measures to predict the response of a cell line to a drug. (B) Model-based RSs for DRP, which typically use matrix factorization (left), in which the cell line and drugs are represented as vectors in a latent space (right). The response of a cell line to a drug is then modelled to be proportional to the length of each of the vectors, and a decreasing function of the angle between cell line vector and the drug vector.
Figure 4Different MDL model validation schemes. (A) Known cell line/new drug; (B) new cell line/known drug; (C) new cell line/new drug; (D) known cell line/known drug.