| Literature DB >> 35716038 |
Mingyang Liu1, Xiaotong Shen1, Wei Pan2.
Abstract
In precision medicine, the ultimate goal is to recommend the most effective treatment to an individual patient based on patient-specific molecular and clinical profiles, possibly high-dimensional. To advance cancer treatment, large-scale screenings of cancer cell lines against chemical compounds have been performed to help better understand the relationship between genomic features and drug response; existing machine learning approaches use exclusively supervised learning, including penalized regression and recommender systems. However, it would be more efficient to apply reinforcement learning to sequentially learn as data accrue, including selecting the most promising therapy for a patient given individual molecular and clinical features and then collecting and learning from the corresponding data. In this article, we propose a novel personalized ranking system called Proximal Policy Optimization Ranking (PPORank), which ranks the drugs based on their predicted effects per cell line (or patient) in the framework of deep reinforcement learning (DRL). Modeled as a Markov decision process, the proposed method learns to recommend the most suitable drugs sequentially and continuously over time. As a proof-of-concept, we conduct experiments on two large-scale cancer cell line data sets in addition to simulated data. The results demonstrate that the proposed DRL-based PPORank outperforms the state-of-the-art competitors based on supervised learning. Taken together, we conclude that novel methods in the framework of DRL have great potential for precision medicine and should be further studied.Entities:
Keywords: Proximal Policy Optimization; actor-critic methods; deep learning; precision medicine; recommender systems
Mesh:
Year: 2022 PMID: 35716038 PMCID: PMC9427729 DOI: 10.1002/sim.9491
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.497
Mathematical notations
| Meaning | Notation |
|---|---|
| Cell‐line |
|
| Number of training cell lines |
|
| Maximum number of drugs for each cell line | M |
| Feature vector of cell line |
|
| Drug |
|
| Number of drugs associated with cell line |
|
| Feature vector or binary features of a drug |
|
| Embedding vector of a drug |
|
| Feature vectors or binary indexes of drugs associated with cell line |
|
| Remaining drugs associated with cell line |
|
| Ground truth response score of a drug |
|
| Ground truth permutation (ranking list) for drugs associate with cell line |
|
| Original drug index of the |
|
| Ranking position of drug |
|
The superscript indicates the associate cell line , if it is missing, it can be applied to general case that is associated with cell line .
FIGURE 1The actor network
FIGURE 2The actor and critic networks
FIGURE 3Mean NDCG values by five‐fold cross‐validation (with 1 SD as error bars). (A) NDCG values using full CCLE data set; (B) NDCG values using full GDSC data set
FIGURE 4DNN and PPORank's performance using on the GDSC data
FIGURE 5Mean NDCG values by five‐fold cross‐validation on the GDSC data with different types of omic data features
FIGURE 6PPORank performance on GDSC with sequential data
FIGURE 7values with the full CCLE or GDSC data for different ; the error bars show one SD based on three‐fold cross‐validation. (A) with the CCLE data; (B) with the GDSC data
PPORank's recommendation rates of lapatinib for the TCGA patients with two different breast cancer subtypes
| Recommendation |
| mBRCA ( |
|---|---|---|
| lapatinib | 0.92 | 0.22 |
| lapatinib | 0.95 | 0.22 |
| lapatinib | 0.94 | 0.44 |
| lapatinib | 0.92 | 0.22 |
| lapatinib | 0.95 | 0.44 |
FIGURE 8Simulation setups: (A) The correlation matrix of the simulated cell lines features with 10 clusters; (B) the weight matrix with 10 clusters
FIGURE 9Performance of DNN and PPORank in terms of in two simulation scenarios. (A) Simulation scenario (1); (B) simulation scenario (3)
Mean NDCG (SD) in the primary simulations
| Scenario 1 | Scenario 2 | Scenario 3 | ||||
|---|---|---|---|---|---|---|
| n=1000 | n=10 000 | n=1000 | n=10 000 | n=1000 | n=10 000 | |
| EN | 0.793 (0.03) | 0.801 (0.02) | 0.593 (0.05) | 0.621 (0.04) | 0.890 (0.04) | 0.905 (0.03) |
| KRR | 0.774 (0.04) | 0.769 (0.07) | 0.532 (0.05) | 0.582 (0.05) | 0.882 (0.06) | 0.901 (0.04) |
| KRL | 0.783 (0.02) | 0.793 (0.01) | 0.625 (0.02) | 0.647 (0.03) | 0.920 (0.03) | 0.927 (0.02) |
| CaDRRes |
| 0.824 (0.02) | 0.632 (0.03) | 0.682 (0.02) | 0.922 (0.02) | 0.931 (0.02) |
| ppo‐w/o | 0.798 (0.07) | 0.811 (0.10) |
| 0.693 (0.01) | 0.940 (0.03) | 0.948 (0.04) |
| PPORank | 0.790 (0.06) |
| 0.651 (0.02) |
|
|
|
Note: The boldfaced values show the best performances.
Mean NDCG (SD) for the secondary simulations
|
|
| ||
|---|---|---|---|
| EN | 0.645 (0.08) | 0.679 (0.07) | |
| KRR | 0.641 (0.11) | 0.650 (0.10) | |
| KRL | 0.689 (0.09) | 0.707 (0.08) | |
| CaDRRes | 0.647 (0.11) | 0.689 (0.09) | |
| ppo‐w/o | 0.701 (0.11) | 0.713 (0.12) | |
| PPORank |
|
|
Note: The boldfaced values show the best performances.