| Literature DB >> 30728403 |
Ning Liu1, Ying Liu2, Brent Logan3, Zhiyuan Xu4, Jian Tang4,5, Yanzhi Wang1.
Abstract
This paper presents the deep reinforcement learning (DRL) framework to estimate the optimal Dynamic Treatment Regimes from observational medical data. This framework is more flexible and adaptive for high dimensional action and state spaces than existing reinforcement learning methods to model real-life complexity in heterogeneous disease progression and treatment choices, with the goal of providing doctors and patients the data-driven personalized decision recommendations. The proposed DRL framework comprises (i) a supervised learning step to predict expert actions, and (ii) a deep reinforcement learning step to estimate the long-term value function of Dynamic Treatment Regimes. Both steps depend on deep neural networks. As a key motivational example, we have implemented the proposed framework on a data set from the Center for International Bone Marrow Transplant Research (CIBMTR) registry database, focusing on the sequence of prevention and treatments for acute and chronic graft versus host disease after transplantation. In the experimental results, we have demonstrated promising accuracy in predicting human experts' decisions, as well as the high expected reward function in the DRL-based dynamic treatment regimes.Entities:
Mesh:
Year: 2019 PMID: 30728403 PMCID: PMC6365640 DOI: 10.1038/s41598-018-37142-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Accuracies on predicting experts’ treatment for initial conditioning and GVHD prophylaxis.
Figure 2Accuracies on predicting experts’ treatment for acute GVHD.
Figure 3Accuracy results on predicting experts’ treatment for chronic GVHD at 100 days, 6 months, 1 year, and 2 years.
Reward comparison among the proposed DQN method, one-size-fit-all, random forest, and experts’ treatment.
| Treatments | Method | Reward | 95% Confidence Interval |
|---|---|---|---|
| AGVHD | DQN | 0.717 | (0.683, 0.729) |
| one-size-fit-all | 0.693 | (0.659, 0.705) | |
| Random forest | 0.677 | (0.666, 0.703) | |
| experts’ treatment | 0.673 | (0.663, 0.694) | |
| CGVHD | DQN | 0.706 | (0.678, 0.722) |
| one-size-fit-all | 0.684 | (0.671, 0.712) | |
| Random forest | 0.672 | (0.663, 0.713) | |
| experts’ treatment | 0.671 | (0.661, 0.697) |
Figure 4Comparison results between the proposed DRL method and the baseline (details in the context) for acute GVHD treatment.
Figure 5Comparison results between the proposed DRL method and the baseline (details in the context) for chronic GVHD treatment.
Figure 6Histogram of Patient Ages in the Data Set of Interests.
Matching Information of Patients and Donors in the Data Set of Interests.
| Identical Sibling | Other Relative | URD Well Matched | URD Partially Matched | URD Mismatched | Other |
|---|---|---|---|---|---|
| 3877 | 451 | 686 | 433 | 173 | 401 |
Figure 7The proposed DRL/DQN framework for prevention and treatment of GVHD, as well as initial conditioning.