| Literature DB >> 30961606 |
Chao Yu1,2, Yinzhao Dong3, Jiming Liu4, Guoqi Ren3.
Abstract
BACKGROUND: Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. However, existing studies simply apply naive RL algorithms in discovering optimal treatment strategies for a targeted problem. This kind of direct applications ignores the abundant causal relationships between treatment options and the associated outcomes that are inherent in medical domains.Entities:
Keywords: Causal factors; Dynamic treatment regime; HIV; Reinforcement learning
Mesh:
Year: 2019 PMID: 30961606 PMCID: PMC6454675 DOI: 10.1186/s12911-019-0755-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
The Causal Policy Gradient (CPG) Algorithm
| Algorithm 1: The CPG Algorithm |
|---|
| Function CPG |
| Input: a differentiable policy parameterizations |
| Initialize policy parameter |
| Repeat forever: |
| Define event A and event B; |
| Generate an episode |
| For each step of the episode t=0,...,T-1: |
| G ← average future return from step t; |
| |
| |
| End for |
| Return |
| End CPG |
Different equilibrium points of the six cells
| Equilibrium point | T1 | T2 | T | T | V | E |
|---|---|---|---|---|---|---|
| The healthy, unstable state | 106 | 3198 | 0 | 0 | 0 | 10 |
| The healthy, locally stable state | 967839 | 621 | 76 | 6 | 415 | 353108 |
| The non-healthy, locally stable state | 163573 | 5 | 11945 | 46 | 63919 | 24 |
Fig. 1The medication regimen a before learning; b-c during learning; and d after learning
Fig. 2The evolution of the six types of cells for the first patient (i.e., before learning). a-f corresponds to the continuous change of , E and V cells, respectively
Fig. 3The evolution of the six types of cells after learning over 300 patients. a-f corresponds to the continuous change of , E and V cells, respectively
Fig. 4The evolution of reward of a the first patient; and b the 300th patient
Fig. 5a Comparison of the performance of direct PG and CPG algorithm; b Dynamic evolution of causal factor C during learning
Definition of different causal factors
| Event | ||
|---|---|---|
| A | adding enzyme RTI or PI | adding enzyme RTI or PI |
| ¬A | without adding enzyme | without adding enzyme |
| B |
Fig. 6a Comparison of reward values for causal algorithms with different causal factors; b The optimum strategy in HIV treatment