| Literature DB >> 30961594 |
Chao Yu1,2, Jiming Liu3, Hongyi Zhao4.
Abstract
BACKGROUND: Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified beforehand to indicate the goal of tasks. However, there is usually no explicit information regarding the reward function in medical records. It is then necessary to consider an approach whereby the reward function can be learned from a set of presumably optimal treatment trajectories using retrospective real medical data. This paper applies inverse RL in inferring the reward functions that clinicians have in mind during their decisions on weaning of mechanical ventilation and sedative dosing in Intensive Care Units (ICUs).Entities:
Keywords: Intensive care units; Inverse learning; Mechanical ventilation; Reinforcement learning; Sedative dosing
Mesh:
Substances:
Year: 2019 PMID: 30961594 PMCID: PMC6454602 DOI: 10.1186/s12911-019-0763-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Example trajectories of three vital signs (Heart Rate, SpO2 and Respiratory Rate) after preprocessing. a Heart Rate b Sp0 2c Respiratory Rate
Fig. 2The process of IBL
Weight vectors for different RL policies
| Policy | Weight of reward function |
|---|---|
|
| [1/7,1/7,1/7,1/7,1/7,1/7,1/7] |
|
| [0.14,0.24,0.15,0.19,0.07,0.07,0.14] |
|
| [0.08,0.17,0.16,0.18,0.29,0.10,0.02] |
|
| [0.07,0.19,0.12,0.21,0.26,0.04,0.11] |
Fig. 3Convergence of using FQI-GBDT and the inverse FQI-GBDT methods
Fig. 4The convergence of P(O|R) using different initial values of the weights
The correctness of learned polices using RL and IRL methods in the test data set
| Policy | Overall Action | Ventilation | Sedative |
|---|---|---|---|
|
| 53.9 | 99.7 | 54.2 |
|
| 53.5 | 99.6 | 53.9 |
|
| 23.5 | 45.7 | 51.0 |
|
| 14.1 | 35.5 | 39.1 |
|
| 17.2 | 34.9 | 54.1 |
The correctness of sedative dosing polices using RL and IRL methods in the test data set
| Policy | Expert Data | Ordinary Single Intubation Data | Multiple Intubation Data |
|---|---|---|---|
|
| 44.5 | 48.5 | 63.4 |
|
| 44.4 | 48.4 | 62.8 |
Fig. 5Comparison of feature importance using different RL and IRL policies
Fig. 6Feature importance using π
Fig. 7Feature importance using π