Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Learning parametric policies and transition probability models of markov decision processes from data.

Literature DB >> 33716408

Learning parametric policies and transition probability models of markov decision processes from data.

Tingting Xu¹, Henghui Zhu¹, Ioannis Ch Paschalidis².

Abstract

We consider the problem of estimating the policy and transition probability model of a Markov Decision Process from data (state, action, next state tuples). The transition probability and policy are assumed to be parametric functions of a sparse set of features associated with the tuples. We propose two regularized maximum likelihood estimation algorithms for learning the transition probability model and policy, respectively. An upper bound is established on the regret, which is the difference between the average reward of the estimated policy under the estimated transition probabilities and that of the original unknown policy under the true (unknown) transition probabilities. We provide a sample complexity result showing that we can achieve a low regret with a relatively small amount of training samples. We illustrate the theoretical results with a healthcare example and a robot navigation experiment.

Entities: Chemical

Keywords: Learning Transition Dynamics; Markov Decision Processes; Maximum likelihood estimation; Policy Learning; Regularization

Year: 2020 PMID： 33716408 PMCID： PMC7944408 DOI： 10.1016/j.ejcon.2020.04.003

Source DB: PubMed Journal: Eur J Control ISSN： 0947-3580 Impact factor: 2.395

Keyword Cloud
References

4 in total

Learning parametric policies and transition probability models of markov decision processes from data.

1. Personalized Diabetes Management Using Electronic Medical Records.

2. Multiplicative Forests for Continuous-Time Processes.

3. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks.

4. Longitudinal modeling of glaucoma progression using 2-dimensional continuous-time hidden Markov model.