Literature DB >> 33716408

Learning parametric policies and transition probability models of markov decision processes from data.

Tingting Xu1, Henghui Zhu1, Ioannis Ch Paschalidis2.   

Abstract

We consider the problem of estimating the policy and transition probability model of a Markov Decision Process from data (state, action, next state tuples). The transition probability and policy are assumed to be parametric functions of a sparse set of features associated with the tuples. We propose two regularized maximum likelihood estimation algorithms for learning the transition probability model and policy, respectively. An upper bound is established on the regret, which is the difference between the average reward of the estimated policy under the estimated transition probabilities and that of the original unknown policy under the true (unknown) transition probabilities. We provide a sample complexity result showing that we can achieve a low regret with a relatively small amount of training samples. We illustrate the theoretical results with a healthcare example and a robot navigation experiment.

Entities:  

Keywords:  Learning Transition Dynamics; Markov Decision Processes; Maximum likelihood estimation; Policy Learning; Regularization

Year:  2020        PMID: 33716408      PMCID: PMC7944408          DOI: 10.1016/j.ejcon.2020.04.003

Source DB:  PubMed          Journal:  Eur J Control        ISSN: 0947-3580            Impact factor:   2.395


  4 in total

1.  Personalized Diabetes Management Using Electronic Medical Records.

Authors:  Dimitris Bertsimas; Nathan Kallus; Alexander M Weinstein; Ying Daisy Zhuo
Journal:  Diabetes Care       Date:  2016-12-05       Impact factor: 19.112

2.  Multiplicative Forests for Continuous-Time Processes.

Authors:  Jeremy C Weiss; Sriraam Natarajan; David Page
Journal:  Adv Neural Inf Process Syst       Date:  2012

3.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks.

Authors:  Edward Choi; Mohammad Taha Bahadori; Andy Schuetz; Walter F Stewart; Jimeng Sun
Journal:  JMLR Workshop Conf Proc       Date:  2016-12-10

4.  Longitudinal modeling of glaucoma progression using 2-dimensional continuous-time hidden Markov model.

Authors:  Yu-Ying Liu; Hiroshi Ishikawa; Mei Chen; Gadi Wollstein; Joel S Schumnan; James M Rehg
Journal:  Med Image Comput Comput Assist Interv       Date:  2013
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.