Literature DB >> 33737759

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting.

Haoyu Chen1, Wenbin Lu1, Rui Song1.   

Abstract

Online decision-making problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit framework with a linear reward model. The ε-greedy policy is adopted to address the classic exploration-and-exploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic normality. Based on the properties of the parameter estimators, we further show that the in-sample inverse propensity weighted value estimator is asymptotically normal. We illustrate our results using simulations and an application to a news article recommendation dataset from Yahoo!.

Entities:  

Keywords:  epsilon-greedy; inverse propensity weighted estimator; model misspecification; online decision-making; statistical inference

Year:  2020        PMID: 33737759      PMCID: PMC7962379          DOI: 10.1080/01621459.2020.1770098

Source DB:  PubMed          Journal:  J Am Stat Assoc        ISSN: 0162-1459            Impact factor:   5.033


  6 in total

1.  Reinforcement learning design for cancer clinical trials.

Authors:  Yufan Zhao; Michael R Kosorok; Donglin Zeng
Journal:  Stat Med       Date:  2009-11-20       Impact factor: 2.373

2.  A robust method for estimating optimal treatment regimes.

Authors:  Baqun Zhang; Anastasios A Tsiatis; Eric B Laber; Marie Davidian
Journal:  Biometrics       Date:  2012-05-02       Impact factor: 2.571

3.  Concordance-Assisted Learning for Estimating Optimal Individualized Treatment Regimes.

Authors:  Caiyun Fan; Wenbin Lu; Rui Song; Yong Zhou
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2016-10-31       Impact factor: 4.488

4.  TARGETED SEQUENTIAL DESIGN FOR TARGETED LEARNING INFERENCE OF THE OPTIMAL TREATMENT RULE AND ITS MEAN REWARD.

Authors:  Antoine Chambaz; Wenjing Zheng; Mark J van der Laan
Journal:  Ann Stat       Date:  2017-12-15       Impact factor: 4.028

5.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning.

Authors:  Yingqi Zhao; Donglin Zeng; A John Rush; Michael R Kosorok
Journal:  J Am Stat Assoc       Date:  2012-09-01       Impact factor: 5.033

6.  Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions.

Authors:  Baqun Zhang; Anastasios A Tsiatis; Eric B Laber; Marie Davidian
Journal:  Biometrika       Date:  2013       Impact factor: 2.445

  6 in total
  2 in total

1.  Statistical Inference with M-Estimators on Adaptively Collected Data.

Authors:  Kelly W Zhang; Lucas Janson; Susan A Murphy
Journal:  Adv Neural Inf Process Syst       Date:  2021-12

2.  A single-index model with a surface-link for optimizing individualized dose rules.

Authors:  Hyung Park; Eva Petkova; Thaddeus Tarpey; R Todd Ogden
Journal:  J Comput Graph Stat       Date:  2021-06-21       Impact factor: 1.884

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.