| Literature DB >> 24302771 |
Baqun Zhang1, Anastasios A Tsiatis, Eric B Laber, Marie Davidian.
Abstract
A dynamic treatment regime is a list of sequential decision rules for assigning treatment based on a patient's history. Q- and A-learning are two main approaches for estimating the optimal regime, i.e., that yielding the most beneficial outcome in the patient population, using data from a clinical trial or observational study. Q-learning requires postulated regression models for the outcome, while A-learning involves models for that part of the outcome regression representing treatment contrasts and for treatment assignment. We propose an alternative to Q- and A-learning that maximizes a doubly robust augmented inverse probability weighted estimator for population mean outcome over a restricted class of regimes. Simulations demonstrate the method's performance and robustness to model misspecification, which is a key concern.Entities:
Keywords: A-learning; Double robustness; Outcome regression; Propensity score; Q-learning
Year: 2013 PMID: 24302771 PMCID: PMC3843953 DOI: 10.1093/biomet/ast014
Source DB: PubMed Journal: Biometrika ISSN: 0006-3444 Impact factor: 2.445