Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Performance Guarantees for Policy Learning.

Literature DB >> 35321441

Performance Guarantees for Policy Learning.

Abstract

This article gives performance guarantees for the regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes when the data are generated from a fixed data distribution that does not change with sample size, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also present a result giving guarantees on the regret decay of policy estimators for the case that the policy falls within a restricted class and the data are generated from local perturbations of a fixed distribution, where this guarantee is uniform in the direction of the local perturbation. Finally, we give a result from the classification literature that shows that faster regret decay is possible via plug-in estimation provided a margin condition holds. Three examples are considered. In these examples, the regret is expressed in terms of either the mean value or the median value, and the number of possible actions is either two or finitely many.

Entities: Chemical

Keywords: individualized treatment rules; personalized medicine; policy learning; precision medicine

Year: 2020 PMID： 35321441 PMCID： PMC8939837 DOI： 10.1214/19-aihp1034

Source DB: PubMed Journal: Ann I H P Probab Stat ISSN： 0246-0203 Impact factor: 1.851

13 in total

Performance Guarantees for Policy Learning.

1. Targeted maximum likelihood estimation of natural direct effects.

2. A doubly robust censoring unbiased transformation.

3. Comment.

4. TARGETED SEQUENTIAL DESIGN FOR TARGETED LEARNING INFERENCE OF THE OPTIMAL TREATMENT RULE AND ITS MEAN REWARD.

5. Interactive Q-learning for Quantiles.

6. Targeted Learning of the Mean Outcome under an Optimal Dynamic Treatment Rule.

7. Super-Learning of an Optimal Dynamic Treatment Rule.

8. Estimating Individualized Treatment Rules Using Outcome Weighted Learning.

9. Estimating Optimal Treatment Regimes from a Classification Perspective.

10. Statistical issues and limitations in personalized medicine research with clinical trials.

1. Rejoinder: Optimal individualized decision rules using instrumental variable methods.