papers AI Learner
The Github is limit! Click to go to the new site.

Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

2019-03-04
Hossein Aboutalebi, Doina Precup, Tibor Schuster

Abstract

The stochastic multi-armed bandit problem is a well-known model for studying the exploration-exploitation trade-off. It has significant possible applications in adaptive clinical trials, which allow for dynamic changes in the treatment allocation probabilities of patients. However, most bandit learning algorithms are designed with the goal of minimizing the expected regret. While this approach is useful in many areas, in clinical trials, it can be sensitive to outlier data, especially when the sample size is small. In this paper, we define and study a new robustness criterion for bandit problems. Specifically, we consider optimizing a function of the distribution of returns as a regret measure. This provides practitioners more flexibility to define an appropriate regret measure. The learning algorithm we propose to solve this type of problem is a modification of the BESA algorithm [Baransi et al., 2014], which considers a more general version of regret. We present a regret bound for our approach and evaluate it empirically both on synthetic problems as well as on a dataset from the clinical trial literature. Our approach compares favorably to a suite of standard bandit algorithms.

Abstract (translated by Google)
URL

http://arxiv.org/abs/1903.01026

PDF

http://arxiv.org/pdf/1903.01026


Similar Posts

Comments