Linear Thompson Sampling Under Unknown Linear Constraints
Ahmadreza Moradipari, Mahnoosh Alizadeh, Christos Thrampoulidis
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 14:18
We study how adding unknown linear safety constraints affects the performance of Thompson Sampling in the linear stochastic bandit problem. The additional constraints must be met at each round in spite of uncertainty about the environment requiring that the learner acts conservatively in choosing her actions. In this setting, we propose Safe-LTS, the first safe Thompson Sampling based algorithm, and we prove that it achieves no-regret learning. We obtain regrets that have the same dependence on the total number of rounds (modulo logarithmic factors) as Safe-UCB, a recently proposed safe algorithm that uses the upper confidence bound principle. Finally, we provide numerical simulations that demonstrate the efficacy of our algorithm.