Generalized Linear Bandits With Safety Constraints
Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 17:40
The classical multi-armed bandit is a class of sequential decision making problems where selecting actions incurs costs that are sampled independently from an unknown underlying distribution. Bandit algorithms have many applications in safety critical systems, where several constraints must be respected during the run of the algorithm in spite of uncertainty about problem parameters. This paper formulates a generalized linear stochastic multi-armed bandit problem with generalized linear safety constraints that depend on an unknown parameter vector. In this setting, we propose a Safe UCB-GLM algorithm for which we provide general and problem-dependent regret bounds.