Correlated Multi-Armed Bandits With A Latent Random Source
Samarth Gupta, Gauri Joshi, Osman Yagan
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 13:48
Multi-armed bandit models are widely studied sequential decision-making problems that exemplify the exploration-exploitation trade-off. We study a novel correlated multi-armed bandit model where the rewards obtained from the arms are functions of a common latent random variable. We propose and analyze the performance of the C-UCB algorithm that leverages the correlations between arms to reduce the cumulative regret (i.e., to increase the total reward obtained after T rounds). Unlike the standard UCB algorithm that pulls all sub-optimal arms O(log T) times, the C-UCB algorithm takes only O(1) times to identify that some arms, which we refer to as non-competitive arms, are optimal. Thus, we effectively reduce a K-armed bandit problem to a C+1-armed bandit problem with C