CASCADING BANDIT UNDER DIFFERENTIAL PRIVACY

Kun Wang, Shuai Li, Jing Dong, Baoxiang Wang

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:09:15

13 May 2022

This paper studies \emph{differential privacy (DP)} and \emph{local differential privacy (LDP)} in cascading bandits. Under DP, we propose a UCB-based algorithm which guarantees $\epsilon$-indistinguishability and a regret of $\mathcal{O}((\frac{\log T}{\epsilon})^{1+\xi})$ for an arbitrarily small $\xi$. This result significantly improves $O(\frac{\log^3 T}{\epsilon})$ in the previous work. Under ($\epsilon$,$\delta$)-LDP, we relax the $K^2$ dependence through the tradeoff between privacy budget $\epsilon$ and error probability $\delta$, and obtain a regret of $\mathcal{O}(\frac{K\log (1/\delta) \log T}{\epsilon^2})$, where $K$ is the size of the arm subset. This result holds for both Gaussian mechanism and Laplace mechanism by analyses on the composition. Extensive experiments corroborate our theoretic findings.

Tags:

online learning

multi-armed bandit

differential privacy

cascading bandit