IEEE Members: $11.00
Non-members: $15.00Length: 58:52
Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning practices. However, theoretical understandings on its non-asymptotic sample complexity remain unsatisfactory, despite significant recent efforts. In this talk, we first show a tight sample complexity bound of Q-learning in the single-agent setting, together with a matching lower bound to establish its minimax sub-optimality. We then show how federated versions of Q-learning allow collaborative learning using data collected by multiple agents without central sharing, where an importance averaging scheme is introduced to unveil the blessing of heterogeneity.