On Information Asymmetry In Online Reinforcement Learning
Ezra Tampubolon, Haris Ceribasic, Holger Boche
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 00:14:00
In this work, we study the system of two interacting non-cooperative Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which does not occur in an environment of general independent learners. Furthermore, we discuss the resulted post-learning policies, show that they are almost optimal in the underlying game sense, and provide numerical hints of almost welfare-optimal of the resulted policies.
Chairs:
Mingyi Hong