NETWORKED POLICY GRADIENT PLAY IN MARKOV POTENTIAL GAMES
Sarper Aydin (Texas A&M University); Ceyhun A Eksin (Texas A&M University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
We propose a networked policy gradient play algorithm for solving Markov potential games. In a Markov game, each agent has a reward function that depends on the actions of all the agents and a common dynamic state. A differentiable Markov potential game admits a potential value function that has local gradients equal to the gradients of agents' local value functions. In the proposed algorithm, agents use parameterized policies that depend on the state and other agents' policies. Agents use stochastic gradients and local parameter values received from their neighbors to update their policies. We show that the joint policy parameters converge to a first-order stationary point of a Markov potential game in expectation for general action and state spaces. Numerical results on the lake game exemplify the convergence of the proposed method.