Byzantine-Resilient Decentralized Td Learning With Linear Function Approximation

Zhaoxian Wu, Han Shen, Tianyi Chen, Qing Ling

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 00:14:47

08 Jun 2021

This paper considers the policy evaluation problem in reinforcement learning with agents of a decentralized and directed network. The focus is on decentralized temporal-difference (TD) learning with linear function approximation in the presence of unreliable or even malicious agents, termed as Byzantine agents. In order to evaluate the quality of a fixed policy in a common environment, agents usually run decentralized TD($\lambda$) collaboratively. However, when some Byzantine agents behave adversarially, decentralized TD($\lambda$) is unable to learn an accurate linear approximation for the true value function. We propose a trimmed-mean based decentralized TD($\lambda$) algorithm to perform policy evaluation in this setting. We establish the finite-time convergence rate, as well as the asymptotic learning error that depends on the number of Byzantine agents. Numerical experiments corroborate the robustness of the proposed algorithm.

Chairs:

Marcelo Bruno

Tags:

signal processing society

IEEE icassp 2021

virtual conference

2021

sps

virtual conference icassp 2021

june 6-11 2021

icassp 2021