PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Perttu H„m„l„inen,Amin Babadi,Xiaoxiao Ma,Jaakko Lehtinen

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Length: 14:09

21 Sep 2020

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress and may make the algorithm prone to getting stuck in local optima. Drawing inspiration from CMA-ES, a black-box evolutionary optimization method designed for robustness in similar situations, we propose PPO-CMA, a proximal policy optimization approach that adaptively expands the exploration variance to speed up progress. With only minor changes to PPO, our algorithm considerably improves performance in Roboschool continuous control benchmarks. Our results also show that PPO-CMA, as opposed to PPO, is significantly less sensitive to the choice of hyperparameters, allowing one to use it in complex movement optimization tasks without requiring tedious tuning.

Tags:

sps conference

mlsp 2020

virtual workshop

mlsp 2020 workshop

September 2020

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Perttu H„m„l„inen,Amin Babadi,Xiaoxiao Ma,Jaakko Lehtinen

Value-Added Bundle(s) Including this Product

MLSP 2020 Virtual Conference - Presentation Videos Product Bundle

More Like This

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

IEEE ICASSP 2024, 1 4-19 April 2024, Seoul, Korea. Conference Presentation Videos Bundle

ICIP 2022, October 16-19, 2022, Bordeaux, France - Presentation Videos Product Bundle

Join the IEEE Signal Processing Society