Skip to main content

MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

Julius Ott (Infineon Technologies AG / Technical University Munich); Lorenzo Servadei (Infineon Technologies AG); Jose Arjona-Medina (Johannes Kepler University Linz); Enrico Rinaldi (University of Michigan); Gianfranco Mauro (Infineon Technologies AG); Daniela Sanchez Lopera (Infineon Technologies AG / Technical University Munich); Michael Stephan (Infineon Technologies AG ); Thomas Stadelmayer (Infineon Technologies AG); Avik Santra (Infineon Technologies AG); Robert Wille (Technical University of Munich)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t. convergence and peak performance by 26% on average.

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00