Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 06:58
09 Jul 2020

We propose a method for optimizing the random action selection of the ε-greedy policy to facilitate more efficient exploration of an environment by a reinforcement learning agent. Our directed ε-greedy policy selects actions with a biased probability where some actions are more likely to be selected than others. The probability distribution used for selecting random actions is the one that tends to lead to actions which increase the agent's uncertainty about its environment. The agent's uncertainty is measured by the error in self-supervised prediction of future environment states at the pixel level, given the previous states and the probabilities of next actions. By propagating the reverse gradient from the future state predictor model to a model generating probability distributions from random noise we create an end-to-end trainable model which learns to generate such action probability distributions for ε-greedy, so as to facilitate directed exploration of the environment. We evaluate our method in two environments: Minecraft and Super Mario Bros. The directed ε-greedy policy achieves an efficient curiosity-driven exploration without the use of any intrinsic reward function, outperforming vanilla ε-greedy exploration, softmax exploration and exploration using intrinsic rewards.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00
  • SPS
    Members: $150.00
    IEEE Members: $250.00
    Non-members: $350.00