DIRECTED EXPLORATION VIA LEARNABLE PROBABILITY DISTRIBUTION FOR RANDOM ACTION SELECTION
Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 06:58
We propose a method for optimizing the random action selection of the ε-greedy policy to facilitate more efficient exploration of an environment by a reinforcement learning agent. Our directed ε-greedy policy selects actions with a biased probability where some actions are more likely to be selected than others. The probability distribution used for selecting random actions is the one that tends to lead to actions which increase the agent's uncertainty about its environment. The agent's uncertainty is measured by the error in self-supervised prediction of future environment states at the pixel level, given the previous states and the probabilities of next actions. By propagating the reverse gradient from the future state predictor model to a model generating probability distributions from random noise we create an end-to-end trainable model which learns to generate such action probability distributions for ε-greedy, so as to facilitate directed exploration of the environment. We evaluate our method in two environments: Minecraft and Super Mario Bros. The directed ε-greedy policy achieves an efficient curiosity-driven exploration without the use of any intrinsic reward function, outperforming vanilla ε-greedy exploration, softmax exploration and exploration using intrinsic rewards.