ENSEMBLE GRAPH Q-LEARNING FOR LARGE SCALE NETWORKS
Talha Bozkus (University of Southern California); Urbashi Mitra (USC)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
The optimization of large-scale networks such as finding the optimal control strategies through cost minimization is challenged by large state spaces. For networks that can be modeled via Markov Decision Processes (MDP), a previously proposed graph reduction strategy is used in conjunction with a novel ensemble learning method based on Q-learning algorithm for policy optimization in unknown environments. By exploiting the structural properties of the network, several structurally related Markov chains are created and these multiple chains are sampled to learn multiple policies which are fused. The convergence of the learning approach is analyzed and the ensemble learning strategy is shown to inherit the properties of classical Q-learning. Numerical results show that the proposed algorithm achieves a reduction of 60% with respect to the policy error and 80% for the runtime versus other state-of-the-art Q-learning algorithms.