UAV Local Path Planning Based on Improved Proximal Policy Optimization Algorithm
Jiahao xu (Nanjing University of Aeronautics and Astronautics); Xuefeng Yan (Nanjing University of Aeronautics and Astronautics ); Peng Cui (Dalian Naval Academy); Xinquan Wu (Nanjing University of Aeronautics and Astronautics); Lipeng Gu (Nanjing University of Aeronautics and Astronautics); Yan biao Niu (Nanjing University of Aeronautics and Astronautics)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Recently, more and more researchers have used deep reinforcement learning (DRL) to solve the UAV local path planning problem. However, existing DRL didn't consider the importance of recent experience for path planning, such as proximal policy optimization (PPO). Moreover, the Actor-Critic framework of PPO suffers from the problem of high variance. To address the above issues, we proposed a Delayed-policy-update PPO with a Prioritized Reply of Recent experience (DPPO-PR2) for local path planning. Firstly, we designed an adaptive parameter to calculate the probability of resampling. By limiting the range of resampling, the possibility of resampling the recent experience is increased. Secondly, a parameter experimentally is picked to make the Actor Network have fewer updates than the Critic Network. Finally, we verified our algorithm has better convergence results and faster execution in six test scenarios.