papers AI Learner
The Github is limit! Click to go to the new site.

Trust Region-Guided Proximal Policy Optimization

2019-01-29
Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan

Abstract

Model-free reinforcement learning relies heavily on a safe yet exploratory policy search. Proximal policy optimization (PPO) is a prominent algorithm to address the safe search problem, by exploiting a heuristic clipping mechanism motivated by a theoretically-justified “trust region” guidance. However, we found that the clipping mechanism of PPO could lead to a lack of exploration issue. Based on this finding, we improve the original PPO with an adaptive clipping mechanism guided by a “trust region” criterion. Our method, termed as Trust Region-Guided PPO (TRPPO), improves PPO with more exploration and better sample efficiency, while maintains the safe search property and design simplicity of PPO. On several benchmark tasks, TRPPO significantly outperforms the original PPO and is competitive with several state-of-the-art methods.

Abstract (translated by Google)
URL

http://arxiv.org/abs/1901.10314

PDF

http://arxiv.org/pdf/1901.10314


Similar Posts

Comments