papers AI Learner
The Github is limit! Click to go to the new site.

Faster and More Accurate Trace-based Policy Evaluation via Overall Target Error Meta-Optimization

2019-05-25
Mingde Zhao, Ian Porada, Sitao Luan, Xiaowen Chang, Doina Precup

Abstract

To improve the speed and accuracy of the trace based policy evaluation method TD({\lambda}), under appropriate assumptions, we derive and propose an off-policy compatible method of meta-learning state-based {\lambda}’s online with efficient incremental updates. Furthermore, we prove the derived bias-variance tradeoff minimization method, with slight adjustments, is equivalent to minimizing the overall target error in terms of state based {\lambda}’s. In experiments, the method shows significantly better performance when compared to the existing method and the baselines.

Abstract (translated by Google)
URL

http://arxiv.org/abs/1904.11439

PDF

http://arxiv.org/pdf/1904.11439


Similar Posts

Comments