Faster and More Accurate Trace-based Policy Evaluation via Overall Target Error Meta-Optimization

2019-05-25

Mingde Zhao, Ian Porada, Sitao Luan, Xiaowen Chang, Doina Precup

arXiv_AI

arXiv_AI Optimization

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

To improve the speed and accuracy of the trace based policy evaluation method TD({\lambda}), under appropriate assumptions, we derive and propose an off-policy compatible method of meta-learning state-based {\lambda}’s online with efficient incremental updates. Furthermore, we prove the derived bias-variance tradeoff minimization method, with slight adjustments, is equivalent to minimizing the overall target error in terms of state based {\lambda}’s. In experiments, the method shows significantly better performance when compared to the existing method and the baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.11439

PDF

http://arxiv.org/pdf/1904.11439

Faster and More Accurate Trace-based Policy Evaluation via Overall Target Error Meta-Optimization

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments