papers AI Learner
The Github is limit! Click to go to the new site.

Reward Potentials for Planning with Learned Neural Network Transition Models

2019-04-19
Buser Say, Scott Sanner, Sylvie Thiébaux

Abstract

Optimal planning with respect to learned neural network (NN) models in continuous action and state spaces using mixed-integer linear programming (MILP) is a challenging task for branch-and-bound solvers due to the poor linear relaxation of the underlying MILP model. For a given set of features, potential heuristics provide an efficient framework for computing bounds on cost (reward) functions. In this paper, we introduce a finite-time algorithm for computing an optimal potential heuristic for learned NN models. We then strengthen the linear relaxation of the underlying MILP model by introducing constraints to bound the reward function based on the precomputed reward potentials. Experimentally, we show that our algorithm efficiently computes reward potentials for learned NN models, and the overhead of computing reward potentials is justified by the overall strengthening of the underlying MILP model for the task of planning over long-term horizons.

Abstract (translated by Google)
URL

http://arxiv.org/abs/1904.09366

PDF

http://arxiv.org/pdf/1904.09366


Similar Posts

Comments