Abstract
Given an environment with continuous state spaces and discrete actions, we investigate using a Double Deep Q-learning Reinforcement Agent to find optimal policies using the LunarLander-v2 OpenAI gym environment.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1708.02378