Abstract
We explore the challenge of action prediction from textual descriptions of scenes, a testbed to approximate whether text inference can be used to predict upcoming actions. As a case of study, we consider the world of the Harry Potter fantasy novels and inferring what spell will be cast next given a fragment of a story. Spells act as keywords that abstract actions (e.g. ‘Alohomora’ to open a door) and denote a response to the environment. This idea is used to automatically build HPAC, a corpus containing 82,836 samples and 85 actions. We then evaluate different baselines. Among the tested models, an LSTM-based approach obtains the best performance for frequent actions and large scene descriptions, but approaches such as logistic regression behave well on infrequent actions.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1905.11037