papers AI Learner
The Github is limit! Click to go to the new site.

Visual Entailment Task for Visually-Grounded Language Learning

2019-01-21
Ning Xie, Farley Lai, Derek Doran, Asim Kadav

Abstract

We introduce a new inference task - Visual Entailment (VE) - which differs from traditional Textual Entailment (TE) tasks whereby a premise is defined by an image, rather than a natural language sentence as in TE tasks. A novel dataset SNLI-VE (publicly available at this https URL) is proposed for VE tasks based on the Stanford Natural Language Inference corpus and Flickr30k. We introduce a differentiable architecture called the Explainable Visual Entailment model (EVE) to tackle the VE problem. EVE and several other state-of-the-art visual question answering (VQA) based models are evaluated on the SNLI-VE dataset, facilitating grounded language understanding and providing insights on how modern VQA based models perform.

Abstract (translated by Google)
URL

https://arxiv.org/abs/1811.10582

PDF

https://arxiv.org/pdf/1811.10582


Similar Posts

Comments