papers AI Learner
The Github is limit! Click to go to the new site.

A large annotated corpus for learning natural language inference

2015-08-21
Samuel R. Bowman, Gabor Angeli, Christopher Potts, Christopher D. Manning

Abstract

Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

Abstract (translated by Google)

理解蕴含和矛盾是理解自然语言的基础,推理和矛盾是语义表征发展的宝贵试验基础。然而,由于缺乏大规模的资源,这一领域的机器学习研究受到极大的限制。为了解决这个问题,我们介绍了斯坦福自然语言推理语料库,这是一个由人类基于图像字幕做一个新的基础任务的新的,免费提供的标签语句对集合。在570K对,它比所有其他类型的资源大两个数量级。规模的增加使得词法分类器能够胜过一些复杂的现有蕴涵模型,并且它允许基于神经网络的模型首次在自然语言推理基准上竞争性地执行。

URL

https://arxiv.org/abs/1508.05326

PDF

https://arxiv.org/pdf/1508.05326


Similar Posts

Comments