A Short Survey on Sense-Annotated Corpora for Diverse Languages and Resources

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

Large sense-annotated datasets are increasingly necessary for training deep supervised systems in word sense disambiguation. However, gathering high-quality sense-annotated data for as many instances as possible is a laborious and expensive task. This has led to the proliferation of automatic and semi-automatic methods for overcoming the so-called knowledge-acquisition bottleneck. In this short survey we present an overview of currently available sense-annotated corpora, both manually and (semi)automatically constructed, for diverse languages and lexical resources (i.e. WordNet, Wikipedia, BabelNet). General statistics and specific features of each sense-annotated dataset are also provided.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.04744

PDF

http://arxiv.org/pdf/1802.04744

A Short Survey on Sense-Annotated Corpora for Diverse Languages and Resources

Abstract

Abstract (translated by Google)

URL

PDF

Comments