Abstract
Large sense-annotated datasets are increasingly necessary for training deep supervised systems in word sense disambiguation. However, gathering high-quality sense-annotated data for as many instances as possible is a laborious and expensive task. This has led to the proliferation of automatic and semi-automatic methods for overcoming the so-called knowledge-acquisition bottleneck. In this short survey we present an overview of currently available sense-annotated corpora, both manually and (semi)automatically constructed, for diverse languages and lexical resources (i.e. WordNet, Wikipedia, BabelNet). General statistics and specific features of each sense-annotated dataset are also provided.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1802.04744