ETNLP: A Toolkit for Extraction, Evaluation and Visualization of Pre-trained Word Embeddings

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

In this paper, we introduce a comprehensive toolkit, ETNLP, which can evaluate, extract, and visualize multiple sets of pre-trained word embeddings. First, for evaluation, ETNLP analyses the quality of pre-trained embeddings based on an input word analogy list. Second, for extraction ETNLP provides a subset of the embeddings to be used in the downstream NLP tasks. Finally, ETNLP has a visualization module which is for exploring the embedded words interactively. We demonstrate the effectiveness of ETNLP on our pre-trained word embeddings in Vietnamese. Specifically, we create a large Vietnamese word analogy list to evaluate the embeddings. We then utilize the pre-trained embeddings for the name entity recognition (NER) task in Vietnamese and achieve the new state-of-the-art results on a benchmark dataset for the NER task. A video demonstration of ETNLP is available at this https URL. The source code and data are available at https: //github.com/vietnlp/etnlp.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.04433

PDF

https://arxiv.org/pdf/1903.04433

ETNLP: A Toolkit for Extraction, Evaluation and Visualization of Pre-trained Word Embeddings

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments