Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

2019-01-06

Yunsu Kim, Julian Schamper, Hermann Ney

arXiv_CL

arXiv_CL Sparse

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

We address for the first time unsupervised training for a translation task with hundreds of thousands of vocabulary words. We scale up the expectation-maximization (EM) algorithm to learn a large translation table without any parallel text or seed lexicon. First, we solve the memory bottleneck and enforce the sparsity with a simple thresholding scheme for the lexicon. Second, we initialize the lexicon training with word classes, which efficiently boosts the performance. Our methods produced promising results on two large-scale unsupervised translation tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.01577

PDF

http://arxiv.org/pdf/1901.01577

Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments