papers AI Learner
The Github is limit! Click to go to the new site.

Ancient-Modern Chinese Translation with a Large Training Dataset

2018-08-11
Dayiheng Liu, Jiancheng Lv, Kexin Yang, Qian Qu

Abstract

Ancient Chinese brings the wisdom and spirit culture of the Chinese nation. Automatically translation from ancient Chinese to modern Chinese helps to inherit and carry forward the quintessence of the ancients. In this paper, we propose an Ancient-Modern Chinese clause alignment approach and apply it to create a large scale Ancient-Modern Chinese parallel corpus which contains about 1.24M bilingual pairs. To our best knowledge, this is the first large high-quality Ancient-Modern Chinese dataset. Furthermore, we train the SMT and various NMT based models on this dataset and provide a strong baseline for this task

Abstract (translated by Google)
URL

https://arxiv.org/abs/1808.03738

PDF

https://arxiv.org/pdf/1808.03738


Comments

Content