Abstract
Spoken language translation (SLT) is becoming more important in the increasingly globalized world, both from a social and economic point of view. It is one of the major challenges for automatic speech recognition (ASR) and machine translation (MT), driving intense research activities in these areas. While past research in SLT, due to technology limitations, dealt mostly with speech recorded under controlled conditions, today’s major challenge is the translation of spoken language as it can be found in real life. Considered application scenarios range from portable translators for tourists, lectures and presentations translation, to broadcast news and shows with live captioning. We would like to present PJIIT’s experiences in the SLT gained from the Eu-Bridge 7th framework project and the U-Star consortium activities for the Polish/English language pair. Presented research concentrates on ASR adaptation for Polish (state-of-the-art acoustic models: DBN-BLSTM training, Kaldi: LDA+MLLT+SAT+MMI), language modeling for ASR & MT (text normalization, RNN-based LMs, n-gram model domain interpolation) and statistical translation techniques (hierarchical models, factored translation models, automatic casing and punctuation, comparable and bilingual corpora preparation). While results for the well-defined domains (phrases for travelers, parliament speeches, medical documentation, movie subtitling) are very encouraging, less defined domains (presentation, lectures) still form a challenge. Our progress in the IWSLT TED task (MT only) will be presented, as well as current progress in the Polish ASR.
Abstract (translated by Google)
口头语言翻译(SLT)在日益全球化的世界变得越来越重要,无论从社会和经济的角度来看。这是自动语音识别(ASR)和机器翻译(MT)面临的主要挑战之一,推动了这些领域的深入研究。虽然过去在SLT方面的研究,由于技术的限制,主要是在受控条件下记录语音,今天的主要挑战是在现实生活中可以找到的口语翻译。考虑应用场景的范围从便携式翻译为游客,讲座和演示文稿翻译,广播新闻和节目与现场字幕。我们想介绍一下PJIIT在欧洲桥梁第七框架项目中获得的SLT的经验以及波兰语/英语对的U-Star联盟活动。提出的研究主要集中在ASR和MT(文本规范化,基于RNN的LM,语言规范化,语言规范化,语言规范化, n-gram模型域插值)和统计翻译技术(分层模型,因子翻译模型,自动套管和标点符号,可比较和双语语料库准备)。尽管定义明确的领域(旅行者的短语,议会的演讲,医疗文件,电影字幕)的结果是非常令人鼓舞的,但定义较少的领域(演讲,讲座)仍然是一个挑战。我们将在IWSLT TED任务(仅限MT)中介绍我们的进展,以及波兰ASR的最新进展。
URL
https://arxiv.org/abs/1511.07788