Abstract
The Transformer translation model is easier to parallelize and provides better performance comparing with recurrent seq2seq models, which makes it popular among industry and research community. We implement Neutron in this work, including the Transformer model and several variants from most recent researches. It is easier to modify and provides comparable performance with interesting features while keep readability.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1903.07402