Abstract
Many hardware accelerators have been proposed to improve the computational efficiency of the inference process in deep neural networks (DNNs). However, off-chip memory accesses, being the most energy consuming operation in such architectures, limit the designs from achieving efficiency gains at the full potential. Towards this, we propose ROMANet, a methodology to investigate efficient dataflow patterns for reducing the number of the off-chip accesses. ROMANet adaptively determine the data reuse patterns for each convolutional layer of a network by considering the reuse factor of weights, input activations, and output activations. It also considers the data mapping inside off-chip memory to reduce row buffer misses and increase parallelism. Our experimental results show that ROMANet methodology is able to achieve up to 50% dynamic energy savings in state-of-the-art DNN accelerators.
Abstract (translated by Google)
URL
https://arxiv.org/abs/1902.10222