Gradient Descent Finds Global Minima of Deep Neural Networks

2019-05-28

Simon S. Du, Jason D. Lee, Haochuan Li, Liwei Wang, Xiyu Zhai

arXiv_AI

arXiv_AI CNN Gradient_Descent

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.03804

PDF

http://arxiv.org/pdf/1811.03804

Gradient Descent Finds Global Minima of Deep Neural Networks

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments