Abstract: In this paper, we study the convergence properties of the natural gradient methods. By reviewing the mathematical condition for the equivalence between the Fisher information matrix and the ...
Abstract: Stochastic gradient descent (SGD) and its many variants are widely used algorithms for training deep neural networks (DNN). However, SGD has some unavoidable drawbacks, including vanishing ...