-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 11:22
As the field of deep learning has been rapidly growing, the optimization methods (optimizers) gain keen attention for efficiently training neural networks. While SGD exhibits practically favorable performance on various tasks, adaptive methods, such as ADAM, are also formulated to equip the gradient-based updating with adaptive scaling in a sophisticated way. In this paper, we propose a novel optimizer to integrate those two approaches of the adaptive method and SGD through assigning stochastic confidence weights to the gradient-based updating. We define statistical uncertainty of the gradients which is implicitly embedded in the adaptive scaling of ADAM, and then based on the uncertainty, naturally incorporate stochasticity into the optimizer as a bridge between SGD and ADAM. Thereby, the proposed optimizer, SCWSGD, endows the parameter updating with two types of stochasticity regarding multiplicative scaling for the gradient and mini-batch sampling to compute the gradient, for improving generalization performance. In the experiments on image classification using various CNNs, the proposed optimizer produces favorable performance in comparison to the other optimizers.