Lookahead Converges To Stationary Points Of Smooth Non-Convex Functions
Jianyu Wang, Vinayak Tantia, Nicolas Ballas, Michael Rabbat
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 12:35
The Lookahead optimizer [Zhang et al., 2019] was recently proposed and demonstrated to improve performance of stochastic first-order methods for training deep neural networks. Lookahead can be viewed as a two time-scale algorithm, where the fast dynamics (inner optimizer) determine a search direction and the slow dynamics (outer optimizer) perform updates by moving along this direction. We prove that, with appropriate choice of step-sizes, Lookahead converges to a stationary point of smooth non-convex functions. Although Lookahead is described and implemented as a serial algorithm, our analysis is based on viewing Lookahead as a multi-agent optimization method with two agents communicating periodically.