On weighted cross-entropy for label-imbalanced separable data: An algorithmic-stability study
Puneesh Deora (University of British Columbia); Christos Thrampoulidis (University of British Columbia)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Implicit bias theory characterizes notions of simplicity in the weights learned by gradient descent when training without explicit regularization beyond zero training error, and has served as a cornerstone result for theoretically justifying good generalization of interpolating models. However, its asymptotic nature (in number of gradient steps) limits its practical relevance. This motivates developing finite-time generalization bounds. Specifically, recent works have proposed bounding the generalization error indirectly by controlling the corresponding test loss via the algorithmic-stability framework. Concretely, for cross-entropy (CE) training on separable balanced data, they show that the CE test loss decays as fast (up to logarithmic factors) as the test error. In this paper, we study generalization under label imbalances. Motivated by our empirical observation that weighted CE (wCE) can significantly outperform the max-margin classifier at early training phases, we ask whether the stability framework can prove this early-stopping result. To this end, we extend the analysis to the imbalanced setting and bound the test loss of wCE. For Gaussian mixtures, we show this bound is orderwise the same as the balanced error of the max-margin classifier, suggesting the test loss might not be a good proxy of balanced error for wCE under imbalances. We further support this conjecture with empirical results.