On weighted cross-entropy for label-imbalanced separable data: An algorithmic-stability study

Puneesh Deora (University of British Columbia); Christos Thrampoulidis (University of British Columbia)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

09 Jun 2023

Implicit bias theory characterizes notions of simplicity in the weights learned by gradient descent when training without explicit regularization beyond zero training error, and has served as a cornerstone result for theoretically justifying good generalization of interpolating models. However, its asymptotic nature (in number of gradient steps) limits its practical relevance. This motivates developing finite-time generalization bounds. Specifically, recent works have proposed bounding the generalization error indirectly by controlling the corresponding test loss via the algorithmic-stability framework. Concretely, for cross-entropy (CE) training on separable balanced data, they show that the CE test loss decays as fast (up to logarithmic factors) as the test error. In this paper, we study generalization under label imbalances. Motivated by our empirical observation that weighted CE (wCE) can significantly outperform the max-margin classifier at early training phases, we ask whether the stability framework can prove this early-stopping result. To this end, we extend the analysis to the imbalanced setting and bound the test loss of wCE. For Gaussian mixtures, we show this bound is orderwise the same as the balanced error of the max-margin classifier, suggesting the test loss might not be a good proxy of balanced error for wCE under imbalances. We further support this conjecture with empirical results.

Tags:

Bounds on performance

On weighted cross-entropy for label-imbalanced separable data: An algorithmic-stability study

Puneesh Deora (University of British Columbia); Christos Thrampoulidis (University of British Columbia)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Asymptotic Distribution of Stochastic Mirror Descent Iterates in Average Ensemble Models

On the Value of Stochastic Side Information in Online Learning

Communication-Constrained Exchange of Zeroth-Order Information with Application to Collaborative Target Tracking

Join the IEEE Signal Processing Society