Tutorial: Understanding Deep Representation Learning via Neural Collapse

Laura Balzano, Qing Qu, Peng Wang, Zhihui Zhu

DOI

SPS

Members: $10.00
IEEE Members: $22.00
Non-members: $30.00

Tutorial 23 Oct 2024

The Neural Collapse phenomenon has garnered significant attention in both practical and theoretical fields of deep learning, as evident from the extensive research on the topic. The presenters' own works have made key contributions to this body of research. Below is a summary of the tutorial outline. The first half focuses on the structures of representations appearing in the last layer, and we generalize the study into intermediate layers in the second half of this tutorial. 1. Prevalence of Neural Collapse & Global Optimality The tutorial starts with the introduction of the Neural Collapse phenomenon in the last layer and its universality in deep network training, and lays out the mathematical foundations of understanding its cause based upon simplified unconstrained feature model (UFM) . We then generalize and explain this phenomenon and its implications under data imbalanceness. 2. Optimization Theory of Neural Collapse We provide a rigorous explanation of the emergence of Neural Collapse from an optimization perspective and demonstrate its impacts on algorithmic choices, drawing on recent works. Specifically, we conduct a global landscape analysis under the UFM to show that benign landscapes are prevalent across various loss functions and problem formulations. Furthermore, we demonstrate the practical algorithmic implications of Neural Collapse on training deep neural networks. 3. Progressive Data Compression & Separation Across Intermediate Layers We open the black-box of deep representation learning by introducing a law that governs how real-world deep neural networks separate data according to their class membership from the bottom layers to the top layers. We show that each layer roughly improves a certain measure of data separation by an equal multiplicative factor. We demonstrate its universality by showing its prevalence across different network architectures, dataset, and training losses. 4. Theory & Applications of Progressive Data Separation Finally, we delve into theoretical understandings of the structures in the intermediate layer via studying the learning dynamics of gradient descent. In particular, we reveal that there are certain parsimonious structures in gradient dynamics so that a certain measure of data separation exhibits layer-wise linear decay from shallow to deep layers. Finally, we demonstrate its practical implications of understanding the phenomenon in transfer learning and the study of foundation models, leading to efficient fine-tuning methods with reduced overfitting.

Tags:

icassp

ICASSP 2024

2024

tutorial

Deep Representation Learning, Neural Collapse

representation learning

Tutorial: Understanding Deep Representation Learning via Neural Collapse

Laura Balzano, Qing Qu, Peng Wang, Zhihui Zhu

More Like This

Welcome and Opening Remarks for the IEEE SustainTech Leadership Forum

Panel: Building Sustainable Cities for Tomorrow

Panel: Unleashing the Potential of Virtual Power Plants for Sustainable Energy Solutions

Join the IEEE Signal Processing Society