Skip to main content

CLMAE: a liter and faster Masked Autoencoders

Yiran Song (Shanghai Jiao Tong University); Lizhuang Ma (Shanghai Jiao Tong University)

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
06 Jun 2023

Self-supervised pre-training has been widely utilized on various vision tasks and gains a great success (e,g, BEiT and MAE). However, pre-training on big datasets suffers a lengthy training schedule and large memory consumption. To alleviate these problems, we propose a light-weighted model called Convolutional Lite Masked Auto Encoder (CLMAE). To improve the convergence speed of the transformer during pre-training. We introduce two-stage convolutional progressive patch embedding and an additional convolution in the feed-forward layer, which promote better correlation among patches in the spatial dimensions. The most important design is called cross-layer parameter sharing mechanism, which reduces model parameters with little impact on the performance. We find that sharing parameters among layers not only improves the parameter efficiency, but also acts as a form of regularization that stabilizes the training. Experimental results on downstream tasks show the effectiveness and generalization ability of CLMAE, which accelerates the training process significantly (by 5 times for ViT-B and MAE) and reduces a quarter of parameters (by 25M fewer for ViT-B), with a competitive accuracy (82.8% on ImageNet-1K).

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00