WordReg: Mitigating the Gap between Training and Inference with Worst-case Drop Regularization
Jun Xia (Westlake University); Ge Wang (Westlake University); Bozhen Hu (Zhejiang University & Westlake University); Cheng Tan (Zhejiang University & Westlake University); Jiangbin Zheng (Westlake University); Yongjie Xu (Westlake University); Stan Z. Li (Westlake University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Dropout has emerged as one of the most frequently used techniques for training deep neural networks (DNNs). Although effective, the sampled sub-model by random dropout during training is inconsistent with the full model (without dropout) during inference. To mitigate this undesirable gap, we propose WordReg, a simple yet effective regularization built on dropout that enforces the consistency between the outputs of different sub-models sampled by dropout. Specifically, WordReg first obtains the worst-case dropout by maximizing the divergence between the outputs with two sub-models with different random dropouts. And then, it encourages the agreements between the outputs of the two sub-models with worst-case divergence. Extensive experiments on diverse DNNs and tasks reveal that WordReg can achieve notable and consistent improvements over non-regularized models and yields some state-of-the-art results. Theoretically, we verify that WordReg can reduce the gap between training and inference. The code for reproducing the results will be released.