Skip to main content

Document Layout Analysis Via Positional Encoding

EJian Zhou, Xingjiao Wu, Luwei Xiao, Xiangcheng Du, TianLong Ma, Liang He

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:05:38
06 Oct 2022

Research of head pose estimation in computer vision has been at the center of much attention. This work presents a framework based on adaptive graph convolution network (AGCN) to process both 2D and 3D facial landmarks extracted from the input RGB image. The network has a two-streams (teacher/3D-student/2D streams) architecture, trained with a 3D to 2D knowledge distillation training process, to transfer features of the 3D stream to the 2D stream for performance promotion. Several processing modules, such as depth-denoising for detected 3D landmarks, multi-stream fusion in inference, were also proposed for further increase of the prediction performance and robustness of our proposed method. in experiments, we follow standard protocols (in terms of datasets and metrices) to evaluate our performance. Three datasets 300W-LP, AFLW2000 and BIWI were used. The performance is measured in mean absolute error (MAE). We can achieve better performance compared to most of the state-of-the-art methods.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00