Skip to main content
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
    Length: 00:09:51
17 Oct 2022

Transformer has shown outstanding performance in time-series data processing, which can definitely facilitate quality assessment of video sequences. However, the quadratic time and memory complexities of Transformer potentially impede its application to long video sequences. in this work, we study a mechanism of sharing attention across video clips in video quality assessment (VQA) scenario. Consequently, an efficient architecture based on integrating shared multi-head attention (MHA) into Transformer is proposed for VQA, which greatly ease the time and memory complexities. A long video sequence is first divided into individual clips. The quality features derived by an image quality model on each frame in a clip are aggregated by a shared MHA layer. The aggregated features across all clips are then fed into a global Transformer encoder for quality prediction at sequence level. The proposed model with a lightweight architecture demonstrates promising performance in no-reference VQA (NR-VQA) modelling on publicly available data-bases. The source code can be found at https://github.com/junyongyou/lagt_vqa.

Value-Added Bundle(s) Including this Product

More Like This

  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00
  • SPS
    Members: Free
    IEEE Members: $11.00
    Non-members: $15.00