Efficient Compressed Video Action Recognition via Late Fusion with a Single Network
Hayato Terao (Hokkaido University); Wataru Noguchi (Hokkaido University); Hiroyuki Iizuka (Hokkaido University); Masahito Yamamoto (Hokkaido University)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Compressed video action recognition is an action recognition approach that can achieve efficient inference by directly classifying video data obtained from multiple features stored in compressed videos. Most conventional methods use multiple networks to process compressed video features. They explore the use of lightweight networks without affecting the classification performance to reduce the computational complexity of compressed video action recognition. This study explores another approach to reduce the computational complexity, by using a single network instead of multiple networks to process compressed video features. Although training a single network cannot yield the practical classification performance of conventional methods, we propose an extended MIMO training method for action recognition to simultaneously process different features. Our experiments demonstrate that our method is the most efficient in terms of computational complexity and can achieve classification performances comparable to conventional methods.