Self-Attention based Action Segmentation using Intra-and Inter-segment Representations
Constantin Patsch (Technical University of Munich); Eckehard Steinbach (TUM)
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Segmenting activities in untrimmed videos remains a critical chal-
lenge to fully understand complex human activity sequences. A
correct representation of temporal action relations is key for im-
proving incorrect segmentations. We propose a self-attention-based
model that refines initial segmentations by separately considering
intra- as well as inter-segment relations between predicted action
segments. Furthermore, in order to enhance the training process, we
use a similarity-guided regularization technique that ensures intra-
segment similarity and the validity of action transitions between ad-
jacent segments. In an extensive evaluation on three public datasets
- Georgia Tech Egocentric Activities, 50Salads, and Breakfast - our
proposed architecture enhances the backbone model by 6.1% on
GTEA, 3.8% on 50Salads, and 3.9% on Breakfast with regard to
the F1@50 metric.