Interaction-Assisted Multi-Modal Representation Learning for Recommendation

Hao Wu (Alibaba Group); Jiajie Wang (Alibaba Group); Zhonglin Zu (Alibaba Group)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

06 Jun 2023

Personalized recommender systems have attracted significant attentions from both industry and academic. Recent studies have shed light on incorporating multi-modal side information into the recommender systems to further boost the performance. Meanwhile, transformer-based multi-modal representation learning has shown great enhancement for downstream visual and textual tasks. However, these self-supervised pre-training methods are not tailored for recommendation and may lead to suboptimal representations. To this end, we propose Interaction-Assisted Multi-Modal Representation Learning for Recommendation (IRL) to inject the information of user interactions into item multi-modal representation learning. Specifically, we extract item graph embedding through user-item interactions and then utilize it to formulate a novel triplet IRL training objective which serves as a behavior-aware pre-training task for the representation learning model. A range of experiments have been conducted on several real-world datasets and extensive results indicate the effectiveness of IRL.

Tags:

Multi-modal signal processing and analysis (audio/visual/haptics/radar/lidar etc.)

Interaction-Assisted Multi-Modal Representation Learning for Recommendation

Hao Wu (Alibaba Group); Jiajie Wang (Alibaba Group); Zhonglin Zu (Alibaba Group)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation

Adaptive CSI Feedback with Hidden Semantic Information Transfer

The Multimodal Information Based Speech Processing (MISP) 2022 Challenge: Audio-Visual Diarization and Recognition

Join the IEEE Signal Processing Society