Overcoming the Seesaw in Monocular 3D Object Detection via Language Knowledge Transferring

Weichen Xu (Peking University); Tianhao Fu (Peking University)

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

07 Jun 2023

Monocular 3D object detection is a challenging problem in self-driving and computer vision communities. Previous works suffered from a severe seesaw phenomenon: multi-category learning was worse than single-category, and feature learning between categories inhibited each other. We reveal that the real culprit is the significant difference in depth distribution between categories. Confusing feature representations exacerbate depth estimation. In this paper, we propose Language Knowledge Transferring to introduce language information in monocular 3D object detection, termed as MonoLT. Multimodal language-Image guides networks learn more class-specific features, which reduces the pressure of depth estimation. Meanwhile, we propose the Polar Depth Aggregator to make the depth estimation less disturbed by the environment and other instances (especially different classes). Comprehensive experiments performed on the KITTI dataset prove the superiority of our proposed method.

Tags:

Applications of machine learning

Overcoming the Seesaw in Monocular 3D Object Detection via Language Knowledge Transferring

Weichen Xu (Peking University); Tianhao Fu (Peking University)

Value-Added Bundle(s) Including this Product

IEEE ICASSP 2023, 4-10 June 2023, Greece. Virtual and In-Person Conference - Presentation Videos Product Bundle

More Like This

Joint Cryo-ET Alignment and Reconstruction with Neural Deformation Fields

Gluformer: Transformer-Based Personalized Glucose Forecasting with Uncertainty Quantification

FINER-GRAINED DECOMPOSITION FOR PARALLEL QUANTUM MIMO PROCESSING

Join the IEEE Signal Processing Society