MULTISCALE REPRESENTATIONS LEARNING TRANSFORMER FRAMEWORK FOR POINT CLOUD CLASSIFICATION

Yajie Sun, Ali Zia, Jun Zhou

DOI

SPS

Members: Free
IEEE Members: $11.00
Non-members: $15.00

Poster 10 Oct 2023

Extracting and aggregating multiple feature representations from various scales have become the key to point cloud tasks. Although Vision Transformer (ViT) is currently popular for processing point clouds, it lacks adequate multi-scale features and interaction among them, which is vital for identifying structural details in the point cloud. In addition, learning efficient and effective representation from the point cloud is challenging due to its irregular, unordered, and sparse nature. Inspired by these, we propose a novel multi-scale representation learning transformer framework employing varied geometric features beyond common Cartesian coordinates. Our approach enriches the descriptions of point clouds by local geometric relationships and then are grouped them at multiple scales. This scale information is aggregated together and then new patches are extracted that minimize features overlay. The bottleneck projection head follows to enhance the information and fed all patches to the multi-head attention to capture the deep dependencies among representations across patches. Evaluation of public benchmark datasets shows the competitive performance of our framework on point cloud classification.

Tags:

point cloud classification

multi-scale features

geometric features

multi-scale transformer

3D computer vision