MULTISCALE REPRESENTATIONS LEARNING TRANSFORMER FRAMEWORK FOR POINT CLOUD CLASSIFICATION
Yajie Sun, Ali Zia, Jun Zhou
-
SPS
IEEE Members: $11.00
Non-members: $15.00
Extracting and aggregating multiple feature representations from various scales have become the key to point cloud tasks. Although Vision Transformer (ViT) is currently popular for processing point clouds, it lacks adequate multi-scale features and interaction among them, which is vital for identifying structural details in the point cloud. In addition, learning efficient and effective representation from the point cloud is challenging due to its irregular, unordered, and sparse nature. Inspired by these, we propose a novel multi-scale representation learning transformer framework employing varied geometric features beyond common Cartesian coordinates. Our approach enriches the descriptions of point clouds by local geometric relationships and then are grouped them at multiple scales. This scale information is aggregated together and then new patches are extracted that minimize features overlay. The bottleneck projection head follows to enhance the information and fed all patches to the multi-head attention to capture the deep dependencies among representations across patches. Evaluation of public benchmark datasets shows the competitive performance of our framework on point cloud classification.