〈 HOME
·
저널 : JICRS(Journal of Institute of Control, Robotics and Systems), Vol.30, No.10, pp.1090-1103, 2024. 10.
·
논문제목 : "Open-vocabulary 3D Semantic Segmentation With 3D Region Mask Proposal and 2D-3D Visual Feature Ensemble"
·
저자 : 배혜림, 김인철
·
요약 : Open-vocabulary scene understanding aims to recognize arbitrary novel categories beyond the base label space. In this paper, we propose a novel open-vocabulary 3D semantic segmentation model, OV-3DRENet, in order to address the limitations of existing models. Unlike existing 3D semantic segmentation models performing point-level categorization, the proposed model performs region-level categorization using Mask3D as 3D region mask proposal module to generate multiple class-agnostic point cloud regions. The proposed model uses OpenScene, a pretrained open-vocabulary point cloud segmentation model, as 3D point encoder to extract language-aligned 3D visual features for each region from the scene point cloud. Futherrmore, it adopts OpenSeg, a pretrained open-vocabulary image segmentation model, as 2D pixel encoder to extract language-aligned 2D visual features for each region from the multi-view scene images. Last, our model applies a novel 2D-3D visual feature ensemble scheme to allocate semantically well-matched open-vocabulary class labels to point cloud regions. Conducting various quantitative and qualitative experiments using a large benchmark dataset ScanNet v2, we show the superiority of the proposed model.