〈   HOME

SCOPUS 등재 국제저널 JICRS, 논문 게재-최정현

2024.06.01


· 저널 : JICRS(Journal of Institute of Control, Robotics and Systems), Vol.30, No.6, pp.596-606, 2024. 06.
· 논문제목 : "Goal Object Grounding and Multimodal Mapping for Multi-object Visual Navigation"
· 저자 : 최정현, 김인철
· 요약 : Multi-object visual navigation (MultiON) is a special type of visual navigation task that requires an embodied agent to visit multiple goal objects distributed over an unseen three-dimensional (3D) environment in a predefined order. To successfully execute MultiON, an agent should be able to accurately ground individual goal objects based on language descriptions regarding their color and shape attributes and build a semantically rich map that effectively covers the entire environment. In this paper, we propose a novel deep neural network-based agent model for performing MultiON tasks. The proposed model provides unique solutions to three different issues regarding MultiON agent design. First, the model adopts the pre-trained Grounding DINO module to ground the language descriptions of goal objects to the visual objects in input images in a zero-shot manner. Moreover, the model uses Bayesian posterior probabilities to effectively register the uncertain local contexts extracted from input images onto the global map. Finally, the model applies a novel reward function to efficiently motivate the agent to explore unvisited areas in the given environment for rapid and accurate map expansion. We demonstrate the superiority of the proposed model by conducting various quantitative and qualitative experiments using the 3D simulation platform, AI-Habitat, and the benchmark scene dataset, Matterport3D.