[2405.17859] Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation

[Submitted on 28 May 2024 (v1), last revised 5 Mar 2025 (this version, v3)]

View a PDF of the paper titled Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation, by Yangxiao Lu and 4 other authors

View PDF
HTML (experimental)

Abstract:Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified, simple, yet effective framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilized foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce.

We show experimentally that our weight adapter can adjust the embeddings locally within their feature space and effectively limit overfitting in the few-shot setting. Furthermore, the weight adapter optimizes weights to enhance the distinctiveness of instance embeddings during similarity computation. This methodology enables a straightforward matching strategy that results in significant performance gains. Our framework surpasses current state-of-the-art methods, demonstrating notable improvements in four detection datasets. In the segmentation tasks on seven core datasets of the BOP challenge, our method outperforms the leading published RGB methods and remains competitive with the best RGB-D method. We have also verified our method using real-world images from a Fetch robot and a RealSense camera. Project Page: this https URL

Submission history

From: Yangxiao Lu [view email]
[v1]
Tue, 28 May 2024 06:16:57 UTC (49,269 KB)
[v2]
Mon, 2 Dec 2024 19:51:41 UTC (41,892 KB)
[v3]
Wed, 5 Mar 2025 01:48:25 UTC (43,367 KB)

Source link

Latest articles

[2405.17859] Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation

Submission history

Latest articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency

Leave a Comment Cancel reply

Featured articles

ChatGPT gained one million new users in an hour today

China police deploy real-life Robocop as humanoid tech takes huge leap forward

Runway releases Gen-4 video model with focus on consistency