March 14, 2025
Visual Feature Learning Without Supervision
The field of computer vision is experiencing an increase in foundation models, similar to those in natural language processing (NLP). These models aim to produce general-purpose visual features that we can apply across various image distributions and tasks without the need for fine-tuning. The recent success of unsupervised learning in NLP pushed the way for similar advancements in computer vision. This article covers DINOv2, an approach that leverages self-supervised learning to generate robust visual features. Figure 1. DINOv2 principal component analysis visualization (source: https://github.com/facebookresearch/dinov2). The DINOv2 Framework In this section we will cover, various components of the DINOv2 framework including