# facebookresearch/dino

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/facebookresearch-dino).**

7,592 stars · 1,045 forks · Python · Apache-2.0 · archived

## Links

- GitHub: https://github.com/facebookresearch/dino
- awesome-repositories: https://awesome-repositories.com/repository/facebookresearch-dino.md

## Description

This project is a PyTorch vision transformer framework designed for self-supervised learning. It implements a model that trains visual representations using a momentum teacher and self-distillation without the need for labeled data.

The library functions as an image feature extractor and visual attention visualizer, allowing for the generation of high-dimensional vectors and the rendering of self-attention maps as heatmaps or videos to analyze model focus.

It provides comprehensive tools for downstream vision evaluation, including linear probe classification, k-nearest neighbor categorization, and visual similarity search. The system also supports semi-supervised video object segmentation and image copy detection.

The framework includes infrastructure for multi-node distributed training and utilities for importing pretrained model weights to accelerate convergence and deployment.

## Tags

### Artificial Intelligence & ML

- [Self-Supervised Vision Representation Trainers](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/word-embeddings/self-supervised-embedding-trainers/self-supervised-vision-representation-trainers.md) — Implements a self-supervised learning method using a momentum teacher and temperature warmup to train vision architectures. ([source](https://github.com/facebookresearch/dino#readme))
- [Vision Transformers](https://awesome-repositories.com/f/artificial-intelligence-ml/vision-transformers.md) — Implements a vision transformer that processes images as sequences of fixed-size patches.
- [Attention Visualizations](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-visualizations.md) — Generates heatmaps and videos to visualize which image regions the transformer focuses on.
- [Self-Distillation Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/masked-language-modeling/self-distillation-pipelines.md) — Trains a student network to predict the output of a momentum-updated teacher without using labeled data.
- [Vision Transformer Training](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-frameworks/vision-model-training/vision-transformer-training.md) — Processes images by dividing them into patches and embedding them into a latent space using a transformer architecture. ([source](https://github.com/facebookresearch/dino/blob/main/README.md))
- [Exponential Moving Average Weight Updates](https://awesome-repositories.com/f/artificial-intelligence-ml/model-weight-reconstruction/weight-smoothing/exponential-moving-average-weight-updates.md) — Stabilizes training using an exponential moving average to update teacher weights based on student weights.
- [PyTorch Vision Transformer Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-vision-transformer-frameworks.md) — Provides a comprehensive PyTorch implementation for training Vision Transformers via self-supervised learning.
- [Augmentation Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/augmentation-pipelines.md) — Ships sequential processing pipelines for stochastic image augmentations including Gaussian blur and solarization. ([source](https://github.com/facebookresearch/dino/blob/main/utils.py))
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training.md) — Supports scaling model training across multiple GPUs and compute nodes for large-scale workloads.
- [Downstream Vision Evaluation](https://awesome-repositories.com/f/artificial-intelligence-ml/downstream-vision-evaluation.md) — Evaluates pretrained weight quality using linear probes and k-nearest neighbor classification on standard datasets.
- [Multi-Node Inference Scaling](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-model-deployments/multi-node-inference-scaling.md) — Distributes heavy machine learning workloads across multiple GPUs and compute nodes.
- [Image-to-Image Retrieval](https://awesome-repositories.com/f/artificial-intelligence-ml/image-retrieval-systems/text-to-image-retrieval/image-to-image-retrieval.md) — Matches query images to target galleries by calculating similarity between learned feature vectors. ([source](https://github.com/facebookresearch/dino#readme))
- [K-Nearest Neighbor Classifiers](https://awesome-repositories.com/f/artificial-intelligence-ml/k-nearest-neighbor-classifiers.md) — Provides k-nearest neighbor classification to categorize images based on latent feature similarity.
- [Linear Classifiers](https://awesome-repositories.com/f/artificial-intelligence-ml/linear-regression/linear-classifiers.md) — Uses linear classifiers as probes to evaluate the quality of learned representations on frozen weights.
- [Output Centering & Sharpening](https://awesome-repositories.com/f/artificial-intelligence-ml/model-distillation-methods/teacher-student-distillation/output-centering-sharpening.md) — Implements output centering and sharpening to prevent collapse during self-supervised distillation.
- [Image Augmentations](https://awesome-repositories.com/f/artificial-intelligence-ml/training-data-transformations/image-augmentations.md) — Utilizes random image transformations to create multiple views of the same image for invariant feature learning.
- [Vector Similarity Search](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-similarity-search.md) — Performs visual similarity searches across datasets using high-dimensional vector embeddings. ([source](https://github.com/facebookresearch/dino/blob/main/README.md))

### Part of an Awesome List

- [Neural Feature Extractors](https://awesome-repositories.com/f/awesome-lists/devtools/feature-extraction/neural-feature-extractors.md) — Generates high-dimensional vectors used for k-NN classification and image retrieval.
- [Self-Attention Implementations](https://awesome-repositories.com/f/awesome-lists/ai/attention-mechanisms/self-attention-implementations.md) — Extracts and renders the self-attention of the class token across different heads to determine model focus. ([source](https://github.com/facebookresearch/dino#readme))

### Graphics & Multimedia

- [Image Feature Extraction](https://awesome-repositories.com/f/graphics-multimedia/image-feature-extraction.md) — Converts images into high-dimensional latent vectors for similarity search and image retrieval.
- [Attention Map Visualizations](https://awesome-repositories.com/f/graphics-multimedia/attention-map-visualizations.md) — DINOv2 produces video files by extracting frames from source media and rendering the model's attention maps for each frame. ([source](https://github.com/facebookresearch/dino#readme))
