# liheyoung/depth-anything

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/liheyoung-depth-anything).**

8,124 stars · 613 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/LiheYoung/Depth-Anything
- Homepage: https://depth-anything.github.io
- awesome-repositories: https://awesome-repositories.com/repository/liheyoung-depth-anything.md

## Topics

`depth-estimation` `image-synthesis` `metric-depth-estimation` `monocular-depth-estimation`

## Description

Depth-Anything is a monocular depth estimation foundation model that produces dense per-pixel depth maps from a single RGB image. It is built on a DINOv2 Vision Transformer encoder backbone and trained on 62 million unlabeled images using a teacher-student pseudo-labeling framework, enabling robust generalization across diverse scenes without task-specific training. The model outputs both relative depth maps, which capture the ordering of scene points, and metric depth maps with real-world units after fine-tuning on datasets like NYUv2 or KITTI.

The project distinguishes itself through its ability to process video frame-by-frame for consistent depth estimation across clips, and through its integration with ControlNet pipelines for depth-conditioned image generation, where it replaces the default depth estimator to provide more precise conditioning signals. It also offers a fine-tuning framework for adapting the pretrained model to custom datasets or downstream tasks such as semantic segmentation, with demonstrated performance on benchmarks like Cityscapes and ADE20K.

Depth-Anything provides a command-line interface for batch processing images and videos, with options for grayscale output or side-by-side visualization. The model can be loaded via Hugging Face Transformers pipelines for minimal-code inference, or loaded from disk for direct tensor-based inference.

## Tags

### Artificial Intelligence & ML

- [Depth](https://awesome-repositories.com/f/artificial-intelligence-ml/foundation-models/depth.md) — Provides a large-scale pretrained depth estimation model that generalizes across diverse scenes without task-specific training.
- [Monocular Depth Estimators](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision/object-pose-estimations/monocular-depth-estimators.md) — Estimates dense per-pixel depth maps from single RGB images using a DINOv2 encoder backbone.
- [Metric Depth Estimators](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision/object-pose-estimations/monocular-depth-estimators/metric-depth-estimators.md) — Fine-tunes the model on metric datasets to output depth values in real-world units from a single image. ([source](https://cdn.jsdelivr.net/gh/liheyoung/depth-anything@main/README.md))
- [Depth Estimation](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision/object-pose-estimations/monocular-depth-estimators/multi-view-depth-estimators/depth-estimation.md) — Processes a single RGB image through a fully convolutional decoder to produce a per-pixel depth map.
- [Pretrained Depth Models](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision/object-pose-estimations/monocular-depth-estimators/multi-view-depth-estimators/depth-estimation/pretrained-depth-models.md) — Provides a pretrained monocular depth estimation model that outputs relative and metric depth maps out of the box.
- [Relative Depth Estimators](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision/object-pose-estimations/monocular-depth-estimators/relative-depth-estimators.md) — Produces depth maps that capture the relative ordering of scene points from a single image without domain-specific training. ([source](https://cdn.jsdelivr.net/gh/liheyoung/depth-anything@main/README.md))
- [Teacher-Student Distillation](https://awesome-repositories.com/f/artificial-intelligence-ml/model-distillation-methods/teacher-student-distillation.md) — Generates pseudo depth labels from a teacher model on unlabeled data and trains a student model to predict them.
- [Teacher-Student Pseudo-Label Training](https://awesome-repositories.com/f/artificial-intelligence-ml/unlabeled-data-training/teacher-student-pseudo-label-training.md) — Trains the depth model on 62 million unlabeled images using a teacher-student pseudo-labeling framework.
- [Self-Supervised](https://awesome-repositories.com/f/artificial-intelligence-ml/vision-encoders/self-supervised.md) — Uses a DINOv2 Vision Transformer encoder pre-trained with self-supervised learning as the backbone for depth estimation.
- [Depth Estimation CLI Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision/object-pose-estimations/monocular-depth-estimators/depth-estimation-cli-tools.md) — Provides a command-line interface for batch processing images to generate depth maps with grayscale or side-by-side output. ([source](https://cdn.jsdelivr.net/gh/liheyoung/depth-anything@main/README.md))
- [Relative Depth Map Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision/object-pose-estimations/monocular-depth-estimators/multi-view-depth-estimators/depth-estimation/relative-depth-map-generators.md) — Outputs depth values that indicate which parts of a scene are closer or farther without providing absolute scale. ([source](https://liheyoung.github.io/))
- [Video Depth Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/computer-vision-systems/computer-vision/object-pose-estimations/monocular-depth-estimators/multi-view-depth-estimators/depth-estimation/video-depth-frameworks.md) — Processes video frames sequentially to generate consistent depth maps for each frame in a clip.
- [Depth Estimation Fine-Tunings](https://awesome-repositories.com/f/artificial-intelligence-ml/full-parameter-fine-tuning/custom-data-fine-tunings/depth-estimation-fine-tunings.md) — Provides a framework for fine-tuning the pretrained depth model on custom datasets for improved accuracy. ([source](https://depth-anything.github.io))
- [Depth Map Conditioning](https://awesome-repositories.com/f/artificial-intelligence-ml/image-generation-models/conditional-image-generation/depth-map-conditioning.md) — Integrates with ControlNet pipelines to provide precise depth maps as conditioning signals for image synthesis.
- [Depth Estimation Fine-Tunings](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-frameworks/vision-model-training/vision-language-training/vision-language-fine-tunings/fine-tuning-frameworks/depth-estimation-fine-tunings.md) — Ships a fine-tuning framework for adapting the pretrained depth model to custom datasets and downstream tasks.
- [Relative-to-Metric Depth Scaling](https://awesome-repositories.com/f/artificial-intelligence-ml/relative-to-metric-depth-scaling.md) — Fine-tunes the relative depth model on metric datasets like NYUv2 or KITTI to output depth in real-world units.

### Graphics & Multimedia

- [Single-Image Metric Depth Mappers](https://awesome-repositories.com/f/graphics-multimedia/depth-accuracy-metrics/metric-depth-mapping/single-image-metric-depth-mappers.md) — Produces depth maps with real-world units from one image, enabling direct measurement of scene geometry.
- [Metric Depth Mapping](https://awesome-repositories.com/f/graphics-multimedia/depth-accuracy-metrics/metric-depth-mapping.md) — Outputs depth values in real-world units when a metric model is used, enabling direct measurement of scene geometry. ([source](https://liheyoung.github.io/))
- [Depth Frame Processors](https://awesome-repositories.com/f/graphics-multimedia/video-frame-processing/depth-frame-processors.md) — Processes video frames sequentially to generate consistent depth maps for each frame in a clip.

### Data & Databases

- [Depth Map Batch Processors](https://awesome-repositories.com/f/data-databases/batch-input-processing/depth-map-batch-processors.md) — Provides a command-line interface for batch processing images and videos to generate depth maps.

### DevOps & Infrastructure

- [Depth Estimation Pipelines](https://awesome-repositories.com/f/devops-infrastructure/model-conversion/hugging-face/depth-estimation-pipelines.md) — Ships a Hugging Face pipeline wrapper for running depth estimation on images with minimal code. ([source](https://cdn.jsdelivr.net/gh/liheyoung/depth-anything@main/README.md))
