This project is a monocular depth estimation model and computer vision framework designed to calculate absolute distance and scale from single images. It functions as a metric depth estimator that generates high-resolution depth maps without requiring camera-specific focal length metadata.
The main features of apple/ml-depth-pro are: Monocular Depth Estimators, Computer Vision Models, Metric Depth Estimators, Single-Image Metric Depth Mappers, Depth Map Evaluation, Computer Vision, Vision Transformer Encoders, Depth Accuracy Metrics.
Open-source alternatives to apple/ml-depth-pro include: liheyoung/depth-anything — Depth-Anything is a monocular depth estimation foundation model that produces dense per-pixel depth maps from a single… pytorch/vision — This project is a comprehensive computer vision library for the PyTorch ecosystem, providing a standardized collection… depthanything/depth-anything-v2 — Depth-Anything-V2 is a computer vision foundation model designed for general-purpose spatial understanding and depth… nianticlabs/monodepth2 — This project is a computer vision system for monocular depth estimation and 3D point cloud generation. It provides a… pytorch/examples — This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning… tingsongyu/pytorch_tutorial — This project is a comprehensive collection of educational examples and reference implementations for building vision…
Depth-Anything is a monocular depth estimation foundation model that produces dense per-pixel depth maps from a single RGB image. It is built on a DINOv2 Vision Transformer encoder backbone and trained on 62 million unlabeled images using a teacher-student pseudo-labeling framework, enabling robust generalization across diverse scenes without task-specific training. The model outputs both relative depth maps, which capture the ordering of scene points, and metric depth maps with real-world units after fine-tuning on datasets like NYUv2 or KITTI. The project distinguishes itself through its ab
This project is a comprehensive computer vision library for the PyTorch ecosystem, providing a standardized collection of neural network architectures, datasets, and high-performance transformation utilities. It serves as a foundational framework for building, training, and deploying deep learning models, offering a centralized model registry that allows developers to instantiate architectures with pre-trained weights for tasks such as image classification, object detection, and semantic segmentation. The library distinguishes itself through its modular approach to data and compute management
Depth-Anything-V2 is a computer vision foundation model designed for general-purpose spatial understanding and depth perception. It functions as a monocular depth estimation model that predicts relative and absolute depth maps from single images or video sequences. The project provides specialized tools for both relative depth estimation and metric depth calculation, allowing for the determination of absolute physical distances in indoor and outdoor environments. It includes a video depth estimation framework that ensures temporal consistency across sequential frames to maintain stable depth
This project is a computer vision system for monocular depth estimation and 3D point cloud generation. It provides a supervised depth learning framework and a depth predictor capable of estimating spatial distance and disparity from single 2D images using pretrained neural networks. The system includes tools to transform 2D depth images into 3D point clouds via pixel coordinate backprojection and converts 3D point cloud data into 2D depth maps. It utilizes a training pipeline that supports model fine-tuning and hyperparameter optimization. The library covers broader capabilities in spatial a