Depth Anything V2

Monocular Depth Estimators - Provides a foundation model that infers three-dimensional spatial depth from single two-dimensional image inputs.

Computer Vision Models - Provides a large-scale pre-trained neural network designed for general purpose spatial understanding and depth perception.

Depth Estimation - Calculates absolute distance measurements in indoor and outdoor scenes using scale-aware models.

Metric - Calculates absolute distance measurements for indoor and outdoor scenes using specialized scale-aware models.

Relative - Produces a relative depth map from a single input image using pre-trained foundation models.

Temporal Video - Generates depth maps for video sequences while maintaining temporal consistency across frames.

Video Depth Frameworks - Provides a framework for generating temporally consistent depth maps across sequential video frames.

Relative-to-Metric Depth Scaling - Translates dimensionless relative depth maps into absolute distance measurements using scale-aware model variants.

Spatial Understanding - Extracts fine-grained geometric information from images to perceive the layout of a physical space.

Metric Depth Mapping - Determines absolute physical distance between the camera and objects in indoor or outdoor environments.

Video Depth Analysis - Generates consistent depth maps across video frames to understand the three dimensional structure of moving scenes.

Temporal Prediction Smoothing - Ensures depth predictions remain stable and smooth across consecutive video frames to reduce jitter.

Transformer Encoders - Uses a vision transformer architecture to extract global context and high-resolution spatial features.

Unsupervised Pre-training - Implements unsupervised pre-training on massive unlabeled datasets to learn general depth representations.

Model Size Variants - Offers a multi-scale model hierarchy with various parameter counts to balance inference speed and accuracy.

DepthAnythingDepth-Anything-V2