Depth Anything 3 | Awesome Repository

Depth-Anything-3 is a collection of core model implementations for depth prediction, multi-view geometry estimation, and RGB-D spatial pipelines. It includes a monocular depth estimation model for predicting depth maps from single images or video, and a 3D Gaussian splatting generator that predicts parameters to synthesize high-fidelity novel views of a scene.

The project provides a multi-view geometry estimator for calculating spatially consistent depth and camera poses across synchronized visual inputs. It also functions as a visual SLAM enhancement tool designed to reduce drift and improve mapping precision in autonomous navigation systems.

The framework covers broader capabilities in 3D reconstruction, including camera pose estimation, multi-camera depth fusion, and the export of geometry data to common 3D formats. It also incorporates model output visualization through an interactive gallery interface and sliding window video inference to manage GPU memory usage during long sequences.

The project includes a scriptable command-line interface for executing batch geometry estimation tasks across multiple files and video streams.

Features

Monocular Depth Estimators - Predicts real-world metric depth values from single RGB images using a trained neural network.
3D Pose Estimation - Calculates precise 3D camera positions and orientations from visual inputs using coordinate-based pose estimation.
Multi-View Depth Estimators - Integrates depth maps from different camera perspectives into a consistent spatial representation for 3D reconstruction.
Depth Estimation - Predicts spatially consistent depth maps from one or several visual inputs to recover the visual space.

Features

Monocular Depth Estimators - Predicts real-world metric depth values from single RGB images using a trained neural network.
3D Pose Estimation - Calculates precise 3D camera positions and orientations from visual inputs using coordinate-based pose estimation.
Multi-View Depth Estimators - Integrates depth maps from different camera perspectives into a consistent spatial representation for 3D reconstruction.
Depth Estimation - Predicts spatially consistent depth maps from one or several visual inputs to recover the visual space.

The project includes a scriptable command-line interface for executing batch geometry estimation tasks across multiple files and video streams.