Depth-Anything-3 is a collection of core model implementations for depth prediction, multi-view geometry estimation, and RGB-D spatial pipelines. It includes a monocular depth estimation model for predicting depth maps from single images or video, and a 3D Gaussian splatting generator that predicts parameters to synthesize high-fidelity novel views of a scene.
The project provides a multi-view geometry estimator for calculating spatially consistent depth and camera poses across synchronized visual inputs. It also functions as a visual SLAM enhancement tool designed to reduce drift and improve mapping precision in autonomous navigation systems.
The framework covers broader capabilities in 3D reconstruction, including camera pose estimation, multi-camera depth fusion, and the export of geometry data to common 3D formats. It also incorporates model output visualization through an interactive gallery interface and sliding window video inference to manage GPU memory usage during long sequences.
The project includes a scriptable command-line interface for executing batch geometry estimation tasks across multiple files and video streams.