This project is a 3D visual localization framework designed to determine a camera's exact position and orientation by matching 2D image features against a 3D reference model. It includes a structure-from-motion pipeline to reconstruct 3D scene geometry from unordered image sets, creating the necessary spatial maps for localization.
The system employs a hierarchical coarse-to-fine localization approach. This process begins with a global-descriptor image retrieval system to identify candidate reference images from a large database and progresses through local feature matching to final 3D model alignment using perspective-n-point pose estimation.
The framework covers deep learning-based feature extraction, sparse feature matching, and camera model calibration to ensure accurate 3D reconstruction. It also provides tools for localization data visualization and 3D scene analysis to examine keypoint visibility and debug mapping errors.