SpatialLM is a spatial modeling framework that uses large language models to transform monocular video and sensor data into structured indoor semantic maps. It functions as a system for indoor layout estimation and a point cloud semantic parser, converting raw geometric data into representations of architectural elements and object categories. The project aligns multi-modal sensor inputs with linguistic tokens, allowing a language model to serve as a reasoning engine for inferring room topology. It employs mechanisms to convert 3D point clouds and 2D image sequences into discrete tokens and s
Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image
Polygon Detection for Room Layout Estimation using Heterogenous Graphs and Wireframes
Pytorch implementation of the ECCV 2020 paper: AtlantaNet: Inferring the 3D Indoor Layout from a Single 360 Image beyond the Manhattan World Assumption