Lingbot Map

Lingbot-map is a feed-forward neural network designed for real-time 3D scene reconstruction from streaming video. It processes video frames one at a time without iterative optimization, producing dense geometry and camera poses at interactive frame rates directly from a live feed.

The project distinguishes itself through its ability to maintain stable geometry and pose alignment across very long video sequences, handling thousands of frames without drift. It achieves this through a combination of coordinate grounding memory, sliding-window inference with overlapping keyframes, and a paged KV cache attention mechanism that manages transformer memory within limited GPU resources. The system also includes a headless rendering pipeline that can produce MP4 flythrough videos of reconstructed point clouds, with camera motion controlled by configurable YAML presets supporting chase-cam, birdseye, static, and pivot shots.

Additional capabilities include keyframe caching to reduce memory footprint during long sequences, and ONNX-based sky point filtering to improve visual quality in outdoor reconstructions. The project provides tools for custom virtual camera path design and supports processing both live video feeds and pre-recorded image sequences.

Features

Streaming 3D Reconstruction Models - Provides a feed-forward neural network that reconstructs 3D scenes from streaming video in real time.

Streaming Inference Networks - Provides the feed-forward streaming architecture that processes video frames one at a time for real-time 3D reconstruction.

Streaming Reconstructors - Runs a feed-forward architecture with paged KV cache attention to produce stable geometry at interactive frame rates.

Streaming Reconstructors - Reconstructs 3D scenes in real time from a live video feed using a feed-forward neural network.

Feed-Forward Reconstructors - Processes a live video feed frame by frame to produce a 3D reconstruction at interactive frame rates.

Live Feed Reconstructors - Processes live video feeds frame by frame to produce 3D reconstructions in real time without full-sequence processing.

3D Drift Correctors - Provides drift correction that maintains stable geometry and pose alignment across thousands of video frames.

Sliding-Window Inference - Implements sliding-window inference with overlapping keyframes to process sequences beyond the model's training range.

Long-Sequence Video Inference Engines - Implements a sliding-window inference engine that maintains stable pose alignment across thousands of video frames.

Pose-Aligned Windows - Resets the transformer context window with overlapping keyframes to stabilize camera pose estimation across long sequences.

KV Cache Window Resetters - Processes video sequences exceeding the model's training range by resetting the KV cache with overlapping keyframes.

3D Reconstruction Drift Correctors - Maintains stable geometry and pose alignment across thousands of frames by combining coordinate grounding and trajectory memory.

Paged KV Cache Management - Manages transformer key-value memory in fixed-size pages, swapping old tokens to disk for long-sequence inference.

Spatial Coordinate Grounders - Maintains a persistent coordinate frame across thousands of frames by aligning geometric cues with trajectory memory.

Attention Cache Samplers - Reduces memory footprint by caching only every Nth frame in the attention cache while predicting all frames.

YAML Preset Paths - Ships YAML presets for defining chase-cam, birdseye, static, and pivot virtual camera paths.

Flythrough Renderers - Produces MP4 flythrough videos of reconstructed point clouds from configurable virtual camera paths.

Flythrough Video Generators - Generates MP4 flythrough videos of reconstructed point clouds with configurable virtual camera paths.

YAML Camera Paths - Controls rendered flythrough camera motion via YAML presets supporting chase-cam, birdseye, static, and pivot shots.

Headless Rendering Pipelines - Feeds video or image sequences through the model and outputs an MP4 flythrough without requiring a display or interactive viewer.

Robbyantlingbot-map

Features

Star history