30 open-source projects similar to openhelix-team/spatial-forcing, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Spatial Forcing alternative.
RynnVLA-002: A Unified Vision-Language-Action and World Model
ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning
Official implementation of ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models
Being-H is BeingBeyond's family of human-centric embodied foundation models.
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos (ICML 2026)
This repo provides training and inference code for the paper "Large Video Planner Enables Generalizable Robot Control"
Official implementation of ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver.
NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards
Dexbotic: Open-Source Vision-Language-Action Toolbox
vjepa2 is a joint-embedding predictive architecture and video self-supervised learning framework. It functions as a visual representation learner and a robotic manipulation model designed to learn representations by predicting future latent states without reconstructing pixels. The system enables the pretraining of video encoders that learn temporally consistent features through masked-token prediction and multi-modal tokenization. It further maps these latent embeddings to specific physical movements via action-conditioned post-training to plan and execute robot arm grasping and picking task
F1: A Vision Language Action Model Bridging Understanding and Generation to Actions
InternVLA-A1: Unifying Understanding, Generation, and Action for Robotic Manipulation
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
NeurIPS 2025 CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification
Implementation of RA-L (2026) paper: VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback
Vision-Language-Action Optimization with Trajectory Ensemble Voting
This is the official repository for villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models.
ICRA 2026 VITRA: Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding.
giga-brain-0 is a robot action model framework designed to train and deploy neural networks that map multi-modal sensor data to physical robot control signals. It functions as a robot manipulation controller that processes high-dimensional observations to execute dexterous, long-horizon physical tasks. The project provides a multi-modal robot inference server using a client-server architecture to stream real-time vision and language observations for instant action prediction. It includes an embodiment fine-tuning pipeline to adapt pre-trained base models to specific robot hardware configurati
LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model ICRA 2026