Interactive Post-Training for Vision-Language-Action Models Official implementation of RIPT-VLA. Parts of the repo are built on a fork of QueST.
We provide examples to fine-tune Octo, on the top of HIL-SERL that provides the base environment to perform robotic manipulation tasks with human interventions. The following sections describe how to use our code.
This repository contains the code for the paper What Can RL Bring to VLA Generalization? An Empirical Study. The pretrained checkpoints are available at HuggingFace.