EasyR1 is a distributed model training system and reinforcement learning framework for large language and vision-language models. It functions as a multimodal trainer and an implementation of a Proximal Policy Optimization pipeline designed to refine the reasoning and perception capabilities of models that process both text and images. The system specializes in distributing reinforcement learning workloads across multiple compute nodes to manage high memory requirements. It optimizes hardware utilization through padding-free training and fine-tuning to fit large models onto available graphics
AReaL is a system for agent orchestration, distributed model training, and parameter-efficient tuning. It provides a framework for developing multi-turn reasoning agents and training large models using reinforcement learning from human feedback. The project implements a toolkit for improving the visual reasoning and geometry problem solving capabilities of vision-language models. It utilizes a memory-efficient tuning system to optimize mathematical and reasoning models across different inference backends. The infrastructure supports large-scale training through tensor, pipeline, and expert p
🚀 Reinforcement Learning for Language Agents🌟
TinyZero is a reinforcement learning framework and implementation designed to train language models to develop reasoning and self-verification abilities. It provides a training pipeline to optimize model performance on mathematical and logical tasks. The project serves as a minimal reproduction of the DeepSeek R1 architectural and training approach. It focuses on creating reasoning models that can solve structured problems through autonomous chain-of-thought discovery. The framework incorporates group relative policy optimization and reward-based self-correction to improve accuracy on logica