rllm is an asynchronous reinforcement learning framework for training language agents. It provides a unified pipeline that runs the same agent code for both evaluation and training, automatically capturing traces for gradient computation. The framework supports distributed reinforcement learning across multiple GPUs and nodes using pluggable backends, and executes agents in isolated sandboxes—either locally or in the cloud—for safe and scalable rollout collection. It trains agents built with LangGraph, SmolAgents, OpenAI Agents SDK, or custom frameworks without requiring core logic changes.
The framework distinguishes itself through native multi-agent training orchestration, where collaborative workflows such as solver-judge pairs learn from shared or competing trajectories with differentiated rewards per agent role. It includes a library of over 50 curated benchmarks spanning math, code, QA, and vision, and provides a suite of pre-built reward functions and graders. Performance optimizations include pre-provisioned sandbox queues and startup snapshot caching to reduce rollout latency, and a transparent HTTP proxy captures token-level data from any inference request without modifying agent code.
Beyond its core training capability, rllm offers a CLI for launching training and evaluation jobs with automated dataset handling, and supports progressive context length scaling, parameter-efficient fine-tuning via LoRA, and multimodal model training. It integrates AI-backed run analysis, real-time web dashboard monitoring, and full-text search across training artifacts. The framework’s pluggable backend interface and environment-variable-driven configuration allow switching between Ray-distributed, managed-service, or single-machine backends without code changes, and its curated dataset management and custom dataset integration methods make it straightforward to bring new tasks into the training workflow.