Verl

verl is a distributed training system designed for large language model alignment and reinforcement learning. It provides a framework for executing post-training pipelines, including supervised fine-tuning and reinforcement learning from human feedback, to refine model behavior and agentic capabilities.

The system utilizes a hybrid training and inference engine that optimizes memory and communication when switching between model generation and gradient updates. It supports multi-modal reinforcement learning for models processing both image and text data, and implements algorithms such as PPO and GRPO to align models using reward signals.

The architecture focuses on distributed scaling through expert parallelism, device-aware placement mapping, and memory resharding. It further reduces resource overhead via low-rank adaptation and decoupled computation dataflows, while providing modular interfaces to integrate with various training and inference engines.

The project includes tools for experiment tracking to log training metrics and performance data to external monitoring platforms.

Features

Distributed Training - Offers a distributed framework for scaling large language model training across multiple GPUs using expert parallelism.

Large-Scale Model Training - Enables training of models with hundreds of billions of parameters using expert parallelism and distributed backends.

Expert Parallelism Configurations - Provides an expert-parallel distributed backend to scale training for massive parameter counts across multiple GPU nodes.

Reinforcement Learning Alignment - Implements reinforcement learning algorithms like PPO and GRPO to align large language models using reward signals.

Memory Resharding - Optimizes communication overhead by dynamically reconfiguring model partitions when switching between training and generation phases.

Sharded Device Mapping - Distributes model layers and tensors across multiple GPUs to optimize hardware utilization and bypass memory limits.

Model Training and Inference Engines - Ships a hybrid engine that optimizes memory and communication when switching between model generation and gradient updates.

Memory Resharding Optimizations - Optimizes memory and communication during transitions between generation and gradient updates using a hybrid engine.

Post-Training Configuration Recipes - Executes post-training workflows including supervised fine-tuning and reinforcement learning to refine model behavior.

Reinforcement Learning Algorithms - Implements core reinforcement learning algorithms including PPO and GRPO for reward-based model alignment.

Reinforcement Learning Training Pipelines - Provides an orchestration pipeline for executing supervised fine-tuning and RLHF to improve model safety and accuracy.

Supervised Fine-Tuning - Provides supervised fine-tuning capabilities as a prerequisite or complement to reinforcement learning.

Model Alignment and Feedback - Facilitates large-scale model alignment using reward signals and reinforcement learning while minimizing communication costs.

Agentic Interaction Training - Enables training conversation models to handle multiple turns and tool calls using reinforcement learning for complex agentic behaviors.

RL Dataflow Construction - Implements decoupled computation dataflows to construct complex post-training reinforcement learning workflows.

Large Language Model Optimization - Optimizes the speed and efficiency of large language models through reinforcement learning post-training processes.

Model Integration Interfaces - Provides modular interfaces to integrate with various training and inference engines for scalable model execution.

Multi-Modal Reinforcement Learning - Supports reinforcement learning for models that process both image and text data.

Multi-Modal Training - Applies reinforcement learning to models processing images and text to improve performance across diverse data types.

Low-Rank Adaptation - Integrates low-rank adaptation to reduce memory footprints by updating only a small subset of model weights.

Modular Provider Interfaces - Provides standardized wrappers to connect diverse training and inference engines while maintaining backend independence.

Logic And Infrastructure Decoupling - Implements architectural separation between reinforcement learning algorithm logic and the underlying hardware execution and data movement.

Agentic Reinforcement Learning - Reinforcement learning framework for large models.

Fine-Tuning Frameworks - Reinforcement learning framework for large models.

Large Language Models - Reinforcement learning framework for LLMs.

Model Training - Scalable reinforcement learning framework for training reward models.

Model Training Frameworks - Flexible and efficient reinforcement learning framework for LLMs.

Reinforcement Learning - Industrial-level RLHF training framework for LLMs.

Reinforcement Learning Frameworks - Flexible and efficient framework for reinforcement learning from human feedback.

volcengineverl

Features

Star history