# hiyouga/easyr1

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/hiyouga-easyr1).**

5,034 stars · 372 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/hiyouga/EasyR1
- Homepage: https://verl.readthedocs.io
- awesome-repositories: https://awesome-repositories.com/repository/hiyouga-easyr1.md

## Description

EasyR1 is a distributed model training system and reinforcement learning framework for large language and vision-language models. It functions as a multimodal trainer and an implementation of a Proximal Policy Optimization pipeline designed to refine the reasoning and perception capabilities of models that process both text and images.

The system specializes in distributing reinforcement learning workloads across multiple compute nodes to manage high memory requirements. It optimizes hardware utilization through padding-free training and fine-tuning to fit large models onto available graphics processing units.

The framework covers reinforcement learning and reward model orchestration, including reinforcement learning from human feedback workflows. Its technical surface includes distributed data parallelism, hybrid precision training, and multimodal input pipelines for interleaved text and image data.

The project includes utilities for checkpoint-based state recovery and integrates with external logging tools for tracking training progress and performance metrics.

## Tags

### Artificial Intelligence & ML

- [Multimodal Reinforcement Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/large-language-models/reinforcement-learning-alignment/multimodal-reinforcement-learning.md) — Uses RL algorithms to refine the outputs of vision and language models through scalable training.
- [Data-Parallel Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/data-parallel-training.md) — Synchronizes model gradients across multiple compute nodes to enable training of models exceeding single-node memory.
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/distributed-training.md) — Provides a framework for distributing large-model RL workloads across multiple compute nodes.
- [Multi-Node Training Scaling](https://awesome-repositories.com/f/artificial-intelligence-ml/gpu-model-deployments/multi-node-training-scaling.md) — Distributes large-scale model training across multiple hardware nodes to manage GPU resources and memory. ([source](https://github.com/hiyouga/easyr1#readme))
- [Multimodal Model Trainers](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-trainers/multimodal-training-interfaces/multimodal-model-trainers.md) — Implements a training pipeline designed to optimize reasoning and perception in multimodal vision-language models.
- [Multimodal Training Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-trainers/multimodal-training-interfaces/multimodal-training-pipelines.md) — Provides end-to-end workflows for processing interleaved text and image data streams for vision-language model training.
- [Distributed Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-training.md) — Scales the training of large language models across multiple compute nodes to increase processing speed.
- [RL Post-Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/large-language-model-training-frameworks/rl-post-training.md) — Offers a scalable system for RL post-training of large language and vision-language models.
- [Vision-Language Training](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-frameworks/vision-model-training/vision-language-training.md) — Runs reinforcement learning pipelines to improve reasoning and perception in models processing both text and images.
- [Reinforcement Learning Training Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-training-pipelines.md) — Orchestrates scalable reinforcement learning pipelines to improve reasoning in multimodal models. ([source](https://github.com/hiyouga/easyr1#readme))
- [RLHF Training Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/rlhf-alignment-algorithms/rlhf-training-pipelines.md) — Coordinates the interaction between policy models, reward models, and value functions for iterative model refinement.
- [Checkpoint-Based Recovery](https://awesome-repositories.com/f/artificial-intelligence-ml/checkpoint-based-recovery.md) — Implements mechanisms to save and restore model weights and optimizer states for training stability.
- [Sequence Packing](https://awesome-repositories.com/f/artificial-intelligence-ml/convolutional-operations/input-padding-utilities/padding-maskers/sequence-padding-utilities/sequence-packing.md) — Packs variable-length sequences into single dense tensors to eliminate wasteful compute cycles during training.
- [Mixed Precision Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/distributed-and-accelerated-compute/training-acceleration-tools/mixed-precision-training.md) — Employs mixed-precision floating point formats to reduce graphics memory usage and accelerate training.
- [Large Model Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/inference-optimizations/large-model-optimizations.md) — Reduces hardware requirements through padding-free training and fine-tuning to fit large models on available GPUs.
- [Reinforcement Learning Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-optimizers.md) — Executes reinforcement learning algorithms using text and image datasets to refine model outputs. ([source](https://github.com/hiyouga/easyr1#readme))
- [Training Checkpointing](https://awesome-repositories.com/f/artificial-intelligence-ml/training-checkpointing.md) — Saves training progress and state to ensure fault tolerance and the ability to resume training. ([source](https://github.com/hiyouga/easyr1#readme))

### Part of an Awesome List

- [PPO Implementations](https://awesome-repositories.com/f/awesome-lists/ai/training-and-alignment/proximal-policy-optimization-alignment/ppo-implementations.md) — Provides a concrete implementation of Proximal Policy Optimization for refining generative multimodal models.
- [Reasoning Models](https://awesome-repositories.com/f/awesome-lists/ai/reasoning-models.md) — User-friendly framework for reasoning model training.
- [Reinforcement Learning Frameworks](https://awesome-repositories.com/f/awesome-lists/ai/reinforcement-learning-frameworks.md) — Simplified training pipeline for reasoning-focused models.

### Data & Databases

- [Training Memory Optimizers](https://awesome-repositories.com/f/data-databases/memory-optimization-strategies/training-memory-optimizers.md) — Implements padding-free training and fine-tuning techniques to reduce graphics memory requirements for large-scale model training. ([source](https://github.com/hiyouga/easyr1#readme))
