30 open-source projects similar to hiyouga/easyr1, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best EasyR1 alternative.
xtuner is a comprehensive training engine for large language models, offering a toolkit for pre-training, supervised fine-tuning, and the optimization of vision-language multimodal models. It serves as a distributed training accelerator and a specialized framework for scaling Mixture-of-Experts models and aligning model behavior through reinforcement learning from human feedback. The project distinguishes itself through advanced memory and compute optimizations, such as sequence parallelism for ultra-long context windows and interleaved pipeline parallelism to reduce GPU idle time. It provide
This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across distributed GPU clusters. The repository distinguishes itself by offering deep-dive tutorials and implementation strategies for complex system challenges. It emphasizes high-performance architectural primitives, such as collective communication orchestration, distributed tensor sharding, and static gr
mmagic is a multimodal training pipeline and framework for generative AI, focusing on visual synthesis and restoration. It provides the infrastructure to build and train models for tasks such as text-to-image and text-to-video generation, 3D-aware content synthesis, and high-fidelity image translation using diffusion models and generative adversarial networks. The project distinguishes itself through specialized capabilities for generative model personalization, including techniques for fine-tuning subjects and styles. It also supports advanced visual manipulations such as latent space interp
Open CLIP is an open source framework for training and deploying Contrastive Language-Image Pre-training models. It serves as a vision-language training framework and multimodal embedding engine that maps images and text into a shared vector space for similarity searches and zero-shot classification. The project provides a toolkit for distributed training of contrastive models and includes an image-to-text generative model for producing natural language descriptions. It supports custom text encoder integration and utilizes teacher-student model distillation to transfer knowledge from large pr
Torchtune is a PyTorch-native library for fine-tuning, aligning, and quantizing large language models. It provides a configurable training pipeline orchestrated through YAML recipes, with CLI overrides and component swapping, distributed training via FSDP2, memory optimizations, and parameter-efficient fine-tuning methods like LoRA, DoRA, and QLoRA. The library distinguishes itself through its YAML-driven configuration system that defines all training parameters and instantiates components from config files, with full CLI override capability for any field or component at launch time. It suppo
AReaL is a system for agent orchestration, distributed model training, and parameter-efficient tuning. It provides a framework for developing multi-turn reasoning agents and training large models using reinforcement learning from human feedback. The project implements a toolkit for improving the visual reasoning and geometry problem solving capabilities of vision-language models. It utilizes a memory-efficient tuning system to optimize mathematical and reasoning models across different inference backends. The infrastructure supports large-scale training through tensor, pipeline, and expert p
Swin-Transformer is a deep learning framework designed for training and deploying hierarchical vision transformer models. It serves as a research library and toolkit for computer vision tasks, providing the infrastructure to build models that replace standard convolution operations with sliding window self-attention mechanisms. By utilizing a multi-scale feature hierarchy, the framework enables the processing of visual data at varying resolutions and spatial scales. The project distinguishes itself through its implementation of shifted window partitioning, which facilitates global information
MMF is a modular framework for building, training, and evaluating vision-and-language models. It provides a configuration-driven experiment system where model, dataset, and training parameters are defined through composable YAML files, alongside a curated model zoo of pretrained checkpoints for state-of-the-art multimodal architectures. The framework includes a multimodal dataset loader that downloads, processes, and batches vision-and-language data, and a vision-language model trainer supporting distributed training, mixed precision, and checkpoint-based resumption. The framework distinguish
SLIME is a distributed reinforcement learning framework for large language model post-training that bridges Megatron training with SGLang inference servers. It orchestrates scalable RL loops across GPU clusters, decoupling training and inference into independent processes that communicate over HTTP and NCCL for independent scaling and fault tolerance. The system supports multi-agent reinforcement learning workflows with parallel agent instances, customizable rollout strategies, and personalized agent serving that improves models from prior conversations without disrupting API serving. The fra
Torchtune is a PyTorch-native library for fine-tuning, aligning, and quantizing large language models. It provides a config-driven system for instantiating components, orchestrating distributed training, and managing parameter-efficient fine-tuning with quantization support, all through YAML-based configurations and command-line overrides. The library distinguishes itself through its comprehensive post-training workflow orchestration, combining supervised fine-tuning, preference optimization (DPO, PPO, GRPO), knowledge distillation, and quantization-aware training in a single configurable pip
RLinf is a distributed reinforcement learning orchestrator and embodied AI training framework. It provides the infrastructure to train vision-language-action models and robotic policies using a combination of reinforcement learning and supervised fine-tuning. The system is designed for scaling workloads across GPU clusters, managing the placement of actors, rollout workers, and environment components. It features a specialized robotics data collection pipeline for gathering teleoperated demonstrations and simulation trajectories into standardized replay buffers, alongside a hardware interface
Composer is a PyTorch distributed training framework designed for scaling large-scale models across multi-node GPU clusters. It functions as a large language model trainer, a distributed model optimizer, and a training lifecycle manager. The project differentiates itself as a deep learning regularization library, providing specialized optimization techniques such as Sharpness Aware Minimization, MixUp, and CutMix to improve model generalization. It further distinguishes its training flow through the use of sequence length warmup, progressive layer freezing, and sharded-state checkpointing for
This project is a distributed training infrastructure designed for aligning large language models through reinforcement learning. It functions as an end-to-end engine for complex alignment tasks, including proximal policy optimization, direct preference optimization, and iterative self-play. By providing a unified framework for multi-turn interactions and tool-use scenarios, it enables the development of models capable of reasoning and external environment engagement. The framework distinguishes itself through a decoupled architecture that separates model training from sample generation. This
TinyZero is a reinforcement learning framework and implementation designed to train language models to develop reasoning and self-verification abilities. It provides a training pipeline to optimize model performance on mathematical and logical tasks. The project serves as a minimal reproduction of the DeepSeek R1 architectural and training approach. It focuses on creating reasoning models that can solve structured problems through autonomous chain-of-thought discovery. The framework incorporates group relative policy optimization and reward-based self-correction to improve accuracy on logica
VLM-R1 is a reasoning vision-language model and embodied AI framework designed to map visual inputs and language instructions into physical navigation waypoints and robotic actions. It functions as a multimodal policy optimizer and an open vocabulary detector capable of locating objects based on arbitrary natural language descriptions. The system distinguishes itself through the use of chain-of-thought reasoning and reinforcement learning to solve complex visual and spatial tasks. It utilizes a video semantic memory system, which employs a visual cache to maintain a history of live video for
trlx is a reinforcement learning library and training framework designed to align large language models using human feedback. It serves as a distributed trainer and compute orchestrator for scaling high-parameter models across multiple GPUs and nodes. The project provides tools for reinforcement learning from human feedback and model alignment. It implements reward-model-based optimization and proximal policy optimization to refine model behavior based on goal-oriented rewards or human-labeled datasets. The framework covers distributed training strategies, including model parallelism, parame
This project is a collection of scripts and workflows for training, fine-tuning, and deploying large language models using the Hugging Face Transformers toolkit. It functions as a distributed training framework, a library for natural language processing task implementations, and a system for building retrieval-augmented generation chatbots. The repository includes specialized tools for model optimization, such as a Bayesian hyperparameter optimizer for automatically tuning model settings. It provides implementations for scaling model training across multiple graphics processors using data par
Fairseq is a PyTorch toolkit for sequence-to-sequence modeling, specializing in neural machine translation, automatic speech recognition, and large-scale language model training. It provides a framework for processing and aligning diverse data sources, including text, audio, and video, to support tasks such as speech-to-text conversion and multimodal sequence learning. The project is distinguished by its distributed training capabilities, which utilize parameter sharding, mixed-precision training, and CPU offloading to handle models that exceed single-device memory. It also includes specializ
AISystem is a comprehensive AI full-stack infrastructure project covering the entire pipeline from AI chip architecture to high-level training frameworks. It encompasses the development of AI compiler frameworks, inference engines, and distributed training orchestrators designed to coordinate workloads across a heterogeneous compute stack of CPUs, GPUs, and NPUs. The project focuses on the deep integration of software and hardware, employing software-hardware co-design to align tensor layouts with physical memory structures. It provides specialized capabilities for accelerating Transformer mo
verl is a distributed training system designed for large language model alignment and reinforcement learning. It provides a framework for executing post-training pipelines, including supervised fine-tuning and reinforcement learning from human feedback, to refine model behavior and agentic capabilities. The system utilizes a hybrid training and inference engine that optimizes memory and communication when switching between model generation and gradient updates. It supports multi-modal reinforcement learning for models processing both image and text data, and implements algorithms such as PPO
DeepSpeedExamples is a collection of reference implementations and scripts for training, fine-tuning, and executing inference on large-scale AI models using DeepSpeed optimization. It provides a distributed model training guide and practical workflows for adapting large language models through memory-efficient techniques. The repository includes specialized implementations for pipeline parallelism to handle models exceeding single GPU memory and a suite of examples for ZeRO memory optimization to reduce per-device overhead. It also features standardized test suites for benchmarking the throug
Swift is a toolkit for the full-parameter and parameter-efficient fine-tuning of large language and multimodal models. It functions as a multimodal model trainer for text, image, video, and audio data, and includes specialized tools for model compression and reinforcement learning from human feedback. The framework provides an alignment toolkit for optimizing model behavior using preference learning algorithms and reinforcement learning. It integrates parameter-efficient fine-tuning methods to adapt models with minimal memory and compute requirements, alongside utilities for reducing hardware
Megatron-LM is a distributed transformer training library and large language model training framework designed to scale models across thousands of GPUs. It functions as a GPU-optimized deep learning toolkit and a scaling engine for mixture-of-experts architectures, enabling the training of models with hundreds of billions of parameters. The project implements multi-dimensional model parallelism, combining tensor, pipeline, data, expert, and context-based workload distribution. It specifically optimizes mixture-of-experts architectures through integrated memory and communication improvements t
Accelerate is a PyTorch distributed training library that abstracts the boilerplate required to run models across multiple GPUs, TPUs, and CPUs. It functions as a deep learning model scaler and distributed hardware orchestrator, allowing the same training script to run on different hardware backends without modifying the core logic. The project provides a distributed training command line interface for configuring compute environments and launching jobs across single or multi-node clusters. It includes a mixed precision training framework to implement FP16 and BF16 precision, reducing memory
PyTorch Lightning is a high-level deep learning framework for PyTorch that automates training loops and removes repetitive engineering boilerplate. It functions as a structured pipeline for managing machine learning experiments, providing a distributed training orchestrator and tools for mixed-precision training. The framework decouples scientific model architecture from the engineering required for infrastructure and scaling. This separation allows the same model code to execute across CPUs, GPUs, or TPUs through a hardware-agnostic execution engine and a centralized trainer that manages the
This project is a comprehensive educational curriculum and structured learning path covering the full lifecycle of large language models. It provides a guided progression through the theory, architecture, training, and deployment of these models. The curriculum includes specialized guides on transformer architecture, model training tutorials, and frameworks for designing autonomous agents. It also provides dedicated resources for studying model safety and ethics. The material covers a wide range of technical capabilities, including distributed training strategies, parameter-efficient fine-tu
This project is a comprehensive educational resource and tutorial handbook for building, training, and deploying machine learning models using TensorFlow 2. It serves as a structured learning guide covering core deep learning concepts, including neural network architectures, automatic differentiation, and tensor operations. The handbook provides technical guidance on optimizing execution efficiency through GPU memory management, distributed training, and model quantization. It also includes detailed manuals for constructing high-performance data pipelines and exporting models for production s
Search-R1 is a distributed training system and reinforcement learning framework designed to create search-augmented language models. It provides an architecture for scaling model workloads across head and worker nodes while optimizing how models interleave internal reasoning with external tool calls. The system focuses on refining model behavior through custom reward signals and reinforcement learning to improve tool-use formatting and information retrieval. It implements an interleaved reasoning-search loop that allows models to alternate between internal thought generation and external data
OpenRLHF is a training framework and alignment library designed for reinforcement learning from human feedback across distributed GPU clusters. It provides tools for aligning large language models and multimodal vision-language models using algorithms such as PPO, GRPO, and DPO. The framework distinguishes itself through a distributed inference engine that overlaps sample rollout with training to increase throughput. It supports scaling to models exceeding 70 billion parameters via parameter sharding and handles long-context sequences through ring-attention sequence parallelism. The project