Open R1

Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development of high-performance models through reinforcement learning and supervised fine-tuning.

The project distinguishes itself by integrating secure, containerized code execution environments directly into the training and evaluation lifecycle. By allowing models to run and verify code snippets against test cases, the framework improves accuracy in mathematical and logical problem-solving. It further supports advanced reasoning capabilities through group relative policy optimization and automated synthetic data pipelines, which curate and filter high-quality reasoning traces for model updates.

The system utilizes modular, configuration-driven recipes to streamline complex workflows, including data decontamination, dataset composition, and multi-node orchestration. It includes standardized benchmarking tools to measure performance across reasoning and coding domains, ensuring that training processes remain reproducible and data-centric. The framework is built to handle the full lifecycle of model improvement, from initial synthetic data generation to final performance evaluation on high-performance computing clusters.

Features

Code-Integrated Training Frameworks - Integrates code execution environments during training to verify outputs and improve problem-solving accuracy.
Large Scale Training Suites - Orchestrates distributed training jobs across multi-node computing clusters to scale the development of high-performance language models.
Reasoning Model Training Suites - Provides a collection of tools and workflows for distilling, fine-tuning, and optimizing large language models for complex reasoning and coding tasks.
Distributed Training Orchestrators - Manages multi-node GPU clusters to execute large-scale model training and reinforcement learning jobs.
Model Evaluation Frameworks - Provides a standardized suite of benchmarks and testing tools designed to measure performance on mathematical, logical, and programming problem-solving tasks.
Reinforcement Learning Optimizers - Improves reasoning capabilities by optimizing model policies against output-derived rewards.
Synthetic Data Pipelines - Provides a framework for generating, filtering, and curating high-quality training datasets through model distillation and automated data integrity verification processes.
Distributed Training Managers - Configures and scales distributed training jobs across multiple nodes on computing clusters.
Model Benchmarking Suites - Measures model performance on reasoning, mathematics, and coding tasks by running a suite of standardized benchmarks against model outputs.
Model Distillation Frameworks - Trains smaller student models by leveraging reasoning traces and high-quality outputs generated by larger, more capable teacher models.
Model Distillation Pipelines - Trains smaller models on outputs from larger reasoning models to improve task-specific performance.
Secure Execution Environments - Runs model-generated code in isolated sandboxes to validate programming and mathematical solutions during training or evaluation workflows.
Sandboxed Execution Environments - Provides secure, isolated environments for executing code to validate model outputs against test cases.
Synthetic Data Generators - Creates synthetic training data using distilled reasoning models to produce high-quality datasets for fine-tuning and model improvement workflows.
Training Data Curators - Cleans, filters, and synthesizes high-quality datasets to ensure model integrity and improve performance on specialized reasoning tasks.
Code and Formal Reasoning - Fully open reproduction of reasoning-focused language models.
Model Implementations - Fully open-source reproduction of advanced reasoning models.
Reasoning Datasets - Fully open reproduction of reasoning model datasets.
Reasoning Models - Open-source reproduction of reasoning model training pipelines.
Model Inference Clusters - Deploys and serves large language models on high-performance computing clusters with multi-node GPU support.
External Execution Providers - Connects external code execution services to run and verify model-generated code snippets within a secure, isolated environment for training and evaluation tasks.
Reinforcement Learning Data Filters - Applies specific criteria to model outputs to select high-quality or relevant reasoning traces for reinforcement learning.
Training Configuration Frameworks - Uses modular configuration files to define and reproduce complex data processing, distillation, and reinforcement learning workflows.
Containerized Worker Orchestrators - Manages scalable, isolated worker processes in containers with configurable resource limits for code evaluation.
Training Workflow Orchestrators - Streamlines the reproduction of model performance using pre-configured training and evaluation workflows.
Data Decontamination Tools - Removes overlapping data samples from training datasets by comparing them against reference sets to ensure data integrity.
Dataset Filtering Pipelines - Applies automated criteria to training datasets to remove noise and ensure only high-quality reasoning samples are used for model updates.
Post-Training Configuration Recipes - Defines post-training workflows using predefined recipes for model distillation and fine-tuning.
Cluster Job Schedulers - Deploys and manages code execution workers across computing clusters using Slurm orchestration.

PeterGriffinJin/Search-R1

5,022View on GitHub

Search-R1 is a distributed training system and reinforcement learning framework designed to create search-augmented language models. It provides an architecture for scaling model workloads across head and worker nodes while optimizing how models interleave internal reasoning with external tool calls. The system focuses on refining model behavior through custom reward signals and reinforcement learning to improve tool-use formatting and information retrieval. It implements an interleaved reasoning-search loop that allows models to alternate between internal thought generation and external data

Jiayi-Pan/TinyZero

13,168View on GitHub

TinyZero is a reinforcement learning framework and implementation designed to train language models to develop reasoning and self-verification abilities. It provides a training pipeline to optimize model performance on mathematical and logical tasks. The project serves as a minimal reproduction of the DeepSeek R1 architectural and training approach. It focuses on creating reasoning models that can solve structured problems through autonomous chain-of-thought discovery. The framework incorporates group relative policy optimization and reward-based self-correction to improve accuracy on logica

deepseek-ai/DeepSeek-R1

91,996View on GitHub

DeepSeek-R1 is an open-weights large language model focused on advanced reasoning. It uses chain-of-thought processing and internal monologues to solve complex mathematical and logical problems by breaking tasks into sequential, verifiable thought processes. The model is developed using reinforcement learning to optimize reasoning patterns and verify logical steps. It employs a distillation process to transfer these high-performance logic capabilities from a large teacher model into smaller, computationally efficient versions. The training framework incorporates group relative policy optimiz

OpenPipe/ART

8,630View on GitHub

ART is a platform for agentic training, providing a reinforcement learning framework, training environment, and compute orchestrator. It enables the improvement of multi-step agent reasoning and tool usage through group relative policy optimization and a judge-based reward modeling system. The project features tools for model distillation to transfer capabilities from large teacher models to smaller architectures, as well as a system for capturing execution trajectories to generate synthetic training data. It supports specialized training workflows including supervised fine-tuning for baselin

huggingfaceopen-r1

Features