awesome-repositories.comBlog
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPBlogSitemapPrivacyTerms
Open R1 | Awesome Repository
← All repositories

huggingface/open-r1

0
View on GitHub↗
25,887 stars·2,412 forks·Python·apache-2.0·0 views

Open R1

AI search

Explore more awesome repositories

Describe what you need in plain English — the AI ranks thousands of curated open-source projects by relevance.

Let's find more awesome repositories

Features

  • Code-Integrated Training Frameworks - Integrates code execution environments during training to verify outputs and improve problem-solving accuracy.
  • Large Scale Training Suites - Orchestrates distributed training jobs across multi-node computing clusters to scale the development of high-performance language models.
  • Reasoning Model Training Suites - Provides a collection of tools and workflows for distilling, fine-tuning, and optimizing large language models for complex reasoning and coding tasks.
  • Distributed Training Orchestrators - Manages multi-node GPU clusters to execute large-scale model training and reinforcement learning jobs.
  • Model Evaluation Frameworks - Provides a standardized suite of benchmarks and testing tools designed to measure performance on mathematical, logical, and programming problem-solving tasks.
  • Reinforcement Learning Optimizers - Improves reasoning capabilities by optimizing model policies against output-derived rewards.
  • Synthetic Data Pipelines - Provides a framework for generating, filtering, and curating high-quality training datasets through model distillation and automated data integrity verification processes.
  • Distributed Training Managers - Configures and scales distributed training jobs across multiple nodes on computing clusters.
  • Model Benchmarking Suites - Measures model performance on reasoning, mathematics, and coding tasks by running a suite of standardized benchmarks against model outputs.
  • Model Distillation Frameworks - Trains smaller student models by leveraging reasoning traces and high-quality outputs generated by larger, more capable teacher models.
  • Model Distillation Pipelines - Trains smaller models on outputs from larger reasoning models to improve task-specific performance.
  • Sandboxed Code Execution Environments - Provides a secure infrastructure for running and validating model-generated code snippets within isolated containers during training and evaluation workflows.
  • Secure Execution Environments - Runs model-generated code in isolated sandboxes to validate programming and mathematical solutions during training or evaluation workflows.
  • Sandboxed Execution Environments - Provides secure, isolated environments for executing code to validate model outputs against test cases.
  • Execution Sandboxes - Runs untrusted model-generated code within isolated environments to verify correctness against test cases during training and evaluation.
  • Synthetic Data Generators - Creates synthetic training data using distilled reasoning models to produce high-quality datasets for fine-tuning and model improvement workflows.
  • Training Data Curators - Cleans, filters, and synthesizes high-quality datasets to ensure model integrity and improve performance on specialized reasoning tasks.
  • Model Inference Clusters - Deploys and serves large language models on high-performance computing clusters with multi-node GPU support.
  • External Execution Providers - Connects external code execution services to run and verify model-generated code snippets within a secure, isolated environment for training and evaluation tasks.
  • Reinforcement Learning Data Filters - Applies specific criteria to model outputs to select high-quality or relevant reasoning traces for reinforcement learning.
  • Training Configuration Frameworks - Uses modular configuration files to define and reproduce complex data processing, distillation, and reinforcement learning workflows.
  • Containerized Worker Orchestrators - Manages scalable, isolated worker processes in containers with configurable resource limits for code evaluation.
  • Training Workflow Orchestrators - Streamlines the reproduction of model performance using pre-configured training and evaluation workflows.
  • Data Decontamination Tools - Removes overlapping data samples from training datasets by comparing them against reference sets to ensure data integrity.
  • Dataset Filtering Pipelines - Applies automated criteria to training datasets to remove noise and ensure only high-quality reasoning samples are used for model updates.
  • Post-Training Configuration Recipes - Defines post-training workflows using predefined recipes for model distillation and fine-tuning.
  • Cluster Job Schedulers - Deploys and manages code execution workers across computing clusters using Slurm orchestration.
  • Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development of high-performance models through reinforcement learning and supervised fine-tuning.

    The project distinguishes itself by integrating secure, containerized code execution environments directly into the training and evaluation lifecycle. By allowing models to run and verify code snippets against test cases, the framework improves accuracy in mathematical and logical problem-solving. It further supports advanced reasoning capabilities through group relative policy optimization and automated synthetic data pipelines, which curate and filter high-quality reasoning traces for model updates.

    The system utilizes modular, configuration-driven recipes to streamline complex workflows, including data decontamination, dataset composition, and multi-node orchestration. It includes standardized benchmarking tools to measure performance across reasoning and coding domains, ensuring that training processes remain reproducible and data-centric. The framework is built to handle the full lifecycle of model improvement, from initial synthetic data generation to final performance evaluation on high-performance computing clusters.