# openpipe/art

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/openpipe-art).**

8,630 stars · 721 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/OpenPipe/ART
- Homepage: https://art.openpipe.ai
- awesome-repositories: https://awesome-repositories.com/repository/openpipe-art.md

## Topics

`agent` `agentic-ai` `grpo` `llms` `lora` `qwen` `qwen3` `reinforcement-learning` `rl`

## Description

ART is a platform for agentic training, providing a reinforcement learning framework, training environment, and compute orchestrator. It enables the improvement of multi-step agent reasoning and tool usage through group relative policy optimization and a judge-based reward modeling system.

The project features tools for model distillation to transfer capabilities from large teacher models to smaller architectures, as well as a system for capturing execution trajectories to generate synthetic training data. It supports specialized training workflows including supervised fine-tuning for baseline establishment and the creation of reproducible task scenarios.

The infrastructure manages GPU compute resources via ephemeral environment provisioning and hybrid local-remote execution. It includes capabilities for trajectory-based data capture, model checkpoint management, and the routing of low-rank adaptations for inference.

The system provides observability through agent workflow scoring, compute cost monitoring, and training metric tracking.

## Tags

### Artificial Intelligence & ML

- [Reinforcement Learning Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-optimizers.md) — Implements reinforcement learning optimizers, specifically group relative policy optimization, to refine agent reasoning and tool usage. ([source](https://art.openpipe.ai/integrations/langgraph-integration))
- [Agent Environments](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/development-runtime-environments/agent-environments.md) — Provides reproducible environments and standardized APIs for agents to practice complex tasks and tool usage. ([source](https://art.openpipe.ai/integrations/openenv-integration.md))
- [Agentic Training Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-training-frameworks.md) — Provides a platform for creating reproducible task scenarios and capturing execution trajectories for agent training.
- [Training Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/cross-hardware-model-inference/training-execution.md) — Executes model training and token generation across local GPUs and managed autoscaling clusters. ([source](https://art.openpipe.ai/getting-started/installation-setup.md))
- [Custom Evaluation Judges](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-evaluation-judges.md) — Defines custom evaluation criteria to guide judge models in prioritizing or penalizing specific response characteristics. ([source](https://art.openpipe.ai/fundamentals/ruler.md))
- [Synthetic Dataset Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-generation/synthetic-dataset-generators.md) — Generates synthetic training data by automatically creating diverse interaction tasks and task scenarios.
- [Group Relative Policy Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/group-relative-policy-optimization.md) — Implements group relative policy optimization to stabilize reinforcement learning by comparing rewards across trajectory groups.
- [Reward Functions](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/objectives-and-optimization/mathematical-training-objectives/reward-functions.md) — Merges relative rankings from judge models with hand-crafted scores to create composite performance signals. ([source](https://art.openpipe.ai/fundamentals/ruler.md))
- [Training Loop Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/pipelines-and-orchestration/training-orchestration-systems/training-loop-managers.md) — Coordinates the iterative cycle of parallel inference rollouts and model weight updates. ([source](https://cdn.jsdelivr.net/gh/openpipe/art@main/README.md))
- [Model Distillation Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/model-distillation-pipelines.md) — Provides pipelines to transfer capabilities from large teacher models to smaller architectures via distillation.
- [Reinforcement Learning Reward Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-reward-systems.md) — Implements a judge-based reward modeling system that ranks agent trajectories to provide RL signals.
- [Model-Based Trajectory Ranking](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-reward-systems/model-based-trajectory-ranking.md) — Compares multiple agent execution paths and assigns rewards using a model-based judge. ([source](https://art.openpipe.ai/fundamentals/ruler.md))
- [Reinforcement Learning Training](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-training.md) — Provides the execution engine for running training scenarios and updating model weights via reinforcement learning. ([source](https://art.openpipe.ai/fundamentals/art-client.md))
- [Group Relative Policy Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-training-pipelines/group-relative-policy-optimization.md) — Implements an iterative reinforcement learning loop using group relative policy optimization to refine agent reasoning. ([source](https://art.openpipe.ai/fundamentals/training-loop.md))
- [RL Loop Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/tool-integration-frameworks/rl-loop-integrations.md) — Connects reinforcement learning loops to orchestration tools and server protocols to optimize multi-step reasoning. ([source](https://cdn.jsdelivr.net/gh/openpipe/art@main/README.md))
- [Reward-Based Trajectory Management](https://awesome-repositories.com/f/artificial-intelligence-ml/training-data-generation/execution-trajectory-distillation/reward-based-trajectory-management.md) — Records sequences of system and user messages during rollouts to serve as data for reward-based optimization. ([source](https://art.openpipe.ai/fundamentals/training-loop.md))
- [Trajectory-Based Agent Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/trajectory-based-agent-optimization.md) — Refines model behavior by capturing and scoring sequences of tool calls and system messages.
- [Protocol Training](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-tooling/agent-and-tool-integrations/mcp-server-integrations/protocol-training.md) — Teaches models how to interact with MCP protocol servers to perform multi-step tool-based workflows. ([source](https://art.openpipe.ai/features/mcp-rl.md))
- [Conversation Branching Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/conversation-history-management/conversation-branching-systems.md) — Stores multiple branching conversation histories within a single trajectory to support sub-agent interactions and delegations.
- [Non-Linear Trajectory Tracking](https://awesome-repositories.com/f/artificial-intelligence-ml/conversation-history-management/non-linear-trajectory-tracking.md) — Stores multiple separate conversation histories within a single trajectory to support complex sub-agent interactions. ([source](https://art.openpipe.ai/resources/glossary.md))
- [Synthetic Scenario Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-generation/synthetic-dataset-generators/synthetic-scenario-generators.md) — Automatically generates diverse interaction tasks and edge cases to test external server integrations. ([source](https://art.openpipe.ai/features/mcp-rl.md))
- [Knowledge Distillation](https://awesome-repositories.com/f/artificial-intelligence-ml/knowledge-distillation.md) — Provides tools for transferring capabilities from large teacher models to smaller, more efficient architectures. ([source](https://art.openpipe.ai/fundamentals/sft-training.md))
- [Hybrid Execution Engines](https://awesome-repositories.com/f/artificial-intelligence-ml/local-model-execution/hybrid-execution-engines.md) — Enables execution of inference and training across both local hardware and remote GPU backends. ([source](https://art.openpipe.ai/fundamentals/art-client.md))
- [Local Model Training Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/local-model-integrations/local-model-training-integrations.md) — Runs training processes on user-owned hardware for a variety of open-weight model architectures. ([source](https://art.openpipe.ai/resources/models.md))
- [Training Progress Monitoring](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/machine-learning-training/utilities/training-progress-monitoring.md) — Tracks reward metrics and model performance over time to validate continuous improvement. ([source](https://art.openpipe.ai/getting-started/about.md))
- [Model Performance Benchmarking](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-evaluation-analysis/model-analysis/model-performance-benchmarking.md) — Benchmarks trained models against multiple baselines using validation sets to measure accuracy improvements. ([source](https://art.openpipe.ai/tutorials/summarizer.md))
- [Low-Rank Adaptation](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning/low-rank-adaptation.md) — Dynamically loads and serves low-rank adaptation weights to specialize agent behavior for different tasks.
- [Model Distillation Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/model-distillation-tools.md) — Includes tools for transferring knowledge from large teacher models to smaller, efficient student architectures.
- [LoRA Adapter Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/speech-to-text-integrations/unified-model-interfaces/llm-completion-interfaces/lora-adapter-interfaces.md) — Provides a backend that dynamically loads and switches between LoRA adapters during inference to improve agent reliability. ([source](https://art.openpipe.ai/fundamentals/training-loop.md))
- [Sub-Agent Trajectory Training](https://awesome-repositories.com/f/artificial-intelligence-ml/sub-agent-trajectory-training.md) — Supports training with multiple separate conversation histories within a single trajectory for sub-agent delegation. ([source](https://art.openpipe.ai/features/additional-histories.md))
- [Supervised Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/supervised-fine-tuning.md) — Supports supervised fine-tuning to establish model baselines and format adherence before reinforcement learning.
- [Adapter-Based Warm-Starts](https://awesome-repositories.com/f/artificial-intelligence-ml/supervised-fine-tuning/cold-start-initializations/adapter-based-warm-starts.md) — Allows reinforcement learning to start from existing adapters to stabilize early training and reduce compute costs. ([source](https://art.openpipe.ai/fundamentals/art-client.md))
- [Synthetic Data Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/synthetic-data-generators.md) — Generates custom models without pre-labeled datasets by automatically producing inputs and evaluating performance. ([source](https://cdn.jsdelivr.net/gh/openpipe/art@main/README.md))

### Part of an Awesome List

- [Workflow Reward Scoring](https://awesome-repositories.com/f/awesome-lists/ai/agentic-workflows/workflow-reward-scoring.md) — Evaluates the correctness of multi-step agent workflows using reward functions to guide the training process. ([source](https://art.openpipe.ai/integrations/langgraph-integration))

### DevOps & Infrastructure

- [GPU Training Clusters](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure-management/gpu-training-clusters.md) — Launches short-lived compute clusters for training tasks to decouple hardware management from training logic.
- [Training Orchestrators](https://awesome-repositories.com/f/devops-infrastructure/worker-node-management/distributed-orchestration/training-orchestrators.md) — Coordinates the transition between parallel inference rollouts and weight updates across distributed GPU hardware.
- [Training](https://awesome-repositories.com/f/devops-infrastructure/infrastructure-automation/training.md) — Automatically manages inference and training hardware to reduce operational overhead. ([source](https://cdn.jsdelivr.net/gh/openpipe/art@main/README.md))
- [Ephemeral GPU Environments](https://awesome-repositories.com/f/devops-infrastructure/infrastructure/infrastructure-as-code/management/infrastructure-orchestration/ephemeral-gpu-environments.md) — Launches short-lived compute clusters to decouple hardware management from agent training logic. ([source](https://art.openpipe.ai/getting-started/about.md))

### System Administration & Monitoring

- [Agent Trajectory Logs](https://awesome-repositories.com/f/system-administration-monitoring/audit-logs/agent-trajectory-logs.md) — Logs agent interactions and tool calls during execution to generate training data for reinforcement learning. ([source](https://art.openpipe.ai/integrations/langgraph-integration))
- [Training Trajectory Capture](https://awesome-repositories.com/f/system-administration-monitoring/audit-logs/agent-trajectory-logs/training-trajectory-capture.md) — Logs sequences of system messages and tool calls as training examples for reinforcement learning and supervised fine-tuning.
- [Model Training Metrics](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/model-training-metrics.md) — Logs critical training data including rewards, loss, and throughput to observability platforms. ([source](https://art.openpipe.ai/features/tracking-metrics.md))

### Testing & Quality Assurance

- [LLM-As-A-Judge Scoring](https://awesome-repositories.com/f/testing-quality-assurance/llm-as-a-judge-scoring.md) — Evaluates agent performance using a larger teacher model to rank trajectory quality against defined rubrics.
