The visitor is looking for software frameworks or libraries that facilitate the alignment of Large Language Models using Reinforcement Learning from Human Feedback (RLHF) and preference-based datasets.

lucidrains/palm-rlhf-pytorch is the closest match — This repository provides a comprehensive PyTorch implementation for RLHF, including reward modeling, policy optimization, and preference-based alignment, making it a direct fit for your requirements.. Other strong matches: allenai/open-instruct, inclusionai/areal, shibing624/medicalgpt, openaccess-ai-collective/axolotl.

Why does lucidrains/palm-rlhf-pytorch match “a library for RLHF and preference alignment”?

This repository provides a comprehensive PyTorch implementation for RLHF, including reward modeling, policy optimization, and preference-based alignment, making it a direct fit for your requirements.

Why does allenai/open-instruct match “a library for RLHF and preference alignment”?

Open-Instruct is a comprehensive framework designed specifically for LLM alignment, providing native support for RLHF pipelines, reward modeling, preference-based fine-tuning, and distributed training across large GPU clusters.

Why does inclusionai/areal match “a library for RLHF and preference alignment”?

AReaL is a comprehensive framework designed for RLHF, reward modeling, and distributed fine-tuning, providing the exact infrastructure needed to align large language models using human preference data.

Why does shibing624/medicalgpt match “a library for RLHF and preference alignment”?

MedicalGPT is a comprehensive framework for fine-tuning and aligning LLMs that explicitly includes support for RLHF, DPO, and reward modeling, making it a strong tool for the requested alignment tasks despite its specific focus on the medical domain.

Why does openaccess-ai-collective/axolotl match “a library for RLHF and preference alignment”?

Axolotl is a comprehensive fine-tuning framework that natively supports RLHF, reward modeling, and preference-based alignment pipelines alongside its distributed training and quantization capabilities.

LLM Alignment and RLHF Frameworks

These open-source libraries provide tools for aligning large language models using reinforcement learning from human feedback.

Find the best repos with AI.We'll search the best matching repositories with AI.

lucidrains/palm-rlhf-pytorch
lucidrains/PaLM-rlhf-pytorch
7,863View on GitHub
This is a PyTorch implementation of reinforcement learning from human feedback designed to align large language models with human values and preferences. It provides a framework for the PaLM architecture and incorporates parameter-efficient fine-tuning to adapt models while minimizing the number of updated weights. The system enables the development of reward models that act as scoring mechanisms built from human preference data. These models evaluate generative outputs to guide the alignment process. The workflow covers policy optimization using a clipped objective, reward modeling based on preference pairs, and the use of divergence penalties to keep the tuned model close to the original reference. It includes a transformer-based policy architecture and a buffer for sampling generated rollouts.
This repository provides a comprehensive PyTorch implementation for RLHF, including reward modeling, policy optimization, and preference-based alignment, making it a direct fit for your requirements.
PythonReward ModelingRLHF ImplementationsRLHF PyTorch Frameworks
View on GitHub7,863
allenai/open-instruct
allenai/open-instruct
3,586View on GitHub
Open-Instruct is a distributed training and instruction tuning framework for large language models. It functions as a coordinator for supervised fine-tuning, reinforcement learning from human feedback pipelines, and tool-use training, providing specialized roles for dataset curation and model alignment. The project distinguishes itself through a high-performance training architecture that utilizes actor-based distributed coordination and hybrid sharding to manage large GPU clusters. It implements advanced alignment techniques including direct preference optimization, group relative policy optimization, and a dynamic rubric system that evolves evaluation criteria via judge models. The framework covers a broad capability surface including instruction dataset engineering with contamination detection, the generation of preference-pair datasets, and the integration of external environments for tool-use learning. It also includes GPU-efficient training kernels, tensor parallelism for layer splitting, and performance benchmarking tools.
Open-Instruct is a comprehensive framework designed specifically for LLM alignment, providing native support for RLHF pipelines, reward modeling, preference-based fine-tuning, and distributed training across large GPU clusters.
PythonDirect Preference OptimizationPreference Alignment DatasetsReward Modeling
View on GitHub3,586
inclusionai/areal
inclusionAI/AReaL
3,559View on GitHub
AReaL is a system for agent orchestration, distributed model training, and parameter-efficient tuning. It provides a framework for developing multi-turn reasoning agents and training large models using reinforcement learning from human feedback. The project implements a toolkit for improving the visual reasoning and geometry problem solving capabilities of vision-language models. It utilizes a memory-efficient tuning system to optimize mathematical and reasoning models across different inference backends. The infrastructure supports large-scale training through tensor, pipeline, and expert parallelism. Its capability surface includes reward model construction based on human preference comparisons and both synchronous and asynchronous reinforcement learning algorithms to improve goal alignment and model reasoning.
AReaL is a comprehensive framework designed for RLHF, reward modeling, and distributed fine-tuning, providing the exact infrastructure needed to align large language models using human preference data.
PythonDistributed TrainingReward ModelingRLHF Training Pipelines
View on GitHub3,559
shibing624/medicalgpt
shibing624/MedicalGPT
4,774View on GitHub
MedicalGPT is an open-source framework for fine-tuning large language models, with a dedicated focus on adapting general models to the medical domain. It provides a complete pipeline that covers continued pretraining on domain-specific corpora, supervised instruction tuning, tokenizer vocabulary extension with medical terminology, and alignment to clinician preferences through direct preference optimization, reinforcement learning, or knowledge distillation. The framework also supports training models to invoke external tools and functions in multi-turn clinical conversations. The platform distinguishes itself by integrating multiple adaptation techniques into a single, configurable workflow. It handles multi-stage domain adaptation—chaining continued pretraining, supervised fine-tuning, preference alignment, and optional knowledge distillation—to inject specialized knowledge and then align model behavior. Beyond standard alignment methods, it offers adapter-based model merging, incremental pretraining with extended vocabularies, and a unified interface that supports over twenty open-source LLM families without requiring manual architecture adaptation. In addition to core training capabilities, MedicalGPT includes utilities for dataset preparation, such as formatting multi-turn conversations, converting dataset formats, generating synthetic role-play dialogues, and compiling pretraining corpora. It provides inference tools like an interactive command-line chat session and a web-based demo interface for serving trained models.
MedicalGPT is a comprehensive framework for fine-tuning and aligning LLMs that explicitly includes support for RLHF, DPO, and reward modeling, making it a strong tool for the requested alignment tasks despite its specific focus on the medical domain.
PythonDirect Preference OptimizationPreference Alignment DatasetsReward Modeling
View on GitHub4,774
openaccess-ai-collective/axolotl
OpenAccess-AI-Collective/axolotl
12,062View on GitHub
Axolotl is a distributed training orchestrator and fine-tuning framework for large language models, multimodal systems, and quantized models. It provides a structured environment for specializing pre-trained models through full parameter updates or low-rank adaptation, as well as aligning model outputs with human expectations via preference tuning pipelines and reward modeling. The system distinguishes itself through a configuration-driven pipeline that manages preprocessing and training workflows via a single file for reproducibility. It implements high-throughput optimizations such as multipacking sequence processing and distributed tensor parallelism to scale workloads across multiple GPUs and hardware nodes. The framework covers broad capability areas including memory optimization through quantization and reduced-precision fine-tuning, sharded data distribution for large datasets, and specialized training workflows for vision and audio models. It further supports human-aligned behavior tuning using reinforcement learning from human feedback.
Axolotl is a comprehensive fine-tuning framework that natively supports RLHF, reward modeling, and preference-based alignment pipelines alongside its distributed training and quantization capabilities.
PythonDistributed TrainingReward Modeling
View on GitHub12,062
openrlhf/openrlhf
OpenRLHF/OpenRLHF
9,675View on GitHub
OpenRLHF is a training framework and alignment library designed for reinforcement learning from human feedback across distributed GPU clusters. It provides tools for aligning large language models and multimodal vision-language models using algorithms such as PPO, GRPO, and DPO. The framework distinguishes itself through a distributed inference engine that overlaps sample rollout with training to increase throughput. It supports scaling to models exceeding 70 billion parameters via parameter sharding and handles long-context sequences through ring-attention sequence parallelism. The project covers a broad range of capabilities, including supervised fine-tuning, reward model development, and the training of multi-turn agents. It incorporates memory optimization techniques such as low-rank adaptation, optimizer state offloading, and sample packing to reduce compute overhead.
OpenRLHF is a comprehensive framework specifically designed for RLHF and preference-based alignment, offering native support for reward modeling, distributed training, and multiple alignment algorithms like PPO and DPO.
PythonDistributed TrainingReward ModelingPreference Alignment
View on GitHub9,675
jingyaogong/minimind
jingyaogong/minimind
51,834View on GitHub
This project is a comprehensive framework for the entire lifecycle of transformer-based language models, supporting everything from foundational pretraining to specialized deployment. It provides a modular toolkit for defining neural network architectures, managing data preparation pipelines, and executing training routines across various scales. The framework is designed to handle the full model development process, including supervised fine-tuning, behavioral alignment, and the integration of agentic capabilities. What distinguishes this framework is its focus on efficient training and advanced alignment methodologies. It incorporates techniques such as low-rank parameter adaptation and mixture-of-experts routing to optimize memory usage and computational efficiency. The system also features built-in support for direct preference optimization and automated feedback training, allowing users to refine model behavior and align outputs with human intent without requiring extensive manual labeling. The platform covers a broad range of capabilities, including knowledge distillation for creating efficient student models, sequence length extrapolation for extended context processing, and robust tool-calling integration for agentic workflows. It includes utilities for benchmarking model performance, converting weights for cross-platform compatibility, and serving predictions through standardized network APIs or local command-line interfaces.
This framework provides a comprehensive suite for the entire LLM lifecycle, specifically including built-in support for direct preference optimization, reward modeling, and behavioral alignment techniques required for RLHF.
PythonPreference Alignment DatasetsAlignment Pipelines
View on GitHub51,834
thinking-machines-lab/tinker-cookbook
thinking-machines-lab/tinker-cookbook
2,856View on GitHub
Tinker Cookbook is an open-source framework for fine-tuning large language models, supporting supervised learning, reinforcement learning, and parameter-efficient techniques like LoRA adapters. It provides a complete pipeline for aligning models with human preferences through multi-stage RLHF workflows, from supervised fine-tuning through preference optimization to reinforcement learning. The framework distinguishes itself through recipe-based training orchestration, where fine-tuning workflows are defined as composable recipe files that chain data loading, model configuration, and training loops into repeatable pipelines. It includes an async concurrent sampling engine that maximizes throughput during training rollouts and evaluation, and supports multi-agent reinforcement learning with self-play or competitive environments. The system manages model checkpoints through hub-centric weight management, enabling saving, loading, downloading, and publishing to remote hubs for sharing and deployment. Beyond core training, the framework covers hyperparameter sweeping across learning rates, LoRA ranks, and RL parameters to find optimal configurations. It handles vision-language model fine-tuning, prompt distillation into model weights, and multi-turn conversation training. The system includes tools for building, merging, and exporting LoRA adapters for efficient serving and HuggingFace compatibility, along with evaluation capabilities for measuring model performance on standard benchmarks. The documentation provides guidance on configuring training runs, building custom reinforcement learning environments, and diagnosing training issues through AI assistant skills.
This framework provides a comprehensive pipeline for RLHF, including reward modeling, preference optimization, and distributed training, making it a complete solution for aligning large language models.
PythonDirect Preference OptimizationRLHF Training Pipelines
View on GitHub2,856
verl-project/verl
verl-project/verl
22,000View on GitHub
This project is a distributed training infrastructure designed for aligning large language models through reinforcement learning. It functions as an end-to-end engine for complex alignment tasks, including proximal policy optimization, direct preference optimization, and iterative self-play. By providing a unified framework for multi-turn interactions and tool-use scenarios, it enables the development of models capable of reasoning and external environment engagement. The framework distinguishes itself through a decoupled architecture that separates model training from sample generation. This asynchronous design allows for continuous throughput by partitioning compute resources between actor, reference, and rollout models. It supports large-scale distributed execution across multi-node clusters, utilizing high-performance communication primitives to synchronize model states and aggregate losses while maintaining stability through advanced policy clipping and variance reduction techniques. Beyond its core reinforcement learning capabilities, the system includes comprehensive infrastructure for data management, reward modeling, and performance optimization. It features modular interfaces for integrating custom tools and external reward servers, alongside built-in support for sequence parallelism, low-precision training, and hardware-specific acceleration. Observability is integrated throughout the pipeline, providing tools for profiling distributed tasks, monitoring policy divergence, and tracking GPU memory usage. The project is implemented in Python and provides a containerized environment for deployment across diverse hardware architectures.
This framework provides a comprehensive, distributed infrastructure specifically built for RLHF, supporting key alignment techniques like PPO and DPO alongside reward modeling and large-scale fine-tuning.
PythonDistributed TrainingReward ModelingPreference Alignment
View on GitHub22,000
huggingface/open-r1
huggingface/open-r1
26,326View on GitHub
Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development of high-performance models through reinforcement learning and supervised fine-tuning. The project distinguishes itself by integrating secure, containerized code execution environments directly into the training and evaluation lifecycle. By allowing models to run and verify code snippets against test cases, the framework improves accuracy in mathematical and logical problem-solving. It further supports advanced reasoning capabilities through group relative policy optimization and automated synthetic data pipelines, which curate and filter high-quality reasoning traces for model updates. The system utilizes modular, configuration-driven recipes to streamline complex workflows, including data decontamination, dataset composition, and multi-node orchestration. It includes standardized benchmarking tools to measure performance across reasoning and coding domains, ensuring that training processes remain reproducible and data-centric. The framework is built to handle the full lifecycle of model improvement, from initial synthetic data generation to final performance evaluation on high-performance computing clusters.
Open-r1 is a comprehensive framework specifically built for the post-training lifecycle of LLMs, providing the necessary tools for reinforcement learning, reward modeling, and distributed training to align models on complex reasoning tasks.
PythonCode-Integrated Training FrameworksLarge Scale Training SuitesReasoning Model Training Suites
View on GitHub26,326
modelscope/ms-swift
modelscope/ms-swift
14,597View on GitHub
This project is a comprehensive toolkit designed for the full lifecycle management of large language and multimodal models. It functions as a unified orchestrator that handles the entire development process, ranging from dataset preparation and supervised fine-tuning to advanced reinforcement learning alignment and production-ready inference deployment. The platform distinguishes itself through a specialized reinforcement learning library that supports complex optimization algorithms, including group relative policy optimization and leave-one-out techniques, to improve model instruction-following and safety. It provides extensive support for training stability through sequence-level importance sampling, token-level loss normalization, and uncertainty-based weighting, ensuring reliable policy updates during the alignment phase. Beyond its core training capabilities, the framework integrates high-performance inference backends and model quantization to facilitate efficient production access. It supports diverse data modalities—including text, image, video, and audio—and offers a modular interface for registering custom model architectures, dialogue templates, and training callbacks. Users can manage these complex workflows through a centralized configuration system or a web-based graphical interface that simplifies task execution and performance monitoring.
This framework provides a comprehensive suite for the entire LLM alignment lifecycle, including native support for RLHF, reward modeling, and advanced reinforcement learning algorithms like GRPO, making it a direct fit for your requirements.
PythonLarge Language Model Fine-Tuning FrameworksLLM Fine-Tuning EnginesMultimodal Training Platforms
View on GitHub14,597
hiyouga/llamafactory
hiyouga/LlamaFactory
72,213View on GitHub
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The project distinguishes itself by offering a low-code visual dashboard that enables users to configure experiments and monitor performance metrics in real time without writing extensive custom scripts. It also features a configuration-driven orchestration system that decouples experiment logic from the underlying execution engine, alongside an OpenAPI-compliant server that exposes trained models as standard network endpoints for integration with external software. Beyond its core training capabilities, the platform supports real-time experiment tracking by streaming performance data to external monitoring services. This allows for the evaluation of model progress and the optimization of parameters throughout the development lifecycle. The software is designed to be installed and configured as a standalone environment for managing the end-to-end lifecycle of language model adaptation.
LlamaFactory is a comprehensive framework that natively supports RLHF, reward modeling, and preference-based dataset fine-tuning, making it a complete solution for LLM alignment workflows.
PythonExperiment TrackingLanguage Model Fine-TuningLarge Language Model Fine-Tuning Frameworks
View on GitHub72,213
optimalscale/lmflow
OptimalScale/LMFlow
8,488View on GitHub
LMFlow is a comprehensive suite for large language model fine-tuning, context extension, multimodal processing, and inference execution. It provides a toolkit for updating model parameters through full tuning or memory-efficient adapter algorithms, alongside an inference engine for executing tuned models via command-line or web-based interfaces. The framework includes a dedicated alignment suite for supervised tuning and reward model training to refine model behavior. It features a context window extender to increase maximum input lengths and a multimodal framework for building chatbots that process and generate responses from combined image and text inputs. The project covers broad capability areas including domain-specific and instruction-following fine-tuning, vocabulary expansion, and model performance benchmarking. It also incorporates memory optimization techniques, low-bit weight quantization for inference acceleration, and utilities for conversation formatting and training data ingestion.
LMFlow provides a comprehensive suite for fine-tuning and includes a dedicated alignment module for reward modeling and preference-based training, making it a capable framework for LLM alignment tasks.
PythonModel Fine-TuningReward Modeling
View on GitHub8,488
hiyouga/llama-efficient-tuning
hiyouga/LLaMA-Efficient-Tuning
72,239View on GitHub
This project is a fine-tuning framework and training pipeline designed to optimize and adapt large language and vision models. It provides a specialized toolkit for parameter-efficient tuning and supervised learning, serving as both a trainer for multimodal models and a deployment tool for serving fine-tuned models via high-performance inference engines. The framework focuses on reducing memory and compute requirements by updating a small subset of model parameters. It supports a wide range of adaptation strategies, including vision-language model training to align text, image, video, and audio data, as well as preference alignment to match model behavior with human expectations. The system covers a broad set of capabilities including supervised fine-tuning, instruction tuning, and core pre-training. It incorporates memory optimization through quantization and weight-merging pipelines, alongside data management for importing and preparing custom datasets. For operational management, it includes a web-based interface for task execution and integration with external dashboards for experiment metric tracking. The project provides utilities for exporting model checkpoints and deploying tuned models as web services using standardized, OpenAI-compatible API interfaces.
This framework provides a comprehensive suite for LLM alignment, including built-in support for reward modeling and preference-based optimization alongside its core fine-tuning capabilities.
PythonReward Modeling
View on GitHub72,239
ludwig-ai/ludwig
ludwig-ai/ludwig
11,717View on GitHub
Ludwig is a multimodal machine learning platform and low-code framework designed for building, training, and deploying neural networks. It enables the construction of models that process text, images, audio, and tabular data through a unified interface using declarative configuration files rather than custom code. The system features a specialized low-code framework for large language models, supporting supervised fine-tuning, preference alignment, and a constrained decoding tool to force structured data output via logit extraction. It also includes an automated model architecture search to identify optimal encoder and combiner combinations for specific datasets. The platform provides a distributed model training engine to scale workloads across compute clusters and containerized environments. Its capabilities extend to computer vision tasks like semantic segmentation, time-series forecasting, and a deployment pipeline that exports models as high-performance REST APIs for real-time inference. The project includes a command-line interface for executing training and evaluation tasks within provisioned container images.
Ludwig is a declarative machine learning framework that supports LLM fine-tuning and preference alignment, providing a unified interface for training and deploying models without requiring extensive custom code.
PythonDistributed Training
View on GitHub11,717
huggingface/transformers
huggingface/transformers
161,630View on GitHub
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference. The library features extensive support for model optimization and performance, including techniques like quantization, speculative decoding, and paged memory management for key-value caches. It provides native integration for distributed training across multi-node clusters, as well as flexible APIs for serving models via compatible inference servers. Developers can also utilize built-in utilities for model patching, custom kernel execution, and automated documentation generation to streamline development workflows.
This library provides the foundational infrastructure for training and fine-tuning transformer models, including the necessary components for reward modeling and RLHF, though it serves as a general-purpose machine learning framework rather than a specialized RLHF-only tool.
PythonAPI FrameworksByte Pair EncodingsHybrid
View on GitHub161,630
unslothai/unsloth
unslothai/unsloth
66,628View on GitHub
Unsloth is a high-performance training and inference platform designed to optimize the lifecycle of large language and multimodal models. It provides a comprehensive engine for fine-tuning, executing, and managing models locally, with a focus on reducing memory consumption and increasing compute speed on consumer-grade hardware. The platform distinguishes itself through hand-optimized kernels and automated computational graph techniques that maximize hardware throughput. It supports advanced training methodologies, including reinforcement learning for reasoning and efficient adapter-based fine-tuning, while offering a unified web-based interface for no-code model training, data preparation, and real-time performance monitoring. Beyond its core training capabilities, the project includes a local inference runtime that supports API-based deployment, tool-calling, and automated output verification. It manages the entire model development process, from dataset generation and hyperparameter configuration to model exporting and performance benchmarking across diverse hardware configurations. The software provides setup utilities for local development environments and includes diagnostic tools to assist with installation and hardware compatibility.
Unsloth provides a high-performance framework for fine-tuning and reinforcement learning, offering the necessary infrastructure for reward modeling and model alignment despite its primary focus on memory-efficient training optimization.
PythonLanguage Model TrainingCustom Kernel AcceleratorsEfficient Training Pipelines
View on GitHub66,628
internlm/internlm
InternLM/InternLM
7,224View on GitHub
InternLM is a large language model and a comprehensive suite of weights designed for text generation and complex reasoning. It functions as an inference engine for serving responses, a fine-tuning framework for adjusting model weights, and a platform for building autonomous AI agents. The system is capable of processing long-context input sequences up to one million tokens for document analysis. It employs chain-of-thought reasoning to solve knowledge-intensive tasks by generating intermediate logic steps before producing a final answer. The project covers model weight optimization through supervised fine-tuning and reinforcement learning from human feedback. It also provides the architecture necessary to execute external tools and deploy pre-trained weights via local or server-based hosting.
InternLM provides a comprehensive framework for fine-tuning and aligning large language models, including built-in support for reinforcement learning from human feedback and preference-based optimization.
PythonLarge Language ModelsAutonomous AI Agent FrameworksAutonomous AI Agents
View on GitHub7,224

LLM Alignment and RLHF Frameworks

lucidrains/PaLM-rlhf-pytorch

allenai/open-instruct

inclusionAI/AReaL

shibing624/MedicalGPT

OpenAccess-AI-Collective/axolotl

OpenRLHF/OpenRLHF

jingyaogong/minimind

thinking-machines-lab/tinker-cookbook

verl-project/verl

huggingface/open-r1

modelscope/ms-swift

hiyouga/LlamaFactory

OptimalScale/LMFlow

hiyouga/LLaMA-Efficient-Tuning

ludwig-ai/ludwig

huggingface/transformers

unslothai/unsloth

InternLM/InternLM