# LLM Alignment and RLHF Frameworks

> Search results for `align models with RLHF and preference data` on awesome-repositories.com. 118 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/align-models-with-rlhf-and-preference-data

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/align-models-with-rlhf-and-preference-data).**

## Results

- [axolotl-ai-cloud/axolotl](https://awesome-repositories.com/repository/axolotl-ai-cloud-axolotl.md) (12,059 ⭐) — Axolotl is a configuration-driven framework designed for the fine-tuning, evaluation, and quantization of large language models. It functions as a comprehensive orchestrator for distributed training, enabling users to manage complex workflows across multi-node and multi-GPU environments. By utilizing structured configuration files, the platform streamlines the setup of training parameters, dataset paths, and hardware distribution strategies.

The project distinguishes itself through its support for diverse training methodologies, including full-parameter tuning, parameter-efficient adaptation,
- [lucidrains/palm-rlhf-pytorch](https://awesome-repositories.com/repository/lucidrains-palm-rlhf-pytorch.md) (7,863 ⭐) — This is a PyTorch implementation of reinforcement learning from human feedback designed to align large language models with human values and preferences. It provides a framework for the PaLM architecture and incorporates parameter-efficient fine-tuning to adapt models while minimizing the number of updated weights.

The system enables the development of reward models that act as scoring mechanisms built from human preference data. These models evaluate generative outputs to guide the alignment process.

The workflow covers policy optimization using a clipped objective, reward modeling based on
- [eleutherai/gpt-neo](https://awesome-repositories.com/repository/eleutherai-gpt-neo.md) (8,275 ⭐) — GPT-Neo is an open-source distributed training framework designed for scaling GPT-2 and GPT-3-style language models across multiple devices using mesh-tensorflow for model parallelism. It provides the infrastructure to train transformer-based language models with billions of parameters across distributed computing environments, making large-scale language model research accessible outside of proprietary systems.

The framework supports training both autoregressive GPT-style models and masked language models like BERT or RoBERTa, with configurable masking strategies and token handling. It inclu
- [pku-alignment/safe-rlhf](https://awesome-repositories.com/repository/pku-alignment-safe-rlhf.md) (1,605 ⭐) — Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
- [infrasys-ai/aiinfra](https://awesome-repositories.com/repository/infrasys-ai-aiinfra.md) (7,414 ⭐)
- [llava-rlhf/llava-rlhf](https://awesome-repositories.com/repository/llava-rlhf-llava-rlhf.md) (396 ⭐) — Aligning LMMs with Factually Augmented RLHF
- [allenai/open-instruct](https://awesome-repositories.com/repository/allenai-open-instruct.md) (3,586 ⭐) — Open-Instruct is a distributed training and instruction tuning framework for large language models. It functions as a coordinator for supervised fine-tuning, reinforcement learning from human feedback pipelines, and tool-use training, providing specialized roles for dataset curation and model alignment.

The project distinguishes itself through a high-performance training architecture that utilizes actor-based distributed coordination and hybrid sharding to manage large GPU clusters. It implements advanced alignment techniques including direct preference optimization, group relative policy opt
- [rlhflow/rlhf-reward-modeling](https://awesome-repositories.com/repository/rlhflow-rlhf-reward-modeling.md) (1,534 ⭐) — The initial release of this project focuses on the Bradley-Terry reward modeling and pairwise preference model. Since then, we have included more advanced techniques to construct a preference model. The structure of this project is
- [jingyaogong/minimind](https://awesome-repositories.com/repository/jingyaogong-minimind.md) (51,834 ⭐) — This project is a comprehensive framework for the entire lifecycle of transformer-based language models, supporting everything from foundational pretraining to specialized deployment. It provides a modular toolkit for defining neural network architectures, managing data preparation pipelines, and executing training routines across various scales. The framework is designed to handle the full model development process, including supervised fine-tuning, behavioral alignment, and the integration of agentic capabilities.

What distinguishes this framework is its focus on efficient training and adva
- [cube-js/cube](https://awesome-repositories.com/repository/cube-js-cube.md) (20,251 ⭐) — Cube is a semantic data layer that provides a unified framework for defining business metrics, dimensions, and relationships across diverse data sources. By acting as a headless business intelligence engine, it transforms raw data into a governed model that can be queried via SQL, REST, and GraphQL interfaces. This architecture ensures consistent data definitions and logic across all downstream analytical applications and reporting tools.

The platform distinguishes itself through its integrated conversational AI capabilities, which allow users to explore data using natural language. It orches
- [shibing624/medicalgpt](https://awesome-repositories.com/repository/shibing624-medicalgpt.md) (4,774 ⭐) — MedicalGPT is an open-source framework for fine-tuning large language models, with a dedicated focus on adapting general models to the medical domain. It provides a complete pipeline that covers continued pretraining on domain-specific corpora, supervised instruction tuning, tokenizer vocabulary extension with medical terminology, and alignment to clinician preferences through direct preference optimization, reinforcement learning, or knowledge distillation. The framework also supports training models to invoke external tools and functions in multi-turn clinical conversations.

The platform di
- [datajuicer/data-juicer](https://awesome-repositories.com/repository/datajuicer-data-juicer.md) (6,574 ⭐) — Data-Juicer is an open-source framework for cleaning, filtering, deduplicating, and transforming multimodal datasets to prepare them for training large language and vision models. It functions as a distributed data pipeline engine that runs processing jobs across Ray clusters, handling billions of samples with automatic operator fusion and adaptive parallelism. The framework provides a library of operators that leverage large language models for semantic extraction, filtering, and data synthesis within processing pipelines.

The project distinguishes itself through a YAML-based data recipe sys
- [rlhf-v/rlhf-v](https://awesome-repositories.com/repository/rlhf-v-rlhf-v.md) (309 ⭐) — [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
- [eclipse-theia/theia](https://awesome-repositories.com/repository/eclipse-theia-theia.md) (21,569 ⭐) — Theia is a modular framework designed for building professional-grade development environments that function as both local desktop applications and remote browser-based services. It provides a comprehensive toolkit for constructing specialized coding tools, allowing developers to assemble custom interfaces and backend logic through a flexible, contribution-based architecture.

The platform distinguishes itself through a highly extensible workbench that supports the integration of existing third-party editor plugins and standard language servers. By utilizing a dependency injection container an
- [pku-alignment/align-anything](https://awesome-repositories.com/repository/pku-alignment-align-anything.md) (4,661 ⭐) — Align Anything: Training All-modality Model with Feedback
- [thinking-machines-lab/tinker-cookbook](https://awesome-repositories.com/repository/thinking-machines-lab-tinker-cookbook.md) (2,856 ⭐) — Tinker Cookbook is an open-source framework for fine-tuning large language models, supporting supervised learning, reinforcement learning, and parameter-efficient techniques like LoRA adapters. It provides a complete pipeline for aligning models with human preferences through multi-stage RLHF workflows, from supervised fine-tuning through preference optimization to reinforcement learning.

The framework distinguishes itself through recipe-based training orchestration, where fine-tuning workflows are defined as composable recipe files that chain data loading, model configuration, and training l
- [datawhalechina/hello-agents](https://awesome-repositories.com/repository/datawhalechina-hello-agents.md) (59,685 ⭐) — This project provides a comprehensive framework for building, training, and managing autonomous agents. It enables the construction of systems that utilize language models to plan, manage memory, and execute multi-step tasks through iterative reasoning loops and tool-based actions.

The framework distinguishes itself by offering specialized capabilities for interacting with graphical user interfaces and legacy software, allowing agents to perceive visual elements and perform actions like a human user. It supports complex, cross-application workflows through graph-based orchestration and provid
- [yfzhang114/mm-rlhf](https://awesome-repositories.com/repository/yfzhang114-mm-rlhf.md) (200 ⭐) — [📖 arXiv Paper] [📊 MM-RLHF Data] [📝 Homepage]
- [camel-ai/camel](https://awesome-repositories.com/repository/camel-ai-camel.md) (17,253 ⭐) — This project is a comprehensive framework for building and managing autonomous agent systems. It provides a unified architecture for orchestrating multi-agent societies, where specialized agents collaborate through roleplay to decompose and solve complex tasks. The system integrates language models with external environments, enabling agents to perform real-world actions through a standardized tool-calling abstraction layer.

The framework distinguishes itself through its focus on iterative reasoning and data reliability. It employs automated feedback loops to refine agent outputs and self-eva
- [lyogavin/airllm](https://awesome-repositories.com/repository/lyogavin-airllm.md) (11,508 ⭐) — Airllm is a framework designed to execute and fine-tune large language models on consumer-grade hardware. By employing layer-wise model decomposition and memory-efficient loading techniques, the engine enables the operation of massive models that would otherwise exceed available system or video memory.

The project distinguishes itself through a suite of optimization strategies that balance memory footprint with performance. It utilizes block-wise weight quantization and asynchronous layer prefetching to reduce resource consumption and hide data transfer latency. Additionally, the framework su
- [ant-design/ant-design](https://awesome-repositories.com/repository/ant-design-ant-design.md) (98,362 ⭐) — Ant Design is an enterprise-grade component library and design system framework built for developing complex, data-heavy web applications. It provides a comprehensive collection of pre-built, state-driven interface elements that map data properties to rendered components, ensuring consistent interaction patterns and visual language across large-scale projects.

The library distinguishes itself through a robust styling architecture that utilizes design tokens and hierarchical configuration providers to propagate global settings like themes, locale, and layout direction. By employing component-l
- [etcd-io/etcd](https://awesome-repositories.com/repository/etcd-io-etcd.md) (51,838 ⭐) — etcd is a distributed, strongly consistent key-value store designed to provide reliable storage for critical system metadata and coordination primitives. It functions as a distributed consensus engine, utilizing a replicated log and leader-based state machine to ensure that all nodes in a cluster maintain a synchronized view of data. By providing atomic operations and linearizable reads and writes, it serves as a foundational component for distributed systems requiring high availability and fault tolerance.

The system distinguishes itself through its multi-version concurrency control, which e
- [linkedin/liger-kernel](https://awesome-repositories.com/repository/linkedin-liger-kernel.md) (6,148 ⭐) — Liger-Kernel is a collection of pre-built fused Triton kernels and patching utilities designed to accelerate large language model training. It provides drop-in kernel replacements for common LLM operations such as RMSNorm, cross-entropy loss, and attention, enabling increased throughput and reduced memory usage while preserving bitwise-exact gradients. The project serves as a toolkit for composing custom model architectures from individual optimized kernels and for patching pre-existing models with minimal code changes.

The project distinguishes itself through its ability to perform runtime m
- [guitarbum722/align](https://awesome-repositories.com/repository/guitarbum722-align.md) (84 ⭐) — A general purpose application and library for aligning text.
- [aseprite/aseprite](https://awesome-repositories.com/repository/aseprite-aseprite.md) (37,521 ⭐) — Aseprite is a specialized graphics editor and animation suite designed for the creation of pixel-based artwork. It provides a comprehensive environment for managing multi-layered animation sequences, offering tools for frame-by-frame design, onion skinning, and real-time motion previews. The application is built to handle both indexed color palettes and full-color RGB editing, allowing users to maintain precise control over pixel data and transparency.

What distinguishes Aseprite is its focus on programmable workflows and game asset production. It features a scriptable command architecture th
- [klaude/eloquent-preferences](https://awesome-repositories.com/repository/klaude-eloquent-preferences.md) (30 ⭐) — Use this library to bind multiple key/value pair preferences to your application's Eloquent models. Preferences are stored in your application's database so they can be easily stored and queried for. This library supports Eloquent 5 through 8 installed either standalone or as a part of the full…
- [verl-project/verl](https://awesome-repositories.com/repository/verl-project-verl.md) (22,000 ⭐) — This project is a distributed training infrastructure designed for aligning large language models through reinforcement learning. It functions as an end-to-end engine for complex alignment tasks, including proximal policy optimization, direct preference optimization, and iterative self-play. By providing a unified framework for multi-turn interactions and tool-use scenarios, it enables the development of models capable of reasoning and external environment engagement.

The framework distinguishes itself through a decoupled architecture that separates model training from sample generation. This
- [wangclnlp/vision-llm-alignment](https://awesome-repositories.com/repository/wangclnlp-vision-llm-alignment.md) (3 ⭐) — This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
- [cmusatyalab/openface](https://awesome-repositories.com/repository/cmusatyalab-openface.md) (15,398 ⭐) — Openface is a deep learning toolkit designed for facial recognition and identity verification. It provides a comprehensive pipeline for detecting faces, aligning landmarks, and transforming facial images into compact numerical vectors. By utilizing these embeddings, the system enables identity classification and similarity comparison through geometric distance calculations.

The project distinguishes itself by integrating research-oriented diagnostic tools alongside its core recognition capabilities. It includes utilities for visualizing high-dimensional feature clusters, inspecting internal c
- [facebookresearch/fairseq](https://awesome-repositories.com/repository/facebookresearch-fairseq.md) (32,228 ⭐) — Fairseq is a PyTorch toolkit for sequence-to-sequence modeling, specializing in neural machine translation, automatic speech recognition, and large-scale language model training. It provides a framework for processing and aligning diverse data sources, including text, audio, and video, to support tasks such as speech-to-text conversion and multimodal sequence learning.

The project is distinguished by its distributed training capabilities, which utilize parameter sharding, mixed-precision training, and CPU offloading to handle models that exceed single-device memory. It also includes specializ
- [erikrhanson/problem-solving-with-algorithms-and-data-structures-using-python](https://awesome-repositories.com/repository/erikrhanson-problem-solving-with-algorithms-and-data-structures-using-python.md) (330 ⭐) — Problem-Solving-with-Algorithms-and-Data-Structures-Using-Python
- [openrlhf/openrlhf](https://awesome-repositories.com/repository/openrlhf-openrlhf.md) (9,675 ⭐) — OpenRLHF is a training framework and alignment library designed for reinforcement learning from human feedback across distributed GPU clusters. It provides tools for aligning large language models and multimodal vision-language models using algorithms such as PPO, GRPO, and DPO.

The framework distinguishes itself through a distributed inference engine that overlaps sample rollout with training to increase throughput. It supports scaling to models exceeding 70 billion parameters via parameter sharding and handles long-context sequences through ring-attention sequence parallelism.

The project
- [scottyab/secure-preferences](https://awesome-repositories.com/repository/scottyab-secure-preferences.md) (1,517 ⭐) — Android Shared preference wrapper than encrypts the values of Shared Preferences. It's not bullet proof security but rather a quick win for incrementally making your android app more secure.
- [opengvlab/internvl](https://awesome-repositories.com/repository/opengvlab-internvl.md) (10,061 ⭐) — InternVL is a vision-language model framework that fuses a visual encoder with a large language model to translate image features into textual tokens for reasoning. It provides a system for multimodal inference and dialogue, enabling the processing of images and text to answer questions or generate descriptions.

The project is distinguished by its high-resolution image processing, which uses dynamic tiling to maintain detail for images up to 4K resolution, and its chain-of-thought visual reasoning for solving complex mathematical and spatial problems. It also supports temporal frame sampling
- [directus/directus](https://awesome-repositories.com/repository/directus-directus.md) (36,030 ⭐) — Directus is a headless content platform that functions as a backend service, automatically generating REST and GraphQL APIs by performing introspection on existing SQL database schemas. It serves as a unified data orchestration layer, decoupling content management from frontend delivery while providing a secure, stateless gateway for database transactions.

The platform distinguishes itself through a granular role-based access control engine that enforces security policies at the field level across all API endpoints. It includes a visual, low-code administrative dashboard that allows non-techn
- [apple/foundationdb](https://awesome-repositories.com/repository/apple-foundationdb.md) (16,446 ⭐) — FoundationDB is an ACID-compliant distributed transactional key-value store. It functions as a scalable database engine that ensures strict serializability and data consistency across a cluster of servers using a shared-nothing architecture.

The system is distinguished by its multi-region replication capabilities, allowing data to be synchronized across different datacenters for high availability and disaster recovery. It utilizes optimistic concurrency control to manage distributed transactions and employs a majority-based coordination system to maintain cluster state.

The platform provides
- [hiyouga/llama-efficient-tuning](https://awesome-repositories.com/repository/hiyouga-llama-efficient-tuning.md) (72,239 ⭐) — This project is a fine-tuning framework and training pipeline designed to optimize and adapt large language and vision models. It provides a specialized toolkit for parameter-efficient tuning and supervised learning, serving as both a trainer for multimodal models and a deployment tool for serving fine-tuned models via high-performance inference engines.

The framework focuses on reducing memory and compute requirements by updating a small subset of model parameters. It supports a wide range of adaptation strategies, including vision-language model training to align text, image, video, and aud
- [wbond/sublime_alignment](https://awesome-repositories.com/repository/wbond-sublime-alignment.md) (521 ⭐) — Easy alignment of multiple selections and multi-line selections
- [ankitects/anki](https://awesome-repositories.com/repository/ankitects-anki.md) (28,571 ⭐) — Anki is a cross-platform flashcard management system designed to optimize long-term memory retention through spaced-repetition learning. It functions as a digital learning assistant that uses active recall practice and automated scheduling algorithms to determine the ideal timing for card reviews based on individual performance history. The core system relies on a local relational database to ensure data persistence and portability, while supporting complex study workflows through flexible note-type schema modeling and template-driven content rendering.

The platform distinguishes itself throu
- [huggingface/smol-course](https://awesome-repositories.com/repository/huggingface-smol-course.md) (6,661 ⭐) — This project is an educational program focused on the alignment of small language models. It provides a technical curriculum and a series of courses designed to teach how to align models with human preferences and behaviors.

The material covers the implementation of preference optimization algorithms and the adaptation of vision-language models to process both text and image data simultaneously. It also includes instructional guides on synthetic data generation to improve model performance in specialized domains.

The curriculum encompasses supervised fine-tuning workflows, the use of chat te
- [sindresorhus/preferences](https://awesome-repositories.com/repository/sindresorhus-preferences.md) (1,546 ⭐) — Just pass in some view controllers and this package will take care of the rest. Built-in SwiftUI support.
- [ml-gsai/llada](https://awesome-repositories.com/repository/ml-gsai-llada.md) (3,580 ⭐) — LLaDA is a masked diffusion language model and conditional text generator. It generates text by iteratively refining masked tokens through a diffusion process rather than predicting the next token in a sequence.

The project functions as a vision-language diffusion model, converting visual inputs into text responses. It also serves as a preference optimization framework that uses log-likelihood estimation and evidence lower bounds to tune model responses.

The system supports multi-round conversational AI and text sequence evaluation. It integrates vision-language embedding for cross-modal con
- [fingerprintjs/fingerprintjs](https://awesome-repositories.com/repository/fingerprintjs-fingerprintjs.md) (27,334 ⭐) — Fingerprint is a visitor identification and fraud detection platform that generates persistent, unique identifiers by analyzing browser and device attributes. By extracting technical signals from the client environment, it enables reliable user tracking across sessions without relying on traditional cookies.

The platform distinguishes itself through its focus on high-accuracy identification and security-first architecture. It employs edge-side proxying to bypass ad-blockers and privacy restrictions, ensuring consistent data collection. To maintain data integrity, it uses cryptographic payload
- [ujjwalkarn/data-mining-with-r](https://awesome-repositories.com/repository/ujjwalkarn-data-mining-with-r.md) (6 ⭐) — This is the notes of data mining with r. please refer to: http://www.liaad.up.pt/~ltorgo/DataMiningWithR Thanks goes to the author. 20111203
- [huggingface/trl](https://awesome-repositories.com/repository/huggingface-trl.md) (18,653 ⭐) — This library provides a comprehensive framework for fine-tuning, aligning, and distilling transformer-based language models. It serves as a toolkit for adapting models to specialized domains through supervised learning, while offering advanced methodologies to improve output quality and reasoning capabilities.

The project distinguishes itself through specialized alignment and optimization techniques, including direct preference optimization and reinforcement learning, which allow models to be tuned against human preferences without complex reward modeling. It further supports training efficie
- [implicit-seman-align/implicit-semantic-response-alignment](https://awesome-repositories.com/repository/implicit-seman-align-implicit-semantic-response-alignment.md) (6 ⭐) — Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation"
- [anthropics/claude-code](https://awesome-repositories.com/repository/anthropics-claude-code.md) (132,728 ⭐) — Anthropic's terminal-native AI coding agent.
- [intel/ipex-llm](https://awesome-repositories.com/repository/intel-ipex-llm.md) (8,836 ⭐) — Intel XPU LLM Acceleration Library is a toolkit designed to accelerate large language model inference and finetuning on Intel CPUs, GPUs, and NPUs. It provides a distributed inference engine for scaling models across multiple accelerators, a multimodal model runtime for vision and speech tasks, and a low-bit model quantization tool for converting weights into INT4, FP8, and GGUF formats.

The project features a parameter-efficient finetuning framework that enables model adaptation using QLoRA and DPO on Intel hardware. It distinguishes itself by providing specialized optimizations for Intel XP
- [open-speech/speech-aligner](https://awesome-repositories.com/repository/open-speech-speech-aligner.md) (410 ⭐) — speech-aligner，是一个从“人声语音”及其“语言文本”，产生音素级别时间对齐标注的工具。speech-aligner, is a tool that generate phoneme-level alignment between human speech and its transcription
- [docmost/docmost](https://awesome-repositories.com/repository/docmost-docmost.md) (19,049 ⭐) — Docmost is an open-source knowledge management system designed as a collaborative documentation platform for teams. It functions as an enterprise wiki that centralizes organizational information into structured, searchable workspaces, enabling users to create, organize, and share content through a hierarchical system of spaces and pages.

The platform distinguishes itself by integrating artificial intelligence directly into the documentation lifecycle. It utilizes vector-based semantic search to allow for natural language queries across stored content and provides AI-assisted tools for draftin