What does eric-mitchell/direct-preference-optimization do?

This project is a framework for aligning large language models with human preferences. It provides a library for optimizing model behavior by mapping preference data directly to a policy objective, bypassing the need for a separate reward model.

What are the main features of eric-mitchell/direct-preference-optimization?

The main features of eric-mitchell/direct-preference-optimization are: Direct Preference Optimization, Preference Alignment Objectives, Reward Modeling, Fine-Tuning Toolkits, Gradient-Based Parameter Updates, Data Parallelism, Large Language Models, Large-Scale Model Training.

What are some open-source alternatives to eric-mitchell/direct-preference-optimization?

Open-source alternatives to eric-mitchell/direct-preference-optimization include: internlm/xtuner — xtuner is a comprehensive training engine for large language models, offering a toolkit for pre-training, supervised… allenai/open-instruct — Open-Instruct is a distributed training and instruction tuning framework for large language models. It functions as a… huggingface/alignment-handbook — This project is an alignment framework and suite of pipelines for training language models using supervised… nndl/llm-beginner — This project is a collection of educational resources and technical guides focused on the development and… eleutherai/gpt-neox — gpt-neox is a distributed training system and framework for building large-scale autoregressive language models. It… inclusionai/areal — AReaL is a system for agent orchestration, distributed model training, and parameter-efficient tuning. It provides a…

Direct Preference Optimization - align LLMs w…

Open-source alternatives to Direct Preference Optimization

Similar open-source projects, ranked by how many features they share with Direct Preference Optimization.

internlm/xtuner
InternLM/xtuner
5,150View on GitHub
xtuner is a comprehensive training engine for large language models, offering a toolkit for pre-training, supervised fine-tuning, and the optimization of vision-language multimodal models. It serves as a distributed training accelerator and a specialized framework for scaling Mixture-of-Experts models and aligning model behavior through reinforcement learning from human feedback. The project distinguishes itself through advanced memory and compute optimizations, such as sequence parallelism for ultra-long context windows and interleaved pipeline parallelism to reduce GPU idle time. It provide
Pythonagentdeepseek-v3gpt-oss
View on GitHub5,150
allenai/open-instruct
allenai/open-instruct
3,586View on GitHub
Open-Instruct is a distributed training and instruction tuning framework for large language models. It functions as a coordinator for supervised fine-tuning, reinforcement learning from human feedback pipelines, and tool-use training, providing specialized roles for dataset curation and model alignment. The project distinguishes itself through a high-performance training architecture that utilizes actor-based distributed coordination and hybrid sharding to manage large GPU clusters. It implements advanced alignment techniques including direct preference optimization, group relative policy opt
Python
View on GitHub3,586
huggingface/alignment-handbook
huggingface/alignment-handbook
5,621View on GitHub
This project is an alignment framework and suite of pipelines for training language models using supervised fine-tuning and preference optimization. It provides tools for executing large-scale distributed training across multiple GPUs and compute nodes, alongside a system for measuring model helpfulness and dialogue quality through single-turn and multi-turn benchmarks. The framework includes specialized tools for direct preference optimization to refine model behavior using paired data without a separate reward model. It also supports constitutional AI alignment and the training of reward mo
Python
View on GitHub5,621

Direct Preference Optimization - align LLMs w… | Awesome Repos

eric-mitchelldirect-preference-optimization

Direct Preference Optimization

Features

Open-source alternatives to Direct Preference Optimization

InternLM/xtuner

allenai/open-instruct

huggingface/alignment-handbook

Frequently asked questions

Star history

Frequently asked questions

Open-source alternatives to Direct Preference Optimization

InternLM/xtuner

allenai/open-instruct

huggingface/alignment-handbook

nndl/llm-beginner