This project is a framework for aligning large language models with human preferences. It provides a library for optimizing model behavior by mapping preference data directly to a policy objective, bypassing the need for a separate reward model.
The main features of eric-mitchell/direct-preference-optimization are: Direct Preference Optimization, Preference Alignment Objectives, Reward Modeling, Fine-Tuning Toolkits, Gradient-Based Parameter Updates, Data Parallelism, Large Language Models, Large-Scale Model Training.
Open-source alternatives to eric-mitchell/direct-preference-optimization include: internlm/xtuner — xtuner is a comprehensive training engine for large language models, offering a toolkit for pre-training, supervised… allenai/open-instruct — Open-Instruct is a distributed training and instruction tuning framework for large language models. It functions as a… huggingface/alignment-handbook — This project is an alignment framework and suite of pipelines for training language models using supervised… nndl/llm-beginner — This project is a collection of educational resources and technical guides focused on the development and… eleutherai/gpt-neox — gpt-neox is a distributed training system and framework for building large-scale autoregressive language models. It… inclusionai/areal — AReaL is a system for agent orchestration, distributed model training, and parameter-efficient tuning. It provides a…
xtuner is a comprehensive training engine for large language models, offering a toolkit for pre-training, supervised fine-tuning, and the optimization of vision-language multimodal models. It serves as a distributed training accelerator and a specialized framework for scaling Mixture-of-Experts models and aligning model behavior through reinforcement learning from human feedback. The project distinguishes itself through advanced memory and compute optimizations, such as sequence parallelism for ultra-long context windows and interleaved pipeline parallelism to reduce GPU idle time. It provide
Open-Instruct is a distributed training and instruction tuning framework for large language models. It functions as a coordinator for supervised fine-tuning, reinforcement learning from human feedback pipelines, and tool-use training, providing specialized roles for dataset curation and model alignment. The project distinguishes itself through a high-performance training architecture that utilizes actor-based distributed coordination and hybrid sharding to manage large GPU clusters. It implements advanced alignment techniques including direct preference optimization, group relative policy opt
This project is an alignment framework and suite of pipelines for training language models using supervised fine-tuning and preference optimization. It provides tools for executing large-scale distributed training across multiple GPUs and compute nodes, alongside a system for measuring model helpfulness and dialogue quality through single-turn and multi-turn benchmarks. The framework includes specialized tools for direct preference optimization to refine model behavior using paired data without a separate reward model. It also supports constitutional AI alignment and the training of reward mo
This project is a collection of educational resources and technical guides focused on the development and implementation of large language models. It provides a comprehensive curriculum covering transformer architectures, training methods, and deployment strategies. The materials provide detailed instructions for building autonomous agents using reasoning loops and tool integration, as well as guides for fine-tuning models through supervised learning and preference optimization. It also includes tutorials for constructing retrieval augmented generation pipelines and implementing transformer m