What does eric-mitchell/direct-preference-optimization do?

Este proyecto es un framework para alinear modelos de lenguaje extensos (LLM) con las preferencias humanas. Proporciona una librería para optimizar el comportamiento del modelo mapeando los datos de preferencia directamente a un objetivo de política, evitando la necesidad de un modelo de recompensa independiente.

What are the main features of eric-mitchell/direct-preference-optimization?

The main features of eric-mitchell/direct-preference-optimization are: Direct Preference Optimization, Preference Alignment Objectives, Reward Modeling, Fine-Tuning Toolkits, Gradient-Based Parameter Updates, Data Parallelism, Large Language Models, Large-Scale Model Training.

What are some open-source alternatives to eric-mitchell/direct-preference-optimization?

Open-source alternatives to eric-mitchell/direct-preference-optimization include: internlm/xtuner — xtuner is a comprehensive training engine for large language models, offering a toolkit for pre-training, supervised… allenai/open-instruct — Open-Instruct is a distributed training and instruction tuning framework for large language models. It functions as a… huggingface/alignment-handbook — This project is an alignment framework and suite of pipelines for training language models using supervised… nndl/llm-beginner — This project is a collection of educational resources and technical guides focused on the development and… eleutherai/gpt-neox — gpt-neox is a distributed training system and framework for building large-scale autoregressive language models. It… inclusionai/areal — AReaL is a system for agent orchestration, distributed model training, and parameter-efficient tuning. It provides a…

Direct Preference Optimization - alinear LLMs…

Alternativas open-source a Direct Preference Optimization

Proyectos open-source similares, clasificados según cuántas características comparten con Direct Preference Optimization.

internlm/xtuner
InternLM/xtuner
5,150Ver en GitHub
xtuner is a comprehensive training engine for large language models, offering a toolkit for pre-training, supervised fine-tuning, and the optimization of vision-language multimodal models. It serves as a distributed training accelerator and a specialized framework for scaling Mixture-of-Experts models and aligning model behavior through reinforcement learning from human feedback. The project distinguishes itself through advanced memory and compute optimizations, such as sequence parallelism for ultra-long context windows and interleaved pipeline parallelism to reduce GPU idle time. It provide
Pythonagentdeepseek-v3gpt-oss
Ver en GitHub5,150
allenai/open-instruct
allenai/open-instruct
3,586Ver en GitHub
Open-Instruct is a distributed training and instruction tuning framework for large language models. It functions as a coordinator for supervised fine-tuning, reinforcement learning from human feedback pipelines, and tool-use training, providing specialized roles for dataset curation and model alignment. The project distinguishes itself through a high-performance training architecture that utilizes actor-based distributed coordination and hybrid sharding to manage large GPU clusters. It implements advanced alignment techniques including direct preference optimization, group relative policy opt
Python
Ver en GitHub3,586
huggingface/alignment-handbook
huggingface/alignment-handbook
5,621Ver en GitHub
This project is an alignment framework and suite of pipelines for training language models using supervised fine-tuning and preference optimization. It provides tools for executing large-scale distributed training across multiple GPUs and compute nodes, alongside a system for measuring model helpfulness and dialogue quality through single-turn and multi-turn benchmarks. The framework includes specialized tools for direct preference optimization to refine model behavior using paired data without a separate reward model. It also supports constitutional AI alignment and the training of reward mo
Python
Ver en GitHub5,621
nndl/llm-beginner
nndl/llm-beginner
6,421Ver en GitHub
This project is a collection of educational resources and technical guides focused on the development and implementation of large language models. It provides a comprehensive curriculum covering transformer architectures, training methods, and deployment strategies. The materials provide detailed instructions for building autonomous agents using reasoning loops and tool integration, as well as guides for fine-tuning models through supervised learning and preference optimization. It also includes tutorials for constructing retrieval augmented generation pipelines and implementing transformer m
Pythonagentfudannlpllm
Ver en GitHub6,421

Ver las 30 alternativas a Direct Preference Optimization

Direct Preference Optimization - alinear LLMs… | Awesome Repos

eric-mitchelldirect-preference-optimization

Direct Preference Optimization

Features

Alternativas open-source a Direct Preference Optimization

InternLM/xtuner

allenai/open-instruct

huggingface/alignment-handbook

nndl/llm-beginner

Frequently asked questions

Historial de estrellas

Frequently asked questions

Alternativas open-source a Direct Preference Optimization

InternLM/xtuner

allenai/open-instruct

huggingface/alignment-handbook

nndl/llm-beginner