What are the best open-source alternatives to RL4LMs?

30 open-source projects similar to allenai/rl4lms, ranked by shared features. Top picks: carperai/trlx, volcengine/verl, openrlhf/openrlhf, lvwerra/trl, axolotl-ai-cloud/axolotl, ganjinzero/rrhf, allenai/finegrainedrlhf, dunzeng/more, anthropics/constitutionalharmlessnesspaper, eit-nlp/accuracyparadox-rlhf.

Is carperai/trlx a good alternative to RL4LMs?

trlx is a reinforcement learning library and training framework designed to align large language models using human feedback. It serves as a distributed trainer and compute orchestrator for scaling high-parameter models across multiple GPUs and nodes. The project provides tools for reinforcement l…

Is volcengine/verl a good alternative to RL4LMs?

verl is a distributed training system designed for large language model alignment and reinforcement learning. It provides a framework for executing post-training pipelines, including supervised fine-tuning and reinforcement learning from human feedback, to refine model behavior and agentic capabili…

Is openrlhf/openrlhf a good alternative to RL4LMs?

OpenRLHF is a training framework and alignment library designed for reinforcement learning from human feedback across distributed GPU clusters. It provides tools for aligning large language models and multimodal vision-language models using algorithms such as PPO, GRPO, and DPO. The framework dist…

Is lvwerra/trl a good alternative to RL4LMs?

This project is a transformer post-training toolkit and reinforcement learning library designed to align language model behavior with human preferences. It provides a framework for managing the transition from supervised fine-tuning to reinforcement learning and preference optimization. The librar…

Is axolotl-ai-cloud/axolotl a good alternative to RL4LMs?

Axolotl is a configuration-driven framework designed for the fine-tuning, evaluation, and quantization of large language models. It functions as a comprehensive orchestrator for distributed training, enabling users to manage complex workflows across multi-node and multi-GPU environments. By utilizi…

Is dunzeng/more a good alternative to RL4LMs?

dunzeng/more is an open-source alternative to RL4LMs.

Is anthropics/constitutionalharmlessnesspaper a good alternative to RL4LMs?

This repository provides supplementary material for our paper Constitutional AI: Harmlessness from AI Feedback.

Is eit-nlp/accuracyparadox-rlhf a good alternative to RL4LMs?

eit-nlp/accuracyparadox-rlhf is an open-source alternative to RL4LMs.

Back to allenai/rl4lms

Open-source alternatives to RL4LMs

30 open-source projects similar to allenai/rl4lms, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best RL4LMs alternative.

carperai/trlx
carperai/trlx
4,749View on GitHub
trlx is a reinforcement learning library and training framework designed to align large language models using human feedback. It serves as a distributed trainer and compute orchestrator for scaling high-parameter models across multiple GPUs and nodes. The project provides tools for reinforcement learning from human feedback and model alignment. It implements reward-model-based optimization and proximal policy optimization to refine model behavior based on goal-oriented rewards or human-labeled datasets. The framework covers distributed training strategies, including model parallelism, parame
Python
View on GitHub4,749
volcengine/verl
volcengine/verl
22,015View on GitHub
verl is a distributed training system designed for large language model alignment and reinforcement learning. It provides a framework for executing post-training pipelines, including supervised fine-tuning and reinforcement learning from human feedback, to refine model behavior and agentic capabilities. The system utilizes a hybrid training and inference engine that optimizes memory and communication when switching between model generation and gradient updates. It supports multi-modal reinforcement learning for models processing both image and text data, and implements algorithms such as PPO
Python
View on GitHub22,015
openrlhf/openrlhf
OpenRLHF/OpenRLHF
9,675View on GitHub
OpenRLHF is a training framework and alignment library designed for reinforcement learning from human feedback across distributed GPU clusters. It provides tools for aligning large language models and multimodal vision-language models using algorithms such as PPO, GRPO, and DPO. The framework distinguishes itself through a distributed inference engine that overlaps sample rollout with training to increase throughput. It supports scaling to models exceeding 70 billion parameters via parameter sharding and handles long-context sequences through ring-attention sequence parallelism. The project
Pythonlarge-language-modelsopenai-o1proximal-policy-optimization
View on GitHub9,675

Open-source alternatives to RL4LMs

carperai/trlx

volcengine/verl

OpenRLHF/OpenRLHF

lvwerra/trl

axolotl-ai-cloud/axolotl

GanjinZero/RRHF

allenai/FineGrainedRLHF

dunzeng/MORE

anthropics/ConstitutionalHarmlessnessPaper

EIT-NLP/AccuracyParadox-RLHF

ernie-research/MA-RLHF

exlaw/DLMA

karpathy/micrograd

gximinglu/quark

hiyouga/LLaMA-Factory

deepspeedai/DeepSpeed

Haoxiang-Wang/directional-preference-alignment

huggingface/autotrain-advanced

huggingface/nanotron

huggingface/peft

huggingface/transformers

huggingface/trl

JAEarly/MIL-for-Non-Markovian-Reward-Modelling

jax-ml/jax

jhejna/few-shot-preference-rl

jhejna/inverse-preference-learning

halfrot/ALaRM

Kwai-YuanQi/MM-RLHF

lamini-ai/lamini

Cornell-RL/drpo