30 open-source projects similar to allenai/rl4lms, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best RL4LMs alternative.
trlx is a reinforcement learning library and training framework designed to align large language models using human feedback. It serves as a distributed trainer and compute orchestrator for scaling high-parameter models across multiple GPUs and nodes. The project provides tools for reinforcement learning from human feedback and model alignment. It implements reward-model-based optimization and proximal policy optimization to refine model behavior based on goal-oriented rewards or human-labeled datasets. The framework covers distributed training strategies, including model parallelism, parame
verl is a distributed training system designed for large language model alignment and reinforcement learning. It provides a framework for executing post-training pipelines, including supervised fine-tuning and reinforcement learning from human feedback, to refine model behavior and agentic capabilities. The system utilizes a hybrid training and inference engine that optimizes memory and communication when switching between model generation and gradient updates. It supports multi-modal reinforcement learning for models processing both image and text data, and implements algorithms such as PPO
OpenRLHF is a training framework and alignment library designed for reinforcement learning from human feedback across distributed GPU clusters. It provides tools for aligning large language models and multimodal vision-language models using algorithms such as PPO, GRPO, and DPO. The framework distinguishes itself through a distributed inference engine that overlaps sample rollout with training to increase throughput. It supports scaling to models exceeding 70 billion parameters via parameter sharding and handles long-context sequences through ring-attention sequence parallelism. The project
This project is a transformer post-training toolkit and reinforcement learning library designed to align language model behavior with human preferences. It provides a framework for managing the transition from supervised fine-tuning to reinforcement learning and preference optimization. The library distinguishes itself through a specialized focus on preference optimization and reward modeling, enabling the adjustment of model outputs based on preferred versus rejected examples. It also includes capabilities for training agents within controlled sandbox environments using task suites and verif
Axolotl is a configuration-driven framework designed for the fine-tuning, evaluation, and quantization of large language models. It functions as a comprehensive orchestrator for distributed training, enabling users to manage complex workflows across multi-node and multi-GPU environments. By utilizing structured configuration files, the platform streamlines the setup of training parameters, dataset paths, and hardware distribution strategies. The project distinguishes itself through its support for diverse training methodologies, including full-parameter tuning, parameter-efficient adaptation,
This repository provides supplementary material for our paper Constitutional AI: Harmlessness from AI Feedback.
micrograd is a scalar autograd engine and minimal neural network library. It implements a system for reverse-mode automatic differentiation over a dynamic graph of scalar operations to calculate gradients. The project includes a computation graph visualizer that generates representations of data flow and gradient propagation. It provides a set of tools for constructing and training multi-layer perceptrons using an API modeled after PyTorch. The library covers the fundamentals of backpropagation and neural network construction, specifically for binary classification tasks. This includes the i
LLaMA-Factory is a comprehensive suite for dataset preparation, model fine-tuning, memory optimization, and standardized API deployment. It provides a unified platform for the supervised and reward-based fine-tuning of large language models and vision-language models. The framework includes a specialized toolkit for training vision-language models and a model serving interface that deploys trained models through high-performance APIs. It utilizes precision tuning and quantization techniques to reduce the hardware requirements and memory footprint of large models. The system covers data pipel
DeepSpeed is a high-performance library designed to scale deep learning model training and inference across massive clusters of GPUs and compute nodes. It provides a comprehensive suite of tools for distributed training, enabling the execution of models that exceed the memory capacity of single devices through advanced parameter partitioning, pipeline-based model parallelism, and memory-efficient state offloading. The framework distinguishes itself through specialized communication-efficient optimizers and hardware-aware acceleration techniques. By utilizing gradient compression, quantization
This project is a multimodal model trainer and machine learning fine-tuning tool that provides a containerized workflow for adapting pre-trained models to specific tasks. It features a no-code web interface and a dashboard for training large language models and other machine learning datasets without writing code. The system distinguishes itself by integrating a no-code interface with remote GPU orchestration, allowing users to deploy containerized training environments on cloud infrastructure or local hardware. It includes a dedicated integrator for uploading trained model weights and config
Minimalistic large language model 3D-parallelism training
This library provides a framework for parameter-efficient fine-tuning, enabling the adaptation of large pretrained models by training only a small subset of parameters. It functions as a distributed model training system and optimization toolkit, designed to reduce the computational and memory requirements typically associated with full model fine-tuning. The project distinguishes itself through a suite of methods for modular adapter composition, including low-rank matrix decomposition and activation-based scaling. It supports the integration of multiple task-specific adapter modules, allowin
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference. The library features extensive support for model optimization and
This library provides a comprehensive framework for fine-tuning, aligning, and distilling transformer-based language models. It serves as a toolkit for adapting models to specialized domains through supervised learning, while offering advanced methodologies to improve output quality and reasoning capabilities. The project distinguishes itself through specialized alignment and optimization techniques, including direct preference optimization and reinforcement learning, which allow models to be tuned against human preferences without complex reward modeling. It further supports training efficie
This project is a high-performance numerical computing library designed for large-scale scientific and machine learning workloads. It functions as an automatic differentiation framework and a just-in-time compilation engine, transforming high-level Python code into optimized machine instructions. By enforcing pure functional programming patterns and immutable array semantics, the library ensures that mathematical functions remain compatible with automated graph transformations and symbolic differentiation. The platform distinguishes itself through its distributed array computing capabilities,
The Official Python Client for Lamini's API