Torchtune

Torchtune is a PyTorch-native library for fine-tuning, aligning, and quantizing large language models. It provides a config-driven system for instantiating components, orchestrating distributed training, and managing parameter-efficient fine-tuning with quantization support, all through YAML-based configurations and command-line overrides.

The library distinguishes itself through its comprehensive post-training workflow orchestration, combining supervised fine-tuning, preference optimization (DPO, PPO, GRPO), knowledge distillation, and quantization-aware training in a single configurable pipeline. It supports distributed training across multiple nodes using FSDP, with parameter-efficient methods like LoRA, QLoRA, and DoRA that reduce memory requirements while maintaining model quality. The system includes built-in recipes for common workflows, from dataset loading and tokenization through evaluation, quantization, and deployment to edge devices via ExecuTorch.

Beyond core training capabilities, torchtune offers tools for model evaluation using standard benchmarks, text generation with configurable sampling parameters, and integration with Hugging Face Transformers and vLLM for serving. The library manages the full model lifecycle, including downloading pretrained weights, uploading trained checkpoints to the Hugging Face Hub, and logging experiments to platforms like Weights & Biases and Comet.

Features

LLM Workflow Orchestrations - Orchestrates the complete LLM post-training pipeline from data loading through evaluation and deployment.
Config-Driven Dataset Builders - Configures custom datasets for chat, instruction, preference, or freeform text from YAML configs.
Pretrained Model Snapshots - Downloads pretrained model weights from a remote registry to a shared filesystem for distributed training.
Direct Preference Optimization - Tunes models to prefer desirable outputs over undesirable ones using direct preference optimization.
Distributed Training Orchestration - Orchestrates distributed fine-tuning jobs across multiple nodes with FSDP and SLURM support.
End-to-End Training Pipelines - Ships a single-command pipeline that trains, evaluates, quantizes, and generates from a language model.
Custom Data Fine-Tunings - Runs DPO alignment using either parameter-efficient LoRA adapters or full-weight fine-tuning, adjustable for compute constraints.
LLM Fine-Tuning - Fine-tunes large language models on custom datasets using full-parameter or parameter-efficient methods.
LoRA Fine-Tuning Tools - Applies low-rank adapters to attention and MLP layers for memory-efficient fine-tuning.
Multi-Node Training Scaling - Spreads fine-tuning jobs across multiple nodes using FSDP and SLURM for large model training.
Instruction Fine-tuning - Fine-tunes language models on structured instruction-response pairs for task-specific behavior.
Language Model Fine-Tuning - Applies supervised fine-tuning with full or parameter-efficient methods on one or multiple devices.
Dialogue-Based Fine-Tuning - Fine-tunes language models on multi-turn conversational datasets to improve dialogue response accuracy.
LoRA Fine-Tuning Pipelines - Applies low-rank adapters to attention and MLP layers for memory-efficient LLM fine-tuning.
LLM Fine-Tuning Toolsets - Provides a framework for supervised fine-tuning, preference optimization, and knowledge distillation with parameter-efficient methods.
LLM Training Orchestrators - Orchestrates distributed fine-tuning across multiple nodes using FSDP and SLURM for large language models.
Distributed Training - Spawns multiple processes across nodes to run distributed fine-tuning jobs with a single command.
Preference Alignment - Fine-tunes language models using DPO-style losses to align outputs with human preferences.
Preference-Based Fine-Tuning - Loads binary preference datasets and tokenizes them for Direct Preference Optimization training.
Quantized Fine-Tuning - Compresses model weights to lower precision during fine-tuning to reduce memory footprint.
Distributed Fine-Tuning - Ships distributed fine-tuning across multiple GPUs and nodes using FSDP model sharding.
Human Preference Alignment - Tunes models using DPO, PPO, or GRPO with full or LoRA/QLoRA weight updates for alignment.
QLoRA Adapters - Combines 4-bit quantization with low-rank adapters to minimize memory during LLM fine-tuning.
Post-Training Datasets - Provides built-in support for loading and tokenizing custom datasets in instruct, chat, or preference formats.
Language Model Training - Provides supervised fine-tuning recipes for adapting pretrained language models to custom tasks.
Teacher-Student Distillation - Transfers knowledge from larger teacher models to smaller student models via distillation.
Quantization-Aware Training - Simulates quantization noise during fine-tuning so weights adapt to lower precision before conversion.
Parameter-Efficient Adaptation - Applies LoRA, QLoRA, and DoRA adapters to selected layers for parameter-efficient fine-tuning.
Parameter Efficient Fine-Tuning - Applies low-rank adaptation (LoRA) to selected model layers for memory-efficient fine-tuning.
Preference-Aligned Adapter Training - Runs Direct Preference Optimization using LoRA, QLoRA, or DoRA adapters for memory-efficient alignment.
Post-Training Configuration Recipes - Launches predefined training, evaluation, or inference recipes using YAML configuration files with optional overrides.
Preference Alignment Datasets - Loads local or Hugging Face preference datasets for DPO-style alignment training without requiring a specific format.
Preference Optimization - Implements an engine for aligning language model outputs with human preferences using DPO, PPO, and GRPO algorithms.
Quantized Fine-Tuning - Provides quantization-aware fine-tuning with QLoRA to reduce memory during LLM training.
Recipe Configuration Definitions - Defines all parameters for a recipe in a YAML file, including model, dataset, optimizer, and loss function settings.
Custom Point Cloud Dataset Training - Supports fine-tuning on custom instruct, chat, and preference datasets with full-parameter or LoRA methods.
Training Recipes - Provides predefined training scripts that accept YAML configs to orchestrate fine-tuning workflows.
LLM Quantization Exports - Reduces model precision post-training or during fine-tuning and exports quantized models for edge deployment.
Preference Alignment - Aligns language model outputs with human preferences using DPO, PPO, or GRPO with configurable loss functions.
Text Dataset Loaders - Loads common text-only datasets like Alpaca and summarization from Hugging Face for fine-tuning.
CLI Configuration Overrides - Ships a CLI-based mechanism to override any configuration field at launch time via key-value pairs.
Configuration Overrides - Provides command-line key-value overrides for any training configuration parameter without editing config files.
Command-Line - Passes any configuration option as a command-line argument to change settings without editing the config file.
Config-Field Override Systems - Provides a command-line key=value override system for any configuration parameter at launch time.
Platform-Specific Config Overrides - Changes any config value at runtime by passing key-value pairs on the command line without editing the file.
Training Config Override Systems - Modifies recipe hyperparameters, loggers, and datasets through YAML configs or command-line key=value overrides.
Dataset Loaders - Pulls chat datasets from Hugging Face repositories by specifying repo name and conversation column.
Preference Dataset Loaders - Fetches preference datasets from Hugging Face repositories and tokenizes them for alignment training.
Training Job Launchers - Starts fine-tuning jobs with a chosen recipe and config, logging loss and GPU memory usage automatically.
Config-Driven Instantiation - Creates Python objects like models and datasets by resolving import paths and keyword arguments from YAML configs.
Activation Recomputation Strategies - Discards intermediate activations during forward pass and recomputes them during backward pass to save memory.
Cross-Sample Attention Masks - Automatically masks cross-sample attention in packed sequences to maintain sample isolation during training.
Sequence Packing - Packs multiple dataset samples into single sequences to reduce padding overhead during training.
Model Distillation - Transfers knowledge from larger teacher models to smaller student models through knowledge distillation.
Full Parameter Fine-Tuning - Supports full-parameter fine-tuning on a single device for maximum accuracy.
Preference-Aligned Full Training - Performs Direct Preference Optimization on all model parameters across multiple GPUs for alignment.
Multimodal Dataset Loaders - Loads multimodal datasets combining images with text instructions for vision-language fine-tuning.
Prompt Templates - Prepends task-specific instructions or system messages to each sample before tokenization for fine-tuning.
Activation and KV Cache Offloaders - Offloads intermediate activations from GPU to CPU during forward pass to save VRAM.
LoRA Training - Supports selecting which attention and MLP layers receive LoRA adapters via CLI flags.
LoRA Configuration Customization - Allows selecting attention and MLP layers for LoRA adapters via CLI arguments without code changes.
Configurable Loss Variants - Supports multiple preference optimization losses including DPO and RSO, selectable via configuration.
Margin-Based Rejection Sampling Losses - Uses a margin-based hinge loss from statistical rejection sampling to enforce a larger gap between chosen and rejected responses.
Training Config Customizers - Modifies recipe hyperparameters, loggers, and datasets through YAML configs or command-line overrides.
ExecuTorch Deployments - Exports quantized models to ExecuTorch for on-device inference on resource-constrained hardware.
Fine-Tuned Model Evaluators - Runs structured evaluations using EleutherAI's evaluation harness to measure model accuracy on tasks like truthfulness.
Text Generation Interfaces - Produces text completions from trained models using configurable sampling parameters like temperature and top-k.
Post-Fine-Tuning Quantizers - Converts float models from quantization-aware fine-tuning into quantized weights for deployment.
Quantization-Aware Fine-Tuning - Simulates quantization effects during fine-tuning so the model maintains accuracy when later quantized.
Quantization Toolkits - Ships a toolkit for quantization-aware training and post-training quantization to reduce model size and accelerate inference.
Model Distillation Frameworks - Transfers behavior from large teacher models to smaller student models via knowledge distillation.
Cross-Architecture Distillation - Supports distillation recipes that work across different model families like Qwen2 variants.
Quantization-Aware Training Exporters - Ships a post-training conversion step that applies a quantizer to QAT-fine-tuned models for inference.
Message Format Converters - Transforms ShareGPT or OpenAI message formats into a standardized internal message structure for training.
Weight-Decomposed Low-Rank Adaptation - Implements DoRA to decouple magnitude and direction updates for parameter-efficient LLM training.
Preference Optimization Loss Functions - Provides selectable DPO and RSO loss functions for controlling how models penalize un-preferred responses.
JSON Preference Dataset Loaders - Reads local JSON files with chosen and rejected conversation columns for preference fine-tuning.
JIT Compilation Functions - Provides JIT compilation of PyTorch functions into optimized kernels for accelerated execution.
Weight Quantization - Reduces numerical precision of model weights to lower memory usage and accelerate inference with minimal accuracy loss.
Fake Quantization Timing Controls - Delays the start of fake quantization by a configurable number of steps so weights and activations stabilize before quantization begins.
Post-Training Quantization - Applies weight-only quantization techniques such as int4 to reduce model size and accelerate inference.
Quantized Model Exporters - Provides tools to convert QAT-trained models into fully quantized formats for inference.
Quantization Evaluation - Runs standard language-modeling benchmarks on quantized models to measure perplexity and accuracy against baselines.
Fake Quantization Schedulers - Provides configurable scheduling of fake quantization insertion during quantization-aware fine-tuning.
Remote JSON Corpus Loading - Reads conversational data from local files or remote HTTPS URLs using the Hugging Face datasets loader.
Gradient Accumulation Strategies - Accumulates gradients over multiple forward passes to simulate larger effective batch sizes.
Dataset Blending - Combines multiple sub-datasets into a single unified dataset for training via concatenation.
PPO Implementations - Implements Proximal Policy Optimization for aligning model outputs using a reward model.
Optimizer State Offloading - Moves optimizer and gradient states to CPU memory to free GPU VRAM during single-device training.
Quantized Optimizers - Quantizes optimizer state dictionaries using 8-bit or paged optimizers to lower GPU memory consumption.
Component Swapping Overrides - Replaces components in the config with different classes or functions and adjusts nested parameters from the command line.
Configuration Validators - Checks that YAML configs are well-formed and all referenced components can be instantiated, reporting errors.
Adapter Loading - Loads base models and their trained adapters or merged weights into Hugging Face Transformers for inference.
Training Rendezvous Coordinators - Coordinates multi-node worker discovery and group formation for fault-tolerant distributed training.
PyTorch JIT Compilation - JIT-compiles PyTorch operations into optimized kernels at runtime, accelerating execution with minimal code changes.
Directional Weight Decompositions - Implements DoRA, a directional weight decomposition method that extends LoRA for improved fine-tuning.
Remote Dataset Loaders - Reads instruction-tuning data from local files or remote URLs in CSV or JSON formats.
Gradient Fusing Optimizations - Fuses optimizer updates into the backward pass to eliminate gradient buffer memory during training.
Model Serving & Deployment - Authors and experiments with LLMs using PyTorch.

pytorch/torchtune

5,774View on GitHub

Torchtune is a PyTorch-native library for fine-tuning, aligning, and quantizing large language models. It provides a configurable training pipeline orchestrated through YAML recipes, with CLI overrides and component swapping, distributed training via FSDP2, memory optimizations, and parameter-efficient fine-tuning methods like LoRA, DoRA, and QLoRA. The library distinguishes itself through its YAML-driven configuration system that defines all training parameters and instantiates components from config files, with full CLI override capability for any field or component at launch time. It suppo

thinking-machines-lab/tinker-cookbook

2,856View on GitHub

Tinker Cookbook is an open-source framework for fine-tuning large language models, supporting supervised learning, reinforcement learning, and parameter-efficient techniques like LoRA adapters. It provides a complete pipeline for aligning models with human preferences through multi-stage RLHF workflows, from supervised fine-tuning through preference optimization to reinforcement learning. The framework distinguishes itself through recipe-based training orchestration, where fine-tuning workflows are defined as composable recipe files that chain data loading, model configuration, and training l

verl-project/verl

22,000View on GitHub

This project is a distributed training infrastructure designed for aligning large language models through reinforcement learning. It functions as an end-to-end engine for complex alignment tasks, including proximal policy optimization, direct preference optimization, and iterative self-play. By providing a unified framework for multi-turn interactions and tool-use scenarios, it enables the development of models capable of reasoning and external environment engagement. The framework distinguishes itself through a decoupled architecture that separates model training from sample generation. This

huggingface/alignment-handbook

5,621View on GitHub

This project is an alignment framework and suite of pipelines for training language models using supervised fine-tuning and preference optimization. It provides tools for executing large-scale distributed training across multiple GPUs and compute nodes, alongside a system for measuring model helpfulness and dialogue quality through single-turn and multi-turn benchmarks. The framework includes specialized tools for direct preference optimization to refine model behavior using paired data without a separate reward model. It also supports constitutional AI alignment and the training of reward mo

meta-pytorchtorchtune

Features