Tinker Cookbook

Tinker Cookbook is an open-source framework for fine-tuning large language models, supporting supervised learning, reinforcement learning, and parameter-efficient techniques like LoRA adapters. It provides a complete pipeline for aligning models with human preferences through multi-stage RLHF workflows, from supervised fine-tuning through preference optimization to reinforcement learning.

The framework distinguishes itself through recipe-based training orchestration, where fine-tuning workflows are defined as composable recipe files that chain data loading, model configuration, and training loops into repeatable pipelines. It includes an async concurrent sampling engine that maximizes throughput during training rollouts and evaluation, and supports multi-agent reinforcement learning with self-play or competitive environments. The system manages model checkpoints through hub-centric weight management, enabling saving, loading, downloading, and publishing to remote hubs for sharing and deployment.

Beyond core training, the framework covers hyperparameter sweeping across learning rates, LoRA ranks, and RL parameters to find optimal configurations. It handles vision-language model fine-tuning, prompt distillation into model weights, and multi-turn conversation training. The system includes tools for building, merging, and exporting LoRA adapters for efficient serving and HuggingFace compatibility, along with evaluation capabilities for measuring model performance on standard benchmarks.

The documentation provides guidance on configuring training runs, building custom reinforcement learning environments, and diagnosing training issues through AI assistant skills.

Features

Language Model Fine-Tuning - Core framework for fine-tuning large language models using supervised and reinforcement learning.

Large Language Model Fine-Tuning Frameworks - An open-source framework for fine-tuning large language models using supervised learning, reinforcement learning, and LoRA adapters.

Reinforcement Learning Training - Training language models to maximize reward signals through on-policy rollouts, scoring, and importance-sampled loss updates.

Supervised Training Pipelines - Uses configuration objects and dataset builders to set up and run supervised fine-tuning jobs.

Reinforcement Learning Fine-Tuning - Applies the GRPO algorithm with reward functions to fine-tune models on math reasoning tasks.

Large Language Model Fine-Tuning - Trains open-weight models from 1B to 1T+ parameters, including dense and mixture-of-experts architectures.

Reinforcement Learning Environments - Defines custom ProblemEnv subclasses for RL training with token-level and message-level completion strategies.

Low-Rank Adaptation - Performs parameter-efficient fine-tuning using LoRA adapters, matching full fine-tuning performance for many use cases.

Multi-Stage Pipelines - Orchestrates supervised fine-tuning, preference model training, and reinforcement learning as sequential stages.

GRPO Training Loops - Executes complete GRPO reinforcement learning training runs with higher-level abstractions.

Parameter Efficient Fine-Tuning - Provides parameter-efficient fine-tuning via LoRA adapters, reducing memory and compute requirements.

Preference Optimization - Runs direct preference optimization and full RLHF pipelines to align model outputs with human preferences.

GRPO Training Loop Configurations - Implements full GRPO training loops with higher-level environment and dataset abstractions.

Reinforcement Learning Training Utilities - Trains language models to maximize reward signals by sampling on-policy rollouts and applying importance-sampled loss.

RLHF Training Pipelines - Provides a complete three-stage RLHF pipeline from supervised fine-tuning through preference optimization to reinforcement learning.

Supervised Fine-Tuning - Ships a supervised fine-tuning loop that constructs data, performs forward and backward passes, and updates weights.

Supervised Fine-Tuning Frameworks - Provides configuration objects and dataset builders for setting up supervised fine-tuning jobs.

Training Recipes - Defines fine-tuning workflows as composable recipe files that chain data loading, model configuration, and training loops.

Model Fine-Tuning - Provides supervised fine-tuning with cross-entropy loss and weight updates for language models.

Vision Model Inputs - Supports passing image inputs alongside text for vision-language model fine-tuning and sampling.

Direct Preference Optimization - Fine-tunes models using direct preference optimization from paired preference data without a separate reward model.

Model Exporters - Merges a LoRA adapter into a full model and saves it in HuggingFace-compatible format.

Model Uploads to Hub - Provides functionality to publish fine-tuned model checkpoints to the HuggingFace Hub.

Chat Template Configurations - Tokenizes and formats prompts according to the chat template expected by different model families.

Trained Model Outputs - Ships sampling and inference capabilities for generating outputs from fine-tuned models.

Custom Environment Builders - Provides the ability to build custom reinforcement learning environments by subclassing ProblemEnv.

Custom Loss Functions - Supports cross-entropy, importance sampling, PPO, and custom loss functions during model training.

Hyperparameter Sweep Orchestrators - Includes a hyperparameter sweep engine that scans learning rates, LoRA ranks, and RL parameters to find optimal configurations.

Fine-Tuned Model Evaluators - Runs evaluations on fine-tuned models to measure performance on standard benchmarks.

Hub Weight Managers - Manages model checkpoints through save, load, download, and upload operations to a remote hub.

Vision-Language Fine-Tunings - Trains multimodal models for image understanding tasks through the same fine-tuning API used for language models.

Model Weight Management - Manages model checkpoints through save, load, download, and publish operations.

Adapter Builders - Converts trained adapters into PEFT format for efficient serving and deployment.

Adapter Exporters - Merges a LoRA adapter into a full model and exports it for use with the HuggingFace ecosystem.

LoRA Adapter Builders and Exporters - Provides tools for building, merging, and exporting LoRA adapters for deployment and sharing.

Multi-Adapter Serving Kernels - Converts trained adapters into PEFT format to serve the model with lower memory usage.

Model-Specific Prompt Formats - Automatically applies model-specific chat templates to structure prompts before inference.

Multi-Agent Training - Supports multi-agent reinforcement learning with self-play and competitive environments for training multiple agents.

RL Reference Environments - Supports building custom RL environments by subclassing ProblemEnv for reinforcement learning tasks.

Self-Play Training Pipelines - Sets up self-play and competitive environments for multi-agent reinforcement learning training.

Async Request Throughput Optimizers - Optimizes throughput by sending multiple concurrent generation requests asynchronously.

Model Evaluation and Benchmarking - Includes evaluation capabilities for running standard benchmarks on fine-tuned models.

Model Checkpoint Uploads - Provides tools for uploading fine-tuned model checkpoints to the HuggingFace Hub for sharing.

Checkpoint Saving and Restoration - Ships checkpoint saving and restoration utilities for managing model states during fine-tuning.

Concurrent Sampling Engines - Ships an async concurrent sampling engine that maximizes throughput during training rollouts and evaluation.

Concurrent Sampling and Training Pipelines - Pipelines concurrent sampling with forward-backward passes and optimizer steps for higher throughput.

Asynchronous Request Handlers - Issues multiple concurrent API calls using futures to maximize throughput during training and inference.

Concurrent AI Requesting - Implements an async concurrent sampling engine for parallel generation requests during training.

thinking-machines-labtinker-cookbook

Tinker Cookbook

Features

Star history