Discover open-source frameworks and libraries for optimizing large language model training using LoRA and QLoRA.
LLaMA-Factory is a comprehensive suite for dataset preparation, model fine-tuning, memory optimization, and standardized API deployment. It provides a unified platform for the supervised and reward-based fine-tuning of large language models and vision-language models. The framework includes a specialized toolkit for training vision-language models and a model serving interface that deploys trained models through high-performance APIs. It utilizes precision tuning and quantization techniques to reduce the hardware requirements and memory footprint of large models. The system covers data pipel
LLaMA-Factory is a comprehensive framework specifically designed for parameter-efficient fine-tuning of large language models, offering native support for LoRA, QLoRA, quantization, and distributed training within a unified pipeline.
This library provides a framework for parameter-efficient fine-tuning, enabling the adaptation of large pretrained models by training only a small subset of parameters. It functions as a distributed model training system and optimization toolkit, designed to reduce the computational and memory requirements typically associated with full model fine-tuning. The project distinguishes itself through a suite of methods for modular adapter composition, including low-rank matrix decomposition and activation-based scaling. It supports the integration of multiple task-specific adapter modules, allowin
This library is the industry-standard framework for parameter-efficient fine-tuning, providing native support for LoRA, QLoRA, quantization, and memory-optimized distributed training within the Hugging Face ecosystem.
Axolotl is a configuration-driven framework designed for the fine-tuning, evaluation, and quantization of large language models. It functions as a comprehensive orchestrator for distributed training, enabling users to manage complex workflows across multi-node and multi-GPU environments. By utilizing structured configuration files, the platform streamlines the setup of training parameters, dataset paths, and hardware distribution strategies. The project distinguishes itself through its support for diverse training methodologies, including full-parameter tuning, parameter-efficient adaptation,
Axolotl is a comprehensive, configuration-driven framework specifically designed for fine-tuning large language models, offering native support for LoRA, QLoRA, distributed training, and extensive Hugging Face integration.
LMFlow is a comprehensive suite for large language model fine-tuning, context extension, multimodal processing, and inference execution. It provides a toolkit for updating model parameters through full tuning or memory-efficient adapter algorithms, alongside an inference engine for executing tuned models via command-line or web-based interfaces. The framework includes a dedicated alignment suite for supervised tuning and reward model training to refine model behavior. It features a context window extender to increase maximum input lengths and a multimodal framework for building chatbots that
LMFlow is a comprehensive framework designed specifically for LLM fine-tuning that natively supports LoRA, QLoRA, quantization, and memory-efficient training techniques while integrating seamlessly with the Hugging Face ecosystem.
This project is a fine-tuning framework and training pipeline designed to optimize and adapt large language and vision models. It provides a specialized toolkit for parameter-efficient tuning and supervised learning, serving as both a trainer for multimodal models and a deployment tool for serving fine-tuned models via high-performance inference engines. The framework focuses on reducing memory and compute requirements by updating a small subset of model parameters. It supports a wide range of adaptation strategies, including vision-language model training to align text, image, video, and aud
This framework provides a comprehensive suite for parameter-efficient fine-tuning, including native support for LoRA, QLoRA, quantization, and distributed training, making it a complete solution for adapting large language and multimodal models.
This project is a comprehensive framework for the training, fine-tuning, and deployment of large language models. It functions as a distributed deep learning platform that enables users to scale model workflows across multiple hardware nodes while providing tools for model evaluation and performance benchmarking. The platform distinguishes itself by offering specialized utilities for model compression and weight transformation, allowing users to reduce memory footprints and latency through quantization and pruning. It supports the adaptation of large models for consumer-grade hardware, facili
This framework provides a comprehensive suite for distributed training, quantization, and parameter-efficient fine-tuning, making it a direct tool for adapting large language models to specific hardware constraints.
LitGPT is a training and deployment framework for large language models, providing a suite of tools for pretraining, finetuning, quantizing, evaluating, and serving models within a production environment. It includes a dedicated training pipeline for adapting pretrained models to specific tasks, a quantization tool for reducing weight precision, and an inference server for hosting models via web interfaces. The framework supports high-performance model development through custom architecture implementation and the use of predefined recipes to standardize pretraining and finetuning. It enables
LitGPT is a comprehensive framework designed for the full lifecycle of LLMs, including native support for parameter-efficient fine-tuning techniques like LoRA and QLoRA, quantization, and distributed training workflows.
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The project distinguishes itself by offering a low-code visual dashboard that enables users to configure experiments and monitor performance metrics in real time without writing extensive custom scripts. It also features a configuration-driven orchestration system that decouples experim
LlamaFactory is a comprehensive framework specifically designed for fine-tuning large language models, offering native support for LoRA, QLoRA, quantization, and distributed training within a unified interface.
PaddleFormers is a framework for the training, fine-tuning, and deployment of large language models. It provides a full lifecycle pipeline for executing large-scale model training and applying adaptation methods to align models with specialized tasks. The project focuses on scaling model operations through distributed training and hardware accelerator integration. It employs pipeline parallelism and mixed-precision training to manage memory and increase throughput across multiple hardware devices. The library includes a curated model zoo for serving pre-trained architectures and tools for pr
PaddleFormers is a comprehensive framework for training and fine-tuning large language models that includes support for parameter-efficient fine-tuning and distributed training, though it is built specifically for the PaddlePaddle ecosystem rather than the more common PyTorch-based workflows.
Airllm is a framework designed to execute and fine-tune large language models on consumer-grade hardware. By employing layer-wise model decomposition and memory-efficient loading techniques, the engine enables the operation of massive models that would otherwise exceed available system or video memory. The project distinguishes itself through a suite of optimization strategies that balance memory footprint with performance. It utilizes block-wise weight quantization and asynchronous layer prefetching to reduce resource consumption and hide data transfer latency. Additionally, the framework su
Airllm is a framework specifically designed for fine-tuning and running large language models on consumer hardware, providing native support for LoRA, quantization, and distributed training to optimize memory usage.
WeClone is an end-to-end framework designed for the creation, training, and deployment of personalized conversational AI digital twins. By fine-tuning large language models on individual chat history, the platform enables the replication of unique communication styles, speech patterns, and conversational habits. The system manages the entire lifecycle of these digital avatars, from initial data preparation to final integration into messaging platforms for real-time interaction. The platform distinguishes itself through a comprehensive suite of data processing utilities that prepare raw messag
WeClone is a specialized framework for fine-tuning conversational AI models that includes essential features like quantization, distributed training, and memory optimization, though it is more narrowly focused on creating digital twins than general-purpose fine-tuning libraries.
Unsloth is a high-performance training and inference platform designed to optimize the lifecycle of large language and multimodal models. It provides a comprehensive engine for fine-tuning, executing, and managing models locally, with a focus on reducing memory consumption and increasing compute speed on consumer-grade hardware. The platform distinguishes itself through hand-optimized kernels and automated computational graph techniques that maximize hardware throughput. It supports advanced training methodologies, including reinforcement learning for reasoning and efficient adapter-based fin
Unsloth is a specialized framework for efficient LLM fine-tuning that natively supports LoRA, QLoRA, and memory-optimized training, making it a flagship tool for the requested task.
This project is a collection of scripts and workflows for training, fine-tuning, and deploying large language models using the Hugging Face Transformers toolkit. It functions as a distributed training framework, a library for natural language processing task implementations, and a system for building retrieval-augmented generation chatbots. The repository includes specialized tools for model optimization, such as a Bayesian hyperparameter optimizer for automatically tuning model settings. It provides implementations for scaling model training across multiple graphics processors using data par
This repository provides a collection of scripts and workflows for fine-tuning large language models using Hugging Face Transformers, including support for parameter-efficient techniques and distributed training. While it functions more as a set of implementation examples and utilities rather than a standalone library or framework, it directly addresses the requirements for LoRA, quantization, and multi-GPU training.
PaddleNLP is a development library and toolkit for training, fine-tuning, and deploying large and small language models using the PaddlePaddle framework. It provides a comprehensive suite for the entire natural language processing lifecycle, from model development to high-performance inference. The project features a standardized model zoo for loading and managing pre-trained models and tokenizers through a unified interface. It distinguishes itself with a specialized model compression framework that reduces memory footprints via weight precision conversion and lossless size optimization, alo
PaddleNLP is a comprehensive framework for training and fine-tuning large language models that natively supports parameter-efficient techniques like LoRA, distributed training, and quantization, though it is built on the PaddlePaddle ecosystem rather than the Hugging Face ecosystem.
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference. The library features extensive support for model optimization and
This library is the industry-standard framework for training and fine-tuning transformer models, providing native support for LoRA, QLoRA, distributed training, and extensive memory optimization techniques.
This project is a comprehensive toolkit for adapting large language models to the Chinese language, providing a specialized framework for fine-tuning, inference, and local deployment. It serves as a coordinated suite for language-specific adaptation, including tools for expanding tokenizers and implementing retrieval-augmented generation. The project distinguishes itself through a complete pipeline for model adaptation, featuring multilingual tokenizer expansion and a fine-tuning framework that supports instruction-based supervised training and adapter merging. It also includes a dedicated de
This project provides a comprehensive pipeline for fine-tuning and deploying Llama-based models, including native support for LoRA, quantization, and instruction-based training, though it is specifically tailored for Chinese language adaptation rather than being a general-purpose framework.
This project provides an end-to-end framework for adapting large language models to follow user instructions through supervised fine-tuning. It functions as a comprehensive training pipeline that enables the creation of specialized assistant models by minimizing the difference between predicted outputs and target responses within structured instruction datasets. The framework distinguishes itself by integrating synthetic data generation with memory-efficient training techniques. It utilizes powerful language models to iteratively expand small sets of human-written seeds into diverse, high-qua
This project provides a comprehensive pipeline for instruction fine-tuning and parameter-efficient adaptation, though it is primarily focused on the specific Alpaca methodology rather than serving as a general-purpose library for diverse LLM architectures.
ERNIE is a development toolkit for training, fine-tuning, and deploying large language models built on the PaddlePaddle deep learning platform. It provides a comprehensive suite of core components, including an inference server for vision and language models, a training and fine-tuning toolkit, and a framework for building retrieval-augmented generation systems using private knowledge bases. The project features multimodal AI models capable of reasoning across text, images, and video to perform complex visual understanding and information extraction. It distinguishes itself through specialize
This toolkit provides a comprehensive environment for training and fine-tuning large language models, including support for distributed training, quantization, and memory optimization techniques necessary for efficient model adaptation.
OpenRLHF is a training framework and alignment library designed for reinforcement learning from human feedback across distributed GPU clusters. It provides tools for aligning large language models and multimodal vision-language models using algorithms such as PPO, GRPO, and DPO. The framework distinguishes itself through a distributed inference engine that overlaps sample rollout with training to increase throughput. It supports scaling to models exceeding 70 billion parameters via parameter sharding and handles long-context sequences through ring-attention sequence parallelism. The project
OpenRLHF is a comprehensive framework for reinforcement learning and alignment that includes support for LoRA, memory optimization, and distributed training, making it a capable tool for fine-tuning tasks despite its primary focus on RLHF.
Corenet is a deep learning training framework and computer vision model library designed for developing neural networks across vision, text, and audio modalities. It functions as a distributed training orchestrator for scaling workloads across multiple compute nodes and provides a multimodal data pipeline for processing image, text, and video data. The project includes a model conversion toolkit for transforming weights and architectures between different machine learning frameworks. It also provides tools for optimizing model performance on Apple Silicon and reducing response latency in gene
CoreNet is a comprehensive deep learning framework that supports parameter-efficient fine-tuning and distributed training, making it a capable tool for adapting large models even though its primary focus extends across vision and multimodal tasks.
Starcoder is a large language model and associated framework designed to generate, complete, and evaluate source code across multiple programming languages. It functions as a source code model that can produce complete function implementations and predict subsequent characters in a line of code based on provided prompts. The project provides a specialized toolkit for adapting base models to specific coding tasks and instruction-following behaviors. This includes a conversational code assistant framework for training models to generate code via natural language chat, as well as a parameter-eff
This repository provides a specialized framework for parameter-efficient fine-tuning and adaptation of code-focused language models, directly supporting the requested techniques for model optimization.
This project is a comprehensive framework for the entire lifecycle of transformer-based language models, supporting everything from foundational pretraining to specialized deployment. It provides a modular toolkit for defining neural network architectures, managing data preparation pipelines, and executing training routines across various scales. The framework is designed to handle the full model development process, including supervised fine-tuning, behavioral alignment, and the integration of agentic capabilities. What distinguishes this framework is its focus on efficient training and adva
This framework provides a comprehensive suite for the entire LLM lifecycle, including built-in support for low-rank parameter adaptation and memory-efficient training techniques suitable for fine-tuning tasks.
ChatGLM-6B is a generative AI inference engine designed for local execution of transformer-based language models. It provides a comprehensive runtime environment that allows users to load and run pre-trained neural network weights directly on their own hardware, ensuring data privacy and independence from external cloud services. The project distinguishes itself through a hardware-agnostic execution backend that supports deployment across diverse environments, including standard processors, Apple Silicon, and multi-GPU configurations. It incorporates advanced optimization techniques such as w
This repository provides a comprehensive toolkit for local model execution that includes built-in utilities for parameter-efficient fine-tuning and quantization, making it a functional choice for adapting models like ChatGLM-6B.
ColossalAI is a distributed deep learning framework designed for training and deploying massive artificial intelligence models across clusters of hardware accelerators. It functions as a parallel computing engine that partitions model workloads and data across multiple processors to maximize memory efficiency and throughput. The platform distinguishes itself through a comprehensive suite of parallelization strategies, including multi-dimensional tensor parallelism and pipeline-based model parallelism, which segment neural network layers and stages across devices. To support large-scale genera
ColossalAI is a powerful distributed deep learning framework that provides the memory optimization and parallelization infrastructure necessary for fine-tuning large models, though it focuses more on general-purpose distributed training than on specialized LoRA/QLoRA workflows.
DeepSpeed is a high-performance library designed to scale deep learning model training and inference across massive clusters of GPUs and compute nodes. It provides a comprehensive suite of tools for distributed training, enabling the execution of models that exceed the memory capacity of single devices through advanced parameter partitioning, pipeline-based model parallelism, and memory-efficient state offloading. The framework distinguishes itself through specialized communication-efficient optimizers and hardware-aware acceleration techniques. By utilizing gradient compression, quantization
DeepSpeed is a high-performance distributed training framework that provides the essential memory optimization and quantization infrastructure required for fine-tuning large language models, though it functions as a foundational acceleration library rather than a dedicated LoRA-specific toolkit.
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc
AutoGluon is an automated machine learning framework that includes support for parameter-efficient fine-tuning of foundation models, making it a capable tool for your requirements even though its primary focus is broader AutoML pipelines.
Sonnet is a modular machine learning framework and TensorFlow neural network library designed for building composable deep learning architectures. It functions as a model orchestrator that manages parameters, state serialization, and graph exports during the training process. The framework provides a distributed training system to synchronize gradients and spread workloads across multiple GPUs or hardware devices. It enables the design of reusable research components through high-level abstractions and subclassing. The library covers neural network architecture design through sequential laye
Sonnet is a general-purpose deep learning library for building neural network architectures in TensorFlow, but it lacks the specialized LLM fine-tuning abstractions and parameter-efficient techniques like LoRA or QLoRA required for this task.
This project is a low-dependency engine designed for training large language models using native C and CUDA. It provides a bare-metal environment for tensor computation, allowing for the execution of neural network operations directly on hardware accelerators without the overhead of high-level software abstractions. The framework distinguishes itself by implementing manual gradient backpropagation and custom hardware-specific kernels, providing granular control over memory mapping and computational precision. It supports distributed training across multiple graphics processors and compute nod
This project is a low-level C/CUDA engine for training LLMs from scratch rather than a framework designed for parameter-efficient fine-tuning techniques like LoRA or QLoRA.
kohya_ss is a graphical user interface and workbench for fine-tuning diffusion models, specifically designed for Stable Diffusion. It provides a suite of tools for training generative AI models, including specialized interfaces for creating Low-Rank Adaptation weights and training ControlNet spatial control networks. The project distinguishes itself through integrated VRAM usage optimization and hardware acceleration, featuring specific support for Intel GPUs via XPU-accelerated libraries. It implements parameter-efficient training methods and memory-saving techniques like gradient checkpoint
This is a specialized GUI and workbench for fine-tuning Stable Diffusion image generation models, rather than a general-purpose framework for fine-tuning large language models.
Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development of high-performance models through reinforcement learning and supervised fine-tuning. The project distinguishes itself by integrating secure, containerized code execution environments directly into the training and evaluation lifecycle. By allowing models to run and verify code snippets against test
While this framework is primarily focused on large-scale reasoning and reinforcement learning pipelines rather than standard LoRA/QLoRA fine-tuning, it provides the necessary distributed training and orchestration infrastructure to support advanced model optimization workflows.
Nanochat is a lightweight execution environment designed for training and running language models on standard consumer hardware. It functions as both a neural network training framework and an inference engine, enabling users to perform backpropagation-based training and model execution directly on general-purpose processors without the need for dedicated graphics hardware. The project distinguishes itself through a suite of optimization tools that prioritize efficiency on local machines. By utilizing memory-mapped weight loading and CPU-optimized vector math, it maximizes throughput for inte
This framework provides a lightweight environment for training and fine-tuning transformer models on consumer hardware, though it focuses on CPU-based execution rather than the specific LoRA/QLoRA parameter-efficient techniques requested.