30 open-source projects similar to optimalscale/lmflow, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best LMFlow alternative.
This project is a comprehensive toolkit for adapting large language models to the Chinese language, providing a specialized framework for fine-tuning, inference, and local deployment. It serves as a coordinated suite for language-specific adaptation, including tools for expanding tokenizers and implementing retrieval-augmented generation. The project distinguishes itself through a complete pipeline for model adaptation, featuring multilingual tokenizer expansion and a fine-tuning framework that supports instruction-based supervised training and adapter merging. It also includes a dedicated de
This project is a fine-tuning framework and training pipeline designed to optimize and adapt large language and vision models. It provides a specialized toolkit for parameter-efficient tuning and supervised learning, serving as both a trainer for multimodal models and a deployment tool for serving fine-tuned models via high-performance inference engines. The framework focuses on reducing memory and compute requirements by updating a small subset of model parameters. It supports a wide range of adaptation strategies, including vision-language model training to align text, image, video, and aud
This project provides a Chinese large language model based on the LLaMA architecture. It is an instruction-tuned model optimized for natural language processing and multi-turn conversations in Chinese. The system includes a framework for parameter-efficient fine-tuning using low-rank adaptation and quantization to reduce memory requirements. It also implements retrieval augmented generation for local document question answering and supports long-context processing for sequences up to 64K tokens. The project covers a broad set of capabilities including supervised instruction tuning, reinforce
Axolotl is a configuration-driven framework designed for the fine-tuning, evaluation, and quantization of large language models. It functions as a comprehensive orchestrator for distributed training, enabling users to manage complex workflows across multi-node and multi-GPU environments. By utilizing structured configuration files, the platform streamlines the setup of training parameters, dataset paths, and hardware distribution strategies. The project distinguishes itself through its support for diverse training methodologies, including full-parameter tuning, parameter-efficient adaptation,
Oumi is a comprehensive large language model development platform designed for synthesizing data, fine-tuning models, and running performance evaluations. It serves as a unified environment for the entire model lifecycle, encompassing a training and fine-tuning suite, an evaluation framework, and tools for synthetic data generation and model distillation. The platform is distinguished by its iterative, failure-driven synthesis approach, which analyzes model weaknesses during evaluation to generate targeted training data. It utilizes an LLM-based judge framework to programmatically score respo
This project is a generative AI educational resource and natural language processing course. It serves as a technical implementation guide for building, pre-training, and fine-tuning a large language model from scratch using PyTorch. The curriculum provides a step-by-step tutorial on large language model development, focusing specifically on the design of transformer-based text generation models. It includes dedicated instruction on parameter-efficient fine-tuning to optimize training by updating only a small subset of model weights. The material covers the end-to-end generative AI training
LLaMA-Factory is a comprehensive suite for dataset preparation, model fine-tuning, memory optimization, and standardized API deployment. It provides a unified platform for the supervised and reward-based fine-tuning of large language models and vision-language models. The framework includes a specialized toolkit for training vision-language models and a model serving interface that deploys trained models through high-performance APIs. It utilizes precision tuning and quantization techniques to reduce the hardware requirements and memory footprint of large models. The system covers data pipel
Firefly is a training framework and inference engine for large language models. It functions as a toolkit for pre-training and fine-tuning various open-weight architectures, providing a system for model alignment and parameter-efficient fine-tuning. The project includes utilities for merging adapter weights back into base models to create standalone files. It also provides a model alignment toolkit to format training data according to specific prompt templates, ensuring conversational consistency across different models. The framework supports distributed model training and preference-based
PaddleNLP is a development library and toolkit for training, fine-tuning, and deploying large and small language models using the PaddlePaddle framework. It provides a comprehensive suite for the entire natural language processing lifecycle, from model development to high-performance inference. The project features a standardized model zoo for loading and managing pre-trained models and tokenizers through a unified interface. It distinguishes itself with a specialized model compression framework that reduces memory footprints via weight precision conversion and lossless size optimization, alo
FastChat is a training and serving platform for large language models that provides an integrated toolkit for fine-tuning, hosting, and benchmarking chatbots. It functions as an inference server capable of hosting multiple models and exposing them via a standardized API for chat applications. The platform distinguishes itself through a distributed model controller that manages worker nodes and routes requests across a hardware-agnostic inference layer supporting various accelerators. It includes a dedicated evaluation framework for assessing model quality using automated judges, multi-turn di
OpenRLHF is a training framework and alignment library designed for reinforcement learning from human feedback across distributed GPU clusters. It provides tools for aligning large language models and multimodal vision-language models using algorithms such as PPO, GRPO, and DPO. The framework distinguishes itself through a distributed inference engine that overlaps sample rollout with training to increase throughput. It supports scaling to models exceeding 70 billion parameters via parameter sharding and handles long-context sequences through ring-attention sequence parallelism. The project
MiniCPM is a collection of small language models designed for local, on-device deployment in resource-constrained environments. The project focuses on running dense Transformer models on consumer hardware, including GPUs, CPUs, and Apple Silicon, without requiring custom code forks. The project distinguishes itself through heavy optimization for edge hardware, utilizing quantized weight compression in GGUF and MLX formats to reduce memory overhead. It implements advanced inference techniques such as speculative sampling and radix-tree prefix caching to accelerate generation speed and throughp
OpenNMT-py is a PyTorch neural machine translation framework used for training and deploying neural machine translation and large language models. It functions as a distributed model training system, an inference engine, and a toolkit for fine-tuning large language models. The framework distinguishes itself with a dedicated toolkit for adapting large language models through low-rank adaptation, quantization, and instruction tuning. It also includes a neural machine translation server that allows trained models to be hosted and exposed via REST API endpoints. The project covers a broad range
Qwen-7B is a pretrained causal language model designed for natural language generation, text processing, and complex reasoning tasks. It is available as an instruction-tuned model optimized for conversational interactions and a tool-use model capable of executing function calls and interacting with external APIs. The project provides a quantized version of the model to reduce GPU memory usage and supports the development of autonomous agents that can execute code and perform functions to complete complex goals. The system covers a wide range of capabilities including model fine-tuning throug
This project provides an end-to-end framework for adapting large language models to follow user instructions through supervised fine-tuning. It functions as a comprehensive training pipeline that enables the creation of specialized assistant models by minimizing the difference between predicted outputs and target responses within structured instruction datasets. The framework distinguishes itself by integrating synthetic data generation with memory-efficient training techniques. It utilizes powerful language models to iteratively expand small sets of human-written seeds into diverse, high-qua
This project is an alignment framework and suite of pipelines for training language models using supervised fine-tuning and preference optimization. It provides tools for executing large-scale distributed training across multiple GPUs and compute nodes, alongside a system for measuring model helpfulness and dialogue quality through single-turn and multi-turn benchmarks. The framework includes specialized tools for direct preference optimization to refine model behavior using paired data without a separate reward model. It also supports constitutional AI alignment and the training of reward mo
Open-Instruct is a distributed training and instruction tuning framework for large language models. It functions as a coordinator for supervised fine-tuning, reinforcement learning from human feedback pipelines, and tool-use training, providing specialized roles for dataset curation and model alignment. The project distinguishes itself through a high-performance training architecture that utilizes actor-based distributed coordination and hybrid sharding to manage large GPU clusters. It implements advanced alignment techniques including direct preference optimization, group relative policy opt
LitGPT is a training and deployment framework for large language models, providing a suite of tools for pretraining, finetuning, quantizing, evaluating, and serving models within a production environment. It includes a dedicated training pipeline for adapting pretrained models to specific tasks, a quantization tool for reducing weight precision, and an inference server for hosting models via web interfaces. The framework supports high-performance model development through custom architecture implementation and the use of predefined recipes to standardize pretraining and finetuning. It enables
ChatGLM2-6B is an open-weight large language model designed for natural language conversations and text generation in both English and Chinese. It functions as a bilingual chat model capable of processing and maintaining coherence across text sequences up to 32K tokens. The model is optimized for local deployment through precision quantization, which reduces memory requirements to allow execution on consumer-grade hardware. It supports distributing model weights across multiple graphics cards to handle parameters that exceed the memory of a single device. The project covers capabilities for
This library provides a framework for parameter-efficient fine-tuning, enabling the adaptation of large pretrained models by training only a small subset of parameters. It functions as a distributed model training system and optimization toolkit, designed to reduce the computational and memory requirements typically associated with full model fine-tuning. The project distinguishes itself through a suite of methods for modular adapter composition, including low-rank matrix decomposition and activation-based scaling. It supports the integration of multiple task-specific adapter modules, allowin
ChatGLM3 is an open-weights large language model designed for bilingual conversational interactions in English and Chinese. It functions as a tool-augmented system capable of calling external functions and executing internal code to resolve complex tasks. The model utilizes four-bit quantization to reduce memory requirements, enabling inference on consumer hardware and diverse processing units including GPUs and CPUs. It features an expanded context window for processing and summarizing long documents and includes a supervised fine-tuning pipeline for adapting the model to specialized domains
Baichuan-7B is an open-source 7 billion parameter bilingual Transformer model designed for text generation and few-shot learning across Chinese and English. It is built on a large Transformer architecture trained on a bilingual corpus, enabling it to produce coherent text in both languages from a single model. The model incorporates several optimization techniques that distinguish it from standard large language models. It uses rotary position embeddings that can extrapolate to longer sequences than seen during training, allowing context extension beyond the original 4096-token training lengt
InternVL is a vision-language model framework that fuses a visual encoder with a large language model to translate image features into textual tokens for reasoning. It provides a system for multimodal inference and dialogue, enabling the processing of images and text to answer questions or generate descriptions. The project is distinguished by its high-resolution image processing, which uses dynamic tiling to maintain detail for images up to 4K resolution, and its chain-of-thought visual reasoning for solving complex mathematical and spatial problems. It also supports temporal frame sampling
ESPnet is a comprehensive speech processing toolkit and PyTorch-based trainer designed for building end-to-end speech recognition, synthesis, and translation models. It provides a structured framework for developing automatic speech recognition systems using transducer and encoder-decoder architectures, alongside engines for text-to-speech synthesis and speech translation pipelines. The project distinguishes itself through a recipe-based workflow execution system that ensures experimental reproducibility by running standardized sequences of scripts for data preparation and model training. It
DeepSpeedExamples is a collection of reference implementations for training and deploying large scale AI models using the DeepSpeed optimization library. It provides Python code examples for training massive models across multiple GPUs through distributed optimization techniques. The repository includes optimized patterns for deploying and running large language model predictions in production environments. It also serves as a guide for model compression to reduce memory footprints and as a source for performance benchmarks to measure execution speed and resource utilization. The project cov
This project is a quantized fine-tuning framework for large language models. It implements a low-rank adaptation library and a four-bit quantizer to reduce the GPU memory requirements needed to train large models. The framework utilizes four-bit quantization and low-rank adapters to enable model training on consumer-grade hardware. It further reduces the memory footprint through double quantization and a paged optimizer that offloads states to system RAM. The system supports distributed training across multiple GPUs to handle larger parameter scales and includes utilities for custom dataset
This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optimization to hardware-accelerated execution and generative sequence management. The runtime distinguishes itself through a highly modular architecture that decouples model logic from hardware-specific kernels. By utilizing an execution provider abstraction, it enables developers to offload computation
This project serves as a comprehensive educational resource and technical handbook for engineers building applications powered by large language models. It provides a structured framework for mastering the principles of artificial intelligence engineering, covering the full lifecycle of model development from initial design to production deployment. The repository distinguishes itself by offering a deep dive into the practical implementation of advanced design patterns, including retrieval-augmented generation, agentic tool orchestration, and parameter-efficient model adaptation. It emphasize
This repository is a collection of frameworks and guides for Llama models, functioning as a fine-tuning framework, an inference pipeline, and an AI workflow orchestrator. It provides tools for adapting large language models to specific datasets and domains. The project includes a parameter-efficient fine-tuning toolkit that utilizes techniques like low-rank adaptation to reduce memory and compute requirements. It also serves as an implementation guide for retrieval-augmented generation, combining model inference with external data retrieval to improve response accuracy. The capability surfac
LARK is a development toolkit for training, fine-tuning, and deploying large language models and multimodal models based on PaddlePaddle. It functions as a comprehensive framework that includes an LLM training orchestrator, an inference server, and a multimodal model framework for processing text, image, and video inputs. The project features a retrieval-augmented generation system for building conversational applications that integrate web search and private knowledge bases. It provides specific capabilities for multimodal reasoning and complex logic, enabling the extraction of structured da