What are the best open-source alternatives to Text Generation Inference?

30 open-source projects similar to huggingface/text-generation-inference, ranked by shared features. Top picks: sgl-project/sglang, intel/ipex-llm, kvcache-ai/ktransformers, internlm/lmdeploy, bentoml/openllm, vllm-project/vllm, zhaochenyang20/awesome-ml-sys-tutorial, zai-org/chatglm3, openvinotoolkit/openvino, modeltc/lightllm.

Is sgl-project/sglang a good alternative to Text Generation Inference?

Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains thr…

Is intel/ipex-llm a good alternative to Text Generation Inference?

Intel XPU LLM Acceleration Library is a toolkit designed to accelerate large language model inference and finetuning on Intel CPUs, GPUs, and NPUs. It provides a distributed inference engine for scaling models across multiple accelerators, a multimodal model runtime for vision and speech tasks, and…

Is kvcache-ai/ktransformers a good alternative to Text Generation Inference?

Ktransformers is a comprehensive framework designed for the operation, fine-tuning, and serving of large language models. It functions as a heterogeneous inference engine and quantized execution runtime, enabling the deployment of massive models by distributing computational workloads across both C…

Is internlm/lmdeploy a good alternative to Text Generation Inference?

lmdeploy is a high-performance inference engine and deployment framework for large language models and vision models. It functions as a multi-modal model server and compression toolkit designed to serve models with high throughput and low latency. The system enables the distribution of model servi…

Is bentoml/openllm a good alternative to Text Generation Inference?

OpenLLM is a framework for deploying, managing, and scaling open-source large language models

Is vllm-project/vllm a good alternative to Text Generation Inference?

vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built…

Is zhaochenyang20/awesome-ml-sys-tutorial a good alternative to Text Generation Inference?

This project provides a comprehensive technical guide and framework for engineering large-scale machine learning systems. It covers the full lifecycle of model development, focusing on the infrastructure and computational principles required to build, train, and serve generative AI models across di…

Is zai-org/chatglm3 a good alternative to Text Generation Inference?

ChatGLM3 is a comprehensive framework for deploying, fine-tuning, and serving large language models. It functions as a high-performance inference engine designed to support conversational AI, enabling developers to build interactive agents capable of multi-turn dialogue, autonomous code execution,…

Is openvinotoolkit/openvino a good alternative to Text Generation Inference?

OpenVINO is an AI inference engine and model serving platform designed to execute optimized deep learning models across CPUs, GPUs, and NPUs through a unified API. It includes a model optimization toolkit for converting, quantizing, and compressing models from various frameworks, alongside a specia…

Is modeltc/lightllm a good alternative to Text Generation Inference?

LightLLM is a high-performance serving framework for deploying and executing large language models. It functions as a multi-GPU inference engine and server capable of handling dense architectures, mixture-of-experts designs, and multimodal models that process both text and images. The system is di…

Back to huggingface/text-generation-inference

Open-source alternatives to Text Generation Inference

30 open-source projects similar to huggingface/text-generation-inference, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Text Generation Inference alternative.

sgl-project/sglang
sgl-project/sglang
29,079View on GitHub
Sglang is a high-performance inference engine and serving system designed for large language and multimodal models. It provides a programmable interface for orchestrating complex generation workflows, enabling developers to coordinate multi-turn dialogues, tool invocations, and reasoning chains through a domain-specific language. The platform is built to support production-scale deployments, offering an OpenAI-compatible API that allows for integration with existing application ecosystems. The system distinguishes itself through a disaggregated architecture that separates compute-intensive pr
Pythonattentionblackwellcuda
View on GitHub29,079
intel/ipex-llm
intel/ipex-llm
8,836View on GitHub
Intel XPU LLM Acceleration Library is a toolkit designed to accelerate large language model inference and finetuning on Intel CPUs, GPUs, and NPUs. It provides a distributed inference engine for scaling models across multiple accelerators, a multimodal model runtime for vision and speech tasks, and a low-bit model quantization tool for converting weights into INT4, FP8, and GGUF formats. The project features a parameter-efficient finetuning framework that enables model adaptation using QLoRA and DPO on Intel hardware. It distinguishes itself by providing specialized optimizations for Intel XP
Python
View on GitHub8,836
kvcache-ai/ktransformers
kvcache-ai/ktransformers
17,288View on GitHub
Ktransformers is a comprehensive framework designed for the operation, fine-tuning, and serving of large language models. It functions as a heterogeneous inference engine and quantized execution runtime, enabling the deployment of massive models by distributing computational workloads across both CPU and GPU resources. This architecture allows users to bypass local memory constraints, making it possible to run and train models that exceed the capacity of a single device. The project distinguishes itself through specialized support for sparse architectures, particularly mixture-of-experts mode
Python
View on GitHub17,288

Open-source alternatives to Text Generation Inference

sgl-project/sglang

intel/ipex-llm

kvcache-ai/ktransformers

InternLM/lmdeploy

bentoml/OpenLLM

vllm-project/vllm

zhaochenyang20/Awesome-ML-SYS-Tutorial

zai-org/ChatGLM3

openvinotoolkit/openvino

ModelTC/LightLLM

OpenNMT/CTranslate2

sgl-project/mini-sglang

jmorganca/ollama

vercel/ai

PrefectHQ/fastmcp

vllm-project/vllm-omni

triton-inference-server/server

b4rtaz/distributed-llama

PaddlePaddle/LARK

hiyouga/LLaMA-Factory

replicate/cog

jina-ai/serve

state-spaces/mamba

exo-explore/exo

GeeeekExplorer/nano-vllm

QwenLM/Qwen

NVIDIA-NeMo/NeMo

ggerganov/llama.cpp

NVIDIA/TensorRT-LLM

langchain-ai/deepagents